Snap Video : Snap Video: An extensible spatiotemporal transformer for text-to-video synthesis.

Snap Video

AI video generation AI model #Video Synthesis #Transformer #Text-to-Image Standard Picks Open Source

Overview :

Snap Video is a video-centric model that systematically addresses the challenges of motion fidelity, visual quality, and scalability in video generation by extending the EDM framework. Utilizing frame-level redundancy, the model proposes a scalable transformer architecture that represents the spatial and temporal dimensions as a highly compressed 1D latent vector. This allows for effective joint modeling of space and time, resulting in the synthesis of videos with strong temporal coherence and complex motion. This architecture enables the model to be efficiently trained to billions of parameters, achieving state-of-the-art results on multiple benchmarks.

Target Users :

Capable of handling various text-to-video tasks, such as story videos, commercial advertisements, and course demonstrations, enabling the automatic generation of video content.

Total Visits： 18.4K

Top Region： US(20.66%)

Website Views ： 202.9K

Use Cases

Generate a video related to the advertisement 'White Rabbit Cream Candy, delicious and safe'.

Generate a short video of 'A cat chasing a butterfly'.

Generate a New Year greeting video related to the phrase '?????????? ?????????????'.

Features

Extends EDM framework to support video generation

Proposes a scalable transformer architecture

Joint modeling of space and time

Synthesis of high-quality and temporally coherent videos