Lumiere : A video generation spatio-temporal diffusion model

Lumiere

AI video generation AI image generation #Video Synthesis #Text-to-Video #Spatio-Temporal Diffusion Model Standard Picks Open Source

Overview :

Lumiere is a text-to-video diffusion model designed to synthesize videos that exhibit realistic, diverse, and coherent motion, addressing key challenges in video synthesis. We introduce a spatio-temporal U-Net architecture that enables the generation of an entire video's temporal duration in a single model pass. This contrasts with existing video models, which synthesize distant keyframes and then perform temporal super-resolution, a method that intrinsically makes global temporal consistency difficult to achieve. By deploying spatial and, importantly, temporal downsampling and upsampling, and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate full-frame rate, low-resolution videos at multiple spatio-temporal scales. We demonstrate state-of-the-art results in text-to-video generation and showcase that our design readily facilitates a variety of content creation tasks and video editing applications, including image-to-video, video repair, and style generation.

Target Users :

Suitable for video synthesis, image-to-video, video repair, style generation and other content creation and video editing applications.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 873.8K

Use Cases

Video synthesis application scenario examples

Image-to-video application scenario examples

Video repair application scenario examples

Features

Synthesize videos that exhibit realistic, diverse, and coherent motion

Generate an entire video's temporal duration in a single model pass