

Stable Video Portraits
Overview :
Stable Video Portraits is an innovative hybrid 2D/3D generation method that uses pre-trained text-to-image models (2D) and 3D shape models (3D) to generate realistic dynamic face videos. This technology elevates generic 2D stable diffusion models to video models through person-specific fine-tuning, providing a time-series 3D shape model as a condition, and introduces a temporal denoising process to generate temporally smooth facial images that can be edited and morphed into text-defined celebrity likenesses without additional fine-tuning at test time. This method outperforms existing monocular head avatar methods in both quantitative and qualitative analyses.
Target Users :
Target audience includes, but is not limited to: computer vision researchers, AI developers, digital media artists, and film and game producers. The Stable Video Portraits technology is particularly suitable for professionals who need to create virtual characters or perform facial animation due to its realistic dynamic face generation capabilities.
Use Cases
In film production, used to generate realistic virtual characters.
In game development, utilized to create highly realistic NPC facial animations.
In the digital arts field, artists use this technology to create unique artistic works.
Features
Extract 3D facial reconstruction (3DMM), FPM, and iris positions from each frame of the input video using available 3D facial reconstruction methods, Facial Parsing Mapping (FPM) models, and Mediapipe.
Train two ControlNets in parallel, generating temporally stable contours in the first stage and internal details in the second stage to create realistic personalized avatars.
Further morph the personalized avatar into celebrities using text without additional fine-tuning.
Modify the inference of DDIM steps t=τ using predictions from the previous frame to account for temporal continuity, ensuring smooth output.
Enable facial morphing features that can transform personalized avatars into specific celebrities, such as Scarlett Johansson or Emma Watson, while maintaining head pose consistency.
Demonstrate superiority compared to current monocular head avatar methods.
Analyze the effects of morphing factors, input controls, and denoising process variables on the results through ablation studies.
How to Use
1. Visit the official website of Stable Video Portraits.
2. Read the research papers and method overviews related to this technology.
3. Download and install the required software and libraries.
4. Prepare the input video, ensuring that the video quality meets the requirements for 3D facial reconstruction.
5. Use 3D facial reconstruction methods, FPM models, and Mediapipe to extract 3DMM, FPM, and iris positions from the video.
6. Train ControlNets to generate contours and internal details.
7. Utilize a temporal denoising process to generate temporally smooth video output.
8. If necessary, adjust the facial features of the personal avatar through text input to match the appearance of specific celebrities.
Featured AI Tools

Sora
AI video generation
17.0M

Animate Anyone
Animate Anyone aims to generate character videos from static images driven by signals. Leveraging the power of diffusion models, we propose a novel framework tailored for character animation. To maintain consistency of complex appearance features present in the reference image, we design ReferenceNet to merge detailed features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guidance module to direct character movements and adopt an effective temporal modeling approach to ensure smooth cross-frame transitions between video frames. By extending the training data, our method can animate any character, achieving superior results in character animation compared to other image-to-video approaches. Moreover, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results.
AI video generation
11.4M