

GAIA
Overview :
GAIA aims to synthesize natural conversational videos from voice and a single portrait image. We introduce GAIA (Generative Avatar AI) which eliminates domain priors in conversational avatar generation. GAIA consists of two stages: 1) decomposing each frame into motion and appearance representations; 2) generating a motion sequence conditioned on voice and a reference portrait image. We collected a large-scale high-quality conversational avatar dataset and trained the model at different scales. Experimental results validate GAIA's superiority, scalability, and flexibility. The methods include variational autoencoders (VAEs) and diffusion models. Diffusion models are optimized to generate motion sequences conditioned on a voice sequence and random frames in a video clip. GAIA can be used for various applications such as controllable conversational avatar generation and text-guided avatar generation.
Target Users :
Can be used to generate natural conversational video avatars, suitable for research and development of AI/ML technologies.
Use Cases
Voice-Driven Conversational Avatar Generation
Video-Driven Conversational Avatar Generation
Text-Guided Avatar Generation
Features
Voice-Driven Conversational Avatar Generation
Video-Driven Conversational Avatar Generation
Pose-Controllable Conversational Avatar Generation
Fully Controllable Conversational Avatar Generation
Text-Guided Avatar Generation
Featured AI Tools

Sora
AI video generation
17.0M

Animate Anyone
Animate Anyone aims to generate character videos from static images driven by signals. Leveraging the power of diffusion models, we propose a novel framework tailored for character animation. To maintain consistency of complex appearance features present in the reference image, we design ReferenceNet to merge detailed features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guidance module to direct character movements and adopt an effective temporal modeling approach to ensure smooth cross-frame transitions between video frames. By extending the training data, our method can animate any character, achieving superior results in character animation compared to other image-to-video approaches. Moreover, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results.
AI video generation
11.4M