

Audio To Photoreal Embodiment
Overview :
Audio to Photoreal Embodiment is a framework for generating full-body photorealistic avatars. It generates diverse poses and movements of the face, body, and hands based on conversational dynamics. The key to its method lies in combining the sample diversity of vector quantization with the high-frequency details obtained from diffusion, resulting in more dynamic and expressive movements. The photorealistic avatars generated for visualizing the movements can express subtle nuances in poses (e.g., sneering and arrogance). To promote this research direction, we introduce a novel multi-view conversational dataset that enables photorealistic reconstruction. Experiments demonstrate that our model generates appropriate and diverse actions, outperforming diffusion and vector quantization-only methods. Furthermore, our perceptual evaluation highlights the importance of photorealism (compared to meshes) in accurately assessing subtle action details within conversational poses. Code and dataset are available online.
Target Users :
Framework for generating full-body photorealistic avatars.
Use Cases
Generate realistic avatars for voice chat applications
Generate realistic avatars for virtual reality environments
Generate realistic avatars for online education platforms
Features
Generate diverse poses and movements of full-body avatars based on audio input
Utilize vector quantization and diffusion techniques to create dynamic and expressive movements
Visualize generated movements using highly realistic avatars
Featured AI Tools

Sora
AI video generation
17.0M

Animate Anyone
Animate Anyone aims to generate character videos from static images driven by signals. Leveraging the power of diffusion models, we propose a novel framework tailored for character animation. To maintain consistency of complex appearance features present in the reference image, we design ReferenceNet to merge detailed features via spatial attention. To ensure controllability and continuity, we introduce an efficient pose guidance module to direct character movements and adopt an effective temporal modeling approach to ensure smooth cross-frame transitions between video frames. By extending the training data, our method can animate any character, achieving superior results in character animation compared to other image-to-video approaches. Moreover, we evaluate our method on benchmarks for fashion video and human dance synthesis, achieving state-of-the-art results.
AI video generation
11.4M