Audio To Photoreal Embodiment : A framework for generating full-body photorealistic avatars.

Audio To Photoreal Embodiment

AI image generation AI video generation #Full-body photorealistic avatar #Pose and movements #Audio input #Vector quantization #Diffusion #Realistic avatars Standard Picks Open Source

Overview :

Audio to Photoreal Embodiment is a framework for generating full-body photorealistic avatars. It generates diverse poses and movements of the face, body, and hands based on conversational dynamics. The key to its method lies in combining the sample diversity of vector quantization with the high-frequency details obtained from diffusion, resulting in more dynamic and expressive movements. The photorealistic avatars generated for visualizing the movements can express subtle nuances in poses (e.g., sneering and arrogance). To promote this research direction, we introduce a novel multi-view conversational dataset that enables photorealistic reconstruction. Experiments demonstrate that our model generates appropriate and diverse actions, outperforming diffusion and vector quantization-only methods. Furthermore, our perceptual evaluation highlights the importance of photorealism (compared to meshes) in accurately assessing subtle action details within conversational poses. Code and dataset are available online.

Target Users :

Framework for generating full-body photorealistic avatars.

Total Visits： 29.7M

Top Region： US(17.94%)

Website Views ： 49.7K

Use Cases

Generate realistic avatars for voice chat applications

Generate realistic avatars for virtual reality environments

Generate realistic avatars for online education platforms

Features

Generate diverse poses and movements of full-body avatars based on audio input

Utilize vector quantization and diffusion techniques to create dynamic and expressive movements