GAIA : Voice-Driven Conversational Avatar Generation

GAIA

AI video generation AI head portrait generation #Avatar Generation #Voice-Driven #Image Synthesis Standard Picks Open Source

Overview :

GAIA aims to synthesize natural conversational videos from voice and a single portrait image. We introduce GAIA (Generative Avatar AI) which eliminates domain priors in conversational avatar generation. GAIA consists of two stages: 1) decomposing each frame into motion and appearance representations; 2) generating a motion sequence conditioned on voice and a reference portrait image. We collected a large-scale high-quality conversational avatar dataset and trained the model at different scales. Experimental results validate GAIA's superiority, scalability, and flexibility. The methods include variational autoencoders (VAEs) and diffusion models. Diffusion models are optimized to generate motion sequences conditioned on a voice sequence and random frames in a video clip. GAIA can be used for various applications such as controllable conversational avatar generation and text-guided avatar generation.

Target Users :

Can be used to generate natural conversational video avatars, suitable for research and development of AI/ML technologies.

Total Visits： 934.0K

Top Region： US(19.93%)

Website Views ： 69.0K

Use Cases

Voice-Driven Conversational Avatar Generation

Video-Driven Conversational Avatar Generation

Text-Guided Avatar Generation

Features

Voice-Driven Conversational Avatar Generation

Video-Driven Conversational Avatar Generation