

Liteavatar
Overview :
LiteAvatar is an audio-driven real-time 2D avatar generation model primarily designed for real-time chat scenarios. Through efficient speech recognition and viseme parameter prediction technology combined with a lightweight 2D face generation model, it achieves 30fps real-time inference using only CPU. Key advantages include efficient audio feature extraction, a lightweight model design, and mobile device-friendly support. This technology is suitable for real-time interactive virtual avatar generation scenarios such as online meetings and virtual live streaming. It was developed based on the need for real-time interaction and low hardware requirements. Currently, it is open-source and free, positioned as an efficient, low-resource-consuming real-time avatar generation solution.
Target Users :
LiteAvatar is targeted towards application developers needing real-time virtual avatar generation, virtual live streaming platforms, and businesses requiring real-time interaction. This technology is suitable for scenarios where efficient real-time interaction is desired with low hardware costs, such as online education, virtual meetings, and virtual social platforms. It helps users enhance interaction experiences and lowers technical barriers.
Use Cases
Online education platforms use this model to provide real-time virtual teacher avatars to students, enhancing interactivity.
Virtual live streaming platforms use LiteAvatar to generate real-time virtual avatars for streamers, reducing hardware costs.
Enterprise internal video conferencing systems integrate this technology to enable virtual avatar participation, enhancing privacy protection.
Features
Audio Feature Extraction: Extracts features from audio using an efficient ASR model.
Viseme Parameter Prediction: Generates lip-sync parameters synchronized with speech based on audio features.
2D Avatar Generation: Real-time rendering of lip movements, supporting lightweight deployment.
Real-time Interaction Support: Achieves 30fps real-time inference on CPU-only devices.
Open-source & Easy-to-use: Provides complete code and documentation for easy integration and expansion by developers.
How to Use
1. Prepare sample data and extract it to the specified path.
2. Install a Python environment (preferably 3.10) and run `pip install -r requirements.txt` to install dependencies.
3. Run inference using `python lite_avatar.py --data_dir /path/to/sample_data --audio_file /path/to/audio.wav --result_dir /path/to/result`.
4. The inference result will be saved as an MP4 video file.
5. Refer to the `OpenAvatarChat` project to implement real-time interactive video chat functionality.
Featured AI Tools
Chinese Picks

Wenxin Yiyian
Wenxin Yiyian is Baidu's new generation of knowledge-enhanced large language model. It can interact with people in dialogue, answer questions, assist in creation, and help people efficiently and conveniently access information, knowledge, and inspiration. Based on the FlyingPaddle deep learning platform and Wenxin Knowledge Enhancement Large Language Model, it continuously integrates learning from massive data and large-scale knowledge, featuring knowledge enhancement, retrieval enhancement, and dialogue enhancement. We look forward to your feedback to help Wenxin Yiyian continue to improve.
Chatbot
5.4M
English Picks

Bot3 AI
Bot3 AI is your ultimate destination for AI conversational robots. Experience unprecedented levels of intelligent dialogue participation by interacting with AI characters.
Chatbot
2.7M