Llama Omni : A low-latency, high-quality end-to-end speech interaction model

Llama Omni

AI Model AI Speech Synthesis #Speech Interaction #End-to-End Model #Low Latency #High Quality #Multimodal Standard Picks Open Source

Overview :

LLaMA-Omni is a low-latency, high-quality end-to-end speech interaction model built on the Llama-3.1-8B-Instruct architecture, aimed at achieving speech capabilities comparable to GPT-4o. The model supports low-latency speech interactions, generating text and speech responses simultaneously. It completed training in less than 3 days using only 4 GPUs, demonstrating its efficient training capabilities.

Target Users :

The LLaMA-Omni model is designed for researchers and developers in the fields of speech recognition, speech synthesis, and natural language processing. It aids in building low-latency, high-quality speech interaction systems, driving the development of intelligent voice assistants and related applications.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 54.6K

Use Cases

Used for developing intelligent voice assistants to provide a seamless conversational experience.

Integrated into smart home systems for voice control of household devices.

Applied in customer service bots to deliver quick and accurate speech services.

Features

Built on Llama-3.1-8B-Instruct to ensure high-quality responses.

Low-latency speech interaction, with delays as low as 226 milliseconds.

Simultaneous generation of text and speech responses.

Training completed in under 3 days using 4 GPUs.

Supports Gradio demos for enhanced user interaction experience.

Provides local inference scripts for easy local testing.

How to Use

Clone the LLaMA-Omni repository to your local machine.

Navigate to the LLaMA-Omni directory and install the required packages.

Install fairseq and flash-attention.

Download the Llama-3.1-8B-Omni model and the Whisper-large-v3 model.

Download the unit-based HiFi-GAN vocoder.

Launch the Gradio demo and access the local server for interaction.

For local inference, organize speech command files according to the format in the omni_speech/infer/examples directory, then refer to the provided scripts for operation.