Hibiki : Hibiki is a model designed for streaming voice translation (i.e., simultaneous interpretation) that can generate accurate translations in real time, chunk by chunk.

Hibiki

Translation Speech Recognition #Voice Translation #Real-time Translation #Multi-stream Architecture #Open-source Model #Low Latency Standard Picks Open Source

Overview :

Hibiki is an advanced model focusing on streaming voice translation. It generates accurate translations in real time by accumulating sufficient contextual information, supporting both voice and text translation, and facilitating voice conversion. The model is based on a multi-stream architecture, capable of simultaneously processing source and target speech, producing continuous audio streams and timestamped text translations. Its main advantages include high-fidelity voice conversion, low-latency real-time translation, and compatibility with complex reasoning strategies. Hibiki currently supports translation from French to English and is suitable for efficient real-time translation scenarios, such as international conferences and multilingual live events. The model is open-source and free, making it ideal for developers and researchers.

Target Users :

Hibiki is suitable for real-time voice translation scenarios, such as international conferences, multilingual live broadcasts, online education, etc. It is especially beneficial for developers and researchers, as it can be utilized to develop related applications or conduct academic research.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 58.2K

Use Cases

At an international conference, real-time translation of French speeches into English to provide instant translation for the audience.

Used on a multilingual live broadcast platform to translate the host's French speech into English in real-time, expanding the viewer base.

In an online education platform, the teacher's French lecture content is translated into English in real-time, facilitating learning for students from different language backgrounds.

Features

Supports streaming voice translation, generating translation results in real-time, chunk by chunk.

Can simultaneously produce target speech and text translations to meet various user needs.

Utilizes a multi-stream architecture, jointly modeling source and target speech.

Supports voice conversion capabilities, preserving the original speaker's vocal characteristics.

Offers multiple backend implementations (e.g., PyTorch, Rust, MLX, etc.) compatible with different hardware platforms.

How to Use

1. Install the necessary backend libraries (e.g., PyTorch or Rust).

2. Download the Hibiki model files, selecting the appropriate version (e.g., PyTorch or MLX).

3. Prepare the audio files to be translated.

4. Use the command-line tool to run the translation script, specifying the audio files and output paths.

5. Adjust parameters as needed (e.g., classifier free guidance coefficients) to optimize translation quality.

6. Review the generated translated audio files and text results.

Featured AI Tools

Transluna is a powerful online tool designed to simplify the process of translating JSON files into multiple languages. It's an essential resource for developers, localization experts, and anyone involved in internationalization and localization. Transluna delivers accurate JSON translations, helping your website effectively communicate and resonate with global users.

Translation

552.3K

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Direct Visits	51.61%	External Links	33.46%	Email	0.04%
Organic Search	12.58%	Social Media	2.19%	Display Ads	0.11%

Monthly Visits	4.92m
Average Visit Duration	393.01
Pages Per Visit	6.11
Bounce Rate	36.20%

Monthly Visits	4.92m
United States	19.34%
China	13.25%
India	9.32%
Russia	4.28%
Germany	3.63%