Bailing TTS : A large-scale text-to-speech model for generating high-quality Chinese dialect voices.

Bailing TTS

AI Speech Synthesis AI Text-to-Speech #text-to-speech #dialects #speech synthesis #Transformer #semi-supervised learning Standard Picks Open Source

Overview :

Bailing-TTS is a series of large-scale text-to-speech (TTS) models developed by Giant Network's AI Lab, focused on generating high-quality Chinese dialect voices. The model employs continuous semi-supervised learning and a specific Transformer architecture, effectively aligning text and speech markers through a multi-stage training process to achieve high-quality dialect speech synthesis. Bailing-TTS has demonstrated speech synthesis results that closely resemble natural human expression, holding significant relevance in the field of dialect speech synthesis.

Target Users :

Bailing-TTS is primarily aimed at developers and enterprises seeking high-quality Chinese dialect speech synthesis, such as those in speech synthesis application development, smart assistants, and educational software. It is particularly suitable for scenarios that require a natural and authentic dialect experience in voice interactions, enhancing user experience.

Total Visits： 0

Website Views ： 151.2K

Features

Continuous semi-supervised learning for aligning text and speech markers.

Utilizes a specific Transformer architecture for Chinese dialect representation learning.

Multi-stage training process to improve dialect speech synthesis quality.

Generates dialect speech that closely matches human natural expression.

Supports multiple Chinese dialects, such as the Henan dialect.

Achieves zero-shot contextual learning for Mandarin.

Facilitates fine-tuning for Mandarin speakers.

How to Use

1. Visit the Bailing-TTS model webpage.