MOSS-TTSD
M

MOSS TTSD

Overview :

MOSS-TTSD is an open-source bilingual dialogue synthesis model that supports natural and expressive speech generation. It can convert dialogue scripts into high-quality speech, suitable for podcast production and AI dialogue applications. The model's features include zero-shot voice cloning and long-duration speech generation, with a high level of expressiveness and realism. MOSS-TTSD is trained on large-scale language and speech data, ensuring the naturalness and accuracy of the generated speech. This technology is suitable for commercial use and is completely open source.

Target Users :

This product is suitable for developers working on speech synthesis, podcast production, and dialogue AI applications, especially content creators and researchers who need high-quality voice generation. MOSS-TTSD provides a flexible and powerful platform that allows users to generate natural and smooth dialogue audio, meeting commercial and educational needs.
Total Visits: 479.9M
Top Region: US(18.86%)
Website Views : 58.4K

Use Cases

Generated podcast audio using MOSS-TTSD to enhance content accessibility.
Used in online education platforms for interactive voice response systems.
In entertainment applications, add realistic voice performances to character dialogues.

Features

Supports dialogue speech generation in both Chinese and English.
Enables zero-shot two-person voice cloning, accurately switching between speakers.
Supports long-duration speech generation, suitable for AI podcast production.
Highly expressive dialogue speech, close to the sound of natural human conversation.
Provides both local and API inference methods for user convenience.
Supports batch processing tools to handle multiple generation requests simultaneously.
Includes a podcast generation tool that can convert long texts or web content into audio.
Provides simple fine-tuning scripts for users to customize the model.

How to Use

Install the required dependency libraries and set up the Python environment.
Download and prepare the XY Tokenizer model weights.
Prepare a JSONL-formatted input file containing dialogue scripts and speaker audio references.
Run the inference script, specifying the input file path and output directory.
View the generated audio files for further processing or publication.

Featured AI Tools

AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase