

Speechgpt
Overview :
SpeechGPT is a multimodal language model with inherent cross-modal dialogue capabilities. It can perceive and generate multimodal content and follow multimodal human instructions. SpeechGPT-Gen is an extended information chain speech generation model. SpeechAgents is a multimodal multi-agent system for human communication simulation. SpeechTokenizer is a unified speech tokenizer suitable for speech language models. The release dates and related information of these models and datasets can be found on the official website.
Target Users :
Can be used in scenarios such as voice content generation and multimodal human-computer interaction
Use Cases
Use SpeechGPT for multimodal dialogue generation
Utilize SpeechGPT-Gen for information chain speech generation
Use SpeechTokenizer for speech tokenization
Features
Multimodal content perception and generation
Information chain speech generation
Multimodal multi-agent system
Unified speech tokenizer
Featured AI Tools

Openvoice
OpenVoice is an open-source voice cloning technology capable of accurately replicating reference voicemails and generating voices in various languages and accents. It offers flexible control over voice characteristics such as emotion, accent, and can adjust rhythm, pauses, and intonation. It achieves zero-shot cross-lingual voice cloning, meaning it does not require the language of the generated or reference voice to be present in the training data.
AI speech recognition
2.4M

Chattts
ChatTTS is an open-source text-to-speech (TTS) model that allows users to convert text into speech. This model is primarily aimed at academic research and educational purposes and is not suitable for commercial or legal applications. It utilizes deep learning techniques to generate natural and fluent speech output, making it suitable for individuals involved in speech synthesis research and development.
AI speech synthesis
1.4M