SpeechGPT
S
Speechgpt
Overview :
SpeechGPT is a multimodal language model with inherent cross-modal dialogue capabilities. It can perceive and generate multimodal content and follow multimodal human instructions. SpeechGPT-Gen is an extended information chain speech generation model. SpeechAgents is a multimodal multi-agent system for human communication simulation. SpeechTokenizer is a unified speech tokenizer suitable for speech language models. The release dates and related information of these models and datasets can be found on the official website.
Target Users :
Can be used in scenarios such as voice content generation and multimodal human-computer interaction
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 112.3K
Use Cases
Use SpeechGPT for multimodal dialogue generation
Utilize SpeechGPT-Gen for information chain speech generation
Use SpeechTokenizer for speech tokenization
Features
Multimodal content perception and generation
Information chain speech generation
Multimodal multi-agent system
Unified speech tokenizer
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase