Voicecraft : Zero-shot voice editing and text-to-speech technology

AI Speech Synthesis

Voicecraft

VoiceCraft

Voicecraft

AI Speech Synthesis AI Speech Recognition #Voice Editing #Text-to-Speech #Voice Cloning #Recording Editing Standard Picks Open Source

Overview :

VoiceCraft is a token-filling based neural encoder-decoder language model that achieves leading performance in voice editing and zero-shot text-to-speech (TTS). For unseen voices, VoiceCraft only needs a few seconds of voice samples to clone the voice or edit the recording. The model is suitable for wild data such as audiobooks, online videos, and podcasts.

Target Users :

Generates and edits voice content for audiobooks, online videos, podcasts, and more.

Total Visits： 1.8K

Top Region： US(75.65%)

Website Views ： 141.6K

Use Cases

Use VoiceCraft to generate natural-sounding voices for audiobooks or podcast episodes.

Edit existing recordings to modify content or change the speaker's voice.

Clone someone's voice from a small amount of voice samples to generate customized voice content.

Features

Voice Editing

Zero-Shot Text-to-Speech

Clone Unseen Voices

Edit Recordings

Featured AI Tools

GPT-SoVITS

GPT-SoVITS-WebUI is a powerful zero-shot voice conversion and text-to-speech WebUI. It features zero-shot TTS, few-shot TTS, cross-language support, and a WebUI toolkit. The product supports English, Japanese, and Chinese, providing integrated tools such as voice accompaniment separation, automatic training set splitting, Chinese ASR, and text annotation to help beginners create training datasets and GPT/SoVITS models. Users can experience real-time text-to-speech conversion by inputting a 5-second voice sample, and they can fine-tune the model using only 1 minute of training data to improve voice similarity and naturalness. The product supports environment setup, Python and PyTorch versions, quick installation, manual installation, pre-trained models, dataset formats, pending tasks, and acknowledgments.

AI Speech Synthesis

Clone-Voice

Clone-Voice is a web-based voice cloning tool that can use any human voice to synthesize speech from text using that voice, or convert one voice to another using that voice. It supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. You can record voice online directly from your microphone. Functions include text-to-speech and voice-to-voice conversion. Its advantages lie in its simplicity, ease of use, no need for N card GPUs, support for multiple languages, and flexible voice recording. The product is currently free to use.

AI Speech Synthesis

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase