Sherpa Onnx : Open-source project supporting various speech recognition and speech synthesis functionalities

AI Speech Recognition

Sherpa Onnx

sherpa-onnx

Sherpa Onnx

AI Speech Recognition AI Speech Synthesis #Speech Recognition #Speech Synthesis #Machine Learning #onnxruntime Standard Picks Open Source

Overview :

Sherpa-onnx is a speech recognition and speech synthesis project based on the next-generation Kaldi, utilizing onnxruntime for inference. It supports a wide range of speech-related functionalities, including automatic speech recognition (ASR), text-to-speech (TTS), speaker recognition, speaker verification, language identification, and keyword detection. It is compatible with various platforms and operating systems, including embedded systems, Android, iOS, Raspberry Pi, RISC-V, and servers.

Target Users :

Sherpa-onnx is suitable for developers and researchers, especially those who need to implement speech recognition and speech synthesis functions on different platforms. It provides various APIs, including C++, C, Python, Go, C#, Java, Kotlin, JavaScript, and Swift, to accommodate developers from diverse backgrounds.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 129.2K

Use Cases

Use sherpa-onnx to implement real-time speech-to-text on Android devices.

Leverage sherpa-onnx for batch speech recognition tasks on servers.

Employ sherpa-onnx for keyword detection on embedded systems.

Features

Supports stream and non-stream speech recognition (ASR).

Supports text-to-speech conversion (TTS).

Supports speaker recognition.

Supports speaker verification.

Supports language identification.

Supports audio tagging and keyword detection.

Supports multiple platforms and operating systems.

How to Use

1. Clone or download the sherpa-onnx project to your local machine.

2. Select the appropriate API and platform based on your required functionalities.

3. Configure the environment and dependencies according to the documentation.

4. Load the pre-trained model and perform testing.

5. Adjust parameters according to your specific requirements to optimize performance.

6. Integrate into your application to implement speech recognition or speech synthesis functionalities.

Featured AI Tools

GPT-SoVITS

GPT-SoVITS-WebUI is a powerful zero-shot voice conversion and text-to-speech WebUI. It features zero-shot TTS, few-shot TTS, cross-language support, and a WebUI toolkit. The product supports English, Japanese, and Chinese, providing integrated tools such as voice accompaniment separation, automatic training set splitting, Chinese ASR, and text annotation to help beginners create training datasets and GPT/SoVITS models. Users can experience real-time text-to-speech conversion by inputting a 5-second voice sample, and they can fine-tune the model using only 1 minute of training data to improve voice similarity and naturalness. The product supports environment setup, Python and PyTorch versions, quick installation, manual installation, pre-trained models, dataset formats, pending tasks, and acknowledgments.

AI Speech Synthesis

Clone-Voice

Clone-Voice is a web-based voice cloning tool that can use any human voice to synthesize speech from text using that voice, or convert one voice to another using that voice. It supports 16 languages including Chinese, English, Japanese, Korean, French, German, and Italian. You can record voice online directly from your microphone. Functions include text-to-speech and voice-to-voice conversion. Its advantages lie in its simplicity, ease of use, no need for N card GPUs, support for multiple languages, and flexible voice recording. The product is currently free to use.

AI Speech Synthesis

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase