

Kokoro Onnx
Overview :
kokoro-onnx is a text-to-speech (TTS) project based on the Kokoro model and ONNX runtime. It supports English and plans to support French, Japanese, Korean, and Chinese. The model offers near real-time performance on macOS M1 and provides a variety of voice options, including whispering. The model is lightweight, approximately 300MB (around 80MB when quantized). This project is open-source on GitHub under the MIT license, facilitating easy integration and use for developers.
Target Users :
The primary target audience is developers and researchers who can utilize this model to add text-to-speech capabilities to applications or engage in research and development related to speech synthesis. Its open-source nature and lightweight characteristics make it suitable for developers who wish to integrate high-quality TTS functionality into their projects without building a model from scratch.
Use Cases
Add voice prompt functionality to mobile applications
Integrate into smart assistant devices for natural language interactions
Conduct research on speech synthesis to explore new voice generation technologies
Features
Supports English (soon to support French, Japanese, Korean, and Chinese)
Offers near real-time performance on macOS M1
Provides a variety of voice options, including whispering
Lightweight model, approximately 300MB (around 80MB when quantized)
Based on ONNX runtime, easy to deploy and integrate
Includes example scripts for quick user onboarding
How to Use
1. Install uv (recommended) or use a regular Python environment
2. Create a new project folder and run 'uv init -p 3.12' to initialize the project within it
3. Use 'uv add' to add the kokoro-onnx and soundfile dependencies
4. Copy the contents of examples/save.py into hello.py
5. Download kokoro-v0_19.onnx and voices.json files and place them in the project directory
6. Run 'uv run hello.py' to generate the audio file
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Fresh Picks

Fish Audio Text To Speech
Text-to-speech technology converts textual information into speech, finding wide applications in assistive reading, voice assistants, and audiobook production. By mimicking human speech, it enhances the convenience of information access, particularly benefiting visually impaired individuals or those unable to read visually.
Text to Speech
8.7M