

Abox
Overview :
ABox is a versatile voice changer app that helps users protect their privacy, recover from voice strain, and allows individuals of diverse genders to express themselves freely. No matter your need, ABox provides solutions for sound liberation.
Target Users :
Chatting, gaming, pranks, and more
Use Cases
Chatting with friends and family
Voice communication in games
Pranks for April Fools' Day or Halloween
Features
Real-time voice changing
Switch voices anytime
Convenient and easy to use
Privacy protection
Traffic Sources
Direct Visits | 0.00% | External Links | 0.00% | 0.00% | |
Organic Search | 0.00% | Social Media | 0.00% | Display Ads | 0.00% |
Latest Traffic Situation
Monthly Visits | 58.54k |
Average Visit Duration | 9.91 |
Pages Per Visit | 1.47 |
Bounce Rate | 55.03% |
Total Traffic Trend Chart
Similar Open Source Products

Parakeet Tdt 0.6b V2
parakeet-tdt-0.6b-v2 is a 600 million parameter automatic speech recognition (ASR) model designed to achieve high-quality English transcription with accurate timestamp prediction and automatic punctuation and capitalization support. The model is based on the FastConformer architecture, capable of efficiently processing audio clips up to 24 minutes long, making it suitable for developers, researchers, and various industry applications.
Speech Recognition

Kimi Audio
Kimi-Audio is an advanced open-source audio foundation model designed to handle a variety of audio processing tasks, such as speech recognition and audio dialogue. The model has been extensively pre-trained on over 13 million hours of diverse audio and text data, giving it strong audio reasoning and language understanding capabilities. Its key advantages include excellent performance and flexibility, making it suitable for researchers and developers to conduct audio-related research and development.
Speech Recognition

Megatts 3
MegaTTS 3 is a highly efficient speech synthesis model based on PyTorch, developed by ByteDance, with ultra-high-quality speech cloning capabilities. Its lightweight architecture contains only 0.45B parameters, supports Chinese, English, and code switching, and can generate natural and fluent speech from input text. It is widely used in academic research and technological development.
Speech Recognition

CSM 1B
CSM 1B is a speech generation model based on the Llama architecture, capable of generating RVQ audio codes from text and audio input. The model is primarily used in speech synthesis and boasts high-quality speech generation capabilities. Its advantages include the ability to handle multi-speaker dialogue scenarios and generate natural and fluent speech through contextual information. This open-source model is intended to support research and educational purposes but is explicitly prohibited from being used for impersonation, fraud, or illegal activities.
Speech Synthesis
Fresh Picks

Sesame CSM
CSM is a conversational speech generation model developed by Sesame. It can generate high-quality speech from text and audio input. The model is based on the Llama architecture and uses the Mimi audio encoder. It is mainly used for speech synthesis and interactive voice applications, such as voice assistants and educational tools. The main advantages of CSM are its ability to generate natural and fluent speech and its ability to optimize speech output through contextual information. The model is currently open-source and suitable for research and educational purposes.
Speech Synthesis

Step Audio
Step-Audio is the first production-level open-source intelligent voice interaction framework, integrating voice understanding and generation capabilities. It supports multilingual dialogue, emotional intonation, dialects, speech rate, and prosodic style control. Its core technologies include a 130B parameter multimodal model, a generative data engine, fine-grained voice control, and enhanced intelligence. This framework promotes the development of intelligent voice interaction technology through open-source models and tools, and is suitable for a variety of voice application scenarios.
Speech Recognition

Fireredasr AED L
FireRedASR-AED-L is an open-source, industrial-grade automatic speech recognition model designed to meet the needs for high efficiency and performance in speech recognition. This model utilizes an attention-based encoder-decoder architecture and supports multiple languages including Mandarin, Chinese dialects, and English. It achieved new record levels in public Mandarin speech recognition benchmarks and has shown exceptional performance in singing lyric recognition. Key advantages of the model include high performance, low latency, and broad applicability across various speech interaction scenarios. Its open-source feature allows developers the freedom to use and modify the code, further advancing the development of speech recognition technology.
Speech Recognition

Fireredasr
FireRedASR is an open-source industrial-grade Mandarin automatic speech recognition model, utilizing an Encoder-Decoder and LLM integrated architecture. It includes two variants: FireRedASR-LLM and FireRedASR-AED, designed for high-performance and efficient needs respectively. The model excels in Mandarin benchmarking tests and also performs well in recognizing dialects and English speech. It is suitable for industrial applications requiring efficient speech-to-text conversion, such as smart assistants and video subtitle generation. The open-source model is easy for developers to integrate and optimize.
Speech Recognition

Pengchengstarling
PengChengStarling is an open-source toolkit focused on multilingual automatic speech recognition (ASR), developed based on the icefall project. It supports the entire ASR process, including data processing, model training, inference, fine-tuning, and deployment. By optimizing parameter configurations and integrating language identifiers into the RNN-Transducer architecture, it significantly enhances the performance of multilingual ASR systems. Its main advantages include efficient multilingual support, a flexible configuration design, and robust inference performance. The models in PengChengStarling perform exceptionally well across various languages, require relatively small model sizes, and offer extremely fast inference speeds, making it suitable for scenarios that demand efficient speech recognition.
Speech Recognition
Alternatives

Parakeet Tdt 0.6b V2
parakeet-tdt-0.6b-v2 is a 600 million parameter automatic speech recognition (ASR) model designed to achieve high-quality English transcription with accurate timestamp prediction and automatic punctuation and capitalization support. The model is based on the FastConformer architecture, capable of efficiently processing audio clips up to 24 minutes long, making it suitable for developers, researchers, and various industry applications.
Speech Recognition

Kimi Audio
Kimi-Audio is an advanced open-source audio foundation model designed to handle a variety of audio processing tasks, such as speech recognition and audio dialogue. The model has been extensively pre-trained on over 13 million hours of diverse audio and text data, giving it strong audio reasoning and language understanding capabilities. Its key advantages include excellent performance and flexibility, making it suitable for researchers and developers to conduct audio-related research and development.
Speech Recognition

Amazon Nova Sonic
Amazon Nova Sonic is a cutting-edge foundational model that integrates speech understanding and generation, enhancing the natural fluency of human-computer dialogue. This model overcomes the complexities of traditional voice applications, achieving a deeper level of communication understanding through a unified architecture. It is suitable for AI applications across multiple industries and holds significant commercial value. As AI technology continues to develop, Nova Sonic will provide customers with better voice interaction experiences and improved service efficiency.
Speech Recognition

Megatts 3
MegaTTS 3 is a highly efficient speech synthesis model based on PyTorch, developed by ByteDance, with ultra-high-quality speech cloning capabilities. Its lightweight architecture contains only 0.45B parameters, supports Chinese, English, and code switching, and can generate natural and fluent speech from input text. It is widely used in academic research and technological development.
Speech Recognition

CSM 1B
CSM 1B is a speech generation model based on the Llama architecture, capable of generating RVQ audio codes from text and audio input. The model is primarily used in speech synthesis and boasts high-quality speech generation capabilities. Its advantages include the ability to handle multi-speaker dialogue scenarios and generate natural and fluent speech through contextual information. This open-source model is intended to support research and educational purposes but is explicitly prohibited from being used for impersonation, fraud, or illegal activities.
Speech Synthesis
Fresh Picks

Sesame CSM
CSM is a conversational speech generation model developed by Sesame. It can generate high-quality speech from text and audio input. The model is based on the Llama architecture and uses the Mimi audio encoder. It is mainly used for speech synthesis and interactive voice applications, such as voice assistants and educational tools. The main advantages of CSM are its ability to generate natural and fluent speech and its ability to optimize speech output through contextual information. The model is currently open-source and suitable for research and educational purposes.
Speech Synthesis

Durt
DuRT is a speech recognition and translation tool focusing on macOS. It uses local AI models and system services to achieve real-time speech recognition and translation, supporting multiple speech recognition methods to improve accuracy and language support. The product displays results in a floating window for easy access during use. Its main advantages include high accuracy, privacy protection (no user information is collected), and a convenient user experience. DuRT is positioned as a highly efficient productivity tool, designed to help users communicate and work more efficiently in multilingual environments. The product is currently available on the Mac App Store; pricing is not explicitly mentioned on the page.
Speech Recognition

Elevenlabs Scribe
Scribe is a high-accuracy speech-to-text model developed by ElevenLabs, designed to handle the unpredictability of real-world audio. It supports 99 languages and provides features such as word-level timestamps, speaker diarization, and audio event labeling. Scribe demonstrates superior performance on the FLEURS and Common Voice benchmarks, surpassing leading models like Gemini 2.0 Flash, Whisper Large V3, and Deepgram Nova-3. It significantly reduces error rates for traditionally underserved languages (such as Serbian, Cantonese, and Malayalam), where error rates often exceed 40% in competing models. Scribe offers an API for developer integration and will launch a low-latency version to support real-time applications.
Speech Recognition

Supertone Play
Supertone Play is a platform dedicated to voice cloning and AI voice content creation. Leveraging advanced AI technology, it empowers users to create personalized voice content through simple voice inputs. This technology has wide-ranging applications in entertainment, education, business, and more, providing users with a novel means of expression and creation. The platform's voice cloning feature allows users to rapidly create unique voice models, while AI voice content creation generates high-quality voice content based on user requirements. The key advantages of this technology are its efficiency, personalization, and innovative nature, catering to diverse user needs in voice creation.
Speech Recognition
Featured AI Tools
English Picks

Resemble
Resemble AI is an AI voice generator that can create realistic human voices in seconds. It also supports voice cloning, allowing you to record or upload voice data to generate your own AI voice. Resemble AI also provides real-time voice-to-voice and text-to-speech conversion functionality, which can be used to create custom voices. Additionally, Resemble AI offers voice editing and language localization features to help users easily edit and localize voice content. Resemble AI also offers API and mobile support, allowing it to run natively on Android and iOS. Pricing and commercial positioning please refer to the official website.
Speech Synthesis
1.1M

Lugs.ai
Speech Recognition
599.2K