Omnisensevoice : Ultra-fast speech recognition with precise timestamps

Omnisensevoice

AI speech recognition AI speech to text #Speech Recognition #Timestamping #Multilingual Support #GPU Acceleration #Open Source Standard Picks Open Source

Overview :

OmniSenseVoice is an optimized speech recognition model based on SenseVoice, designed for rapid inference and accurate timestamps, providing a smarter and faster way to transcribe audio.

Target Users :

The target audience includes businesses and developers in need of voice transcription, audio analysis, and real-time speech recognition. The high-speed processing capabilities and precise timestamp functions of OmniSenseVoice make it especially suitable for scenarios requiring rapid processing of large volumes of voice data, such as meeting notes, lecture transcriptions, and real-time translation.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 93.3K

Use Cases

Real-time speech transcription for meetings, generating timestamped meeting notes.

Transcription of online course content, providing students with timestamped lecture notes.

Real-time translation applications delivering fast and accurate speech translation services.

Features

Supports automatic language detection or specification (auto, Chinese, English, Cantonese, Japanese, Korean).

Offers text normalization options, allowing for reverse text normalization.

Can run on specific GPUs, with CPU as the default.

Uses quantized models to accelerate processing speed.

Provides detailed help information for user understanding and usage.

Includes benchmarking features to assess model performance.

Supports up to 50 times faster processing without sacrificing accuracy.

How to Use

1. Install the OmniSenseVoice model.

2. Set the language parameter as needed, for example: --language zh.

3. Choose whether to perform text normalization, for example: --textnorm woitn.

4. Specify the device ID to run on, for example: --device-id 0.

5. If desired, select to use the quantized model, for example: --quantize.

6. Run a benchmark test to evaluate model performance, for example: omnisense benchmark -s -d --num-workers 2 --device-id 0 --batch-size 10 --textnorm woitn --language en benchmark/data/manifests/libritts/libritts_cuts_dev-clean.jsonl.

7. Refer to the README file for more details and configuration options.

8. Adjust parameters according to specific requirements to perform speech recognition tasks.