Parakeet Tdt 0.6b V2 : A high-quality English automatic speech recognition model that supports punctuation and timestamp prediction.

Parakeet Tdt 0.6b V2

Speech Recognition Development & Tools #Automatic Speech Recognition #Deep Learning #NVIDIA #Machine Learning #Speech-to-Text Standard Picks Open Source

Overview :

parakeet-tdt-0.6b-v2 is a 600 million parameter automatic speech recognition (ASR) model designed to achieve high-quality English transcription with accurate timestamp prediction and automatic punctuation and capitalization support. The model is based on the FastConformer architecture, capable of efficiently processing audio clips up to 24 minutes long, making it suitable for developers, researchers, and various industry applications.

Target Users :

This product is suitable for developers, researchers, and industry professionals, especially teams that need to build speech-to-text applications. The high accuracy and flexibility of parakeet-tdt-0.6b-v2 make it an ideal choice for implementing speech recognition functionality.

Total Visits： 23.9M

Top Region： US(17.58%)

Website Views ： 38.4K

Use Cases

Used for real-time transcription in voice assistants.

Implemented for classroom lecture transcription in educational applications.

Used as an automatic transcription tool for meeting records and summary generation.

Features

Accurate word-level timestamp prediction: Provides detailed timestamp information for each word.

Automatic punctuation and capitalization: Enhances readability of the transcribed text.

Strong performance on spoken numbers and lyrics: Can accurately transcribe number and lyric content.

Supports 16kHz audio input: Compatible with mainstream audio formats such as .wav and .flac.

Can handle audio up to 24 minutes long: Transcribes long audio at once, improving efficiency.

Runs on multiple NVIDIA GPUs: Optimizes performance and provides faster training and inference speeds.

Suitable for various applications: Ideal for conversational AI, voice assistants, transcription services, subtitle generation, etc.

How to Use

Install the NVIDIA NeMo toolkit and ensure the latest version of PyTorch is installed.

Download the model with the following command: import nemo.collections.asr as nemo_asr; asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name='nvidia/parakeet-tdt-0.6b-v2')

Prepare 16kHz audio files in .wav and .flac formats.

Invoke the model for transcription using: output = asr_model.transcribe(['audio file path']).

Add parameters for timestamps if needed: output = asr_model.transcribe(['audio file path'], timestamps=True).

Process the transcription output as required for text analysis or storage.