Lookoncetohear : Real-Time Speech Extraction Smart Earphone Interaction System

Lookoncetohear

AI speech recognition AI audio editing #Speech Recognition #Real-Time Processing #Smart Earphones #Interaction System Standard Picks Open Source

Overview :

LookOnceToHear is an innovative smart earphone interaction system that allows users to select the target speaker they want to hear by simply using visual recognition. This technology was nominated for Best Paper at CHI 2024. It achieves real-time speech extraction through synthetic audio mixing, head-related transfer functions (HRTFs), and binaural room impulse responses (BRIRs), providing users with a novel way to interact.

Target Users :

This product is suitable for researchers and developers who need to perform speech recognition and extraction in noisy environments. For example, it can help people with hearing impairments understand conversations better in noisy environments, or perform speech analysis and processing in multi-speaker environments.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 86.4K

Use Cases

In meetings, use LookOnceToHear to select and listen to the voice of a specific speaker

Help people with hearing impairments concentrate on conversations in noisy public places

In audio analysis research, used to distinguish and extract multiple sound sources

Features

Users select the desired voice by looking at the target speaker for a few seconds

Utilizes the Scaper toolkit to synthesize and generate audio mixtures

Provides a self-contained dataset and training .jams specification files

Supports real-time speech extraction and evaluation of target speech listening models

Offers model checkpoints for easy training and evaluation by users

Suitable for speech recognition and extraction in noisy environments

How to Use

Download and unzip the provided .zip file to the 'data/' directory

Run the command to initiate the training process

Use Scaper's 'generate_from_jams' function to generate audio mixtures on the .jams specification files

Download and load the target speech listening model checkpoint for evaluation

Adjust model parameters as needed to optimize performance

In practical applications, users simply need to look at the target speaker to start speech extraction

Featured AI Tools

Openvoice

OpenVoice is an open-source voice cloning technology capable of accurately replicating reference voicemails and generating voices in various languages and accents. It offers flexible control over voice characteristics such as emotion, accent, and can adjust rhythm, pauses, and intonation. It achieves zero-shot cross-lingual voice cloning, meaning it does not require the language of the generated or reference voice to be present in the training data.

AI speech recognition

2.4M

Azure AI Studio Speech Services

Azure AI Studio is a suite of artificial intelligence services offered by Microsoft Azure, encompassing speech services. These services may include functions such as speech recognition, text-to-speech, and speech translation, enabling developers to incorporate voice-related intelligence into their applications.

AI speech recognition

272.4K

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

Direct Visits	51.61%	External Links	33.46%	Email	0.04%
Organic Search	12.58%	Social Media	2.19%	Display Ads	0.11%

Monthly Visits	4.92m
Average Visit Duration	393.01
Pages Per Visit	6.11
Bounce Rate	36.20%

Monthly Visits	4.92m
United States	19.34%
China	13.25%
India	9.32%
Russia	4.28%
Germany	3.63%