SALMONN : SALMONN: Speech Audio Language Music Open Neural Network

SALMONN

AI speech recognition AI speech synthesis #Speech #Audio #Language #Music #Large Language Model Standard Picks Open Source

Overview :

Developed by the Department of Electronic Engineering, Tsinghua University, and ByteDance, SALMONN is a large language model (LLM) that supports voice, audio events, and music input. Unlike models that only support voice or audio event input, SALMONN can perceive and understand various audio inputs, thereby achieving new capabilities such as multilingual speech recognition and translation, as well as audio-speech co-inference. This can be seen as giving the LLM 'auditory' and cognitive auditory abilities, making SALMONN a step towards artificial general intelligence with auditory capabilities.

Target Users :

SALMONN can be applied to fields such as speech recognition, speech translation, and audio processing.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 92.2K

Use Cases

Input: gunshots.wav, Output: ...

Input: duck.wav, Output: ...

Input: music.wav, Output: ...

Features

Multilingual Speech Recognition

Multilingual Speech Translation