AudioSep
A
Audiosep
Overview :
AudioSep is an open-domain audio source separation model based on natural language queries. It consists of two key components: a text encoder and a separation model. We trained AudioSep on a large-scale multimodal dataset and extensively evaluated its capabilities on many tasks, including audio event separation, instrument separation, and voice enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability, significantly outperforming previous audio query and language query sound separation models when using audio titles or text labels as queries. To ensure the reproducibility of this work, we will release the source code, evaluation benchmark, and pre-trained models.
Target Users :
Applicable to the field of audio separation, it can be used in audio processing and audio editing.
Total Visits: 20.4M
Top Region: US(29.22%)
Website Views : 88.0K
Use Cases
Use AudioSep to separate the guitar sound from the audio
Use AudioSep to separate the human voice from the audio
Use AudioSep to separate the piano sound from the audio
Features
Audio source separation based on natural language queries
Support for open-domain audio concept separation
Support for audio event separation, instrument separation, and voice enhancement
Possess strong separation performance and zero-shot generalization ability
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase