

Audiosep
Overview :
AudioSep is an open-domain audio source separation model based on natural language queries. It consists of two key components: a text encoder and a separation model. We trained AudioSep on a large-scale multimodal dataset and extensively evaluated its capabilities on many tasks, including audio event separation, instrument separation, and voice enhancement. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability, significantly outperforming previous audio query and language query sound separation models when using audio titles or text labels as queries. To ensure the reproducibility of this work, we will release the source code, evaluation benchmark, and pre-trained models.
Target Users :
Applicable to the field of audio separation, it can be used in audio processing and audio editing.
Use Cases
Use AudioSep to separate the guitar sound from the audio
Use AudioSep to separate the human voice from the audio
Use AudioSep to separate the piano sound from the audio
Features
Audio source separation based on natural language queries
Support for open-domain audio concept separation
Support for audio event separation, instrument separation, and voice enhancement
Possess strong separation performance and zero-shot generalization ability
Featured AI Tools

Resemble Enhance
The resemble-enhance AI model supports voice noise reduction and enhancement, capable of efficiently removing background noise, restoring voice details, and improving voice quality. It includes both noise reduction and enhancement modules, which separate voice signals from noise and enhance voice quality through deep learning algorithms. The model is trained for true HI-FI 44.1kHz voice, delivering high-quality enhanced speech. Users can install it via pip, or customize and train their own model based on provided code. This powerful yet user-friendly solution is the top choice for enhancing voice quality.
AI Audio Enhancer
221.6K
Fresh Picks

Foleycrafter
FoleyCrafter is a text-based video to audio generation framework capable of producing high-quality audio that is semantically relevant to the input video and time-synced. This technology holds significant importance in video production, especially during post-production, where it can greatly enhance efficiency and audio quality. It was jointly developed by the Shanghai Artificial Intelligence Laboratory and the Chinese University of Hong Kong, Shenzhen.
AI Audio Editing
116.7K