CLaMP 3
C
Clamp 3
Overview :
CLaMP 3 is an advanced music information retrieval model that aligns features of musical scores, performance signals, audio recordings, and multilingual text through contrastive learning, supporting cross-modal and cross-lingual music retrieval. It demonstrates strong generalization capabilities by handling unaligned modalities and unseen languages. Trained on the large-scale dataset M4-RAG, which covers various global music traditions, the model supports a variety of music retrieval tasks such as text-to-music and image-to-music.
Target Users :
CLaMP 3 is designed for music researchers, music recommendation system developers, music educators, and users interested in cross-modal music retrieval. It helps users quickly find music that matches text descriptions or image scenes, improving the efficiency and accuracy of music retrieval.
Total Visits: 0
Top Region: KR(100.00%)
Website Views : 47.5K
Use Cases
Retrieve music via text description: Input keywords such as 'big band, major key, swing' to retrieve matching music.
Retrieve music via image: Input an image of a wedding scene, and the model retrieves a wedding march based on the generated description.
Zero-shot music classification: Input an unlabeled piece of music, and the model classifies it into the corresponding music category through semantic similarity.
Features
Supports cross-modal music retrieval, such as retrieving audio from musical scores.
Supports multilingual text-to-music retrieval, including unseen languages.
Supports image-to-music retrieval by matching image descriptions to music.
Supports zero-shot music classification through semantic similarity computation.
Supports music semantic similarity assessment, highly consistent with human perception.
Provides a large-scale music-text pair dataset M4-RAG and a benchmark dataset WikiMT-X.
Visualizes music modalities and semantic distributions through t-SNE.
How to Use
1. Access the online demo page of CLaMP 3 or download the model weights.
2. Input a query using text descriptions, images, or other modalities.
3. The model aligns the features of the query and music modalities through contrastive learning.
4. Retrieve the music that best matches the query.
5. Visualize music modalities and semantic distributions using the provided tools.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase