

Clamp 3
Overview :
CLaMP 3 is an advanced music information retrieval model that aligns features of musical scores, performance signals, audio recordings, and multilingual text through contrastive learning, supporting cross-modal and cross-lingual music retrieval. It demonstrates strong generalization capabilities by handling unaligned modalities and unseen languages. Trained on the large-scale dataset M4-RAG, which covers various global music traditions, the model supports a variety of music retrieval tasks such as text-to-music and image-to-music.
Target Users :
CLaMP 3 is designed for music researchers, music recommendation system developers, music educators, and users interested in cross-modal music retrieval. It helps users quickly find music that matches text descriptions or image scenes, improving the efficiency and accuracy of music retrieval.
Use Cases
Retrieve music via text description: Input keywords such as 'big band, major key, swing' to retrieve matching music.
Retrieve music via image: Input an image of a wedding scene, and the model retrieves a wedding march based on the generated description.
Zero-shot music classification: Input an unlabeled piece of music, and the model classifies it into the corresponding music category through semantic similarity.
Features
Supports cross-modal music retrieval, such as retrieving audio from musical scores.
Supports multilingual text-to-music retrieval, including unseen languages.
Supports image-to-music retrieval by matching image descriptions to music.
Supports zero-shot music classification through semantic similarity computation.
Supports music semantic similarity assessment, highly consistent with human perception.
Provides a large-scale music-text pair dataset M4-RAG and a benchmark dataset WikiMT-X.
Visualizes music modalities and semantic distributions through t-SNE.
How to Use
1. Access the online demo page of CLaMP 3 or download the model weights.
2. Input a query using text descriptions, images, or other modalities.
3. The model aligns the features of the query and music modalities through contrastive learning.
4. Retrieve the music that best matches the query.
5. Visualize music modalities and semantic distributions using the provided tools.
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M