Any GPT
A
Any GPT
Overview :
AnyGPT is a unified large-scale language model that employs discrete representations for the uniform processing of various modalities, including voice, text, images, and music. AnyGPT can be trained stably without modifying the architecture or training paradigm of existing large-scale language models. It relies entirely on data-level preprocessing, which facilitates the seamless integration of new modalities into the language model, akin to the addition of a new language. We have constructed a text-centric multi-modal dataset for multi-modal alignment pre-training. Utilizing generative models, we have created the first large-scale multi-modal instruction dataset from any modality to any modality. It consists of 108,000 multi-turn dialogue examples with different modalities intertwined, enabling the model to handle combinations of any modal input and output. Experimental results indicate that AnyGPT can facilitate multi-modal dialogues from any modality to any modality and achieve performance comparable to dedicated models across all modalities, demonstrating that discrete representations can be effectively and conveniently used for unifying multiple modalities in language models.
Target Users :
["Engaging in multi-modal conversations","Supporting voice assistant and other applications","Creating multi-modal content"]
Total Visits: 423
Top Region: TH(100.00%)
Website Views : 98.0K
Features
Supporting input and output of multiple modalities including voice, text, images, and music
Conducting multi-turn multi-modal intertwined conversations
Achieving the level of dedicated models across all modalities
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase