EVE : Decoder-free vision-language model, efficient and data-driven.

EVE

EVE

AI Model AI Image Generation #Vision-language model #Decoder-free #Data-driven #AI research Standard Picks Open Source

Overview :

EVE is a decoder-free vision-language model jointly developed by researchers from Dalian University of Technology, Beijing Institute of Artificial Intelligence, and Peking University. It demonstrates exceptional capabilities across different image aspect ratios, outperforming Fuyu-8B and approaching the performance of modular encoder-based LVMs. EVE excels in data efficiency and training efficiency, using 33M publicly available data for pre-training and leveraging 665K LLaVA SFT data for training the EVE-7B model, along with an additional 1.2M SFT data for the EVE-7B (HD) model. The development of EVE adopts efficient, transparent, and practical strategies, paving the way for novel paradigms in cross-modal pure decoder architectures.

Target Users :

The EVE model is primarily intended for researchers and developers in the field of artificial intelligence, especially those specializing in vision-language tasks and natural language processing. Due to its efficient data processing capabilities and training efficiency, EVE is highly suitable for scenarios requiring handling of large-scale visual data and language models, and it plays a significant role in advancing the field of artificial intelligence.

Total Visits： 474.6M

Top Region： US(19.34%)

Website Views ： 48.0K

Use Cases

Researchers utilize the EVE model for image captioning tasks.

Developers leverage EVE for the development of visual question answering systems.

Educational institutions employ the EVE model to teach the construction and application of vision-language models.

Features

Vision-language model design for any image aspect ratio.

Efficient pre-training using a small amount of public data.

Further optimization through the use of a large amount of SFT data.

Training efficiency achieved in approximately 9 days using two 8-A100 (40G) nodes.