InternVL2-8B-MPO
I
Internvl2 8B MPO
Overview :
InternVL2-8B-MPO is a multimodal large language model (MLLM) that enhances multimodal inference capabilities by introducing a Mixed Preference Optimization (MPO) process. The model features an automated pipeline for preference data construction and builds the MMPR, a large-scale multimodal inference preference dataset. Based on the InternVL2-8B model, InternVL2-8B-MPO is fine-tuned using the MMPR dataset, demonstrating stronger multimodal inference capabilities with fewer hallucinations. The model achieved an accuracy of 67.0% on MathVista, surpassing the InternVL2-8B by 8.7 points, and performing closely to the much larger InternVL2-76B model.
Target Users :
The target audience includes researchers, developers, and enterprise users, especially those who need to handle multimodal data (such as images and text) and wish to enhance the inference capabilities of models. InternVL2-8B-MPO provides more accurate data analysis and generates more reliable results, making it suitable for improving product intelligence and supporting decision-making.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 48.6K
Use Cases
Accuracy testing on the MathVista dataset achieved 67.0%.
Using InternVL2-8B-MPO for image description generation, providing detailed content descriptions.
Comparing similarities and differences between different images in multimodal reasoning tasks.
Features
? Enhanced multimodal inference capabilities: Boosted by Mixed Preference Optimization (MPO).
? High accuracy: Achieved a 67.0% accuracy on MathVista, significantly outperforming InternVL2-8B.
? Reduced hallucinations: Exhibits fewer hallucinations compared to InternVL2-8B.
? Supports multiple deployment methods: Including model deployment using LMDeploy.
? Compatible with multiple languages: As a multilingual model, it supports understanding and generation in different languages.
? Suitable for a variety of tasks: Including image-text-text tasks, capable of processing and generating text related to images.
? Model fine-tuning: Supports fine-tuning across various platforms to adapt to specific tasks.
? User-friendly: Provides detailed quick-start guidelines and APIs for easy user access.
How to Use
1. Install the necessary libraries, such as transformers and torch.
2. Load the InternVL2-8B-MPO model using AutoModel.from_pretrained.
3. Prepare your input data, including text and images.
4. Perform inference using the model to generate outputs related to the input.
5. Post-process the outputs as needed, such as text formatting or image display.
6. If necessary, fine-tune the model to adapt it to specific applications.
7. Deploy the model to a production environment; LMDeploy can be used for model deployment.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase