MMStar
M
Mmstar
Overview :
MMStar is a benchmark dataset designed to assess the multimodal capabilities of large visual language models. It comprises 1500 carefully selected visual language samples, covering 6 core abilities and 18 sub-dimensions. Each sample has undergone human review, ensuring visual dependency, minimizing data leakage, and requiring advanced multimodal capabilities for resolution. In addition to traditional accuracy metrics, MMStar proposes two new metrics to measure data leakage and the practical performance gains of multimodal training. Researchers can use MMStar to evaluate the multimodal capabilities of visual language models across multiple tasks and leverage the new metrics to discover potential issues within models.
Target Users :
MMStar is primarily used to evaluate and analyze the performance of large visual language models on multimodal tasks. It helps identify potential issues within models and guide future improvements.
Total Visits: 62
Top Region: US(100.00%)
Website Views : 52.4K
Use Cases
Researchers can use MMStar to evaluate the performance of their own trained visual language models on different visual language tasks.
Model developers can use MMStar to identify potential data leakage issues in their models and take appropriate measures.
Benchmark results can provide guidance and inspiration for further improvement of existing visual language models.
Features
Contains 1500 high-quality visual language samples
Covers 6 core abilities and 18 sub-dimensions
Human review ensures visual dependency and minimizes data leakage
Proposes two new metrics: multimodal gain and data leakage
Benchmarks 16 leading visual language models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase