CogVLM
C
Cogvlm
Overview :
CogVLM is a powerful open-source visual language model. CogVLM-17B has 10 billion visual parameters and 7 billion language parameters. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmark datasets, including NoCaps, Flicker30k Captions, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA, and TDIUC, and ranks second or matches PaLI-X 55B on VQAv2, OKVQA, TextVQA, and COCO Captions. CogVLM can also engage in conversations with you about images.
Target Users :
Used for image description, question answering, and visual localization
Total Visits: 474.6M
Top Region: US(19.34%)
Website Views : 102.7K
Use Cases
Use CogVLM to accurately describe image details
Use CogVLM to answer various types of questions
Use CogVLM for visual localization
Features
Accurately describe image details
Answer various types of questions
Visual localization
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase