Vary Toy : A miniature language model combined with enhanced visual vocabulary

Vary Toy

AI Model AI Model Inference Training #Miniature Model #Visual Vocabulary #LVLMs #Ordinary GPU Standard Picks Open Source

Overview :

Vary-toy is a miniature Vary model based on Qwen-1.8B as the underlying 'large' language model. Vary-toy incorporates an improved visual vocabulary, enabling the model to possess all the characteristics of Vary and exhibit broader generalization capabilities. Specifically, in the process of generating visual vocabulary, we replace negative samples from natural images with positive samples driven by object detection, fully utilizing the capacity of the vocabulary network to efficiently encode visual information corresponding to natural objects. In experiments, Vary-toy achieved 65.6% ANLS on DocVQA, a 59.1% accuracy on ChartQA, an 88.1% accuracy on RefCOCO, and a 29% accuracy on MMVet. Pricing: Free trial available, paid version price to be determined. Positioning: Providing researchers with a solution to train and deploy LVLMs on ordinary GPUs under limited resources.

Target Users :

Researchers train and deploy LVLMs on ordinary GPUs under resource constraints

Total Visits： 29.7M

Top Region： US(17.58%)

Website Views ： 75.3K

Use Cases

Researchers conduct document visual question answering experiments on ordinary GPUs using Vary-toy

Researchers conduct chart question answering experiments on ordinary GPUs using Vary-toy

Researchers conduct reference focus pointing experiments on ordinary GPUs using Vary-toy

Features

Miniature Vary model based on Qwen-1.8B

Enhanced visual vocabulary