

Vary Toy
Overview :
Vary-toy is a miniature Vary model based on Qwen-1.8B as the underlying 'large' language model. Vary-toy incorporates an improved visual vocabulary, enabling the model to possess all the characteristics of Vary and exhibit broader generalization capabilities. Specifically, in the process of generating visual vocabulary, we replace negative samples from natural images with positive samples driven by object detection, fully utilizing the capacity of the vocabulary network to efficiently encode visual information corresponding to natural objects. In experiments, Vary-toy achieved 65.6% ANLS on DocVQA, a 59.1% accuracy on ChartQA, an 88.1% accuracy on RefCOCO, and a 29% accuracy on MMVet. Pricing: Free trial available, paid version price to be determined. Positioning: Providing researchers with a solution to train and deploy LVLMs on ordinary GPUs under limited resources.
Target Users :
Researchers train and deploy LVLMs on ordinary GPUs under resource constraints
Use Cases
Researchers conduct document visual question answering experiments on ordinary GPUs using Vary-toy
Researchers conduct chart question answering experiments on ordinary GPUs using Vary-toy
Researchers conduct reference focus pointing experiments on ordinary GPUs using Vary-toy
Features
Miniature Vary model based on Qwen-1.8B
Enhanced visual vocabulary
Replace natural image negative samples with object detection-driven positive samples
Efficient encoding of visual information corresponding to natural objects
Achieved good performance on DocVQA, ChartQA, RefCOCO, and MMVet
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
7.0M