

Aya Vision 32B
Overview :
Aya Vision 32B is an advanced vision-language model developed by Cohere For AI, boasting 32 billion parameters and supporting 23 languages, including English, Chinese, and Arabic. This model combines the latest multilingual language model Aya Expanse 32B and the SigLIP2 vision encoder, achieving visual and language understanding integration through a multimodal adapter. It excels in the vision-language field, capable of handling complex image and text tasks such as OCR, image captioning, and visual reasoning. The release of this model aims to promote the popularization of multimodal research, providing a powerful tool for global researchers with its open-source weights. The model is licensed under CC-BY-NC and is subject to Cohere For AI's fair use policy.
Target Users :
This model is suitable for researchers, developers, and enterprises that need to handle vision-language tasks, especially those requiring multilingual support and high-performance models.
Use Cases
Use Aya Vision 32B for image captioning in Cohere Playground
Interact with the model through an interactive conversation using Hugging Face Space
Use the model for multilingual OCR tasks
Features
Supports 23 languages, covering various language scenarios
Can process image input and generate text output
Supports 16K context length, suitable for complex tasks
Provides interactive experiences, such as Cohere Playground and Hugging Face Space
Allows chat interaction with the model via WhatsApp
How to Use
Install the necessary transformers library: `pip install 'git+https://github.com/huggingface/transformers.git@v4.49.0-AyaVision'`
Load the model and processor: `AutoProcessor.from_pretrained(model_id)` and `AutoModelForImageTextToText.from_pretrained(model_id)`
Prepare input data, including images and text content
Format the input data using the `processor.apply_chat_template` method
Call the model's `generate` method to generate output text
Decode the generated tokens and get the final result
Featured AI Tools

Gemini
Gemini is the latest generation of AI system developed by Google DeepMind. It excels in multimodal reasoning, enabling seamless interaction between text, images, videos, audio, and code. Gemini surpasses previous models in language understanding, reasoning, mathematics, programming, and other fields, becoming one of the most powerful AI systems to date. It comes in three different scales to meet various needs from edge computing to cloud computing. Gemini can be widely applied in creative design, writing assistance, question answering, code generation, and more.
AI Model
11.4M
Chinese Picks

Liblibai
LiblibAI is a leading Chinese AI creative platform offering powerful AI creative tools to help creators bring their imagination to life. The platform provides a vast library of free AI creative models, allowing users to search and utilize these models for image, text, and audio creations. Users can also train their own AI models on the platform. Focused on the diverse needs of creators, LiblibAI is committed to creating inclusive conditions and serving the creative industry, ensuring that everyone can enjoy the joy of creation.
AI Model
6.9M