Aya Vision 32B : Aya Vision 32B is a multilingual vision-language model suitable for various applications, including OCR, image captioning, and visual reasoning.

Aya Vision 32B

AI Model Image Generation #Multilingual #Vision-Language #OCR #Image Captioning #Visual Reasoning #Open Source Standard Picks Open Source

Overview :

Aya Vision 32B is an advanced vision-language model developed by Cohere For AI, boasting 32 billion parameters and supporting 23 languages, including English, Chinese, and Arabic. This model combines the latest multilingual language model Aya Expanse 32B and the SigLIP2 vision encoder, achieving visual and language understanding integration through a multimodal adapter. It excels in the vision-language field, capable of handling complex image and text tasks such as OCR, image captioning, and visual reasoning. The release of this model aims to promote the popularization of multimodal research, providing a powerful tool for global researchers with its open-source weights. The model is licensed under CC-BY-NC and is subject to Cohere For AI's fair use policy.

Target Users :

This model is suitable for researchers, developers, and enterprises that need to handle vision-language tasks, especially those requiring multilingual support and high-performance models.

Total Visits： 25.3M

Top Region： US(17.94%)

Website Views ： 67.1K

Use Cases

Use Aya Vision 32B for image captioning in Cohere Playground

Interact with the model through an interactive conversation using Hugging Face Space

Use the model for multilingual OCR tasks

Features

Supports 23 languages, covering various language scenarios

Can process image input and generate text output

Supports 16K context length, suitable for complex tasks

Provides interactive experiences, such as Cohere Playground and Hugging Face Space