Aria-Base-64K
A
Aria Base 64K
Overview :
Aria-Base-64K is one of the foundational models in the Aria series, designed for research purposes and further training. It emerged after the long text pre-training phase, trained on 33 billion tokens (21 billion multimodal and 12 billion language tokens, with 69% being long texts). It is suitable for further pre-training or fine-tuning on long video question answering datasets or long document question answering datasets, even in resource-constrained environments, through post-training with short instruction tuning datasets tailored for long text scenarios. The model can comprehend up to 250 high-resolution images or up to 500 medium-resolution images, maintaining robust foundational performance in both language and multimodal contexts.
Target Users :
The target audience includes researchers and developers, particularly professionals who need to handle long texts and multimodal datasets. Aria-Base-64K offers a powerful pre-trained model suitable for scenarios such as video question answering and long document question answering, helping to enhance processing efficiency and accuracy.
Total Visits: 29.7M
Top Region: US(17.94%)
Website Views : 45.0K
Use Cases
- Develop a video question answering system using Aria-Base-64K to enhance understanding of video content.
- Apply Aria-Base-64K to long document question answering, improving efficiency in document retrieval and comprehension.
- Utilize Aria-Base-64K for joint inference of images and text to develop new multimodal applications.
Features
- Long text pre-training: Trained on 33 billion tokens, ideal for further pre-training or fine-tuning on long video question answering and long document question answering datasets.
- Multimodal understanding: Capable of understanding up to 250 high-resolution images or up to 500 medium-resolution images.
- Strong foundational performance: Maintains the same robust foundational performance as Aria-Base-8K across language and multimodal scenarios.
- Low ratio chat template training: Trained using only about 3% of the data in a chat template format, which may not be suitable for direct chat template usage.
- Quick start support: Provides quick installation and inference code examples for users to rapidly begin using the model.
- Advanced inference and fine-tuning: Offers a code repository that supports more advanced inference, examples, and fine-tuning on custom datasets.
How to Use
1. Install necessary libraries: Use pip to install transformers, accelerate, sentencepiece, and other libraries.
2. Load the model: Load the Aria-Base-64K model using AutoModelForCausalLM.from_pretrained.
3. Process input: Use AutoProcessor.from_pretrained to process text and image inputs.
4. Perform inference: Pass the processed inputs to the model to execute generation tasks.
5. Decode output: Use the processor to decode the tokens output by the model to obtain the final results.
6. Advanced usage: For more advanced inference and fine-tuning, access the code repository on GitHub.
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase