AIM : Pre-training of Large-Scale Autoregressive Image Models

AIM

AIM

AI Model AI Image Generation #Visual Model #Autoregressive Pre-training #Large-Scale Data Standard Picks Paid

Overview :

This paper introduces AIM, a family of visual models pre-trained using autoregressive objectives. Inspired by their textual counterparts, the large language models (LLMs), these models exhibit similar scaling properties. Specifically, we highlight two key findings: (1) the performance of visual features improves with increasing model capacity and dataset size, and (2) the value of the objective function correlates with model performance on downstream tasks. By pre-training a 70-billion parameter AIM on 2 billion images, we achieved 84.0% accuracy on ImageNet-1k using a frozen backbone. Interestingly, even at this scale, we observe no signs of performance saturation, suggesting that AIM may represent a new frontier in training large-scale visual models. AIM's pre-training is similar to that of LLMs and does not require any image-specific strategies to stabilize large-scale training.

Target Users :

Suitable for autoregressive pre-training on large-scale image datasets, as well as scenarios requiring training of large-scale visual models.

Total Visits： 3.1M

Top Region： US(14.90%)

Website Views ： 61.3K

Use Cases

Large-scale image recognition in autonomous driving systems

Pre-training of large-scale data in medical image analysis

Training large-scale visual models for smart surveillance systems

Features

Autoregressive Image Model Pre-training

Large-Scale Visual Model Training