Google Vision Transformer
G
Google Vision Transformer
Overview :
Google Vision Transformer is an image recognition model based on the Transformer encoder. It is pre-trained on a large-scale image dataset and can be used for tasks such as image classification. The model is pre-trained on the ImageNet-21k dataset and fine-tuned on the ImageNet dataset, possessing strong image feature extraction capabilities. The model processes image data by dividing the image into fixed-size image blocks and linearly embedding these blocks. Additionally, the model incorporates positional encoding before the input sequence to handle sequential data within the Transformer encoder. Users can perform image classification and other tasks by adding a linear layer on top of the pre-trained encoder. The advantages of Google Vision Transformer lie in its powerful image feature learning ability and widespread applicability. The model is freely available for use.
Target Users :
Suitable for image classification, object detection, and image segmentation.
Total Visits: 437.9M
Top Region: US(19.34%)
Website Views : 62.7K
Features
Transformer-based image feature extraction
Supports image classification tasks
Pre-trained model suitable for transfer learning
Suitable for large-scale image data
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase