An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
by Google Brain · free · Last verified 2026-03-17
Introduced the Vision Transformer (ViT), demonstrating that a pure transformer applied directly to sequences of image patches achieves state-of-the-art performance on image classification when pretrained on large datasets. The paper challenged the dominance of convolutional neural networks in computer vision.
https://arxiv.org/abs/2010.11929 ↗A
A—Great
Adoption: A+Quality: A+Freshness: B+Citations: A+Engagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- image-classification, feature-extraction, transfer-learning
- Integrations
- Use Cases
- image-classification, vision-pretraining, feature-extraction
- API Available
- No
- Tags
- vision-transformer, image-classification, attention, self-supervised, pretraining
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
81.9Adoption
95
Quality
97
Freshness
72
Citations
98
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.