Skip to main content
PaperComputer Visionv1.0

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

by Google Brain · free · Last verified 2026-03-17

Introduced the Vision Transformer (ViT), demonstrating that a pure transformer applied directly to sequences of image patches achieves state-of-the-art performance on image classification when pretrained on large datasets. The paper challenged the dominance of convolutional neural networks in computer vision.

https://arxiv.org/abs/2010.11929
A
AGreat
Adoption: A+Quality: A+Freshness: B+Citations: A+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
image-classification, feature-extraction, transfer-learning
Integrations
Use Cases
image-classification, vision-pretraining, feature-extraction
API Available
No
Tags
vision-transformer, image-classification, attention, self-supervised, pretraining
Added
2026-03-17
Completeness
100%

Index Score

81.9
Adoption
95
Quality
97
Freshness
72
Citations
98
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service