RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
by Google DeepMind · free · Last verified 2026-03-17
RT-2 co-fine-tunes vision-language models on web and robotics data to produce action tokens directly, enabling robots to reason about novel objects and tasks never seen during robot training. The work demonstrates that internet-scale pretraining can be transferred to physical manipulation with minimal robot-specific data.
https://arxiv.org/abs/2307.15818 ↗B
B—Above Average
Adoption: B+Quality: A+Freshness: B+Citations: B+Engagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- robotic-control, visual-reasoning, action-generation, zero-shot-generalization
- Integrations
- Use Cases
- robotic-manipulation, household-automation, industrial-robotics
- API Available
- No
- Tags
- robotics, vision-language, action-models, transfer-learning, google
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
65.5Adoption
72
Quality
90
Freshness
78
Citations
75
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.