Skip to main content
Paperroboticsv2.0

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

by Google DeepMind · free · Last verified 2026-03-17

RT-2 co-fine-tunes vision-language models on web and robotics data to produce action tokens directly, enabling robots to reason about novel objects and tasks never seen during robot training. The work demonstrates that internet-scale pretraining can be transferred to physical manipulation with minimal robot-specific data.

https://arxiv.org/abs/2307.15818
B
BAbove Average
Adoption: B+Quality: A+Freshness: B+Citations: B+Engagement: F

Specifications

License
Open Access
Pricing
free
Capabilities
robotic-control, visual-reasoning, action-generation, zero-shot-generalization
Integrations
Use Cases
robotic-manipulation, household-automation, industrial-robotics
API Available
No
Tags
robotics, vision-language, action-models, transfer-learning, google
Added
2026-03-17
Completeness
100%

Index Score

65.5
Adoption
72
Quality
90
Freshness
78
Citations
75
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service