Skip to main content
brand
context
industry
strategy
AaaS
Paperroboticsv2.0

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

by Google DeepMind · unknown · Last verified 2026-03-17

RT-2 is a Vision-Language-Action (VLA) model that translates visual and language inputs directly into robotic actions. By co-fine-tuning large models on both web-scale and robotics data, it transfers knowledge from the internet to physical control, enabling robots to reason about and execute tasks involving novel objects and scenarios without explicit robotic training.

https://arxiv.org/abs/2307.15818
B
BAbove Average
Adoption: B+Quality: A+Freshness: B+Citations: B+Engagement: F

Specifications

License
Open Access
Pricing
unknown
Capabilities
end-to-end robotic control, visual-reasoning, action-generation, zero-shot generalization to new tasks, emergent reasoning capabilities, symbolic understanding, multi-stage semantic reasoning, transfer of web-scale knowledge, natural language instruction following
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Tags
robotics, vision-language-models, action-models, transfer-learning, google-research, foundation-models, embodied-ai, zero-shot-learning, generalist-robots, vla
Added
2026-03-17
Completeness
0.9%

Index Score

65.5
Adoption
72
Quality
90
Freshness
78
Citations
75
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service