Skip to main content
Modelmultimodalvv2

RT-2

by Google DeepMind · paid · Last verified 2026-03-17

RT-2 (Robotics Transformer 2) is Google DeepMind's vision-language-action model that directly maps visual observations and language instructions to robot actions, enabling robots to perform novel tasks through generalization from web-scale pretraining. It represents a breakthrough in combining foundation model capabilities with physical robot control.

https://robotics-transformer2.github.io/
C+
C+Average
Adoption: CQuality: A+Freshness: B+Citations: AEngagement: F

Specifications

License
Proprietary
Pricing
paid
Capabilities
visual-instruction-following, robot-action-generation, zero-shot-task-generalization, natural-language-robot-control
Integrations
Google Robot Platforms
Use Cases
tabletop manipulation, household task automation, novel object interaction, instruction-following robotics
API Available
No
Parameters
~55B
Context Window
N/A
Modalities
vision, text, action
Training Cutoff
2023
Tags
robotics, google, vision-language-action, embodied-ai, manipulation
Added
2026-03-17
Completeness
100%

Index Score

54
Adoption
40
Quality
90
Freshness
72
Citations
80
Engagement
0

Explore the full AI ecosystem on Agents as a Service