Modelmultimodalvv2

RT-2

by Google DeepMind · paid · Last verified 2026-03-17

RT-2 (Robotics Transformer 2) is Google DeepMind's vision-language-action model that directly maps visual observations and language instructions to robot actions, enabling robots to perform novel tasks through generalization from web-scale pretraining. It represents a breakthrough in combining foundation model capabilities with physical robot control.

https://robotics-transformer2.github.io/ ↗

D—Poor

Adoption: CQuality: A+Freshness: B+Citations: FEngagement: F

Specifications

License: Proprietary
Pricing: paid
Capabilities: visual-instruction-following, robot-action-generation, zero-shot-task-generalization, natural-language-robot-control
Integrations: Google Robot Platforms
Use Cases: tabletop manipulation, household task automation, novel object interaction, instruction-following robotics
API Available: No
Parameters: ~55B
Context Window: N/A
Modalities: vision, text, action
Training Cutoff: 2023
Tags: robotics, google, vision-language-action, embodied-ai, manipulation
Added: 2026-03-17
Completeness: 87%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service