Flamingo: a Visual Language Model for Few-Shot Learning
by DeepMind · free · Last verified 2026-03-17
Introduced Flamingo, a family of visual language models that bridge powerful pretrained vision and language models, enabling few-shot learning on a diverse range of multimodal tasks by training on arbitrarily interleaved sequences of images, video, and text. Flamingo set new few-shot state-of-the-art on 16 benchmarks.
https://arxiv.org/abs/2204.14198 ↗B+
B+—Good
Adoption: B+Quality: A+Freshness: B+Citations: AEngagement: F
Specifications
- License
- Open Access
- Pricing
- free
- Capabilities
- visual-question-answering, image-captioning, few-shot-learning, video-understanding
- Integrations
- Use Cases
- few-shot-multimodal-tasks, vqa, image-captioning
- API Available
- No
- Tags
- flamingo, multimodal, few-shot, vision-language, deepmind
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
71.8Adoption
78
Quality
93
Freshness
74
Citations
88
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.