The Llama 4 Herd: The Beginning of a New Era of Natively Multimodal AI
by Meta AI · open-source · Last verified 2026-03-17
Llama 4 introduces a family of natively multimodal mixture-of-experts models—Scout (17B/16 experts), Maverick (17B/128 experts), and Behemoth (288B/16 experts)—pretrained jointly on text, image, and video data. Maverick achieves top scores on vision-language benchmarks while Scout offers 10M-token context at a fraction of the compute of comparable models.
https://arxiv.org/abs/2504.07557 ↗B
B—Above Average
Adoption: AQuality: A+Freshness: A+Citations: BEngagement: F
Specifications
- License
- Llama 4 Community License
- Pricing
- open-source
- Capabilities
- text-generation, image-understanding, video-understanding, long-context, mixture-of-experts
- Integrations
- Hugging Face, Ollama, Together AI, Groq, AWS Bedrock
- Use Cases
- multimodal-reasoning, document-understanding, video-analysis, code-generation, long-document-qa
- API Available
- Yes
- Tags
- llm, multimodal, mixture-of-experts, meta, open-source, 2025
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
67.5Adoption
82
Quality
92
Freshness
98
Citations
65
Engagement
0
Put AI to work for your business
Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.