Skip to main content
ModelLLMsvLlama-3.2-11B-Vision-Instruct

Llama 3.2 11B Vision

by Meta · open-source · Last verified 2026-03-17

Meta's first multimodal Llama model with native image understanding capabilities at a compact 11B parameter size. Bridges text and vision tasks in a single open-source model suitable for diverse deployments.

https://llama.meta.com
B
BAbove Average
Adoption: B+Quality: B+Freshness: B+Citations: BEngagement: F

Specifications

License
Llama 3.2 Community License
Pricing
open-source
Capabilities
text-generation, image-understanding, visual-qa, instruction-following, multimodal-reasoning
Integrations
huggingface, ollama, vllm, together-ai
Use Cases
visual-qa, image-captioning, document-understanding, multimodal-chatbots, accessibility
API Available
No
Parameters
11B
Context Window
128K tokens
Modalities
text, image
Training Cutoff
December 2023
Tags
llm, open-source, multimodal, vision, compact, meta
Added
2026-03-17
Completeness
100%

Index Score

60.8
Adoption
72
Quality
75
Freshness
70
Citations
68
Engagement
0

Explore the full AI ecosystem on Agents as a Service