Llama 3.2 11B Vision
by Meta · free · Last verified 2026-03-17
Llama 3.2 11B Vision is Meta's first open-source multimodal model, integrating native image understanding with advanced text generation. At a compact 11B parameters, it's designed for efficiency, enabling visual question answering, image captioning, and complex reasoning across text and images in a single, deployable model.
https://llama.meta.com ↗B
B—Above Average
Adoption: B+Quality: B+Freshness: B+Citations: BEngagement: F
Specifications
- License
- Llama 3.2 Community License
- Pricing
- free
- Capabilities
- image-understanding, visual-question-answering, text-generation-from-images, multimodal-reasoning, object-detection-via-prompting, image-captioning, chart-and-graph-interpretation, instruction-following, optical-character-recognition-ocr
- Integrations
- Hugging Face Transformers, PyTorch, LangChain, LlamaIndex, Ollama
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- No
- Parameters
- 11B
- Context Window
- 128K tokens
- Modalities
- text, image
- Training Cutoff
- December 2023
- Tags
- llm, open-source, multimodal, vision, compact, meta, vlm, visual-language-model, image-to-text, llama-3-2, computer-vision
- Added
- 2026-03-17
- Completeness
- 0.95%
Index Score
60.8Adoption
72
Quality
75
Freshness
70
Citations
68
Engagement
0