LLaVA 1.6
by LLaVA Team (UW-Madison / Microsoft) · open-source · Last verified 2026-03-17
Large Language and Vision Assistant with improved visual reasoning and OCR capabilities through dynamic high-resolution image encoding. An open-source multimodal model that rivals proprietary alternatives on vision benchmarks.
https://llava-vl.github.io ↗D
D—Poor
Adoption: C+Quality: B+Freshness: C+Citations: FEngagement: F
Specifications
- License
- Apache 2.0
- Pricing
- open-source
- Capabilities
- image-understanding, visual-reasoning, ocr, visual-instruction-following, multi-image-understanding
- Integrations
- huggingface, ollama, vllm, langchain
- Use Cases
- visual-qa, image-captioning, document-understanding, multimodal-chatbot
- API Available
- No
- Parameters
- 34B
- Context Window
- 4K tokens
- Modalities
- text, image
- Training Cutoff
- Late 2023
- Tags
- multimodal, vision, open-source, visual-instruction-tuning, research
- Added
- 2026-03-17
- Completeness
- 87%
Index Score
38Adoption
58
Quality
72
Freshness
52
Citations
2
Engagement
0