LLaVA 1.6
by LLaVA Team (UW-Madison / Microsoft) · open-source · Last verified 2026-03-17
Large Language and Vision Assistant with improved visual reasoning and OCR capabilities through dynamic high-resolution image encoding. An open-source multimodal model that rivals proprietary alternatives on vision benchmarks.
https://llava-vl.github.io ↗C+
C+—Average
Adoption: C+Quality: B+Freshness: C+Citations: BEngagement: F
Specifications
- License
- Apache 2.0
- Pricing
- open-source
- Capabilities
- image-understanding, visual-reasoning, ocr, visual-instruction-following, multi-image-understanding
- Integrations
- huggingface, ollama, vllm, langchain
- Use Cases
- visual-qa, image-captioning, document-understanding, multimodal-chatbot
- API Available
- No
- Parameters
- 34B
- Context Window
- 4K tokens
- Modalities
- text, image
- Training Cutoff
- Late 2023
- Tags
- multimodal, vision, open-source, visual-instruction-tuning, research
- Added
- 2026-03-17
- Completeness
- 92%
Index Score
53.85Adoption
58
Quality
72
Freshness
52
Citations
65
Engagement
0