Skip to main content
ModelLLMsv1.6

LLaVA 1.6

by LLaVA Team (UW-Madison / Microsoft) · open-source · Last verified 2026-03-17

Large Language and Vision Assistant with improved visual reasoning and OCR capabilities through dynamic high-resolution image encoding. An open-source multimodal model that rivals proprietary alternatives on vision benchmarks.

https://llava-vl.github.io
C+
C+Average
Adoption: C+Quality: B+Freshness: C+Citations: BEngagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
image-understanding, visual-reasoning, ocr, visual-instruction-following, multi-image-understanding
Integrations
huggingface, ollama, vllm, langchain
Use Cases
visual-qa, image-captioning, document-understanding, multimodal-chatbot
API Available
No
Parameters
34B
Context Window
4K tokens
Modalities
text, image
Training Cutoff
Late 2023
Tags
multimodal, vision, open-source, visual-instruction-tuning, research
Added
2026-03-17
Completeness
92%

Index Score

53.85
Adoption
58
Quality
72
Freshness
52
Citations
65
Engagement
0

Explore the full AI ecosystem on Agents as a Service