ModelLLMsv1.6

LLaVA 1.6

by LLaVA Team (UW-Madison / Microsoft) · open-source · Last verified 2026-03-17

Large Language and Vision Assistant with improved visual reasoning and OCR capabilities through dynamic high-resolution image encoding. An open-source multimodal model that rivals proprietary alternatives on vision benchmarks.

https://llava-vl.github.io ↗

C+

C+—Average

Adoption: C+Quality: B+Freshness: C+Citations: BEngagement: F

Specifications

License: Apache 2.0
Pricing: open-source
Capabilities: image-understanding, visual-reasoning, ocr, visual-instruction-following, multi-image-understanding
Integrations: huggingface, ollama, vllm, langchain
Use Cases: visual-qa, image-captioning, document-understanding, multimodal-chatbot
API Available: No
Parameters: 34B
Context Window: 4K tokens
Modalities: text, image
Training Cutoff: Late 2023
Tags: multimodal, vision, open-source, visual-instruction-tuning, research
Added: 2026-03-17
Completeness: 92%

Index Score

53.85

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service