Modelmultimodalv2.5

Qwen2.5-VL-72B

by Alibaba Cloud (Qwen Team) · free · Last verified 2026-03-17

Qwen2.5-VL-72B is Alibaba's flagship open vision-language model at 72 billion parameters, achieving top-tier performance on visual understanding benchmarks including chart analysis, document parsing, and fine-grained image understanding. It supports dynamic resolution image inputs and video understanding with native high-resolution processing.

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct ↗

C—Below Average

Adoption: B+Quality: A+Freshness: A+Citations: FEngagement: F

Specifications

License: Qwen License
Pricing: free
Capabilities: vision, visual-question-answering, document-understanding, video-understanding, ocr, agentic-vision
Integrations: Hugging Face, Qwen API, vLLM, Ollama
Use Cases: document-analysis, visual-qa, video-understanding, chart-interpretation, agentic-visual-tasks
API Available: Yes
Parameters: 72B
Context Window: 128K
Modalities: text, image, video
Training Cutoff: 2024
Tags: alibaba, qwen, vision-language, open-source, frontier, large
Added: 2026-03-17
Completeness: 80%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service