Skip to main content
ModelLLMsv2.0

CogVLM 2

by Tsinghua University / Zhipu AI · open-source · Last verified 2026-03-17

Tsinghua University's second-generation vision-language model with deep fusion of visual and linguistic features via a visual expert module. Excels at visual grounding, OCR, and detailed image understanding tasks.

https://github.com/THUDM/CogVLM2
C
CBelow Average
Adoption: CQuality: B+Freshness: C+Citations: CEngagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
image-understanding, visual-grounding, ocr, visual-reasoning, referring-expression-comprehension
Integrations
huggingface, vllm, transformers
Use Cases
visual-grounding, image-qa, document-analysis, scene-understanding
API Available
No
Parameters
19B
Context Window
8K tokens
Modalities
text, image
Training Cutoff
Mid 2024
Tags
multimodal, vision, open-source, tsinghua, visual-grounding
Added
2026-03-17
Completeness
88%

Index Score

42
Adoption
40
Quality
70
Freshness
50
Citations
48
Engagement
0

Explore the full AI ecosystem on Agents as a Service