CogVLM 2
by Tsinghua University / Zhipu AI · open-source · Last verified 2026-03-17
Tsinghua University's second-generation vision-language model with deep fusion of visual and linguistic features via a visual expert module. Excels at visual grounding, OCR, and detailed image understanding tasks.
https://github.com/THUDM/CogVLM2 ↗C
C—Below Average
Adoption: CQuality: B+Freshness: C+Citations: CEngagement: F
Specifications
- License
- Apache 2.0
- Pricing
- open-source
- Capabilities
- image-understanding, visual-grounding, ocr, visual-reasoning, referring-expression-comprehension
- Integrations
- huggingface, vllm, transformers
- Use Cases
- visual-grounding, image-qa, document-analysis, scene-understanding
- API Available
- No
- Parameters
- 19B
- Context Window
- 8K tokens
- Modalities
- text, image
- Training Cutoff
- Mid 2024
- Tags
- multimodal, vision, open-source, tsinghua, visual-grounding
- Added
- 2026-03-17
- Completeness
- 88%
Index Score
42Adoption
40
Quality
70
Freshness
50
Citations
48
Engagement
0