Skip to main content
PaperLLMsv1.0

CogVLM: Visual Expert for Pretrained Language Models

by Tsinghua University / Zhipu AI · open-source · Last verified 2026-03-17

Introduced CogVLM, which adds a trainable visual expert module to each layer of a pretrained LLM, enabling deep fusion of visual and language features without compromising the language model's capabilities. CogVLM achieves state-of-the-art performance on 17 benchmarks while keeping the original LLM weights frozen.

https://arxiv.org/abs/2311.03079
B
BAbove Average
Adoption: B+Quality: AFreshness: ACitations: BEngagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
visual-question-answering, image-captioning, grounding, visual-reasoning
Integrations
huggingface
Use Cases
multimodal-qa, visual-grounding, image-understanding
API Available
No
Tags
cogvlm, multimodal, visual-expert, deep-fusion, vision-language
Added
2026-03-17
Completeness
100%

Index Score

63.4
Adoption
72
Quality
88
Freshness
83
Citations
68
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service