ShareGPT4V
by Shanghai AI Lab · free · Last verified 2026-03-17
ShareGPT4V is a large-scale, high-quality dataset containing 100,000 image-text pairs generated by GPT-4V. It is specifically designed for the instruction-tuning of open-source large vision-language models (LVLMs). The dataset's detailed captions and conversational QA pairs significantly enhance a model's ability to perform complex scene understanding, OCR, and visual reasoning.
https://sharegpt4v.github.io ↗B
B—Above Average
Adoption: B+Quality: A+Freshness: B+Citations: B+Engagement: F
Specifications
- License
- Apache-2.0
- Pricing
- free
- Capabilities
- Instruction-tuning for LVLMs, Complex scene understanding, Visual question answering (VQA), Detailed image captioning, Optical Character Recognition (OCR) in context, Visual reasoning, Multimodal conversation generation, Object attribute recognition
- Integrations
- [object Object], [object Object], [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Tags
- dataset, multimodal, instruction-tuning, vision-language, gpt-4v, llava, computer-vision, image-captioning, visual-question-answering, synthetic-data, ocr
- Added
- 2026-03-17
- Completeness
- 1%
Index Score
65.1Adoption
70
Quality
93
Freshness
78
Citations
74
Engagement
0