DatasetComputer Visionv1.0

ShareGPT4V

by Shanghai AI Lab · free · Last verified 2026-03-17

ShareGPT4V is a large-scale, high-quality dataset containing 100,000 image-text pairs generated by GPT-4V. It is specifically designed for the instruction-tuning of open-source large vision-language models (LVLMs). The dataset's detailed captions and conversational QA pairs significantly enhance a model's ability to perform complex scene understanding, OCR, and visual reasoning.

https://sharegpt4v.github.io ↗

C—Below Average

Adoption: B+Quality: A+Freshness: B+Citations: FEngagement: F

Specifications

License: Apache-2.0
Pricing: free
Capabilities: Instruction-tuning for LVLMs, Complex scene understanding, Visual question answering (VQA), Detailed image captioning, Optical Character Recognition (OCR) in context, Visual reasoning, Multimodal conversation generation, Object attribute recognition
Integrations: [object Object], [object Object], [object Object], [object Object]
Use Cases: [object Object], [object Object], [object Object], [object Object], [object Object]
API Available: Yes
Tags: dataset, multimodal, instruction-tuning, vision-language, gpt-4v, llava, computer-vision, image-captioning, visual-question-answering, synthetic-data, ocr
Added: 2026-03-17
Completeness: 1%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service