VQA v2
by Georgia Tech / VT · open-source · Last verified 2026-03-01
Visual Question Answering benchmark requiring models to answer open-ended questions about images. Version 2 balances the dataset to reduce language biases, ensuring models must genuinely understand image content rather than relying on question-type priors.
https://visualqa.org ↗B+
B+—Good
Adoption: AQuality: AFreshness: BCitations: AEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- model-evaluation, visual-qa-testing, image-understanding-assessment
- Integrations
- lm-eval-harness
- Use Cases
- visual-understanding-evaluation, image-qa-testing, multimodal-comparison
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro
- Metrics
- accuracy, yes-no-accuracy, number-accuracy, other-accuracy
- Methodology
- Open-ended questions about COCO images. Answers evaluated against 10 human annotations using soft accuracy metric where answer is correct if it matches at least 3 annotators.
- Last Run
- 2026-01-15
- Tags
- benchmark, evaluation, multimodal, visual-qa, understanding
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
70.3Adoption
82
Quality
80
Freshness
68
Citations
86
Engagement
0