Skip to main content
BenchmarkComputer Visionv2.0

VQA v2

by Georgia Tech / VT · open-source · Last verified 2026-03-01

Visual Question Answering benchmark requiring models to answer open-ended questions about images. Version 2 balances the dataset to reduce language biases, ensuring models must genuinely understand image content rather than relying on question-type priors.

https://visualqa.org
B+
B+Good
Adoption: AQuality: AFreshness: BCitations: AEngagement: F

Specifications

License
CC-BY-4.0
Pricing
open-source
Capabilities
model-evaluation, visual-qa-testing, image-understanding-assessment
Integrations
lm-eval-harness
Use Cases
visual-understanding-evaluation, image-qa-testing, multimodal-comparison
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro
Metrics
accuracy, yes-no-accuracy, number-accuracy, other-accuracy
Methodology
Open-ended questions about COCO images. Answers evaluated against 10 human annotations using soft accuracy metric where answer is correct if it matches at least 3 annotators.
Last Run
2026-01-15
Tags
benchmark, evaluation, multimodal, visual-qa, understanding
Added
2026-03-17
Completeness
100%

Index Score

70.3
Adoption
82
Quality
80
Freshness
68
Citations
86
Engagement
0

Explore the full AI ecosystem on Agents as a Service