BenchmarkComputer Visionv2.0

VQA v2

by Georgia Tech / VT · open-source · Last verified 2026-03-01

Visual Question Answering benchmark requiring models to answer open-ended questions about images. Version 2 balances the dataset to reduce language biases, ensuring models must genuinely understand image content rather than relying on question-type priors.

https://visualqa.org ↗

B+

B+—Good

Adoption: AQuality: AFreshness: BCitations: AEngagement: F

Specifications

License: CC-BY-4.0
Pricing: open-source
Capabilities: model-evaluation, visual-qa-testing, image-understanding-assessment
Integrations: lm-eval-harness
Use Cases: visual-understanding-evaluation, image-qa-testing, multimodal-comparison
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro
Metrics: accuracy, yes-no-accuracy, number-accuracy, other-accuracy
Methodology: Open-ended questions about COCO images. Answers evaluated against 10 human annotations using soft accuracy metric where answer is correct if it matches at least 3 annotators.
Last Run: 2026-01-15
Tags: benchmark, evaluation, multimodal, visual-qa, understanding
Added: 2026-03-17
Completeness: 100%

Index Score

70.3

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service