Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

ScienceQA

by Lu et al. / UCLA · free · Last verified 2026-03-17

ScienceQA is a large-scale multimodal benchmark featuring 21,208 science questions for grades 3-12. It uniquely combines visual diagrams and textual contexts, requiring models to perform complex reasoning. Each question includes multiple-choice options, a detailed lecture, and a step-by-step explanation for the correct answer.

https://scienceqa.github.io
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
CC BY-NC-SA 4.0
Pricing
free
Capabilities
multimodal question answering, visual reasoning and diagram understanding, scientific knowledge retrieval, natural language understanding, chain-of-thought reasoning evaluation, explanation generation, evaluating models on K-12 science topics
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, gemini-2-5-pro, llava-1.6
Metrics
accuracy
Methodology
21,208 multiple-choice questions across 26 topics. Models answer with or without context images; performance averaged across image/text-only subsets. Chain-of-thought explanations optionally evaluated for rationale quality.
Last Run
2026-02-02
Tags
benchmark, science-qa, multimodal-reasoning, visual-question-answering, vqa, k12-education, chain-of-thought, explanation-generation, natural-language-processing, evaluation
Added
2026-03-17
Completeness
0.7%

Index Score

68
Adoption
76
Quality
88
Freshness
73
Citations
80
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service