Skip to main content
BenchmarkLLMsv1.0

ScienceQA

by Lu et al. / UCLA · open-source · Last verified 2026-03-17

ScienceQA is a multimodal benchmark of 21,000 science questions spanning natural science, language science, and social science at grade-school through high-school levels. Each question includes a natural-language or diagram context, multiple-choice answers, and an annotated lecture and explanation for rationale evaluation.

https://scienceqa.github.io
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: AEngagement: F

Specifications

License
CC BY-NC-SA 4.0
Pricing
open-source
Capabilities
evaluation, multimodal-reasoning, science-qa
Integrations
Use Cases
model-evaluation, multimodal-ai, educational-ai
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, gemini-2-5-pro, llava-1.6
Metrics
accuracy
Methodology
21,208 multiple-choice questions across 26 topics. Models answer with or without context images; performance averaged across image/text-only subsets. Chain-of-thought explanations optionally evaluated for rationale quality.
Last Run
2026-02-02
Tags
science, multimodal, k12, explanation, multiple-choice
Added
2026-03-17
Completeness
100%

Index Score

68
Adoption
76
Quality
88
Freshness
73
Citations
80
Engagement
0

Explore the full AI ecosystem on Agents as a Service