GPQA
by NYU · open-source · Last verified 2026-03-01
Graduate-level Google-Proof Question Answering benchmark featuring questions written by domain experts in physics, chemistry, and biology. Questions are designed to be unsearchable, requiring genuine reasoning rather than memorization.
https://github.com/idavidrein/gpqa ↗B+
B+—Good
Adoption: AQuality: A+Freshness: ACitations: AEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- model-evaluation, expert-knowledge-testing, reasoning-assessment
- Integrations
- lm-eval-harness
- Use Cases
- frontier-model-evaluation, reasoning-benchmarking, expert-level-assessment
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- accuracy, diamond-accuracy
- Methodology
- Expert-written multiple-choice questions in STEM fields. Diamond subset contains the hardest questions validated by multiple domain experts.
- Last Run
- 2026-02-25
- Tags
- benchmark, evaluation, graduate-level, reasoning, expert
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
71.6Adoption
82
Quality
94
Freshness
86
Citations
80
Engagement
0