GPQA Diamond
by · · Last verified 2026-03-26T21:47:59.230Z
Graduate-level science benchmark with PhD-level questions for evaluating deep reasoning in LLMs.
https://arxiv.org/abs/2311.12022 ↗D
D—Poor
Adoption: FQuality: A+Freshness: A+Citations: FEngagement: F
Specifications
- API Available
- No
- Tags
- science, reasoning, PhD
- Added
- 2026-03-26T21:47:59.230Z
- Completeness
- 0%
Index Score
38Adoption
0
Quality
95
Freshness
100
Citations
0
Engagement
0
Put AI to work for your business
Deploy this benchmark alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.
Stay updated on the AI ecosystem
Get weekly insights on tools, models, agents, and more — curated by AI.