brand
context
industry
strategy
AaaS
Skip to main content
Benchmarkv

GPQA Diamond

by · · Last verified 2026-03-26T21:47:59.230Z

Graduate-level science benchmark with PhD-level questions for evaluating deep reasoning in LLMs.

https://arxiv.org/abs/2311.12022
D
DPoor
Adoption: FQuality: A+Freshness: A+Citations: FEngagement: F
Share

Specifications

API Available
No
Tags
science, reasoning, PhD
Added
2026-03-26T21:47:59.230Z
Completeness
0%

Index Score

38
Adoption
0
Quality
95
Freshness
100
Citations
0
Engagement
0

Put AI to work for your business

Deploy this benchmark alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Stay updated on the AI ecosystem

Get weekly insights on tools, models, agents, and more — curated by AI.