Benchmarkbenchmarks-evaluationv1.0

GPQA Diamond

by NYU / Cohere · free · Last verified 2026-04-24

GPQA Diamond (Graduate-Level Google-Proof Q&A) is a challenging multiple-choice benchmark requiring expert-level knowledge in biology, chemistry, and physics. Questions are designed to be answerable by domain PhD students but not by web search. GPQA Diamond is the standard for measuring frontier scientific reasoning capability.

https://github.com/idavidrein/gpqa ↗

D—Poor

Adoption: C+Quality: B+Freshness: ACitations: FEngagement: F

Specifications

License: Proprietary
Pricing: free
Capabilities
Integrations
Use Cases
API Available: No
Tags: benchmark, science, reasoning, graduate-level, biology, chemistry, physics
Added: 2026-04-24
Completeness: 73%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service