LiveBench
by LiveBench Team · open-source · Last verified 2026-03-01
Continuously updated benchmark with new questions released monthly to prevent data contamination. Covers math, coding, reasoning, language, data analysis, and instruction following with automatically verifiable answers that do not require LLM judges.
https://livebench.ai ↗B
B—Above Average
Adoption: BQuality: AFreshness: A+Citations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-evaluation, contamination-free-testing, dynamic-assessment
- Integrations
- livebench-api
- Use Cases
- contamination-free-evaluation, ongoing-model-comparison, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3, llama-4-405b
- Metrics
- global-average, math-score, coding-score, reasoning-score
- Methodology
- Monthly-refreshed question sets across 6 categories. All answers programmatically verifiable without LLM judges. Questions sourced from recent events to prevent training contamination.
- Last Run
- 2026-03-15
- Tags
- benchmark, evaluation, live, contamination-free, dynamic
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
60.3Adoption
68
Quality
88
Freshness
96
Citations
62
Engagement
0