BenchmarkLLMsv2024

AIME 2024

by MAA · free · Last verified 2026-03-01

A highly challenging benchmark for evaluating the mathematical reasoning of frontier AI models. It uses 30 problems from the 2024 American Invitational Mathematics Examination (AIME), which are designed to test creative problem-solving, multi-step deduction, and knowledge across number theory, geometry, algebra, and combinatorics.

https://artofproblemsolving.com/wiki/index.php/2024_AIME ↗

B—Above Average

Adoption: B+Quality: AFreshness: ACitations: B+Engagement: F

Specifications

License: Public Domain
Pricing: free
Capabilities: evaluating advanced mathematical problem-solving, benchmarking multi-step logical reasoning chains, assessing creative and non-standard solution strategies, testing proficiency in number theory, geometry, and combinatorics, measuring performance on pre-olympiad level mathematics, gauging model ability for abstract thinking and symbolic manipulation, verifying formal proof construction and validation
Integrations
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: No
Evaluated Models: claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
Metrics: solve-rate, average-score
Methodology: 30 AIME problems with integer answers 0-999. Models solve problems and provide final integer answers evaluated for exact match.
Last Run: 2026-02-15
Tags: benchmark, model-evaluation, mathematics, reasoning, llm-benchmark, competition-math, problem-solving, number-theory, geometry, combinatorics
Added: 2026-03-17
Completeness: 0.9%

Index Score

64.9

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service