Skip to main content
BenchmarkAI for Codev1.0

MLE-bench

by OpenAI · open-source · Last verified 2026-03-01

Benchmark evaluating AI agents on real Kaggle machine learning competitions. Tests the full ML engineering pipeline including data exploration, feature engineering, model selection, training, and submission formatting against actual competition leaderboards.

https://github.com/openai/mle-bench
C+
C+Average
Adoption: C+Quality: AFreshness: A+Citations: C+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
agent-evaluation, ml-pipeline-testing, competition-benchmarking
Integrations
docker, kaggle
Use Cases
ml-agent-evaluation, data-science-capability-testing, research
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro
Metrics
medal-rate, above-median-rate, competition-score
Methodology
75 real Kaggle competitions. Agents work in sandboxed environments with dataset access, generating submissions scored against actual competition metrics.
Last Run
2026-03-05
Tags
benchmark, evaluation, machine-learning, kaggle, data-science
Added
2026-03-17
Completeness
100%

Index Score

54.8
Adoption
58
Quality
88
Freshness
90
Citations
56
Engagement
0

Explore the full AI ecosystem on Agents as a Service