Skip to main content
BenchmarkAI for Codev1.0

MLE-bench

by OpenAI · open-source · Last verified 2026-03-01

Benchmark evaluating AI agents on real Kaggle machine learning competitions. Tests the full ML engineering pipeline including data exploration, feature engineering, model selection, training, and submission formatting against actual competition leaderboards.

https://github.com/openai/mle-bench
C
CBelow Average
Adoption: C+Quality: AFreshness: A+Citations: FEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
agent-evaluation, ml-pipeline-testing, competition-benchmarking
Integrations
docker, kaggle
Use Cases
ml-agent-evaluation, data-science-capability-testing, research
API Available
No
Evaluated Models
claude-4, gpt-5, gemini-2.5-pro
Metrics
medal-rate, above-median-rate, competition-score
Methodology
75 real Kaggle competitions. Agents work in sandboxed environments with dataset access, generating submissions scored against actual competition metrics.
Last Run
2026-03-05
Tags
benchmark, evaluation, machine-learning, kaggle, data-science
Added
2026-03-17
Completeness
80%

Index Score

41
Adoption
58
Quality
88
Freshness
90
Citations
0
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service