MLE-bench
by OpenAI · open-source · Last verified 2026-03-01
Benchmark evaluating AI agents on real Kaggle machine learning competitions. Tests the full ML engineering pipeline including data exploration, feature engineering, model selection, training, and submission formatting against actual competition leaderboards.
https://github.com/openai/mle-bench ↗C+
C+—Average
Adoption: C+Quality: AFreshness: A+Citations: C+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- agent-evaluation, ml-pipeline-testing, competition-benchmarking
- Integrations
- docker, kaggle
- Use Cases
- ml-agent-evaluation, data-science-capability-testing, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro
- Metrics
- medal-rate, above-median-rate, competition-score
- Methodology
- 75 real Kaggle competitions. Agents work in sandboxed environments with dataset access, generating submissions scored against actual competition metrics.
- Last Run
- 2026-03-05
- Tags
- benchmark, evaluation, machine-learning, kaggle, data-science
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
54.8Adoption
58
Quality
88
Freshness
90
Citations
56
Engagement
0