Codeforces Benchmark
by Codeforces / Community · open-source · Last verified 2026-03-01
Evaluates models on competitive programming problems from the Codeforces platform across difficulty ratings. Tests algorithmic thinking, data structure knowledge, and the ability to produce correct and efficient solutions under competitive constraints.
https://codeforces.com ↗C+
C+—Average
Adoption: BQuality: AFreshness: ACitations: C+Engagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- model-evaluation, algorithmic-testing, competitive-programming-assessment
- Integrations
- codeforces-api
- Use Cases
- algorithmic-ability-testing, competitive-programming-evaluation, research
- API Available
- No
- Evaluated Models
- claude-4, gpt-5, gemini-2.5-pro, deepseek-v3
- Metrics
- pass-rate, elo-rating
- Methodology
- Problems from Codeforces rated 800-3000. Models generate solutions judged by the online judge for correctness and time/memory limits.
- Last Run
- 2026-03-01
- Tags
- benchmark, evaluation, competitive-programming, algorithms, problem-solving
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
55.7Adoption
62
Quality
82
Freshness
86
Citations
58
Engagement
0