EleutherAI lm-evaluation-harness
by EleutherAI · open-source · Last verified 2026-04-24
The EleutherAI lm-evaluation-harness is the standard open-source framework for evaluating language models on hundreds of benchmarks including MMLU, HellaSwag, ARC, and TruthfulQA. It is used by Hugging Face's Open LLM Leaderboard and most open-source model releases to produce reproducible benchmark results.
https://github.com/EleutherAI/lm-evaluation-harness ↗C
C—Below Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F
Specifications
- License
- Open Source
- Pricing
- open-source
- Capabilities
- Integrations
- Use Cases
- API Available
- No
- SDK Languages
- Tags
- evaluation, benchmark, open-source, eleutherai, mmlu, reproducible, leaderboard
- Added
- 2026-04-24
- Completeness
- 60%
Index Score
44Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0