Skip to main content
brand
context
industry
strategy
AaaS
Toolbenchmarks-evaluationv1.0

EleutherAI lm-evaluation-harness

by EleutherAI · open-source · Last verified 2026-04-24

The EleutherAI lm-evaluation-harness is the standard open-source framework for evaluating language models on hundreds of benchmarks including MMLU, HellaSwag, ARC, and TruthfulQA. It is used by Hugging Face's Open LLM Leaderboard and most open-source model releases to produce reproducible benchmark results.

https://github.com/EleutherAI/lm-evaluation-harness
C
CBelow Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F

Specifications

License
Open Source
Pricing
open-source
Capabilities
Integrations
Use Cases
API Available
No
SDK Languages
Tags
evaluation, benchmark, open-source, eleutherai, mmlu, reproducible, leaderboard
Added
2026-04-24
Completeness
60%

Index Score

44
Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service