Toolbenchmarks-evaluationv1.0

EleutherAI lm-evaluation-harness

by EleutherAI · open-source · Last verified 2026-04-24

The EleutherAI lm-evaluation-harness is the standard open-source framework for evaluating language models on hundreds of benchmarks including MMLU, HellaSwag, ARC, and TruthfulQA. It is used by Hugging Face's Open LLM Leaderboard and most open-source model releases to produce reproducible benchmark results.

https://github.com/EleutherAI/lm-evaluation-harness ↗

C—Below Average

Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F

Specifications

License: Open Source
Pricing: open-source
Capabilities
Integrations
Use Cases
API Available: No
SDK Languages
Tags: evaluation, benchmark, open-source, eleutherai, mmlu, reproducible, leaderboard
Added: 2026-04-24
Completeness: 60%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service