HELM
by Stanford CRFM · free · Last verified 2026-04-24
HELM (Holistic Evaluation of Language Models) from Stanford CRFM provides a multi-dimensional evaluation framework that measures LLMs across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. It evaluates models on 42 scenarios and 59 metrics, providing the most comprehensive public assessment of LLM capabilities and risks.
https://crfm.stanford.edu/helm/ ↗C
C—Below Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F
Specifications
- License
- Proprietary
- Pricing
- free
- Capabilities
- Integrations
- Use Cases
- API Available
- No
- Tags
- benchmark, holistic, fairness, robustness, calibration, stanford, comprehensive
- Added
- 2026-04-24
- Completeness
- 60%
Index Score
44Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0