HELM
by Stanford CRFM · free · Last verified 2026-04-24
HELM (Holistic Evaluation of Language Models) from Stanford CRFM provides a multi-dimensional evaluation framework that measures LLMs across accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency. It evaluates models on 42 scenarios and 59 metrics, providing the most comprehensive public assessment of LLM capabilities and risks.
https://crfm.stanford.edu/helm/ ↗D
D—Poor
Adoption: C+Quality: B+Freshness: ACitations: FEngagement: F
Specifications
- License
- Proprietary
- Pricing
- free
- Capabilities
- Integrations
- Use Cases
- API Available
- No
- Tags
- benchmark, holistic, fairness, robustness, calibration, stanford, comprehensive
- Added
- 2026-04-24
- Completeness
- 73%
Index Score
34Adoption
50
Quality
70
Freshness
80
Citations
0
Engagement
0