LegalBench
by Guha et al. / Stanford CodeX · open-source · Last verified 2026-03-17
LegalBench is a collaboratively built benchmark measuring the legal reasoning ability of large language models across 162 tasks spanning issue spotting, rule recall, rule application, and legal interpretation. It provides a comprehensive evaluation of whether models can reason like lawyers.
https://hazyresearch.stanford.edu/legalbench/ ↗B
B—Above Average
Adoption: B+Quality: A+Freshness: B+Citations: AEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- evaluation, legal-reasoning, multi-task-evaluation
- Integrations
- Use Cases
- model-evaluation, legal-ai, contract-analysis
- API Available
- No
- Evaluated Models
- gpt-4o, claude-opus-4, gemini-2-5-pro, llama-3-70b
- Metrics
- accuracy, macro-f1
- Methodology
- 162 legal tasks partitioned into six categories of legal reasoning. Tasks range from binary classification to short-form generation. Models are evaluated zero-shot and few-shot; results are macro-averaged across task categories.
- Last Run
- 2026-02-20
- Tags
- legal, reasoning, nlp, law, multi-task
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
68.3Adoption
74
Quality
91
Freshness
76
Citations
82
Engagement
0