Skip to main content
BenchmarkLLMsv1.0

LegalBench

by Guha et al. / Stanford CodeX · open-source · Last verified 2026-03-17

LegalBench is a collaboratively built benchmark measuring the legal reasoning ability of large language models across 162 tasks spanning issue spotting, rule recall, rule application, and legal interpretation. It provides a comprehensive evaluation of whether models can reason like lawyers.

https://hazyresearch.stanford.edu/legalbench/
B
BAbove Average
Adoption: B+Quality: A+Freshness: B+Citations: AEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
evaluation, legal-reasoning, multi-task-evaluation
Integrations
Use Cases
model-evaluation, legal-ai, contract-analysis
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, gemini-2-5-pro, llama-3-70b
Metrics
accuracy, macro-f1
Methodology
162 legal tasks partitioned into six categories of legal reasoning. Tasks range from binary classification to short-form generation. Models are evaluated zero-shot and few-shot; results are macro-averaged across task categories.
Last Run
2026-02-20
Tags
legal, reasoning, nlp, law, multi-task
Added
2026-03-17
Completeness
100%

Index Score

68.3
Adoption
74
Quality
91
Freshness
76
Citations
82
Engagement
0

Explore the full AI ecosystem on Agents as a Service