BenchmarkLLMsv1.0

LegalBench

by Guha et al. / Stanford CodeX · open-source · Last verified 2026-03-17

LegalBench is a collaboratively built benchmark measuring the legal reasoning ability of large language models across 162 tasks spanning issue spotting, rule recall, rule application, and legal interpretation. It provides a comprehensive evaluation of whether models can reason like lawyers.

https://hazyresearch.stanford.edu/legalbench/ ↗

B—Above Average

Adoption: B+Quality: A+Freshness: B+Citations: AEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: evaluation, legal-reasoning, multi-task-evaluation
Integrations
Use Cases: model-evaluation, legal-ai, contract-analysis
API Available: No
Evaluated Models: gpt-4o, claude-opus-4, gemini-2-5-pro, llama-3-70b
Metrics: accuracy, macro-f1
Methodology: 162 legal tasks partitioned into six categories of legal reasoning. Tasks range from binary classification to short-form generation. Models are evaluated zero-shot and few-shot; results are macro-averaged across task categories.
Last Run: 2026-02-20
Tags: legal, reasoning, nlp, law, multi-task
Added: 2026-03-17
Completeness: 100%

Index Score

68.3

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service