Skip to main content
BenchmarkLLMsv1.0

CaseHOLD

by Zheng et al. / Berkeley Law / LexGLUE · open-source · Last verified 2026-03-17

CaseHOLD challenges models to identify the correct holding statement for a US court case given the citing context. Part of the LexGLUE legal NLP benchmark suite, it requires understanding legal reasoning across 53,000+ case holdings from the Harvard Law Library.

https://huggingface.co/datasets/lex_glue
C+
C+Average
Adoption: BQuality: AFreshness: BCitations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
open-source
Capabilities
evaluation, legal-reasoning, case-law-analysis
Integrations
huggingface
Use Cases
model-evaluation, legal-ai, judicial-research
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, legal-bert, roberta-large
Metrics
accuracy, macro-f1
Methodology
Five-way multiple-choice classification: given a legal context with a masked holding, models select the correct holding from five candidates. Evaluated on a 3,900-example test split. Macro-F1 is the primary metric.
Last Run
2025-12-01
Tags
legal, case-law, holding-statements, multiple-choice, lex-glue
Added
2026-03-17
Completeness
100%

Index Score

58.8
Adoption
61
Quality
83
Freshness
62
Citations
71
Engagement
0

Explore the full AI ecosystem on Agents as a Service