Skip to main content
brand
context
industry
strategy
AaaS
BenchmarkLLMsv1.0

CaseHOLD

by Zheng et al. / Berkeley Law / LexGLUE · free · Last verified 2026-03-17

CaseHOLD is a legal NLP benchmark for evaluating a model's ability to identify the correct holding statement for a US court case. Given a citing context, the model must choose the correct holding from a list of candidates. Sourced from over 53,000 cases, it is a core component of the LexGLUE benchmark suite for legal AI.

https://huggingface.co/datasets/lex_glue
C+
C+Average
Adoption: BQuality: AFreshness: BCitations: B+Engagement: F

Specifications

License
CC BY 4.0
Pricing
free
Capabilities
Legal Reasoning Evaluation, Case Law Analysis, Contextual Understanding of Legal Texts, Precedent Identification, Distinguishing Nuanced Legal Statements, Multiple-Choice Question Answering, Information Retrieval from Legal Documents
Integrations
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
No
Evaluated Models
gpt-4o, claude-opus-4, legal-bert, roberta-large
Metrics
accuracy, macro-f1
Methodology
Five-way multiple-choice classification: given a legal context with a masked holding, models select the correct holding from five candidates. Evaluated on a 3,900-example test split. Macro-F1 is the primary metric.
Last Run
2025-12-01
Tags
legal-nlp, benchmark, case-law, legal-reasoning, multiple-choice, text-classification, lex-glue, us-law, information-retrieval, ai-evaluation
Added
2026-03-17
Completeness
0.9%

Index Score

58.8
Adoption
61
Quality
83
Freshness
62
Citations
71
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service