Skip to main content
Datasetlegalv1.0

Legal-BERT Training Data

by Gerasimos Spanakis / Maastricht University · open-source · Last verified 2026-03-17

The Legal-BERT training corpus is a large collection of English legal text assembled from UK legislation, EU legislation, ECHR/ECLI court decisions, and US contracts specifically curated to pretrain domain-adapted BERT models. It has enabled a family of Legal-BERT models that significantly outperform general-domain language models on legal NLP tasks.

https://huggingface.co/nlpaueb/legal-bert-base-uncased
B
BAbove Average
Adoption: B+Quality: AFreshness: BCitations: AEngagement: F

Specifications

License
CC-BY-4.0
Pricing
open-source
Capabilities
legal-text-pretraining, contract-analysis, legal-classification, ner-legal
Integrations
HuggingFace Transformers
Use Cases
language-model-pretraining, legal-nlp-research, contract-ai
API Available
No
Tags
legal-nlp, pretraining, contracts, court-decisions, legislation, BERT
Added
2026-03-17
Completeness
100%

Index Score

65.9
Adoption
71
Quality
85
Freshness
62
Citations
82
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service