Legal-BERT Training Data
by Gerasimos Spanakis / Maastricht University · open-source · Last verified 2026-03-17
The Legal-BERT training corpus is a large collection of English legal text assembled from UK legislation, EU legislation, ECHR/ECLI court decisions, and US contracts specifically curated to pretrain domain-adapted BERT models. It has enabled a family of Legal-BERT models that significantly outperform general-domain language models on legal NLP tasks.
https://huggingface.co/nlpaueb/legal-bert-base-uncased ↗B
B—Above Average
Adoption: B+Quality: AFreshness: BCitations: AEngagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- open-source
- Capabilities
- legal-text-pretraining, contract-analysis, legal-classification, ner-legal
- Integrations
- HuggingFace Transformers
- Use Cases
- language-model-pretraining, legal-nlp-research, contract-ai
- API Available
- No
- Tags
- legal-nlp, pretraining, contracts, court-decisions, legislation, BERT
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
65.9Adoption
71
Quality
85
Freshness
62
Citations
82
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.