LibriSpeech
by OpenSLR / Johns Hopkins University · free · Last verified 2026-03-17
LibriSpeech is a corpus of approximately 1,000 hours of 16kHz read English speech derived from LibriVox audiobooks, split into clean and other subsets of 100h and 360h for training, with dedicated development and test sets. It has become the de facto standard benchmark for English ASR systems.
https://www.openslr.org/12 ↗A
A—Great
Adoption: A+Quality: A+Freshness: C+Citations: A+Engagement: F
Specifications
- License
- CC-BY-4.0
- Pricing
- free
- Capabilities
- speech-recognition, speech-synthesis, speaker-identification
- Integrations
- HuggingFace Datasets, torchaudio, ESPnet
- Use Cases
- model-training, benchmark, speech-research
- API Available
- No
- Tags
- automatic-speech-recognition, ASR, english, audiobooks, benchmark
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
80.2Adoption
95
Quality
92
Freshness
55
Citations
95
Engagement
0
Put AI to work for your business
Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.