Skip to main content
DatasetSpeech & Audio AIv2015

LibriSpeech

by OpenSLR / Johns Hopkins University · free · Last verified 2026-03-17

LibriSpeech is a corpus of approximately 1,000 hours of 16kHz read English speech derived from LibriVox audiobooks, split into clean and other subsets of 100h and 360h for training, with dedicated development and test sets. It has become the de facto standard benchmark for English ASR systems.

https://www.openslr.org/12
A
AGreat
Adoption: A+Quality: A+Freshness: C+Citations: A+Engagement: F

Specifications

License
CC-BY-4.0
Pricing
free
Capabilities
speech-recognition, speech-synthesis, speaker-identification
Integrations
HuggingFace Datasets, torchaudio, ESPnet
Use Cases
model-training, benchmark, speech-research
API Available
No
Tags
automatic-speech-recognition, ASR, english, audiobooks, benchmark
Added
2026-03-17
Completeness
100%

Index Score

80.2
Adoption
95
Quality
92
Freshness
55
Citations
95
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service