DatasetSpeech & Audio AIv2015

LibriSpeech

by OpenSLR / Johns Hopkins University · free · Last verified 2026-03-17

LibriSpeech is a corpus of approximately 1,000 hours of 16kHz read English speech derived from LibriVox audiobooks, split into clean and other subsets of 100h and 360h for training, with dedicated development and test sets. It has become the de facto standard benchmark for English ASR systems.

https://www.openslr.org/12 ↗

C+

C+—Average

Adoption: A+Quality: A+Freshness: C+Citations: FEngagement: F

Specifications

License: CC-BY-4.0
Pricing: free
Capabilities: speech-recognition, speech-synthesis, speaker-identification
Integrations: HuggingFace Datasets, torchaudio, ESPnet
Use Cases: model-training, benchmark, speech-research
API Available: No
Tags: automatic-speech-recognition, ASR, english, audiobooks, benchmark
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service