Skip to main content
DatasetSpeech & Audio AIv1.0

GigaSpeech

by Seasalt.ai / SpeechColab · open-source · Last verified 2026-03-17

GigaSpeech is a multi-domain English speech corpus with 10,000 hours of high-quality labeled audio for ASR, sourced from audiobooks, podcasts, and YouTube across a broad range of topics and recording conditions. Its scale and diversity make it particularly valuable for training robust, domain-generalizable speech recognition models.

https://github.com/SpeechColab/GigaSpeech
B
BAbove Average
Adoption: B+Quality: AFreshness: B+Citations: B+Engagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
automatic-speech-recognition, multi-domain-asr, robust-asr
Integrations
HuggingFace Datasets, ESPnet, Kaldi
Use Cases
model-training, benchmark, domain-robust-asr
API Available
No
Tags
ASR, large-scale, english, multi-domain, podcasts, audiobooks, youtube
Added
2026-03-17
Completeness
100%

Index Score

67.7
Adoption
76
Quality
89
Freshness
78
Citations
78
Engagement
0

Put AI to work for your business

Deploy this dataset alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service