DatasetSpeech & Audio AIv1.0

GigaSpeech

by Seasalt.ai / SpeechColab · open-source · Last verified 2026-03-17

GigaSpeech is a multi-domain English speech corpus with 10,000 hours of high-quality labeled audio for ASR, sourced from audiobooks, podcasts, and YouTube across a broad range of topics and recording conditions. Its scale and diversity make it particularly valuable for training robust, domain-generalizable speech recognition models.

https://github.com/SpeechColab/GigaSpeech ↗

C—Below Average

Adoption: B+Quality: AFreshness: B+Citations: FEngagement: F

Specifications

License: Apache-2.0
Pricing: open-source
Capabilities: automatic-speech-recognition, multi-domain-asr, robust-asr
Integrations: HuggingFace Datasets, ESPnet, Kaldi
Use Cases: model-training, benchmark, domain-robust-asr
API Available: No
Tags: ASR, large-scale, english, multi-domain, podcasts, audiobooks, youtube
Added: 2026-03-17
Completeness: 100%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service