Speech-to-Text Pipeline
by OpenAI · open-source · Last verified 2026-03-17
Production-grade ASR pipeline using OpenAI Whisper or faster-whisper with VAD-based chunking, speaker timestamp alignment, and SRT/VTT subtitle export. Handles long-form audio via sliding window segmentation and automatic language detection.
https://github.com/openai/whisper ↗B+
B+—Good
Adoption: AQuality: AFreshness: A+Citations: B+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- vad-chunking, long-form-audio, language-detection, srt-export, timestamp-alignment
- Integrations
- whisper, faster-whisper, pyannote-audio, ffmpeg
- Use Cases
- meeting-transcription, podcast-captioning, call-center-analytics
- API Available
- No
- Language
- python
- Dependencies
- openai-whisper, faster-whisper, pyannote.audio, ffmpeg-python, torch
- Environment
- Python 3.10+, CUDA optional
- Est. Runtime
- ~2x faster than real-time with GPU
- Tags
- speech-to-text, whisper, transcription, asr, audio
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
71.4Adoption
88
Quality
87
Freshness
90
Citations
75
Engagement
0