Skip to main content
ScriptSpeech & Audio AIv2.1

Speech-to-Text Pipeline

by OpenAI · open-source · Last verified 2026-03-17

Production-grade ASR pipeline using OpenAI Whisper or faster-whisper with VAD-based chunking, speaker timestamp alignment, and SRT/VTT subtitle export. Handles long-form audio via sliding window segmentation and automatic language detection.

https://github.com/openai/whisper
B+
B+Good
Adoption: AQuality: AFreshness: A+Citations: B+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
vad-chunking, long-form-audio, language-detection, srt-export, timestamp-alignment
Integrations
whisper, faster-whisper, pyannote-audio, ffmpeg
Use Cases
meeting-transcription, podcast-captioning, call-center-analytics
API Available
No
Language
python
Dependencies
openai-whisper, faster-whisper, pyannote.audio, ffmpeg-python, torch
Environment
Python 3.10+, CUDA optional
Est. Runtime
~2x faster than real-time with GPU
Tags
speech-to-text, whisper, transcription, asr, audio
Added
2026-03-17
Completeness
100%

Index Score

71.4
Adoption
88
Quality
87
Freshness
90
Citations
75
Engagement
0

Explore the full AI ecosystem on Agents as a Service