ScriptSpeech & Audio AIv2.1

Speech-to-Text Pipeline

by OpenAI · open-source · Last verified 2026-03-17

Production-grade ASR pipeline using OpenAI Whisper or faster-whisper with VAD-based chunking, speaker timestamp alignment, and SRT/VTT subtitle export. Handles long-form audio via sliding window segmentation and automatic language detection.

https://github.com/openai/whisper ↗

B+

B+—Good

Adoption: AQuality: AFreshness: A+Citations: B+Engagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: vad-chunking, long-form-audio, language-detection, srt-export, timestamp-alignment
Integrations: whisper, faster-whisper, pyannote-audio, ffmpeg
Use Cases: meeting-transcription, podcast-captioning, call-center-analytics
API Available: No
Language: python
Dependencies: openai-whisper, faster-whisper, pyannote.audio, ffmpeg-python, torch
Environment: Python 3.10+, CUDA optional
Est. Runtime: ~2x faster than real-time with GPU
Tags: speech-to-text, whisper, transcription, asr, audio
Added: 2026-03-17
Completeness: 100%

Index Score

71.4

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service