ModelSpeech & Audio AIvVoicebox

Voicebox

by Meta AI · open-source · Last verified 2026-03-17

Voicebox is Meta AI's generative speech model based on non-autoregressive flow matching that achieves state-of-the-art performance on text-to-speech, noise removal, content editing, and style transfer tasks through a unified in-context learning approach. Its flow-matching architecture allows it to generalize to new voices and styles without fine-tuning, setting a new paradigm for zero-shot speech synthesis.

https://voicebox.metademolab.com ↗

D—Poor

Adoption: DQuality: AFreshness: BCitations: FEngagement: F

Specifications

License: Research Only
Pricing: open-source
Capabilities: text-to-speech, speech-editing, noise-removal, zero-shot-voice-cloning, cross-lingual-synthesis
Integrations: pytorch
Use Cases: research, speech-editing, voice-cloning, accessibility, multilingual-tts
API Available: No
Parameters: ~330M
Context Window: N/A
Modalities: text, audio
Training Cutoff: 2023
Tags: text-to-speech, speech-editing, in-context-learning, meta-ai, flow-matching
Added: 2026-03-17
Completeness: 80%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service