Skip to main content
ModelSpeech & Audio AIv2.0

StyleTTS2

by Columbia University (Li et al.) · open-source · Last verified 2026-03-17

StyleTTS2 is an open-source text-to-speech model from Columbia University that achieves human-level naturalness on LJSpeech and VCTK benchmarks by modeling speech styles as latent diffusion variables for zero-shot voice cloning and expressive synthesis. It surpassed commercial systems like ElevenLabs in several blind listening evaluations, establishing the highest quality bar achieved by any open-source TTS system at the time of publication.

https://styletts2.github.io
C+
C+Average
Adoption: CQuality: A+Freshness: B+Citations: B+Engagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
text-to-speech, zero-shot-voice-cloning, style-diffusion, human-level-naturalness, prosody-control
Integrations
huggingface, local-inference
Use Cases
high-quality-tts, voice-cloning, research, audiobook-narration, creative-audio
API Available
Yes
Parameters
~150M
Context Window
N/A
Modalities
text, audio
Training Cutoff
2023
Tags
text-to-speech, style-diffusion, zero-shot, open-source, human-level
Added
2026-03-17
Completeness
100%

Index Score

56.6
Adoption
48
Quality
93
Freshness
70
Citations
75
Engagement
0

Explore the full AI ecosystem on Agents as a Service