Skip to main content
brand
context
industry
strategy
AaaS
ModelSpeech & Audio AIvv2

XTTS-v2

by Coqui AI · freemium · Last verified 2026-03-17

XTTS-v2 is an open-source, cross-lingual text-to-speech model from Coqui AI. It excels at high-quality voice cloning from just a few seconds of audio and supports 17 languages. With real-time streaming inference, it's ideal for applications needing custom voices and low-latency output.

https://coqui.ai/blog/tts/open_xtts
C+
C+Average
Adoption: BQuality: AFreshness: B+Citations: BEngagement: F

Specifications

License
Coqui Public Model License
Pricing
freemium
Capabilities
Cross-Lingual Text-to-Speech, Few-Shot Voice Cloning (from 6s audio), Multilingual Synthesis (17 languages), Real-time Streaming Inference, Emotion and Style Control, Zero-Shot Voice Cloning for supported languages, Open-Source Model and Code
Integrations
[object Object], [object Object], [object Object]
Use Cases
[object Object], [object Object], [object Object], [object Object]
API Available
Yes
Parameters
~500M
Context Window
N/A
Modalities
text, audio
Training Cutoff
2023
Tags
text-to-speech, voice-cloning, multilingual-tts, open-source, coqui-ai, speech-synthesis, ai-model, deep-learning, real-time-audio, cross-lingual
Added
2026-03-17
Completeness
0.95%

Index Score

59.6
Adoption
65
Quality
83
Freshness
70
Citations
68
Engagement
0

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service