XTTS-v2
by Coqui AI · freemium · Last verified 2026-03-17
XTTS-v2 is an open-source, cross-lingual text-to-speech model from Coqui AI. It excels at high-quality voice cloning from just a few seconds of audio and supports 17 languages. With real-time streaming inference, it's ideal for applications needing custom voices and low-latency output.
https://coqui.ai/blog/tts/open_xtts ↗C+
C+—Average
Adoption: BQuality: AFreshness: B+Citations: BEngagement: F
Specifications
- License
- Coqui Public Model License
- Pricing
- freemium
- Capabilities
- Cross-Lingual Text-to-Speech, Few-Shot Voice Cloning (from 6s audio), Multilingual Synthesis (17 languages), Real-time Streaming Inference, Emotion and Style Control, Zero-Shot Voice Cloning for supported languages, Open-Source Model and Code
- Integrations
- [object Object], [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Parameters
- ~500M
- Context Window
- N/A
- Modalities
- text, audio
- Training Cutoff
- 2023
- Tags
- text-to-speech, voice-cloning, multilingual-tts, open-source, coqui-ai, speech-synthesis, ai-model, deep-learning, real-time-audio, cross-lingual
- Added
- 2026-03-17
- Completeness
- 0.95%
Index Score
59.6Adoption
65
Quality
83
Freshness
70
Citations
68
Engagement
0