ModelSpeech & Audio AIvv2

XTTS-v2

by Coqui AI · freemium · Last verified 2026-03-17

XTTS-v2 is an open-source, cross-lingual text-to-speech model from Coqui AI. It excels at high-quality voice cloning from just a few seconds of audio and supports 17 languages. With real-time streaming inference, it's ideal for applications needing custom voices and low-latency output.

https://coqui.ai/blog/tts/open_xtts ↗

C—Below Average

Adoption: BQuality: AFreshness: B+Citations: FEngagement: F

Specifications

License: Coqui Public Model License
Pricing: freemium
Capabilities: Cross-Lingual Text-to-Speech, Few-Shot Voice Cloning (from 6s audio), Multilingual Synthesis (17 languages), Real-time Streaming Inference, Emotion and Style Control, Zero-Shot Voice Cloning for supported languages, Open-Source Model and Code
Integrations: [object Object], [object Object], [object Object]
Use Cases: [object Object], [object Object], [object Object], [object Object]
API Available: Yes
Parameters: ~500M
Context Window: N/A
Modalities: text, audio
Training Cutoff: 2023
Tags: text-to-speech, voice-cloning, multilingual-tts, open-source, coqui-ai, speech-synthesis, ai-model, deep-learning, real-time-audio, cross-lingual
Added: 2026-03-17
Completeness: 87%

Index Score

Adoption

Quality

Freshness

Citations

Engagement

Need help choosing the right model?

Get Expert Guidance

Explore the full AI ecosystem on Agents as a Service