TGI
by Hugging Face · open-source · Last verified 2026-03-17
Hugging Face's production-ready inference server for large language models written in Rust. Provides tensor parallelism, quantization, continuous batching, and streaming for efficient LLM deployment.
https://huggingface.co/docs/text-generation-inference ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- model-serving, tensor-parallelism, quantization, continuous-batching, streaming
- Integrations
- hugging-face, langchain
- Use Cases
- production-serving, model-deployment, api-hosting, inference-optimization
- API Available
- Yes
- SDK Languages
- python, rust
- Deployment
- self-hosted, docker, hugging-face-inference-endpoints
- Rate Limits
- N/A (self-hosted)
- Data Privacy
- Self-hosted, user-managed
- Tags
- inference, model-serving, hugging-face, rust
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
63Adoption
72
Quality
86
Freshness
88
Citations
68
Engagement
0