TensorRT-LLM
by NVIDIA · open-source · Last verified 2026-03-17
NVIDIA's library for optimizing LLM inference on NVIDIA GPUs with TensorRT acceleration. Provides quantization, kernel fusion, and in-flight batching for maximum throughput on data center GPUs.
https://github.com/NVIDIA/TensorRT-LLM ↗C+
C+—Average
Adoption: BQuality: A+Freshness: ACitations: BEngagement: F
Specifications
- License
- Apache-2.0
- Pricing
- open-source
- Capabilities
- gpu-optimization, quantization, kernel-fusion, in-flight-batching, multi-gpu-support
- Integrations
- triton-inference, hugging-face
- Use Cases
- production-inference, high-throughput-serving, enterprise-deployment, latency-optimization
- API Available
- Yes
- SDK Languages
- python, cpp
- Deployment
- self-hosted, docker, nvidia-cloud
- Rate Limits
- N/A (self-hosted, hardware-limited)
- Data Privacy
- Self-hosted, user-managed
- Tags
- inference, nvidia, optimization, gpu, enterprise
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
57.8Adoption
62
Quality
90
Freshness
88
Citations
60
Engagement
0