TensorRT-LLM
by NVIDIA · open-source · Last verified 2026-04-24
TensorRT-LLM is NVIDIA's optimized inference library that compiles LLMs into highly efficient TensorRT engines for maximum GPU utilization. It supports INT4, INT8, and FP8 quantization, in-flight batching, and KV-cache optimization, delivering the best raw throughput on NVIDIA hardware for production deployments.
https://github.com/NVIDIA/TensorRT-LLM ↗C
C—Below Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F
Specifications
- License
- Open Source
- Pricing
- open-source
- Capabilities
- Integrations
- Use Cases
- API Available
- No
- SDK Languages
- python, cpp
- Deployment
- self-hosted, docker, nvidia-cloud
- Rate Limits
- N/A (self-hosted, hardware-limited)
- Data Privacy
- Self-hosted, user-managed
- Tags
- inference, nvidia, tensorrt, quantization, throughput, production
- Added
- 2026-04-24
- Completeness
- 60%
Index Score
44Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0