Skip to main content
brand
context
industry
strategy
AaaS
Toolmodel-servingv1.0

TensorRT-LLM

by NVIDIA · open-source · Last verified 2026-04-24

TensorRT-LLM is NVIDIA's optimized inference library that compiles LLMs into highly efficient TensorRT engines for maximum GPU utilization. It supports INT4, INT8, and FP8 quantization, in-flight batching, and KV-cache optimization, delivering the best raw throughput on NVIDIA hardware for production deployments.

https://github.com/NVIDIA/TensorRT-LLM
C
CBelow Average
Adoption: C+Quality: B+Freshness: ACitations: CEngagement: F

Specifications

License
Open Source
Pricing
open-source
Capabilities
Integrations
Use Cases
API Available
No
SDK Languages
python, cpp
Deployment
self-hosted, docker, nvidia-cloud
Rate Limits
N/A (self-hosted, hardware-limited)
Data Privacy
Self-hosted, user-managed
Tags
inference, nvidia, tensorrt, quantization, throughput, production
Added
2026-04-24
Completeness
60%

Index Score

44
Adoption
50
Quality
70
Freshness
80
Citations
40
Engagement
0

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service