Skip to main content
ToolAI Infrastructurev0.14

TensorRT-LLM

by NVIDIA · open-source · Last verified 2026-03-17

NVIDIA's library for optimizing LLM inference on NVIDIA GPUs with TensorRT acceleration. Provides quantization, kernel fusion, and in-flight batching for maximum throughput on data center GPUs.

https://github.com/NVIDIA/TensorRT-LLM
C+
C+Average
Adoption: BQuality: A+Freshness: ACitations: BEngagement: F

Specifications

License
Apache-2.0
Pricing
open-source
Capabilities
gpu-optimization, quantization, kernel-fusion, in-flight-batching, multi-gpu-support
Integrations
triton-inference, hugging-face
Use Cases
production-inference, high-throughput-serving, enterprise-deployment, latency-optimization
API Available
Yes
SDK Languages
python, cpp
Deployment
self-hosted, docker, nvidia-cloud
Rate Limits
N/A (self-hosted, hardware-limited)
Data Privacy
Self-hosted, user-managed
Tags
inference, nvidia, optimization, gpu, enterprise
Added
2026-03-17
Completeness
100%

Index Score

57.8
Adoption
62
Quality
90
Freshness
88
Citations
60
Engagement
0

Explore the full AI ecosystem on Agents as a Service