Skip to main content
ToolAI Infrastructurev0.2

ExLlamaV2

by turboderp · open-source · Last verified 2026-03-17

High-performance inference library optimized for quantized LLMs on consumer GPUs. Provides EXL2 quantization format and fast inference with paged attention for running large models on limited VRAM.

https://github.com/turboderp/exllamav2
C
CBelow Average
Adoption: CQuality: AFreshness: ACitations: CEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
gpu-optimized-inference, exl2-quantization, paged-attention, fast-generation, low-vram
Integrations
Use Cases
consumer-gpu-inference, quantized-model-serving, local-ai
API Available
Yes
SDK Languages
python
Deployment
self-hosted
Rate Limits
N/A (local, hardware-limited)
Data Privacy
Fully local; no data sent externally
Tags
inference, quantization, gpu-optimized, exl2
Added
2026-03-17
Completeness
100%

Index Score

47.3
Adoption
48
Quality
84
Freshness
82
Citations
45
Engagement
0

Explore the full AI ecosystem on Agents as a Service