ExLlamaV2
by turboderp · open-source · Last verified 2026-03-17
High-performance inference library optimized for quantized LLMs on consumer GPUs. Provides EXL2 quantization format and fast inference with paged attention for running large models on limited VRAM.
https://github.com/turboderp/exllamav2 ↗C
C—Below Average
Adoption: CQuality: AFreshness: ACitations: CEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- gpu-optimized-inference, exl2-quantization, paged-attention, fast-generation, low-vram
- Integrations
- Use Cases
- consumer-gpu-inference, quantized-model-serving, local-ai
- API Available
- Yes
- SDK Languages
- python
- Deployment
- self-hosted
- Rate Limits
- N/A (local, hardware-limited)
- Data Privacy
- Fully local; no data sent externally
- Tags
- inference, quantization, gpu-optimized, exl2
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
47.3Adoption
48
Quality
84
Freshness
82
Citations
45
Engagement
0