ToolAI Infrastructurev0.2

ExLlamaV2

by turboderp · open-source · Last verified 2026-03-17

High-performance inference library optimized for quantized LLMs on consumer GPUs. Provides EXL2 quantization format and fast inference with paged attention for running large models on limited VRAM.

https://github.com/turboderp/exllamav2 ↗

C—Below Average

Adoption: CQuality: AFreshness: ACitations: CEngagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: gpu-optimized-inference, exl2-quantization, paged-attention, fast-generation, low-vram
Integrations
Use Cases: consumer-gpu-inference, quantized-model-serving, local-ai
API Available: Yes
SDK Languages: python
Deployment: self-hosted
Rate Limits: N/A (local, hardware-limited)
Data Privacy: Fully local; no data sent externally
Tags: inference, quantization, gpu-optimized, exl2
Added: 2026-03-17
Completeness: 100%

Index Score

47.3

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service