PaperLLMsv1.0

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

by MIT / MIT-IBM Watson AI Lab · open-source · Last verified 2026-03-17

Introduced AWQ (Activation-aware Weight Quantization), a hardware-friendly low-bit weight quantization approach that protects a small fraction (1%) of salient weights based on activation magnitudes, achieving better performance than GPTQ at 4-bit while being faster and more broadly applicable across model architectures.

https://arxiv.org/abs/2306.00978 ↗

B+

B+—Good

Adoption: AQuality: A+Freshness: ACitations: B+Engagement: F

Specifications

License: MIT
Pricing: open-source
Capabilities: weight-quantization, activation-aware-compression, 4-bit-inference, hardware-efficient
Integrations: autoawq, huggingface, vllm, llm-compressor
Use Cases: model-compression, edge-deployment, efficient-llm-serving
API Available: No
Tags: awq, quantization, activation-aware, weight-quantization, efficiency
Added: 2026-03-17
Completeness: 100%

Index Score

73.3

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service