Skip to main content
PaperLLMsv1.0

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

by Institute of Science and Technology Austria (IST Austria) · open-source · Last verified 2026-03-17

Presented GPTQ, a one-shot weight quantization method based on approximate second-order information that can quantize GPT models with 175B parameters to 4-bit or 3-bit precision in approximately four GPU-hours with negligible accuracy loss. GPTQ made large model inference practical on consumer hardware.

https://arxiv.org/abs/2210.17323
B+
B+Good
Adoption: A+Quality: A+Freshness: B+Citations: AEngagement: F

Specifications

License
Apache 2.0
Pricing
open-source
Capabilities
model-quantization, weight-compression, inference-efficiency, 4-bit-inference
Integrations
auto-gptq, huggingface, llama-cpp
Use Cases
consumer-hardware-inference, model-compression, edge-deployment
API Available
No
Tags
gptq, quantization, post-training-quantization, 4-bit, efficiency
Added
2026-03-17
Completeness
100%

Index Score

74.4
Adoption
90
Quality
92
Freshness
73
Citations
80
Engagement
0

Put AI to work for your business

Deploy this paper alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service