PaperLLMsv1.0

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

by Institute of Science and Technology Austria (IST Austria) · open-source · Last verified 2026-03-17

Presented GPTQ, a one-shot weight quantization method based on approximate second-order information that can quantize GPT models with 175B parameters to 4-bit or 3-bit precision in approximately four GPU-hours with negligible accuracy loss. GPTQ made large model inference practical on consumer hardware.

https://arxiv.org/abs/2210.17323 ↗

B+

B+—Good

Adoption: A+Quality: A+Freshness: B+Citations: AEngagement: F

Specifications

License: Apache 2.0
Pricing: open-source
Capabilities: model-quantization, weight-compression, inference-efficiency, 4-bit-inference
Integrations: auto-gptq, huggingface, llama-cpp
Use Cases: consumer-hardware-inference, model-compression, edge-deployment
API Available: No
Tags: gptq, quantization, post-training-quantization, 4-bit, efficiency
Added: 2026-03-17
Completeness: 100%

Index Score

74.4

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service