Model Quantization (GPTQ)
by AaaS · open-source · Last verified 2026-03-01
Quantizes language models using GPTQ for efficient inference on consumer hardware. Performs calibration-based quantization, quality evaluation against the original model, and exports in formats compatible with vLLM, llama.cpp, and other inference engines.
https://aaas.blog/script/model-quantization-gptq ↗C+
C+—Average
Adoption: C+Quality: AFreshness: B+Citations: C+Engagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- gptq-quantization, calibration, quality-evaluation, format-export, benchmarking
- Integrations
- auto-gptq, transformers, datasets, torch
- Use Cases
- model-compression, edge-deployment, cost-reduction, consumer-gpu-inference
- API Available
- No
- Language
- python
- Dependencies
- auto-gptq, transformers, datasets, torch, safetensors
- Environment
- Python 3.11+ with CUDA 12 and 16GB+ VRAM
- Est. Runtime
- 30-120 minutes depending on model size
- Tags
- script, automation, quantization, gptq, optimization
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
52.2Adoption
58
Quality
80
Freshness
78
Citations
52
Engagement
0