llama.cpp
by Georgi Gerganov · open-source · Last verified 2026-03-17
C/C++ implementation for running LLMs with minimal dependencies and broad hardware support. Enables efficient inference on CPUs and GPUs with GGUF quantization formats for running models on consumer hardware.
https://github.com/ggerganov/llama.cpp ↗B+
B+—Good
Adoption: AQuality: AFreshness: A+Citations: AEngagement: F
Specifications
- License
- MIT
- Pricing
- open-source
- Capabilities
- cpu-inference, gpu-inference, quantization, gguf-format, server-mode
- Integrations
- ollama, lm-studio, langchain
- Use Cases
- edge-deployment, local-inference, embedded-ai, resource-constrained-inference
- API Available
- Yes
- SDK Languages
- cpp, python
- Deployment
- self-hosted, embedded
- Rate Limits
- N/A (local, hardware-limited)
- Data Privacy
- Fully local; no data sent externally
- Tags
- inference, cpp, quantization, local, gguf
- Added
- 2026-03-17
- Completeness
- 100%
Index Score
72.1Adoption
85
Quality
88
Freshness
92
Citations
82
Engagement
0