Skip to main content
ToolAI Infrastructurevb3800

llama.cpp

by Georgi Gerganov · open-source · Last verified 2026-03-17

C/C++ implementation for running LLMs with minimal dependencies and broad hardware support. Enables efficient inference on CPUs and GPUs with GGUF quantization formats for running models on consumer hardware.

https://github.com/ggerganov/llama.cpp
B+
B+Good
Adoption: AQuality: AFreshness: A+Citations: AEngagement: F

Specifications

License
MIT
Pricing
open-source
Capabilities
cpu-inference, gpu-inference, quantization, gguf-format, server-mode
Integrations
ollama, lm-studio, langchain
Use Cases
edge-deployment, local-inference, embedded-ai, resource-constrained-inference
API Available
Yes
SDK Languages
cpp, python
Deployment
self-hosted, embedded
Rate Limits
N/A (local, hardware-limited)
Data Privacy
Fully local; no data sent externally
Tags
inference, cpp, quantization, local, gguf
Added
2026-03-17
Completeness
100%

Index Score

72.1
Adoption
85
Quality
88
Freshness
92
Citations
82
Engagement
0

Explore the full AI ecosystem on Agents as a Service