Skip to main content
IntegrationAI Infrastructurev1.30

Cerebras + LiteLLM

by LiteLLM · free · Last verified 2026-03-17

LiteLLM proxy integration for Cerebras Inference, enabling Cerebras's wafer-scale chip throughput to be accessed via a unified OpenAI-compatible gateway. Allows developers to route requests to Cerebras's CS-3 hardware — delivering over 2000 tokens/second on Llama 3.1 70B — from any existing OpenAI SDK integration through LiteLLM's model aliases.

https://docs.litellm.ai/docs/providers/cerebras
D
DPoor
Adoption: DQuality: AFreshness: A+Citations: DEngagement: F

Specifications

License
MIT
Pricing
free
Capabilities
model-routing, openai-compatible-proxy, wafer-scale-inference, high-throughput, fallback-routing
Integrations
cerebras, litellm
Use Cases
high-throughput-inference, model-gateway, fastest-llm-inference, batch-summarization
API Available
Yes
Tags
cerebras, litellm, wafer-scale, fast-inference, model-gateway
Added
2026-03-17
Completeness
100%

Index Score

37.8
Adoption
35
Quality
84
Freshness
92
Citations
28
Engagement
0

Put AI to work for your business

Deploy this integration alongside autonomous AaaS agents that handle tasks end-to-end — no babysitting required.

Explore the full AI ecosystem on Agents as a Service