IntegrationAI Infrastructurev1.30

Cerebras + LiteLLM

by LiteLLM · free · Last verified 2026-03-17

LiteLLM proxy integration for Cerebras Inference, enabling Cerebras's wafer-scale chip throughput to be accessed via a unified OpenAI-compatible gateway. Allows developers to route requests to Cerebras's CS-3 hardware — delivering over 2000 tokens/second on Llama 3.1 70B — from any existing OpenAI SDK integration through LiteLLM's model aliases.

https://docs.litellm.ai/docs/providers/cerebras ↗

D—Poor

Adoption: DQuality: AFreshness: A+Citations: DEngagement: F

Specifications

License: MIT
Pricing: free
Capabilities: model-routing, openai-compatible-proxy, wafer-scale-inference, high-throughput, fallback-routing
Integrations: cerebras, litellm
Use Cases: high-throughput-inference, model-gateway, fastest-llm-inference, batch-summarization
API Available: Yes
Tags: cerebras, litellm, wafer-scale, fast-inference, model-gateway
Added: 2026-03-17
Completeness: 100%

Index Score

37.8

Adoption

Quality

Freshness

Citations

Engagement

Need this tool deployed for your team?

Get a Custom Setup

Explore the full AI ecosystem on Agents as a Service