Breakthrough Serverless Compute for AI, at Scale.
Powering Trillions of Tokens per Month, Chutes is the leading open-source, decentralized compute provider for deploying, scaling and running open-source models in production.

SOTA Open-Source LLMs, Available here first.
Our team works around the clock to provide the latest SOTA Open-Source models minutes after release. When a new model releases, Chutes is where you'll always get them. Get access to what's next here first.
View Top LLMsHow to use
Python
import os
import requests
payload = {
"model": "moonshotai/Kimi-K2.6-TEE",
"messages": [{"role": "user", "content": "Ship a concise launch checklist."}],
"stream": True,
}
response = requests.post(
"https://llm.chutes.ai/v1/chat/completions",
headers={
"Authorization": f"Bearer {os.environ['CHUTES_API_KEY']}",
"Content-Type": "application/json",
},
json=payload,
stream=True,
)
for line in response.iter_lines():
if line and line.startswith(b"data: "):
print(line.decode()[6:])There's a Chute for Everything
Not just the LLMs you'd expect — Chutes runs image, video, speech, music and more. Every open-source modality, always on and ready to scale.
Made for hyperscaling AI powered products
High-performance AI Inference of top SOTA OSS Models, ephemeral jobs, batch processing jobs, and much more. With Chutes, just bring the code and let us do the rest.
Get Started in seconds
Purpose-built for AI Developers

Designed to be flexible, but fast

Decentralized, Open-source Infrastructure
AI Model Inference
Permanently Hot Models, Stable, Ready for Scale.
TEE/Secure Compute
Secure, Private, and Isolated Compute.
Consumer Apps
Chutes Chat and Chutes Search for Consumers.
Pricing
Choose a plan that fits your needs.
Plus
- 5X the value of pay-as-you-go
- 6% off PAYG pricing
- PAYG requests beyond limit
Pro
- 5X the value of pay-as-you-go
- 10% off PAYG pricing
- PAYG requests beyond limit
Enterprise
Custom billing only
- Volume discounts
- Custom rate limits
- Dedicated support
Don't want a subscription? With pay-as-you-go you only pay for the LLM tokens you actually consume — no monthly commitment. See per-token model rates, a live cost estimator, and private GPU pricing on the pricing page.
Explore pay-as-you-go pricingTop Models Available Always On, Ready to Scale.
Bring your own code. On Chutes, you can run any model without worrying about cold starts or capacity constraints.

Qwen3 32B
Qwen/Qwen3-32B-TEE
No description yet.

Gemma 4 31B turbo
google/gemma-4-31B-turbo-TEE
No description yet.

Qwen3.5 397B A17B
Qwen/Qwen3.5-397B-A17B-TEE
No description yet.

GLM 5.1
zai-org/GLM-5.1-TEE
GLM-5.1 is a large language model optimized for agentic tasks and coding that excels at sustained problem-solving over long horizons through iterative reasoning and tool use.

Qwen3.6 27B
Qwen/Qwen3.6-27B-TEE
No description yet.

DeepSeek V3.2
deepseek-ai/DeepSeek-V3.2-TEE
DeepSeek-V3.2 is an open-source LLM optimized for efficient reasoning and agent tasks through sparse attention and reinforcement learning, useful for complex problem-solving and tool-use applications.

GLM 5
zai-org/GLM-5-TEE
No description yet.

Kimi K2.6
moonshotai/Kimi-K2.6-TEE
No description yet.

MiniMax M2.5
MiniMaxAI/MiniMax-M2.5-TEE
MiniMax-M2.5 is a frontier-class LLM excelling at coding, agentic tool use, and office automation tasks, with state-of-the-art performance on benchmarks like SWE-Bench while being dramatically more cost-effective than competitors.

GLM 5.2
zai-org/GLM-5.2-TEE
No description yet.