Breakthrough Serverless Compute for AI, at Scale.

Powering Trillions of Tokens per Month, Chutes is the leading open-source, decentralized compute provider for deploying, scaling and running open-source models in production.

Explore Models Create Account

Chutes Global

SOTA Open-Source LLMs, Available here first.

Our team works around the clock to provide the latest SOTA Open-Source models minutes after release. When a new model releases, Chutes is where you'll always get them. Get access to what's next here first.

Just landedKimi K2.6 GLM 5.2 Gemma 4 31B

GLM

How to use

Python

import os
import requests

payload = {
    "model": "moonshotai/Kimi-K2.6-TEE",
    "messages": [{"role": "user", "content": "Ship a concise launch checklist."}],
    "stream": True,
}

response = requests.post(
    "https://llm.chutes.ai/v1/chat/completions",
    headers={
        "Authorization": f"Bearer {os.environ['CHUTES_API_KEY']}",
        "Content-Type": "application/json",
    },
    json=payload,
    stream=True,
)

for line in response.iter_lines():
    if line and line.startswith(b"data: "):
        print(line.decode()[6:])

There's a Chute for Everything

Not just the LLMs you'd expect — Chutes runs image, video, speech, music and more. Every open-source modality, always on and ready to scale.

View All Image Models

Chutes Serverless Compute

Made for hyperscaling AI powered products

High-performance AI Inference of top SOTA OSS Models, ephemeral jobs, batch processing jobs, and much more. With Chutes, just bring the code and let us do the rest.

Get Started in seconds

Purpose-built for AI Developers

Designed to be flexible, but fast

Decentralized, Open-source Infrastructure

AI Model Inference

Permanently Hot Models, Stable, Ready for Scale.

TEE/Secure Compute

Secure, Private, and Isolated Compute.

Consumer Apps

Chutes Chat and Chutes Search for Consumers.

Embedding Models

Embed your data and use it for search, recommendation, and more.

Content Moderation

Hate speech detection, NSFW moderation, and more.

Your Model

Bring your own code. On Chutes, you can run any open model with just a few lines of code.

Pricing

Choose a plan that fits your needs.

Plus

$10per month

5X the value of pay-as-you-go
6% off PAYG pricing
PAYG requests beyond limit

Get Started

Best Value

Pro

$20per month

5X the value of pay-as-you-go
10% off PAYG pricing
PAYG requests beyond limit

Get Started

Enterprise

Custom billing only

Volume discounts
Custom rate limits
Dedicated support

Contact Sales

Don't want a subscription? With pay-as-you-go you only pay for the LLM tokens you actually consume — no monthly commitment. See per-token model rates, a live cost estimator, and private GPU pricing on the pricing page.

Explore pay-as-you-go pricing

Chutes Explore

Top Models Available Always On, Ready to Scale.

Bring your own code. On Chutes, you can run any model without worrying about cold starts or capacity constraints.

Qwen3 32B

Qwen/Qwen3-32B-TEE

Small and fast model with great intelligence per dollar

chutesTEEHotLLM

Pricing$0.10 in / $0.42 out /M

Qwen3.5 397B A17B

Qwen/Qwen3.5-397B-A17B-TEE

No description yet.

chutesTEEHotLLM

Pricing$0.45 in / $3.00 out /M

Gemma 4 31B turbo

google/gemma-4-31B-turbo-TEE

No description yet.

chutesTEEHotLLM

Pricing$0.12 in / $0.37 out /M

DeepSeek V3.2

deepseek-ai/DeepSeek-V3.2-TEE

DeepSeek-V3.2 is an open-source LLM optimized for efficient reasoning and agent tasks through sparse attention and reinforcement learning, useful for complex problem-solving and tool-use applications.

chutesTEEHotLLM

Pricing$1.00 in / $1.00 out /M

GLM 5.1

zai-org/GLM-5.1-TEE

GLM-5.1 is a large language model optimized for agentic tasks and coding that excels at sustained problem-solving over long horizons through iterative reasoning and tool use.

chutesTEEHotLLM

Pricing$0.98 in / $3.08 out /M

GLM 5.2

zai-org/GLM-5.2-TEE

No description yet.

chutesTEEHotLLM

Pricing$1.25 in / $3.95 out /M

Qwen3.6 27B

Qwen/Qwen3.6-27B-TEE

Small and fast model with great intelligence per dollar

chutesTEEHotLLM

Pricing$0.30 in / $2.00 out /M

Kimi K2.6

moonshotai/Kimi-K2.6-TEE

No description yet.

chutesTEEHotLLM

Pricing$0.66 in / $3.50 out /M

Qwen3 235B A22B Thinking 2507

Qwen/Qwen3-235B-A22B-Thinking-2507-TEE

No description yet.

chutesTEEHotLLM

Pricing$0.30 in / $1.20 out /M

Mistral Nemo Instruct 2407

unsloth/Mistral-Nemo-Instruct-2407-TEE

No description yet.

chutesTEEHotLLM

Pricing$0.02 in / $0.10 out /M

View More Chutes

AI Compute for Everyone.

Talk to sales Get started

Breakthrough Serverless Compute for AI, at Scale.

SOTA Open-Source LLMs, Available here first.

There's a Chute for Everything

Made for hyperscaling AI powered products

Purpose-built for AI Developers

Designed to be flexible, but fast

Decentralized, Open-source Infrastructure

AI capabilities on Chutes

AI Model Inference

TEE/Secure Compute

Consumer Apps

Model types on Chutes

Embedding Models

Content Moderation

Your Model

Pricing

Plus

Pro

Enterprise

Top Models Available Always On, Ready to Scale.

Qwen3 32B

Qwen3.5 397B A17B

Gemma 4 31B turbo

DeepSeek V3.2

GLM 5.1

GLM 5.2

Qwen3.6 27B

Kimi K2.6

Qwen3 235B A22B Thinking 2507

Mistral Nemo Instruct 2407

AI Compute for Everyone.