Agent Hub
Your agent already speaks our API.
Chutes serves leading Open Source AI models from Z.ai, DeepSeek, Moonshot AI, MiniMax AI, Google and many more through both an OpenAI-compatible and an Anthropic-compatible endpoint. Point any agent at it, send the key as Bearer auth, and choose models from the live catalog instead of baking IDs into your release.
Why agent builders pick Chutes
Zero integration cost
Chutes follows the OpenAI API shape for chat completions, streaming, tool calling, JSON mode, structured outputs, and vision-capable models. Frameworks that accept a base URL can keep their existing client code.
Privacy you can verify
The 13 models returned by the catalog all report confidential_compute=true as of Jun 25, 2026. For sensitive work, filter on that boolean and use attestation evidence before making stronger claims.
Routing without new infrastructure
Pass a concrete model ID for zero setup, or pass an inline comma-separated pool and append :latency or :throughput. Saved default aliases require a one-time pool configuration in the dashboard.
From zero to first completion
Start with the same three facts every OpenAI-compatible agent needs: the Chutes base URL, Bearer auth, and a current model ID from GET /v1/models.
Get a key
Create a cpk_ key at chutes.ai/auth/start and keep it in your secret store.
Read the catalog
GET /v1/models returns the model IDs, prices, context, modalities, and feature flags.
Call completions
Use https://llm.chutes.ai/v1 and send Authorization: Bearer $CHUTES_API_KEY.
# Read the public catalog first.
curl https://llm.chutes.ai/v1/models
# Use a current model ID from that response.
export CHUTES_API_KEY=cpk_...
export CHUTES_MODEL="Qwen/Qwen3-32B-TEE"
curl https://llm.chutes.ai/v1/chat/completions \
-H "Authorization: Bearer $CHUTES_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "'"$CHUTES_MODEL"'", "messages": [{"role": "user", "content": "Say hello in one sentence."}]}'Built for agent failure modes
One model going down should not take your agent with it. The picker selected this inline pool from the current catalog using the toolkit's agentic task preset.
model="Qwen/Qwen3-32B-TEE,google/gemma-4-31B-turbo-TEE,MiniMaxAI/MiniMax-M2.5-TEE:latency"# Source: chutes-agent-toolkit/cookbook/python/05_routing_failover.py
"""Inline model routing: failover pool + latency strategy. [VERIFIED 2026-06-11: ran live against the paid API]
Pass several model IDs comma-separated in the `model` field:
- plain list -> sequential failover (try in order)
- list + ":latency" -> fastest first token right now
- list + ":throughput" -> most tokens/sec right now
A single concrete model ID bypasses routing entirely.
Run: CHUTES_API_KEY=cpk_... python 05_routing_failover.py
"""
import os
from openai import OpenAI
POOL = os.environ.get(
"CHUTES_POOL",
"MiniMaxAI/MiniMax-M2.5-TEE,deepseek-ai/DeepSeek-V3.2-TEE,zai-org/GLM-5-TEE",
)
client = OpenAI(base_url="https://llm.chutes.ai/v1", api_key=os.environ["CHUTES_API_KEY"])
# Sequential failover: if the first model is busy or down, the next one answers.
resp = client.chat.completions.create(
model=POOL,
messages=[{"role": "user", "content": "Which model are you? One sentence."}],
)
print("failover ->", resp.model, "|", resp.choices[0].message.content)
# Latency strategy: route to whichever pool member has the lowest TTFT right now.
resp = client.chat.completions.create(
model=f"{POOL}:latency",
messages=[{"role": "user", "content": "Reply with the single word: pong"}],
)
print("latency ->", resp.model, "|", resp.choices[0].message.content)
Agent lanes
Pick the lane that matches where your agent already runs. Each card keeps status labels from the toolkit, including guide-only and [BETA] boundaries.
Claude
Use Chutes from Claude skills
Turn Chutes operations into agent-callable skills while keeping endpoint facts and credential handling centralized.
Add this repo as a plugin marketplace or copy plugins/chutes-ai/skills into the active Claude skills directory.
Hermes
Run Hermes on Chutes
Use Chutes for Hermes chat, delegation, background work, and account-aware tools without waiting for a built-in provider.
Store CHUTES_API_KEY in the Hermes environment, add a providers.
Cursor, Cline, Aider, and MCP clients
Bring Chutes to editor agents with MCP
Make Chutes discoverable and callable inside local coding agents while keeping secrets outside checked-in config.
Run generate_agent_config.
OpenClaw
Run channel agents on Chutes with OpenClaw
Put open-source TEE inference behind Discord, Slack, Matrix, and other channel agents through a self-hosted gateway.
Export CHUTES_API_KEY, add a chutes provider under models.
Codex
Run Codex on Chutes
Give Codex-style coding agents open-source, TEE-backed inference through the OpenAI API shape they already understand.
Set CHUTES_API_KEY in the agent runtime, point the OpenAI-compatible base URL at https://llm.
Generic OpenAI SDK agents
Use one OpenAI-compatible endpoint for many agent stacks
Reuse existing OpenAI-compatible client code while moving inference to Chutes-hosted open-source models.
Set the SDK base URL to https://llm.
What is on the menu
Discovery is public at GET /v1/models. The table renders IDs, context, prices, modalities, and feature flags from that response, then falls back to the vendored snapshot when the API is unreachable.
Live model menu
13 models. All report confidential_compute=true, verified Jun 25, 2026.
| Model | Context | $/1M in | $/1M out | Features | Good at |
|---|---|---|---|---|---|
| Qwen/Qwen3-32B-TEE | 40K | $0.104 | $0.416 | JSON modeTool callingStructured outputReasoning | cost-aware tool loops |
| google/gemma-4-31B-turbo-TEE | 128K | $0.12 | $0.37 | JSON modeTool callingStructured outputReasoning | vision-capable agents |
| Qwen/Qwen3.5-397B-A17B-TEE | 256K | $0.45 | $3.00 | JSON modeTool callingStructured outputReasoning | vision-capable agents |
| zai-org/GLM-5.1-TEE | 198K | $0.98 | $3.08 | JSON modeStructured outputTool callingReasoning | long-context work |
| Qwen/Qwen3.6-27B-TEE | 256K | $0.30 | $2.00 | JSON modeTool callingStructured outputReasoning | vision-capable agents |
| deepseek-ai/DeepSeek-V3.2-TEE | 128K | $1.00 | $1.00 | JSON modeTool callingReasoningStructured output | reasoning and chat |
| zai-org/GLM-5-TEE | 198K | $0.95 | $2.55 | JSON modeStructured outputTool callingReasoning | long-context work |
| moonshotai/Kimi-K2.6-TEE | 256K | $0.66 | $3.50 | JSON modeStructured outputTool callingReasoning | vision and video agents |
| MiniMaxAI/MiniMax-M2.5-TEE | 192K | $0.15 | $1.20 | JSON modeTool callingStructured outputReasoning | cost-aware tool loops |
| zai-org/GLM-5.2-TEE | 1.0M | $1.40 | $4.40 | JSON modeStructured outputTool callingReasoning | long-context work |
| unsloth/Mistral-Nemo-Instruct-2407-TEE | 128K | $0.0245 | $0.0978 | No advertised feature flags | general chat |
| moonshotai/Kimi-K2.5-TEE | 256K | $0.44 | $2.00 | JSON modeStructured outputTool callingReasoning | vision and video agents |
| Qwen/Qwen3-235B-A22B-Thinking-2507-TEE | 256K | $0.2989 | $1.1957 | JSON modeStructured outputTool callingReasoning | long-context work |
Let the catalog pick the pool
The picker is a TypeScript port of the toolkit's scripts/pick_model.py: same task presets, filters, blended-price sort, and routing-string builder.
Pick a live model pool
Filters use the same task presets as the toolkit picker. Catalog source: live, Jun 25, 2026.
model="google/gemma-4-31B-turbo-TEE:latency"Inline pools work with no dashboard setup. Saved aliases such as default:latency require a pool configured once at chutes.ai/app Model Routing.
Ship your agent on Chutes.
Start with a concrete model ID from the live /v1/models response, then add routing once the workload needs failover, latency, or throughput behavior.