Agent Hub

Your agent already speaks our API.

Chutes serves leading Open Source AI models from Z.ai, DeepSeek, Moonshot AI, MiniMax AI, Google and many more through both an OpenAI-compatible and an Anthropic-compatible endpoint. Point any agent at it, send the key as Bearer auth, and choose models from the live catalog instead of baking IDs into your release.

Base URL
https://llm.chutes.ai/v1
Auth
Authorization: Bearer cpk_...
Catalog
13 models verified Jun 25, 2026
TEE flag
Every listed model reports confidential_compute=true

Why agent builders pick Chutes

Zero integration cost

Chutes follows the OpenAI API shape for chat completions, streaming, tool calling, JSON mode, structured outputs, and vision-capable models. Frameworks that accept a base URL can keep their existing client code.

Privacy you can verify

The 13 models returned by the catalog all report confidential_compute=true as of Jun 25, 2026. For sensitive work, filter on that boolean and use attestation evidence before making stronger claims.

Routing without new infrastructure

Pass a concrete model ID for zero setup, or pass an inline comma-separated pool and append :latency or :throughput. Saved default aliases require a one-time pool configuration in the dashboard.

From zero to first completion

Start with the same three facts every OpenAI-compatible agent needs: the Chutes base URL, Bearer auth, and a current model ID from GET /v1/models.

01

Get a key

Create a cpk_ key at chutes.ai/auth/start and keep it in your secret store.

02

Read the catalog

GET /v1/models returns the model IDs, prices, context, modalities, and feature flags.

03

Call completions

Use https://llm.chutes.ai/v1 and send Authorization: Bearer $CHUTES_API_KEY.

Curl smoke test
bash
# Read the public catalog first.
curl https://llm.chutes.ai/v1/models

# Use a current model ID from that response.
export CHUTES_API_KEY=cpk_...
export CHUTES_MODEL="Qwen/Qwen3-32B-TEE"

curl https://llm.chutes.ai/v1/chat/completions \
  -H "Authorization: Bearer $CHUTES_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "'"$CHUTES_MODEL"'", "messages": [{"role": "user", "content": "Say hello in one sentence."}]}'
Open Python cookbook source

Built for agent failure modes

One model going down should not take your agent with it. The picker selected this inline pool from the current catalog using the toolkit's agentic task preset.

Live-driven routing string
python
model="Qwen/Qwen3-32B-TEE,google/gemma-4-31B-turbo-TEE,MiniMaxAI/MiniMax-M2.5-TEE:latency"
Routing and failover
python
# Source: chutes-agent-toolkit/cookbook/python/05_routing_failover.py
"""Inline model routing: failover pool + latency strategy. [VERIFIED 2026-06-11: ran live against the paid API]

Pass several model IDs comma-separated in the `model` field:
  - plain list            -> sequential failover (try in order)
  - list + ":latency"     -> fastest first token right now
  - list + ":throughput"  -> most tokens/sec right now
A single concrete model ID bypasses routing entirely.
Run: CHUTES_API_KEY=cpk_... python 05_routing_failover.py
"""

import os

from openai import OpenAI

POOL = os.environ.get(
    "CHUTES_POOL",
    "MiniMaxAI/MiniMax-M2.5-TEE,deepseek-ai/DeepSeek-V3.2-TEE,zai-org/GLM-5-TEE",
)

client = OpenAI(base_url="https://llm.chutes.ai/v1", api_key=os.environ["CHUTES_API_KEY"])

# Sequential failover: if the first model is busy or down, the next one answers.
resp = client.chat.completions.create(
    model=POOL,
    messages=[{"role": "user", "content": "Which model are you? One sentence."}],
)
print("failover  ->", resp.model, "|", resp.choices[0].message.content)

# Latency strategy: route to whichever pool member has the lowest TTFT right now.
resp = client.chat.completions.create(
    model=f"{POOL}:latency",
    messages=[{"role": "user", "content": "Reply with the single word: pong"}],
)
print("latency   ->", resp.model, "|", resp.choices[0].message.content)
Toolkit source

Agent lanes

Pick the lane that matches where your agent already runs. Each card keeps status labels from the toolkit, including guide-only and [BETA] boundaries.

Claude

Use Chutes from Claude skills

Active

Turn Chutes operations into agent-callable skills while keeping endpoint facts and credential handling centralized.

Setup

Add this repo as a plugin marketplace or copy plugins/chutes-ai/skills into the active Claude skills directory.

Hermes

Run Hermes on Chutes

Active

Use Chutes for Hermes chat, delegation, background work, and account-aware tools without waiting for a built-in provider.

Setup

Store CHUTES_API_KEY in the Hermes environment, add a providers.

Cursor, Cline, Aider, and MCP clients

Bring Chutes to editor agents with MCP

Write tools [BETA]

Make Chutes discoverable and callable inside local coding agents while keeping secrets outside checked-in config.

Setup

Run generate_agent_config.

OpenClaw

Run channel agents on Chutes with OpenClaw

[BETA] docs

Put open-source TEE inference behind Discord, Slack, Matrix, and other channel agents through a self-hosted gateway.

Setup

Export CHUTES_API_KEY, add a chutes provider under models.

Codex

Run Codex on Chutes

Guide only

Give Codex-style coding agents open-source, TEE-backed inference through the OpenAI API shape they already understand.

Setup

Set CHUTES_API_KEY in the agent runtime, point the OpenAI-compatible base URL at https://llm.

Generic OpenAI SDK agents

Use one OpenAI-compatible endpoint for many agent stacks

Active

Reuse existing OpenAI-compatible client code while moving inference to Chutes-hosted open-source models.

Setup

Set the SDK base URL to https://llm.

What is on the menu

Discovery is public at GET /v1/models. The table renders IDs, context, prices, modalities, and feature flags from that response, then falls back to the vendored snapshot when the API is unreachable.

Live model menu

13 models. All report confidential_compute=true, verified Jun 25, 2026.

Live API
ModelContext$/1M in$/1M outFeaturesGood at
Qwen/Qwen3-32B-TEE40K$0.104$0.416
JSON modeTool callingStructured outputReasoning
cost-aware tool loops
google/gemma-4-31B-turbo-TEE128K$0.12$0.37
JSON modeTool callingStructured outputReasoning
vision-capable agents
Qwen/Qwen3.5-397B-A17B-TEE256K$0.45$3.00
JSON modeTool callingStructured outputReasoning
vision-capable agents
zai-org/GLM-5.1-TEE198K$0.98$3.08
JSON modeStructured outputTool callingReasoning
long-context work
Qwen/Qwen3.6-27B-TEE256K$0.30$2.00
JSON modeTool callingStructured outputReasoning
vision-capable agents
deepseek-ai/DeepSeek-V3.2-TEE128K$1.00$1.00
JSON modeTool callingReasoningStructured output
reasoning and chat
zai-org/GLM-5-TEE198K$0.95$2.55
JSON modeStructured outputTool callingReasoning
long-context work
moonshotai/Kimi-K2.6-TEE256K$0.66$3.50
JSON modeStructured outputTool callingReasoning
vision and video agents
MiniMaxAI/MiniMax-M2.5-TEE192K$0.15$1.20
JSON modeTool callingStructured outputReasoning
cost-aware tool loops
zai-org/GLM-5.2-TEE1.0M$1.40$4.40
JSON modeStructured outputTool callingReasoning
long-context work
unsloth/Mistral-Nemo-Instruct-2407-TEE128K$0.0245$0.0978No advertised feature flagsgeneral chat
moonshotai/Kimi-K2.5-TEE256K$0.44$2.00
JSON modeStructured outputTool callingReasoning
vision and video agents
Qwen/Qwen3-235B-A22B-Thinking-2507-TEE256K$0.2989$1.1957
JSON modeStructured outputTool callingReasoning
long-context work

Let the catalog pick the pool

The picker is a TypeScript port of the toolkit's scripts/pick_model.py: same task presets, filters, blended-price sort, and routing-string builder.

Pick a live model pool

Filters use the same task presets as the toolkit picker. Catalog source: live, Jun 25, 2026.

Pick 1
google/gemma-4-31B-turbo-TEE
TEE
$0.12/M in$0.37/M out128K ctx
Routing string
python
model="google/gemma-4-31B-turbo-TEE:latency"

Inline pools work with no dashboard setup. Saved aliases such as default:latency require a pool configured once at chutes.ai/app Model Routing.

Ship your agent on Chutes.

Start with a concrete model ID from the live /v1/models response, then add routing once the workload needs failover, latency, or throughput behavior.