What base URL do OpenAI-compatible agents use for Chutes?

Use https://llm.chutes.ai/v1 for OpenAI-compatible inference.

How should Chutes inference requests authenticate?

Use Authorization: Bearer with a cpk_ API key. Do not use X-API-Key for inference.

Agent Hub

Your agent already speaks our API.

Chutes serves leading Open Source AI models from Z.ai, DeepSeek, Moonshot AI, MiniMax AI, Google and many more through both an OpenAI-compatible and an Anthropic-compatible endpoint. Point any agent at it, send the key as Bearer auth, and choose models from the live catalog instead of baking IDs into your release.

Get an API key Connect your agent

Base URL

https://llm.chutes.ai/v1

Auth

Authorization: Bearer cpk_...

Catalog

13 models verified Jun 25, 2026

TEE flag

Every listed model reports confidential_compute=true

Why agent builders pick Chutes

Zero integration cost

Chutes follows the OpenAI API shape for chat completions, streaming, tool calling, JSON mode, structured outputs, and vision-capable models. Frameworks that accept a base URL can keep their existing client code.

Privacy you can verify

The 13 models returned by the catalog all report confidential_compute=true as of Jun 25, 2026. For sensitive work, filter on that boolean and use attestation evidence before making stronger claims.

Routing without new infrastructure

Pass a concrete model ID for zero setup, or pass an inline comma-separated pool and append :latency or :throughput. Saved default aliases require a one-time pool configuration in the dashboard.

From zero to first completion

Start with the same three facts every OpenAI-compatible agent needs: the Chutes base URL, Bearer auth, and a current model ID from GET /v1/models.

Get a key

Create a cpk_ key at chutes.ai/auth/start and keep it in your secret store.

Read the catalog

GET /v1/models returns the model IDs, prices, context, modalities, and feature flags.

Call completions

Use https://llm.chutes.ai/v1 and send Authorization: Bearer $CHUTES_API_KEY.

Curl smoke test

bash

# Read the public catalog first.
curl https://llm.chutes.ai/v1/models

# Use a current model ID from that response.
export CHUTES_API_KEY=cpk_...
export CHUTES_MODEL="Qwen/Qwen3-32B-TEE"

curl https://llm.chutes.ai/v1/chat/completions \
  -H "Authorization: Bearer $CHUTES_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "'"$CHUTES_MODEL"'", "messages": [{"role": "user", "content": "Say hello in one sentence."}]}'

Open Python cookbook source

Built for agent failure modes

One model going down should not take your agent with it. The picker selected this inline pool from the current catalog using the toolkit's agentic task preset.

Live-driven routing string

python

model="Qwen/Qwen3-32B-TEE,google/gemma-4-31B-turbo-TEE,MiniMaxAI/MiniMax-M2.5-TEE:latency"

Routing and failover

python

# Source: chutes-agent-toolkit/cookbook/python/05_routing_failover.py
"""Inline model routing: failover pool + latency strategy. [VERIFIED 2026-06-11: ran live against the paid API]

Pass several model IDs comma-separated in the `model` field:
  - plain list            -> sequential failover (try in order)
  - list + ":latency"     -> fastest first token right now
  - list + ":throughput"  -> most tokens/sec right now
A single concrete model ID bypasses routing entirely.
Run: CHUTES_API_KEY=cpk_... python 05_routing_failover.py
"""

import os

from openai import OpenAI

POOL = os.environ.get(
    "CHUTES_POOL",
    "MiniMaxAI/MiniMax-M2.5-TEE,deepseek-ai/DeepSeek-V3.2-TEE,zai-org/GLM-5-TEE",
)

client = OpenAI(base_url="https://llm.chutes.ai/v1", api_key=os.environ["CHUTES_API_KEY"])

# Sequential failover: if the first model is busy or down, the next one answers.
resp = client.chat.completions.create(
    model=POOL,
    messages=[{"role": "user", "content": "Which model are you? One sentence."}],
)
print("failover  ->", resp.model, "|", resp.choices[0].message.content)

# Latency strategy: route to whichever pool member has the lowest TTFT right now.
resp = client.chat.completions.create(
    model=f"{POOL}:latency",
    messages=[{"role": "user", "content": "Reply with the single word: pong"}],
)
print("latency   ->", resp.model, "|", resp.choices[0].message.content)

Toolkit source

Agent lanes

Pick the lane that matches where your agent already runs. Each card keeps status labels from the toolkit, including guide-only and [BETA] boundaries.

Claude

Use Chutes from Claude skills

Active

Turn Chutes operations into agent-callable skills while keeping endpoint facts and credential handling centralized.

Setup

Add this repo as a plugin marketplace or copy plugins/chutes-ai/skills into the active Claude skills directory.

Open guide Claude install

Hermes

Run Hermes on Chutes

Active

Use Chutes for Hermes chat, delegation, background work, and account-aware tools without waiting for a built-in provider.

Setup

Store CHUTES_API_KEY in the Hermes environment, add a providers.

Open guide Hermes guide

Cursor, Cline, Aider, and MCP clients

Bring Chutes to editor agents with MCP

Write tools [BETA]

Make Chutes discoverable and callable inside local coding agents while keeping secrets outside checked-in config.

Setup

Run generate_agent_config.

Open guide OpenAI-compatible agent guide

OpenClaw

Run channel agents on Chutes with OpenClaw

[BETA] docs

Put open-source TEE inference behind Discord, Slack, Matrix, and other channel agents through a self-hosted gateway.

Setup

Export CHUTES_API_KEY, add a chutes provider under models.

Open guide OpenClaw guide

Codex

Run Codex on Chutes

Guide only

Give Codex-style coding agents open-source, TEE-backed inference through the OpenAI API shape they already understand.

Setup

Set CHUTES_API_KEY in the agent runtime, point the OpenAI-compatible base URL at https://llm.

Open guide Codex guide

Generic OpenAI SDK agents

Use one OpenAI-compatible endpoint for many agent stacks

Active

Reuse existing OpenAI-compatible client code while moving inference to Chutes-hosted open-source models.

Setup

Set the SDK base URL to https://llm.

Open guide Endpoint guide

What is on the menu

Discovery is public at GET /v1/models. The table renders IDs, context, prices, modalities, and feature flags from that response, then falls back to the vendored snapshot when the API is unreachable.

Live model menu

13 models. All report confidential_compute=true, verified Jun 25, 2026.

Live API

Model	Context	$/1M in	$/1M out	Features	Good at
Qwen/Qwen3-32B-TEE	40K	$0.104	$0.416	JSON modeTool callingStructured outputReasoning	cost-aware tool loops
google/gemma-4-31B-turbo-TEE	128K	$0.12	$0.37	JSON modeTool callingStructured outputReasoning	vision-capable agents
Qwen/Qwen3.5-397B-A17B-TEE	256K	$0.45	$3.00	JSON modeTool callingStructured outputReasoning	vision-capable agents
zai-org/GLM-5.1-TEE	198K	$0.98	$3.08	JSON modeStructured outputTool callingReasoning	long-context work
Qwen/Qwen3.6-27B-TEE	256K	$0.30	$2.00	JSON modeTool callingStructured outputReasoning	vision-capable agents
deepseek-ai/DeepSeek-V3.2-TEE	128K	$1.00	$1.00	JSON modeTool callingReasoningStructured output	reasoning and chat
zai-org/GLM-5-TEE	198K	$0.95	$2.55	JSON modeStructured outputTool callingReasoning	long-context work
moonshotai/Kimi-K2.6-TEE	256K	$0.66	$3.50	JSON modeStructured outputTool callingReasoning	vision and video agents
MiniMaxAI/MiniMax-M2.5-TEE	192K	$0.15	$1.20	JSON modeTool callingStructured outputReasoning	cost-aware tool loops
zai-org/GLM-5.2-TEE	1.0M	$1.40	$4.40	JSON modeStructured outputTool callingReasoning	long-context work
unsloth/Mistral-Nemo-Instruct-2407-TEE	128K	$0.0245	$0.0978	No advertised feature flags	general chat
moonshotai/Kimi-K2.5-TEE	256K	$0.44	$2.00	JSON modeStructured outputTool callingReasoning	vision and video agents
Qwen/Qwen3-235B-A22B-Thinking-2507-TEE	256K	$0.2989	$1.1957	JSON modeStructured outputTool callingReasoning	long-context work

Let the catalog pick the pool

The picker is a TypeScript port of the toolkit's scripts/pick_model.py: same task presets, filters, blended-price sort, and routing-string builder.

Pick a live model pool

Filters use the same task presets as the toolkit picker. Catalog source: live, Jun 25, 2026.

Need

Task presetRouting

Require confidential_compute

Pick 1

google/gemma-4-31B-turbo-TEE

TEE

$0.12/M in$0.37/M out128K ctx

Routing string

python

model="google/gemma-4-31B-turbo-TEE:latency"

Inline pools work with no dashboard setup. Saved aliases such as default:latency require a pool configured once at chutes.ai/app Model Routing.

Ship your agent on Chutes.

Start with a concrete model ID from the live /v1/models response, then add routing once the workload needs failover, latency, or throughput behavior.

Cookbook Browse live models