12/23/2025

Confidential Compute for AI Inference: How Chutes Delivers Verifiable Privacy with Trusted Execution Environments

On-Chain
Confidential Compute for AI Inference: How Chutes Delivers Verifiable Privacy with Trusted Execution Environments

Your prompts. Your data. Hardware-protected from everyone—even us.

Chutes TEEs: Confidential AI you can verify, not just trust

The Chutes network is designed for an adversarial world: miners are anonymous and permissionless. 

Given this new reality, security is no longer “just trust me” it must become “don’t trust, verify.”

That’s the core idea behind Chutes’ security model: a defense‑in‑depth stack that verifies software integrity, attests to hardware properties, restricts runtime behavior, and encrypts communications end-to-end.

With Trusted Execution Environments (TEEs) deployed via sek8s, Chutes extends those guarantees into the hardest part of the threat model: protecting data while it’s actively being processed, even from a compromised host or the machine owner themselves.

The baseline: defense-in-depth for every chute

Chutes applies standard security measures to all chutes, whether or not they run in a TEE. The internal security model describes five pillars:

  • End‑to‑end encryption
  • Code and filesystem integrity
  • Environment attestation
  • Containment (capability restriction, including network)
  • TEE mode for the highest assurance workloads

Integrity and verification components

Some core security components are closed-source but can be described at a functional level.

Key examples include:

  • cfsv: challenge-based filesystem validation against a build-time source of truth.
  • inspecto: Python bytecode inspection with hashing to detect tampering and “logic bombs” that naive file hashing can miss.
  • cllmv: per-token verification hashes bound to the exact Hugging Face model name and revision, making it infeasible to spoof outputs from an unapproved or lower-capability model.
  • envdump: environment snapshots for validator-side verification of the miner’s runtime context.
  • chutes-net-nanny: runtime containment, including outbound network control, DNS verification, and deliberate failure on common exec into pod > sidecar > local service not in process tree style security circumvention attempts.
  • graval-priv: a proprietary chutes GPU attestation scheme (“Proof of Consecutive VRAM Work”) used to prevent miners from misrepresenting GPU capabilities. This enables creation of a unique AES‑256 key tied to verified physical GPU properties.

On the validator side, a solution known as forge is treated as the “source of truth”: it builds chute images in controlled stages, generates baselines, scans for vulnerabilities, and cryptographically signs each image using Sigstore's Cosign.

Continuous monitoring, not just startup checks

Chutes’ watchtower service continuously monitors active miners and can issue randomized integrity challenges at any time. These include software integrity checks and model-weight verification (hashing a random slice at a random offset) to defeat “bait-and-switch” attacks. If a miner fails these checks or doesn’t respond, it’s automatically removed from the network.

Why TEEs matter: protecting data-in-use from the host

Traditional container isolation protects against some threats, but it does not protect you from a hostile host OS or hypervisor, the exact scenario you should always be aware of when miners are anonymous, but you still care about operational security.

Chutes’ TEE mode is powered by sek8s, a security-hardened Kubernetes distribution designed to run workloads inside Intel® TDX confidential VMs, so what are they?

Intel® TDX: confidential VMs and an untrusted hypervisor

TDX creates a “Trust Domain” (TD) where memory is encrypted with keys known only to the CPU, and the hypervisor/VMM is removed from the trust boundary (it cannot inspect registers or memory).

The point is simple: even if the host is root‑compromised, reading the chute’s data-in-use becomes hardware-blocked.

NVIDIA Protected PCIe: extending confidentiality to GPU workflows

AI workloads don’t stop at CPU memory; they rely heavily on GPUs. Chutes uses NVIDIA GPUs that support confidential computing with Protected PCIe (PPCIE), creating an encrypted CPU↔GPU channel so data/code in transit across PCIe can’t be snooped on.

Proving trust before execution: remote attestation with measured boot

A true TEE is only useful if you can prove you’re really running inside the expected confidential environment.

The sek8s attestation flow includes:

  • Measurements stored in RTMRs (Runtime Measurement Registers) taken during boot.
  • A TD Quote (Trust Domain) signed by a CPU-fused key, bound to a validator-provided nonce (anti-replay).
  • Validator verification using Intel public keys, GPU attestation checks, and comparison against a known-good “golden” configuration.

Critically, sek8s ties attestation directly to disk access: the guest root filesystem is encrypted with LUKS (Linux Unified Key), and the decryption key is only released after successful attestation. This stops a modified environment from booting into a usable state.

Ensuring only approved code runs: cosign admission control

Even a perfect TEE won’t save you if the code running inside it is malicious.

To close that gap, sek8s runs a strict admission controller: pods can only be scheduled if their images are signed with Chutes’ cosign key, linking back to the signed output of the validator-side forge.

TEEs aren’t enough on their own (and Chutes says so)

There is an important reality that always needs to be considered: a TEE protects you from the host, but it doesn’t inherently protect you from malicious code inside the TEE (e.g., a chute that logs prompts).

Chutes’ answer is to combine TEEs with:

  • Verified builds (forge baselines + cosign signatures)
  • Continuous integrity checks (watchtower random challenges)
  • Strong egress control (net-nanny is optional but typically enabled; when enabled it blocks outbound traffic by default except the validator proxy)

That combination is what turns “confidential computing” into something you can actually rely on in an adversarial network environment.

End-to-end: what a TEE-backed LLM request looks like

Lets walk through an SGLang LLM request inside a sek8s TEE:

  1. Build: forge builds the image, produces baselines, signs with cosign.
  2. Deploy: validator attests sek8s; job is admitted only if signature checks pass.
  3. Startup handshake: the entrypoint validates cfsv/inspecto; validator issues an ephemeral symmetric key (GraValMiddleware).
  4. SGLang hardening: server runs with a password and binds only to 127.0.0.1 so only authenticated, proxied calls can reach it.
  5. Inference: request is decrypted inside the Intel TDX trust domain; GPU compute is protected by TDX + NVIDIA PPCIE; watchtower may issue a live weight-slice challenge; cllmv emits per-token proofs; response is encrypted inside the TD.

Verifiability: the end goal is independent proof

A core theme is “radical verifiability”: cryptographic reports only matter if you can compare them to a known-good baseline.

Chutes is  building toward real-time, public access to TD Quotes, NVIDIA attestations, and full IMA software manifests so third parties can independently validate the environment and running code.