This document provides a comprehensive overview of the security measures implemented within the Chutes serverless compute platform. Our security model is built on a defense-in-depth strategy, with multiple layers of verification and protection to ensure the integrity of the compute environment and the privacy of user data.
Guiding Principles
The Chutes network is designed for an adversarial environment where miners are anonymous and permissionless. Our security posture is therefore built on the principle of "don't trust, verify." We employ a multi-faceted approach to security, including:
End-to-End Encryption: All communication between the user, the validator, and the miner is encrypted.
Code and Filesystem Integrity: We continuously verify that the code running on miners' machines has not been tampered with.
Environment Attestation: We collect and verify detailed information about the miner's hardware and software environment.
Containment: We strictly limit the capabilities of the running code, including network access and access to the host system.
Trusted Execution Environments (TEE): For the highest level of assurance, we leverage Intel TDX and NVIDIA GPUs to create a fully isolated and verifiable compute environment.
Security Layers
The following sections detail the different layers of our security model, from the base-level protections applied to all chutes to the advanced TEE-based measures.
2. Standard Security Measures (Non-TEE)
These security measures are applied to all chutes running on the network, regardless of whether they are in a TEE or not. They form the baseline of trust and verification for the entire platform.
Private Security Components (High-Level Overview)
The Chutes platform utilizes several closed-source security components to protect against various attack vectors. While the source code for these components is not public, their functionality is described below.
cfsv (Chutes Secure Filesystem Validation): Responsible for ensuring the integrity of the container's filesystem. It works by building an index of all files and generating secure cryptographic digests based on random challenge seeds provided by the validator. This prevents unauthorized modifications to the filesystem. The source-of-truth for these digests is generated during the image build process.
cllmv (Chutes Large Language Model Verification): This component integrates with the SGLang inference engine to provide per-token verification hashes. Crucially, the specified Hugging Face model name and exact revision hash are cryptographically bound into the per-token proofs. This allows for cryptographic verification that every single token of output was generated by the exact model and revision claimed by the miner, making it impossible to spoof results from a cheaper or different model.
envdump (Environment Dump): Securely collects a comprehensive snapshot of the miner's environment. This includes environment variables, filesystem information, kernel details, and loaded Python modules. This data is sent to the validator to ensure the miner's environment conforms to the expected configuration.
inspecto (Python Code Inspection): This tool performs static analysis of Python bytecode for all loaded modules. It detects and prevents attempts by a miner to override standard library paths or insert malicious "logic bombs" that a simple file hash might miss. It generates a secure hash of the bytecode, which is compared against a source-of-truth hash generated at image build time.
chutes-net-nanny (Network and Process Nanny): A critical component for runtime security and containment. Its responsibilities include:
Network Access Control: Limits outbound network connections to a predefined set of hosts.
Filesystem Encryption: Encrypts the main "chute" source file to protect intellectual property.
Integrity Verification: Uses self-referencing hashes to ensure its own integrity.
DNS Verification: Prevents DNS spoofing attacks.
Pod Access Prevention: Intentionally causes a segmentation fault if any attempt is made to exec into the pod, run a sidecar container, or connect to a local service not in the process tree. This defeats a huge class of common container-based attacks.
graval-priv (GPU Attestation): This component provides "Proof of Consecutive VRAM Work" to cryptographically attest to the physical properties of the GPU. It uses OpenCL and the clBLAS library for broad compatibility with GPUs from different manufacturers, including NVIDIA and AMD. The process involves performing a series of consecutive matrix multiplications on the GPU. To create a verifiable yet efficient benchmark, it takes diagonal memory slices from the matrices, drastically reducing data transfer overhead while retaining a cryptographic proof that the full multiplication occurred. The time taken to complete these operations, combined with the memory access patterns, provides a hardware-level signature of the GPU's processing speed and available VRAM. This prevents miners from fraudulently claiming to have a more powerful GPU than they actually possess. This attestation process also enables the creation of a unique AES-256 encryption key based on the specific GPU's UUID and a random challenge, tying the secure communication channel to the verified physical hardware.
Public Security Components (Detailed Description)
The following open-source components are key to the Chutes security model.
chutes (Miner-side Library): The core library that is injected into every chute container. It orchestrates the entire startup and validation process from the miner's perspective. The main logic is in chutes/chutes/entrypoint/run.py, which executes a multi-stage security handshake to ensure the integrity of the environment before any user code is run. For specific applications like SGLang LLMs, the Chutes library wrapper implements additional hardening: it launches the SGLang process with a password and strictly binds it only to the loopback interface (127.0.0.1). This ensures that nothing can directly access the inference server on the miner node except authenticated, validated, and signed calls originating from the validator, which are securely proxied through the Chutes library wrapper itself.
chutes-api (Validator and API): The central validator and API server for the Chutes network. It is responsible for creating the trusted environment that miners must adhere to, validating miners against that baseline, and securely relaying requests. Its key security functions are distributed across several components:
**api/image/forge.py: The Source of Truth** This is arguably the most critical security component on the validator side. The forge is responsible for building all chute images that run on the network. It establishes the "source of truth" that all miners are subsequently validated against. It performs controlled, multi-stage builds, generates filesystem and bytecode baselines, scans for vulnerabilities, and cryptographically signs the final image with cosign.
**api/graval_worker.py and api/instance/router.py: Miner Validation and Activation** These components handle the other side of the conversation with the miner's entrypoint/run.py, verifying the initial handshake, performing hardware attestation, and issuing the symmetric encryption key only upon successful validation of all proofs.
**watchtower.py: Continuous Monitoring and Active Defense** The watchtower is an active defense system that continuously monitors the health and integrity of all active miners on the network. It goes beyond simple liveness checks and performs deep, randomized validation:
Software Integrity Checks: It can issue random challenges to miners at any time, instructing them to perform on-demand cfsv, inspecto, or envdump checks and return the results.
Model Weight Verification: To ensure the correct model is loaded and to defeat "bait-and-switch" attacks (where a miner loads the correct model at startup but swaps it for a cheaper one later), the watchtower can command a chute to read its model files at a random offset and return a SHA256 hash of that data slice. The validator compares this against the correct hash for the specified model, making it computationally infeasible for a miner to use a different or modified set of model weights.
If a miner fails any of these checks or does not respond, it is immediately removed from the network.
chutes-miner (Miner Management): This repository contains the tools for miners to manage their chute deployments. It acts as the local enforcement layer, translating the validator's desired state into actual running pods on the miner's Kubernetes cluster, using a JWT-based authorization flow to ensure no chute can launch without explicit permission.
3. TEE Security Measures: The sek8s Environment
While the standard security measures provide a robust defense-in-depth strategy, for users who require the highest possible level of assurance and data confidentiality, Chutes offers deployment in a Trusted Execution Environment (TEE). This is powered by our custom, security-hardened Kubernetes distribution, sek8s.
The sek8s environment, located in the public sek8s repository, is designed from the ground up to run workloads within Intel® Trust Domain Extensions (TDX) confidential virtual machines. When a chute runs in a sek8s environment, it is not just protected by our standard validation mechanisms; it is further isolated by hardware-level security guarantees. This provides a verifiable and impenetrable black box for your data and code.
Here are the key security features of sek8s, which work in concert with all the previously mentioned security layers:
Intel® TDX Deep Dive: Creating the Confidential VM
Intel® TDX is the cornerstone of our TEE offering. It allows us to create a special type of virtual machine called a Trust Domain (TD) that is isolated from almost everything else on the system.
Secure Arbitration Mode (SEAM): TDX introduces a new CPU mode called SEAM. This is a hardware-enforced layer that sits alongside the standard VMX modes used by hypervisors. A special, Intel-signed and hardware-resident module called the "TDX module" operates within SEAM. This module is responsible for creating, managing, and tearing down Trust Domains. The key is that the host's hypervisor (or Virtual Machine Monitor, VMM) is no longer fully in control; it must make requests to the TDX module to interact with a TD, and the TDX module will refuse any request that would violate the TD's confidentiality or integrity.
Memory Encryption and Integrity: The primary guarantee of TDX is that the memory used by a TD is encrypted using a key known only to the CPU. If the hypervisor, or an attacker with root access on the host, tries to read the memory of a running chute, they will only see ciphertext. Furthermore, TDX provides memory integrity protection, which prevents attackers from replaying or tampering with the encrypted memory pages.
Data Isolation: Because of SEAM and memory encryption, the VMM/hypervisor is removed from the trust boundary. It is treated as untrusted. It can no longer inspect the CPU registers or memory of the TD. This means the host operator, and any malware on the host, is physically prevented by the CPU from seeing a user's data-in-use inside the chute.
NVIDIA Confidential Computing with Protected PCIe (PPCIE)
Modern AI workloads are not confined to the CPU. To provide a true end-to-end TEE, the trust boundary must be extended to the GPU.
The Problem: The PCIe bus, which connects the CPU and GPU, is traditionally unencrypted. An attacker with physical access or sufficient host compromise could potentially snoop this bus to intercept data as it travels to and from the GPU.
The Solution: We use NVIDIA GPUs (such as the H100) that support Confidential Computing mode with Protected PCIe. In this mode, the GPU and CPU establish a secure, encrypted channel over the PCIe bus. All data and code sent to the GPU for processing are encrypted, protecting them from bus snooping attacks. This ensures that your data remains confidential even as it's being used for high-speed training or inference on the GPU.
Full System Attestation: Proving Trust Before Execution
Before a TEE-enabled chute is even started, the validator performs a full remote attestation of the sek8s environment to prove that it is genuine and untampered.
The Measurement (RTMR): During the boot process of the Trust Domain, the TDX module performs a series of cryptographic measurements. It measures the firmware, the bootloader, the kernel, and other critical software components. These measurements are stored in special CPU registers called Runtime Memory Measurement Registers (RTMRs). Any change to the software, no matter how small, will result in a different RTMR value.
The Quote: The sek8s node can request that the TDX module generate a "TD Quote." This is a data structure that is cryptographically signed by a private key fused into the CPU itself. The Quote contains the RTMR values, a nonce provided by the validator (to prevent replay attacks), and other important metadata.
The Verification: The attestation process is as follows:
The validator generates a random nonce and sends it to the miner's sek8s node.
The sek8s node requests a TD Quote from the CPU, including the nonce. It also gathers an attestation report from the NVIDIA GPU.
The node sends both the CPU's TD Quote and the NVIDIA attestation report to the validator.
The validator first checks the cryptographic signature on the TD Quote using Intel's public keys to confirm it came from a genuine Intel CPU with TDX enabled. It then checks the NVIDIA report.
Finally, it compares the RTMR measurements inside the Quote with a known-good "golden" configuration for sek8s.
Only if every single measurement matches does attestation pass. This proves, with cryptographic certainty, that the hardware and software stack on the miner's machine is exactly what it is supposed to be.
Encrypted and Measured Root Filesystem: This attestation is tied directly to the filesystem's accessibility. The root filesystem of the sek8s guest environment is encrypted with LUKS. The decryption key is only released by a secure service after a successful attestation. This means the node cannot even boot into a usable state if its underlying software has been modified in any way. Any change to the filesystem would alter the measurements, cause attestation to fail, and prevent the decryption key from being released, rendering the node inoperable.
cosign Image Admission Controller
The final link in the chain of trust is ensuring that only authorized code runs within the attested, confidential environment.
The sek8s Kubernetes API server is configured with a strict admission controller that intercepts all pod creation requests. This controller will only allow a pod to be scheduled if its container image has been cryptographically signed by Chutes' cosign key. This connects back to the chutes-apiforge, which signs every image it builds. It makes it impossible to run a malicious or tampered image inside the sek8s TEE.
Hardened Environment & No Backdoors
The sek8s environment is stripped down to the bare essentials. There are no SSH daemons, remote access tools, or unnecessary services running. Deployment and management are handled exclusively through the locked-down Kubernetes API, which itself is subject to strict authentication and authorization controls.
The TEE Guarantee
When you run a chute in TEE mode, you are not just trusting our software validation stack; you are relying on hardware-enforced cryptographic guarantees from Intel and NVIDIA. The combination of remote attestation, encrypted memory, and a locked-down, measured environment means you can be confident that:
Your code is running on genuine, untampered hardware.
The software environment is exactly what Chutes has defined, with no modifications.
No one, not even the machine's owner, can access or view your data while it is being processed.
This provides the strongest possible protection against data exfiltration and intellectual property theft, making Chutes a uniquely secure platform for sensitive AI workloads.
4. Verifiability and Trust
The previous sections detailed the "how" of Chutes' security model. This section details the "why," explaining how these features combine to create a platform that is not just secure, but transparently and verifiably so.
Model and Configuration Transparency
A cornerstone of the Chutes platform is eliminating the ambiguity common in other compute networks. When you use a Chutes model, you know exactly what you are getting and how it's running.
For any public chute, you can visit its page on the chutes.ai website and click the "Source" tab to inspect its complete, reproducible configuration. This includes:
Full Source Code: The exact Python code for the chute is visible.
Inference Engine Arguments: The precise engine_args used to launch the inference server (e.g., SGLang) are listed, showing every flag and setting.
GPU Requirements: The specific GPU models the chute is designed and validated to run on.
Hugging Face Model & Revision: The exact model_name and, most importantly, the locked revision (commit hash) from Hugging Face are clearly defined. We virtually never use quantized models; if we did, the quantization configuration would also be explicitly defined here.
Open Source SGLang Fork: The version of our SGLang fork used is open source and can be inspected on GitHub, and is generally kept in sync with the main upstream sglang project.
This transparency means there is no "black box" when it comes to the model itself. You can verify the exact, non-quantized, revision-locked model you are paying for before you ever make an API call.
The Chutes Difference: A Comparison with Opaque AI Platforms
The verifiability of the Chutes platform stands in stark contrast to the "trust me, bro" model of typical closed, centralized AI platforms. When considering security and integrity, the difference is fundamental.
Question a Skeptic Would Ask
Typical Opaque Platform (e.g., "ACME LLM, Inc.")
The Chutes Verifiable Answer
Which model am I really using?
You are told you're using ACME-Chat-v3-Turbo, but you have no way to verify if it's the latest version or an older, cheaper one.
You can see the exact Hugging Face model_name and revision hash on chutes.ai for the specific chute you are using.
Is the model quantized or modified?
You don't know. They might be using a heavily quantized (e.g., 4-bit) or "lobotomized" version of the model to save on costs, delivering lower quality results.
You can see the exact engine_args and source code. Chutes almost never uses quantized models, and if so, it would be explicitly declared. The watchtower's random hash checks of the model files ensure the weights on disk are the ones you expect.
What code is processing my prompt?
It's a proprietary secret, running in their data center. You are trusting that their internal code has no bugs, no malicious logic, and does what the privacy policy says.
The code for the chute, the chutes library, and the SGLang fork are all open source. inspecto verifies the bytecode at runtime.
How is my data protected while in use?
You have to trust their internal security practices and their privacy policy. A single rogue employee or host-level vulnerability could expose your data.
Verifiable hardware isolation. With sek8s, your data is protected by Intel TDX memory encryption and NVIDIA PPCIE. Not even the owner of the machine can see your data in memory. This is a physical guarantee, not a policy promise.
Is my prompt being logged or used for training?
Their privacy policy says no, but you have no way to prove it. Malicious or accidental logging is a significant risk.
The code is open and auditable. More importantly, chutes-net-nanny blocks all outbound network traffic by default, so even if the code tried to exfiltrate your data, it would be blocked by a lower-level security layer.
How do I know the environment is secure?
You don't. You are trusting their infrastructure security, which is completely opaque to you.
You can verify it yourself, in real time. You can fetch the hardware attestation quote (TD Quote) and the full software manifest (IMA report) for the node running your workload at any time.
What is the basis of trust?
Trust in a brand, its marketing, and its legal documents (privacy policy).
Cryptographic proof. The entire system is built on the principle of "don't trust, verify," from the hardware up to the application code.
Why TEEs Alone Are Not Enough: Chutes' Holistic Security Philosophy
While Trusted Execution Environments (TEEs) provide groundbreaking hardware-level isolation, it is crucial to understand that they are not a silver bullet. Relying solely on TEEs can create a false sense of security, as several attack vectors remain unaddressed. Chutes' approach is built on the understanding that true security requires a holistic, multi-layered strategy that integrates hardware TEEs with robust software validation, continuous monitoring, and radical transparency.
Here's why TEEs alone are insufficient and how Chutes addresses these gaps:
The Insider Threat: What Good is a Black Box if the Code Inside is Malicious? A TEE's primary function is to protect a workload from a compromised host. It creates a "black box" where the CPU prevents the host OS from snooping on the code's memory. However, the TEE itself does not know if the code it is executing is malicious. For example, a malicious operator could create a chute that perfectly mimics a legitimate LLM service, but adds one extra line of code: log_file.write(user_prompt). The TEE will dutifully run this code and protect it from the host, but it will also faithfully execute the instruction to log the user's private data. Without a mechanism to verify the integrity of the code inside the TEE, the user has no guarantee against this kind of insider attack.
Chutes' Mitigation: This is precisely why our software validation stack is not just an add-on, but an essential component of TEE security. A TEE's job is to protect data in use; Chutes' job is to verify the code that uses it.
Verified Code: Our rigorous forge process (inspecto, cfsv, trivy) and cosign image signing guarantee that the code running inside the TEE is the exact, untampered code the user expects. The malicious prompt-logging chute would never be deployed because its inspecto hash would not match the source-of-truth, and its image would not have a valid signature.
Continuous Checks: Even if an attacker found a novel way to modify the code after launch (a hypothetical scenario, as this is blocked by multiple layers), the watchtower's continuous and random inspecto and cfsv challenges would immediately detect the modification.
Configurable Egress Control: As a final defense, the chutes-net-nanny, while optional, is typically enabled to block all outbound network traffic, preventing a malicious chute from "phoning home" with stolen data.
TEEs Are Not Immune to Vulnerabilities: Hardware is not perfect, and TEE implementations have historically had, and will likely continue to have, their own vulnerabilities and zero-days. Exploiting a TEE vulnerability could potentially allow an attacker to break isolation or extract keys.
Chutes' Mitigation: Our multi-layered approach means that even if a TEE vulnerability were to be discovered, the attacker would still face significant hurdles. The external network lockdown by chutes-net-nanny and sek8s network policies would prevent command-and-control communication or data exfiltration. The continuous cfsv and inspecto checks would detect tampering. The IMA manifests provide a real-time audit trail. These redundant layers reduce the blast radius of any single point of failure.
Lack of Visibility and Trust: While TEEs provide a "black box," this can ironically lead to a lack of verifiable trust for external observers. How can a user be sure that the code inside the black box is indeed what it claims to be, or that the attestation process itself isn't being spoofed?
Chutes' Mitigation: Our commitment to "Radical Verifiability" addresses this head-on. By providing real-time, public access to hardware attestation reports (TD Quotes, NVIDIA attestations) and full software manifests (IMA), Chutes enables any third-party observer to independently verify the integrity of the environment and the running code. This transparency transforms the "black box" into a cryptographically transparent, auditable compute environment.
In summary, while Intel TDX and NVIDIA PPCIE provide essential hardware roots of trust, Chutes understands that a truly secure confidential computing platform must go further. By combining these advanced hardware technologies with a comprehensive, open-source-auditable software stack and a commitment to radical verifiability, Chutes delivers a level of integrity and confidence that far exceeds what TEEs alone can offer.
Openness and Radical Verifiability
A core tenet of the Chutes security model is that you should not have to trust us blindly. We believe that verifiability means nothing unless you have something to verify against. Cryptographic reports are only meaningful if you can compare them to a known-good, publicly auditable baseline.
Open Source as the Foundation of Trust: The core logic for the validator (chutes-api), the miner deployment engine (chutes-miner), the client library (chutes), and the entire TEE environment (sek8s) are publicly available on GitHub. This is not just a philosophical choice; it is a security necessity. Our open-source repositories define the "golden state"—the exact configuration, software components, and measurements that our attestation reports should reflect. Without this public baseline, our claims of verifiability would be empty.
Real-time, Public Attestation: We are building on this foundation to provide radical transparency. For any chute running on the network, at any time, anyone will be able to query:
The Full Attestation Report: You can request the latest TD Quote and NVIDIA attestation report directly from the node the chute is running on. You can then independently verify the hardware signatures and, most importantly, compare the software measurements (RTMRs) against the configuration defined in the open-source sek8s repository.
The Full Software Manifest: We use the Integrity Measurement Architecture (IMA) of the Linux kernel to generate a signed manifest of every single file, library, and package on the filesystem. This manifest is included in the attestation report's measurements. You can fetch this manifest and compare it against the public sek8s build to prove that not a single file has been added, removed, or altered.
This ability for any third party to independently and cryptographically verify the integrity of any node on the network against a public, open-source codebase is the ultimate expression of our "don't trust, verify" principle. It provides a level of provable security that is unparalleled in public compute platforms.
5. Attack Vectors and Mitigations
To make the security guarantees of the Chutes platform more concrete, this section enumerates common attack vectors and details how they are mitigated by the platform's security layers.
Attack Vector
Description
Standard Mitigation (All Chutes)
TEE (sek8s) Mitigation (Enhanced Protection)
Code Tampering
A malicious miner modifies the chute's source code to steal data, alter results, or introduce a backdoor.
inspecto: At startup, generates a hash of all Python bytecode, which is validated against a source-of-truth hash from the image build. Any modification is immediately detected.
cosign Admission Controller: The Kubernetes API server flatly refuses to run any image that does not have a valid cryptographic signature from the Chutes build system (forge).
Immutable Filesystem: The container's root filesystem is read-only.
Filesystem Tampering
The miner modifies system libraries, Python packages, or other files within the container to compromise the environment.
cfsv: At startup and on-demand, performs a challenge-response protocol to verify the integrity of the entire filesystem against a source-of-truth index created at build time.
Measured & Encrypted Root FS: The entire host filesystem for the confidential VM is measured at boot and encrypted. Attestation will fail if a single byte is changed, and the disk decryption key will not be released, rendering the node inert.
Model Substitution / Weight Tampering
A miner uses a cheaper, quantized, or "lobotomized" model while claiming to run the full-precision version specified by the user.
watchtower: Can issue a random challenge at any time, requiring the miner to hash a specific slice of the model weight files on disk. This defeats "bait-and-switch" attacks.
cllmv: Cryptographically binds the model name and revision hash to the per-token output proofs.
These software-level checks are the primary defense and are augmented by the TEE's general isolation, which prevents an attacker from tampering with the validation tools themselves.
Data-in-Transit Interception
An attacker on the same network as the miner or validator attempts to read or modify API requests.
End-to-End Encryption: All communication between the validator and the miner is encrypted using a symmetric AES-256 key negotiated during the graval-priv hardware attestation handshake.
Hardware-Enforced Isolation: The TLS session for communication is terminated inside the confidential VM. An attacker on the host cannot intercept the unencrypted traffic, as the host OS has no access to the TD's memory or network stack.
Data-in-Use / Memory Snooping
The miner (or an attacker who has compromised the host OS) attempts to read the memory of the running chute to steal user data, prompts, or model weights.
Process Isolation: Standard OS-level process isolation is used. This does not protect against a root-level attacker on the host.
Intel® TDX Memory Encryption: The entire memory space of the confidential VM is encrypted by the CPU. The host OS and hypervisor only see ciphertext. It is physically impossible for the host to read the chute's memory.
GPU Bus Snooping
An attacker with physical access or high-level host compromise uses specialized tools to read data as it travels over the PCIe bus between the CPU and the GPU.
(No specific mitigation for this advanced attack).
NVIDIA Protected PCIe (PPCIE): The link between the CPU and GPU is fully encrypted. All data and models sent to GPU VRAM are protected from snooping attacks on the PCIe bus.
Pod Breakout / Host Compromise
A process inside the chute container attempts to escape its container and gain access to the host operating system.
chutes-net-nanny: Intercepts system calls and intentionally segfaults any process that attempts to exec into the pod, attach a debugger, or otherwise interact with processes outside its own tree.
Restrictive K8s Config: Pods are run with a restrictive securityContext, as non-root users, and with privilege escalation disabled.
Hypervisor Isolation: The chute runs inside a completely separate, hardware-isolated confidential VM (the Trust Domain). A pod breakout would only grant access to the inside of the TD, which has no access to the host system or other TDs.
Malicious Network Activity / Data Exfiltration
A compromised or malicious chute attempts to send user data to an attacker-controlled server on the internet.
chutes-net-nanny: By default, all outbound network traffic is blocked, except to the Chutes validator proxy. Egress can only be enabled on a per-chute basis by the chute's owner.
sek8s Network Policies: In addition to net-nanny, sek8s enforces strict, default-deny Kubernetes network policies at the infrastructure level, providing a second layer of egress control.
Attestation Forgery / Impersonation
A malicious miner tries to fake its hardware or software environment to trick the validator into accepting it.
graval-priv: Uses a GPU-specific, hardware-based challenge-response mechanism that is difficult to fake without access to the specific GPU hardware.
Continuous Monitoring: The watchtower performs random, on-demand checks.
Hardware-Signed Quotes: Attestation is not a software proof; it's a cryptographic report (TD Quote) signed by a private key fused into the CPU hardware. This signature is verifiable and cannot be forged. The use of a nonce from the validator prevents replay attacks.
GPU Fraud / Misrepresentation
A miner claims to have a powerful, expensive GPU (e.g., an H100) to attract high-value workloads but actually runs the computation on a cheaper, slower GPU (e.g., a T4).
graval-priv: This non-TEE attestation serves as a hardware benchmark. The "Proof of Consecutive VRAM Work" (consecutive matrix multiplications) cryptographically proves the GPU's actual processing speed and VRAM amount. The time it takes to return the proof is a key part of the validation, making it impossible for a slow GPU to fake the performance of a fast one.
NVIDIA Hardware Attestation: In a TEE, this is augmented by the signed attestation report from the GPU itself. This report, verifiable by the validator, contains the true identity of the GPU (e.g., "NVIDIA H100"), providing a second, hardware-rooted proof that prevents misrepresentation.
Rollback Attacks
An attacker tries to force a miner to run an old, known-vulnerable version of a chute image.
Validator State: The validator (chutes-api) is the source of truth for which chute versions are valid. gepetto will refuse to run any version not explicitly approved by the validator.
cosign Verification: The admission controller verifies the image signature against the latest trusted keys. An older image might be signed, but could be blocked by other policy if vulnerabilities are discovered.
6. Case Study: End-to-End SGLang LLM Request in a TEE
To demonstrate how these layers work together, let's walk through the entire lifecycle of a request to a Large Language Model (LLM) running in an SGLang chute, deployed inside a sek8s TEE.
Stage 0: Pre-Flight Verification
A skeptical user, before spending any money, visits the chute's public page on chutes.ai. In the "Source" tab, they verify the exact configuration: the Hugging Face model (meta-llama/Llama-2-70b-chat-hf) and revision (a1b2c3d...), the precise engine_args used to launch SGLang, the lack of any quantization flags, and the open-source chute code itself. This provides a verifiable baseline for what to expect.
Stage 1: The Build - Creating Verifiable Truth
Image Creation: The chutes-apiforge service picks up the chute definition. It builds a new container image.
Baseline Hashes: During the build, cfsv and inspecto are run inside the container to generate the "source-of-truth" hashes for the filesystem and Python bytecode.
Signing: The final image is pushed to the registry and cryptographically signed with cosign.
Stage 2: The Deployment - Attestation Before Execution
Deployment Request: The user decides to run the chute. gepetto identifies a TEE-capable server running sek8s.
Hardware Attestation: Before deploying, the chutes-api validator initiates remote attestation. The sek8s node returns a TD Quote signed by the CPU's hardware key and an NVIDIA GPU report.
Verification: The validator verifies the signatures and compares the measurements in the quotes to the "golden" sek8s configuration. Attestation passes, proving the node is genuine and untampered.
Launch Authorization:gepetto receives a single-use JWT launch token from the validator.
Kubernetes Deployment:gepetto creates the Kubernetes Job object. The sek8s admission controller verifies the cosign signature on the image and allows the pod to be scheduled.
Stage 3: The Launch - A Chained Sequence of Checks
Secure Startup: The pod starts, and the chutes/entrypoint/run.py script executes.
Validation Handshake: The entrypoint uses its JWT to open a dialogue with the validator, sending its cfsv and inspecto hashes. The validator confirms they match the build-time hashes.
Symmetric Key: With all checks passed, the validator sends the ephemeral AES symmetric key to the chute. The GraValMiddleware is now active.
SGLang Initialization: The chute's @chute.on_startup() function is called. The script downloads the specific, revision-locked Llama-2-70B model and starts the sglang.launch_server process. Importantly, the sglang server is launched with a password and strictly binds only to the loopback interface (127.0.0.1). This means no external process can directly connect to the SGLang server; all communication must be securely routed through the Chutes library's proxy.
Activation & Lockdown: The SGLang server is ready. The entrypoint calls the activation_url, and netnanny permanently disables external network access (if configured).
Stage 4: The Inference Request & Continuous Verification
User Request: A user sends a prompt: POST /v1/chat/completions.
Encrypted Forward & Decryption: The request is encrypted, sent to the miner, and decrypted inside the Intel TDX Trust Domain. The host OS sees only ciphertext.
Secure Inference: The prompt is processed by the LLM on the GPU. The data is protected by TDX on the CPU and by NVIDIA PPCIE on the PCIe bus.
Runtime Check (Optional): At this very moment, the watchtower could issue a random challenge, demanding the chute hash a slice of the Llama-2 model weights on disk to prove they haven't been swapped post-launch.
Verified Output: As the LLM generates tokens, cllmv generates verification hashes for the output, cryptographically binding the response to the meta-llama/Llama-2-70b-chat-hf model and revision a1b2c3d... that the user originally inspected.
Encrypted Response: The final response is encrypted by the GraValMiddlewareinside the TD and sent back to the user.
From start to finish, the user's data has been protected by multiple, overlapping layers of hardware and software security. At the most critical stage—when the data is in use—it is inside a hardware-enforced black box, invisible even to the owner of the machine it's running on.