Developers

Chute API Reference

The class is the core component of the Chutes framework, representing a deployable AI application unit. It extends FastAPI, so you can use all FastAPI features. This reference covers all methods, properties, and configuration options.

Class Definition

from chutes.chute import Chute

chute = Chute(
    username: str,
    name: str,
    image: str | Image,
    tagline: str = "",
    readme: str = "",
    standard_template: str = None,
    revision: str = None,
    node_selector: NodeSelector = None,
    concurrency: int = 1,
    max_instances: int = 1,
    shutdown_after_seconds: int = 300,
    scaling_threshold: float = 0.75,
    allow_external_egress: bool = False,
    encrypted_fs: bool = False,
    passthrough_headers: dict = {},
    tee: bool = False,
    **kwargs
)

Constructor Parameters

Required Parameters

The username or organization name for the chute deployment.

Example:

chute = Chute(username="mycompany", name="ai-service", image="parachutes/python:3.12")

The name of the chute application.

Example:

chute = Chute(username="mycompany", name="text-generator", image="parachutes/python:3.12")

Docker image for the chute runtime environment (required).

Example:

# Using a string reference to a pre-built image
chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12"
)

# Using a custom Image object
from chutes.image import Image
custom_image = Image(username="mycompany", name="custom-ai", tag="1.0")
chute = Chute(
    username="mycompany",
    name="text-generator",
    image=custom_image
)

Optional Parameters

A brief description of what the chute does.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    tagline="Advanced text generation with GPT models"
)

Detailed documentation for the chute in Markdown format.

Example:

readme = """
# Text Generation API

This chute provides advanced text generation capabilities.

## Features
- Multiple model support
- Customizable parameters
- Real-time streaming
"""

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    readme=readme
)

Reference to a standard template (e.g., "vllm", "sglang", "diffusion").

Specific revision or version identifier for the chute.

Hardware requirements and preferences for the chute.

Example:

from chutes.chute import NodeSelector

node_selector = NodeSelector(
    gpu_count=2,
    min_vram_gb_per_gpu=24,
    include=["h100", "a100"],
    exclude=["t4"]
)

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    node_selector=node_selector
)

Maximum number of concurrent requests the chute can handle per instance.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    concurrency=8  # Handle up to 8 concurrent requests
)

Guidelines:

  • For vLLM/SGLang with continuous batching: 64-256
  • For single-request models (diffusion): 1
  • For models with some parallelism: 4-16

Maximum number of instances that can be scaled up.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    max_instances=10  # Scale up to 10 instances
)

Time in seconds to wait before shutting down an idle instance. Default is 5 minutes.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="parachutes/python:3.12",
    shutdown_after_seconds=600  # Shutdown after 10 minutes idle
)

Utilization threshold at which to trigger scaling (0.0 to 1.0).

Whether to allow external network connections after startup.

Important: By default, external network access is blocked after initialization. Set to if your chute needs to fetch external resources at runtime (e.g., image URLs for vision models).

Example:

# For vision language models that need to fetch images
chute = Chute(
    username="mycompany",
    name="vision-model",
    image="parachutes/python:3.12",
    allow_external_egress=True
)

Whether to use encrypted filesystem for the chute.

Headers to pass through to passthrough cord endpoints.

Whether this chute runs in a Trusted Execution Environment.

Additional keyword arguments passed to the underlying FastAPI application.

Decorators

Lifecycle Decorators

Decorator for functions to run during chute startup.

Signature:

@chute.on_startup(priority: int = 50)
async def initialization_function(self) -> None:
    """Function to run on startup."""
    pass

Parameters:

  • : Execution order (lower values execute first, default=50)
    • 0-20: Early initialization
    • 30-70: Normal operations
    • 80-100: Late initialization

Example:

@chute.on_startup(priority=10)  # Runs early
async def load_model(self):
    """Load the AI model during startup."""
    from transformers import AutoTokenizer, AutoModelForCausalLM

    self.tokenizer = AutoTokenizer.from_pretrained("gpt2")
    self.model = AutoModelForCausalLM.from_pretrained("gpt2")
    print("Model loaded successfully")

@chute.on_startup(priority=90)  # Runs late
async def log_startup(self):
    print("All initialization complete")

Use Cases:

  • Load AI models
  • Initialize databases
  • Set up caches
  • Configure services

Decorator for functions to run during chute shutdown.

Signature:

@chute.on_shutdown(priority: int = 50)
async def cleanup_function(self) -> None:
    """Function to run on shutdown."""
    pass

Example:

@chute.on_shutdown(priority=10)
async def cleanup_resources(self):
    """Clean up resources during shutdown."""
    if hasattr(self, 'model'):
        del self.model
    print("Resources cleaned up")

API Endpoint Decorator

Decorator to create HTTP API endpoints. See Cord Decorator Reference for detailed documentation.

Basic Example:

@chute.cord(public_api_path="/generate", public_api_method="POST")
async def generate_text(self, prompt: str) -> str:
    """Generate text from a prompt."""
    return await self.model.generate(prompt)

Job Decorator

Decorator to create long-running jobs or server rentals. See Job Decorator Reference for detailed documentation.

Basic Example:

from chutes.chute.job import Port

@chute.job(ports=[Port(name="web", port=8080, proto="http")], timeout=3600)
async def training_job(self, **job_data):
    """Long-running training job."""
    output_dir = job_data["output_dir"]
    # Perform training...
    return {"status": "completed"}

Properties

The name of the chute.

Type:

The unique identifier for the chute.

Type:

The readme/documentation for the chute.

Type:

The tagline for the chute.

Type:

The image configuration for the chute.

Type:

The hardware requirements for the chute.

Type:

The standard template name if using a template.

Type:

List of cord endpoints registered with the chute.

Type:

List of jobs registered with the chute.

Type:

Methods

Initialize the chute by running all startup hooks. Called automatically when the chute starts in remote context.

await chute.initialize()

FastAPI Integration

Since extends , you can use all FastAPI features directly:

Adding Middleware

from fastapi.middleware.cors import CORSMiddleware

@chute.on_startup()
async def setup_middleware(self):
    self.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_methods=["*"],
        allow_headers=["*"]
    )

Adding Custom Routes

@chute.on_startup()
async def add_custom_routes(self):
    @self.get("/custom")
    async def custom_endpoint():
        return {"message": "Custom endpoint"}

Using Dependencies

from fastapi import Depends, HTTPException

async def verify_token(token: str):
    if token != "secret":
        raise HTTPException(401, "Invalid token")
    return token

@chute.cord(public_api_path="/protected")
async def protected_endpoint(self, token: str = Depends(verify_token)):
    return {"message": "Protected content"}

Complete Example

from chutes.chute import Chute, NodeSelector
from chutes.image import Image
from pydantic import BaseModel, Field

# Define custom image
image = (
    Image(username="myuser", name="my-chute", tag="1.0")
    .from_base("parachutes/python:3.12")
    .run_command("pip install transformers torch")
)

# Define input/output schemas
class GenerationInput(BaseModel):
    prompt: str = Field(..., description="Input prompt")
    max_tokens: int = Field(100, ge=1, le=1000)

class GenerationOutput(BaseModel):
    text: str
    tokens_used: int

# Create chute
chute = Chute(
    username="myuser",
    name="text-generator",
    tagline="Generate text with transformers",
    readme="## Text Generator\n\nGenerates text from prompts.",
    image=image,
    node_selector=NodeSelector(
        gpu_count=1,
        min_vram_gb_per_gpu=16
    ),
    concurrency=4,
    max_instances=3,
    shutdown_after_seconds=300,
    allow_external_egress=False
)

@chute.on_startup()
async def load_model(self):
    """Load model during startup."""
    from transformers import pipeline
    self.generator = pipeline("text-generation", model="gpt2", device=0)
    print("Model loaded!")

@chute.cord(
    public_api_path="/generate",
    public_api_method="POST",
    minimal_input_schema=GenerationInput
)
async def generate(self, input_data: GenerationInput) -> GenerationOutput:
    """Generate text from a prompt."""
    result = self.generator(
        input_data.prompt,
        max_length=input_data.max_tokens
    )[0]["generated_text"]
    
    return GenerationOutput(
        text=result,
        tokens_used=len(result.split())
    )

@chute.cord(public_api_path="/health", public_api_method="GET")
async def health(self) -> dict:
    """Health check endpoint."""
    return {
        "status": "healthy",
        "model_loaded": hasattr(self, "generator")
    }

Best Practices

1. Use Appropriate Concurrency

# For LLMs with continuous batching
chute = Chute(..., concurrency=64)

# For single-request models
chute = Chute(..., concurrency=1)

2. Set Reasonable Shutdown Timers

# Development - short timeout
chute = Chute(..., shutdown_after_seconds=60)

# Production - longer timeout to avoid cold starts
chute = Chute(..., shutdown_after_seconds=300)

3. Use Type Hints and Schemas

from pydantic import BaseModel

class MyInput(BaseModel):
    text: str

@chute.cord(
    public_api_path="/process",
    minimal_input_schema=MyInput
)
async def process(self, data: MyInput) -> dict:
    return {"result": data.text.upper()}

4. Handle Errors Gracefully

from fastapi import HTTPException

@chute.cord(public_api_path="/generate")
async def generate(self, prompt: str):
    if not prompt.strip():
        raise HTTPException(400, "Prompt cannot be empty")
    
    try:
        return await self.model.generate(prompt)
    except Exception as e:
        raise HTTPException(500, f"Generation failed: {e}")

See Also