Chute API Reference

The class is the core component of the Chutes framework, representing a deployable AI application unit. This reference covers all methods, properties, and configuration options.

Class Definition

from chutes.chute import Chute

chute = Chute(
    username: str,
    name: str,
    image: str | Image,
    tagline: str = "",
    readme: str = "",
    standard_template: str = None,
    revision: str = None,
    node_selector: NodeSelector = None,
    concurrency: int = 1,
    **kwargs
)

Constructor Parameters

Required Parameters

The username or organization name for the chute deployment.

Example:

chute = Chute(username="mycompany", name="ai-service", image="python:3.11")

Rules:

Must be lowercase alphanumeric with hyphens
Cannot start or end with hyphen
Maximum 63 characters
Must be unique within the platform

The name of the chute application.

Example:

chute = Chute(username="mycompany", name="text-generator", image="python:3.11")

Rules:

Must be lowercase alphanumeric with hyphens
Cannot start or end with hyphen
Maximum 63 characters
Must be unique within the username namespace

Docker image for the chute runtime environment (required).

Example:

# Using a string reference
chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11"
)

# Using a custom Image object
from chutes.image import Image
custom_image = Image(username="mycompany", name="custom-ai", tag="1.0")
chute = Chute(
    username="mycompany",
    name="text-generator",
    image=custom_image
)

Optional Parameters

A brief description of what the chute does.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    tagline="Advanced text generation with GPT models"
)

Best Practices:

Keep under 100 characters
Use present tense
Be descriptive but concise

Detailed documentation for the chute in Markdown format.

Example:

readme = """
# Text Generation API

This chute provides advanced text generation capabilities using state-of-the-art language models.

## Features
- Multiple model support
- Customizable parameters
- High-performance inference
- Real-time streaming

## Usage
Send a POST request to `/generate` with your prompt.
"""

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    readme=readme
)

Reference to a standard template to use as a base for the chute.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    standard_template="vllm"
)

Specific revision or version identifier for the chute.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    revision="v1.2.0"
)

Hardware requirements and preferences for the chute.

Example:

from chutes.chute import NodeSelector

node_selector = NodeSelector(
    gpu_count=2,
    min_vram_gb_per_gpu=24,
    include=["h100", "a100"],
    exclude=["t4"]
)

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    node_selector=node_selector
)

Maximum number of concurrent requests the chute can handle.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    concurrency=8  # Handle up to 8 concurrent requests
)

Guidelines:

Higher concurrency requires more memory
Consider model size and GPU memory
Typical values: 1-16 for most applications

Additional keyword arguments passed to the underlying FastAPI application.

Example:

chute = Chute(
    username="mycompany",
    name="text-generator",
    image="python:3.11",
    title="My AI API",  # FastAPI title
    description="Custom AI service",  # FastAPI description
    version="1.0.0"  # API version
)

Decorators and Methods

Lifecycle Decorators

Decorator for functions to run during chute startup.

Signature:

@chute.on_startup()
async def initialization_function(self) -> None:
    """Function to run on startup."""
    pass

Example:

@chute.on_startup()
async def load_model(self):
    """Load the AI model during startup."""
    from transformers import AutoTokenizer, AutoModelForCausalLM

    self.tokenizer = AutoTokenizer.from_pretrained("gpt2")
    self.model = AutoModelForCausalLM.from_pretrained("gpt2")

    print("Model loaded successfully")

Use Cases:

Load AI models
Initialize databases
Set up caches
Configure services
Load configuration

Best Practices:

Use async functions for I/O operations
Add error handling
Log initialization steps
Keep startup time reasonable

Decorator for functions to run during chute shutdown.

Signature:

@chute.on_shutdown()
async def cleanup_function(self) -> None:
    """Function to run on shutdown."""
    pass

Example:

@chute.on_shutdown()
async def cleanup_resources(self):
    """Clean up resources during shutdown."""
    if hasattr(self, 'model'):
        del self.model

    if hasattr(self, 'database_connection'):
        await self.database_connection.close()

    print("Resources cleaned up")

Use Cases:

Free memory (unload models)
Close database connections
Save state
Clean temporary files
Log shutdown events

API Endpoint Decorators

Decorator to create HTTP API endpoints.

Signature:

@chute.cord(
    public_api_path: str,
    method: str = "POST",
    input_schema: Optional[Type[BaseModel]] = None,
    output_schema: Optional[Type[BaseModel]] = None,
    minimal_input_schema: Optional[Type[BaseModel]] = None,
    output_content_type: str = "application/json",
    stream: bool = False,
    public_api_method: Optional[str] = None  # Deprecated: use 'method'
)

Parameters:

- URL path for the endpoint (required)
- HTTP method (GET, POST, PUT, DELETE, etc.)
- Pydantic schema for input validation
- Pydantic schema for output validation
- Simplified schema for documentation
- Response content type
- Enable streaming responses

Basic Example:

@chute.cord(public_api_path="/generate", method="POST")
async def generate_text(self, prompt: str) -> str:
    """Generate text from a prompt."""
    return await self.model.generate(prompt)

Advanced Example with Schemas:

from pydantic import BaseModel, Field

class GenerationInput(BaseModel):
    prompt: str = Field(..., description="Input prompt")
    max_tokens: int = Field(100, ge=1, le=1000)
    temperature: float = Field(0.7, ge=0.0, le=2.0)

class GenerationOutput(BaseModel):
    generated_text: str = Field(..., description="Generated text")
    tokens_used: int = Field(..., description="Number of tokens used")

@chute.cord(
    public_api_path="/generate",
    method="POST",
    input_schema=GenerationInput,
    output_schema=GenerationOutput
)
async def generate_text(self, params: GenerationInput) -> GenerationOutput:
    """Generate text with parameters."""
    result = await self.model.generate(
        params.prompt,
        max_tokens=params.max_tokens,
        temperature=params.temperature
    )

    return GenerationOutput(
        generated_text=result,
        tokens_used=len(result.split())
    )

Non-JSON Responses:

from fastapi import Response

@chute.cord(
    public_api_path="/generate_image",
    method="POST",
    output_content_type="image/png"
)
async def generate_image(self, prompt: str) -> Response:
    """Generate an image and return as PNG."""
    image_data = await self.model.generate_image(prompt)

    return Response(
        content=image_data,
        media_type="image/png",
        headers={"Content-Disposition": "inline; filename=generated.png"}
    )

Streaming Example:

from fastapi.responses import StreamingResponse

@chute.cord(
    public_api_path="/stream_generate",
    method="POST",
    stream=True
)
async def stream_generate(self, prompt: str):
    """Stream text generation token by token."""

    async def generate_tokens():
        async for token in self.model.stream_generate(prompt):
            yield f"data: {json.dumps({'token': token})}\n\n"

    return StreamingResponse(
        generate_tokens(),
        media_type="text/event-stream"
    )

Decorator to create background job handlers.

Signature:

@chute.job(
    name: Optional[str] = None,
    schedule: Optional[str] = None,
    retry_count: int = 3,
    timeout: Optional[int] = None
)

Parameters:

- Job name (defaults to function name)
- Cron schedule for recurring jobs
- Number of retry attempts on failure
- Job timeout in seconds

Basic Job Example:

@chute.job(name="data_processing")
async def process_data(self, data: dict) -> dict:
    """Process data in the background."""

    # Simulate processing
    await asyncio.sleep(5)

    return {
        "status": "completed",
        "processed_items": len(data.get("items", []))
    }

Scheduled Job Example:

@chute.job(
    name="daily_cleanup",
    schedule="0 2 * * *"  # Run daily at 2 AM
)
async def daily_cleanup(self):
    """Daily cleanup job."""

    # Clean up temporary files
    temp_files_cleaned = await self.cleanup_temp_files()

    # Clear old cache entries
    cache_entries_cleared = await self.clear_old_cache()

    return {
        "temp_files_cleaned": temp_files_cleaned,
        "cache_entries_cleared": cache_entries_cleared
    }

Job with Retry:

@chute.job(
    name="external_api_call",
    retry_count=5,
    timeout=30
)
async def call_external_api(self, endpoint: str, data: dict):
    """Call external API with retry logic."""

    import aiohttp

    async with aiohttp.ClientSession() as session:
        async with session.post(endpoint, json=data) as response:
            if response.status != 200:
                raise Exception(f"API call failed: {response.status}")

            return await response.json()

Decorator to create WebSocket endpoints.

Signature:

@chute.websocket(path: str)

Example:

from fastapi import WebSocket

@chute.websocket("/chat")
async def websocket_chat(self, websocket: WebSocket):
    """WebSocket endpoint for real-time chat."""

    await websocket.accept()

    try:
        while True:
            # Receive message
            message = await websocket.receive_text()

            # Process with AI model
            response = await self.model.generate(message)

            # Send response
            await websocket.send_text(response)

    except Exception as e:
        print(f"WebSocket error: {e}")
    finally:
        await websocket.close()

Properties and Attributes

Runtime Properties

The underlying FastAPI application instance.

Type:

Example:

# Access the FastAPI app directly
@chute.on_startup()
async def configure_app(self):
    # Add custom middleware
    from fastapi.middleware.cors import CORSMiddleware

    self.app.add_middleware(
        CORSMiddleware,
        allow_origins=["*"],
        allow_methods=["*"],
        allow_headers=["*"]
    )

    # Add custom route
    @self.app.get("/custom")
    async def custom_endpoint():
        return {"message": "Custom endpoint"}

Built-in logger for the chute.

Type:

Example:

@chute.on_startup()
async def setup_logging(self):
    self.logger.info("Chute startup initiated")

@chute.cord(public_api_path="/generate")
async def generate(self, prompt: str):
    self.logger.info(f"Generating text for prompt: {prompt[:50]}...")

    try:
        result = await self.model.generate(prompt)
        self.logger.info("Text generation completed successfully")
        return result
    except Exception as e:
        self.logger.error(f"Text generation failed: {e}")
        raise

Configuration Properties

Access to chute configuration.

Example:

@chute.on_startup()
async def log_config(self):
    self.logger.info(f"Chute name: {self.config.name}")
    self.logger.info(f"Concurrency: {self.config.concurrency}")
    self.logger.info(f"Username: {self.config.username}")

Method Reference

Utility Methods

Run the chute locally for development and testing.

Signature:

def run_local(
    host: str = "0.0.0.0",
    port: int = 8000,
    reload: bool = True,
    log_level: str = "info"
) -> None

Example:

if __name__ == "__main__":
    chute.run_local(
        host="127.0.0.1",
        port=8080,
        reload=True,
        log_level="debug"
    )

Parameters:

- Host to bind to
- Port to listen on
- Enable auto-reload on code changes
- Logging level (debug, info, warning, error)

Built-in health check method.

Signature:

async def health_check(self) -> dict

Example:

# Access health check programmatically
health_status = await chute.health_check()
print(health_status)
# Output: {"status": "healthy", "timestamp": "2024-01-01T12:00:00Z"}

Custom Health Checks:

@chute.on_startup()
async def setup_custom_health_check(self):
    """Override default health check."""

    async def custom_health_check():
        try:
            # Check model availability
            test_result = await self.model.generate("test", max_tokens=1)

            # Check external dependencies
            # ... additional checks ...

            return {
                "status": "healthy",
                "model_loaded": True,
                "timestamp": datetime.now().isoformat()
            }
        except Exception as e:
            return {
                "status": "unhealthy",
                "error": str(e),
                "timestamp": datetime.now().isoformat()
            }

    # Replace default health check
    self.health_check = custom_health_check

Advanced Usage Patterns

State Management

@chute.on_startup()
async def initialize_state(self):
    """Initialize application state."""

    # Model state
    self.model = None
    self.model_loaded = False

    # Cache state
    self.cache = {}
    self.cache_hits = 0
    self.cache_misses = 0

    # Request tracking
    self.request_count = 0
    self.error_count = 0

@chute.cord(public_api_path="/generate")
async def generate_with_tracking(self, prompt: str):
    """Generate text with state tracking."""

    self.request_count += 1

    try:
        # Check cache
        cache_key = hash(prompt)
        if cache_key in self.cache:
            self.cache_hits += 1
            return self.cache[cache_key]

        # Generate new result
        result = await self.model.generate(prompt)

        # Update cache
        self.cache[cache_key] = result
        self.cache_misses += 1

        return result

    except Exception as e:
        self.error_count += 1
        raise

@chute.cord(public_api_path="/stats", method="GET")
async def get_stats(self):
    """Get application statistics."""
    return {
        "requests": self.request_count,
        "errors": self.error_count,
        "cache_hits": self.cache_hits,
        "cache_misses": self.cache_misses,
        "cache_hit_rate": self.cache_hits / (self.cache_hits + self.cache_misses) if (self.cache_hits + self.cache_misses) > 0 else 0
    }

Dependency Injection

from typing import Protocol

class ModelInterface(Protocol):
    async def generate(self, prompt: str) -> str: ...

class CacheInterface(Protocol):
    async def get(self, key: str) -> Optional[str]: ...
    async def set(self, key: str, value: str) -> None: ...

@chute.on_startup()
async def setup_dependencies(self):
    """Set up dependency injection."""

    # Create implementations
    from my_models import GPTModel
    from my_cache import RedisCache

    self.model: ModelInterface = GPTModel()
    self.cache: CacheInterface = RedisCache()

    # Initialize
    await self.model.load()
    await self.cache.connect()

@chute.cord(public_api_path="/generate")
async def generate_with_deps(self, prompt: str):
    """Generate using injected dependencies."""

    # Check cache
    cached = await self.cache.get(prompt)
    if cached:
        return cached

    # Generate
    result = await self.model.generate(prompt)

    # Cache result
    await self.cache.set(prompt, result)

    return result

Error Handling

from fastapi import HTTPException

@chute.on_startup()
async def setup_error_handling(self):
    """Set up centralized error handling."""

    @self.app.exception_handler(Exception)
    async def global_exception_handler(request, exc):
        self.logger.error(f"Unhandled exception: {exc}")

        return JSONResponse(
            status_code=500,
            content={
                "error": "Internal server error",
                "request_id": getattr(request.state, "request_id", None)
            }
        )

@chute.cord(public_api_path="/generate")
async def generate_with_error_handling(self, prompt: str):
    """Generate with comprehensive error handling."""

    try:
        # Validate input
        if not prompt or len(prompt.strip()) == 0:
            raise HTTPException(
                status_code=400,
                detail="Prompt cannot be empty"
            )

        if len(prompt) > 10000:
            raise HTTPException(
                status_code=400,
                detail="Prompt too long (max 10,000 characters)"
            )

        # Generate
        result = await self.model.generate(prompt)

        if not result:
            raise HTTPException(
                status_code=500,
                detail="Model failed to generate output"
            )

        return {"generated_text": result}

    except HTTPException:
        raise  # Re-raise HTTP exceptions
    except Exception as e:
        self.logger.error(f"Generation failed: {e}")
        raise HTTPException(
            status_code=500,
            detail="Text generation failed"
        )

Best Practices

Performance Optimization

import asyncio
from functools import lru_cache

@chute.on_startup()
async def optimize_performance(self):
    """Set up performance optimizations."""

    # Connection pooling
    self.session = aiohttp.ClientSession()

    # Model batching
    self.batch_queue = asyncio.Queue()
    self.batch_size = 8

    # Start batch processor
    asyncio.create_task(self.process_batches())

@lru_cache(maxsize=1000)
def expensive_computation(self, input_data: str) -> str:
    """Cache expensive computations."""
    # Expensive operation here
    return result

async def process_batches(self):
    """Process requests in batches for efficiency."""
    while True:
        batch = []

        # Collect batch
        try:
            for _ in range(self.batch_size):
                item = await asyncio.wait_for(
                    self.batch_queue.get(),
                    timeout=0.1
                )
                batch.append(item)
        except asyncio.TimeoutError:
            pass

        if batch:
            # Process batch
            results = await self.model.generate_batch([item['prompt'] for item in batch])

            # Return results
            for item, result in zip(batch, results):
                item['future'].set_result(result)

@chute.cord(public_api_path="/generate_batched")
async def generate_batched(self, prompt: str):
    """Generate using batched processing."""

    future = asyncio.Future()

    await self.batch_queue.put({
        'prompt': prompt,
        'future': future
    })

    result = await future
    return {"generated_text": result}

Security Best Practices

import hashlib
import hmac
from fastapi import Request, HTTPException

@chute.on_startup()
async def setup_security(self):
    """Set up security measures."""

    # API key validation
    self.valid_api_keys = {"your-secret-api-key"}

    # Rate limiting
    self.rate_limits = {}
    self.rate_limit_window = 60  # seconds
    self.rate_limit_max = 100   # requests per window

async def validate_api_key(self, request: Request):
    """Validate API key from request."""

    api_key = request.headers.get("X-API-Key")
    if not api_key or api_key not in self.valid_api_keys:
        raise HTTPException(
            status_code=401,
            detail="Invalid or missing API key"
        )

async def check_rate_limit(self, request: Request):
    """Check rate limiting."""

    client_ip = request.client.host
    now = time.time()

    # Clean old entries
    self.rate_limits = {
        ip: times for ip, times in self.rate_limits.items()
        if any(t > now - self.rate_limit_window for t in times)
    }

    # Check current IP
    if client_ip not in self.rate_limits:
        self.rate_limits[client_ip] = []

    recent_requests = [
        t for t in self.rate_limits[client_ip]
        if t > now - self.rate_limit_window
    ]

    if len(recent_requests) >= self.rate_limit_max:
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded"
        )

    # Record current request
    self.rate_limits[client_ip].append(now)

@chute.cord(public_api_path="/secure_generate")
async def secure_generate(self, prompt: str, request: Request):
    """Secure generation endpoint."""

    # Security checks
    await self.validate_api_key(request)
    await self.check_rate_limit(request)

    # Input sanitization
    prompt = prompt.strip()
    if len(prompt) > 5000:
        raise HTTPException(400, "Prompt too long")

    # Generate
    result = await self.model.generate(prompt)

    return {"generated_text": result}

Common Patterns and Examples

Multi-Model Chute

@chute.on_startup()
async def load_multiple_models(self):
    """Load multiple models for different tasks."""

    self.text_model = TextModel()
    self.image_model = ImageModel()
    self.embedding_model = EmbeddingModel()

    await asyncio.gather(
        self.text_model.load(),
        self.image_model.load(),
        self.embedding_model.load()
    )

@chute.cord(public_api_path="/generate_text")
async def generate_text(self, prompt: str):
    return await self.text_model.generate(prompt)

@chute.cord(public_api_path="/generate_image")
async def generate_image(self, prompt: str):
    return await self.image_model.generate(prompt)

@chute.cord(public_api_path="/embed_text")
async def embed_text(self, text: str):
    return await self.embedding_model.embed(text)

Template-Based Chute

from chutes.templates import build_vllm_chute

# Use template as base
chute = build_vllm_chute(
    username="mycompany",
    name="custom-llm",
    model_name="gpt2"
)

# Add custom functionality
@chute.cord(public_api_path="/custom_generate")
async def custom_generate(self, prompt: str, style: str = "formal"):
    """Custom generation with style control."""

    style_prompts = {
        "formal": "Please respond in a formal tone: ",
        "casual": "Please respond casually: ",
        "technical": "Please provide a technical explanation: "
    }

    styled_prompt = style_prompts.get(style, "") + prompt

    result = await self.generate(styled_prompt)
    return {"generated_text": result, "style": style}

This comprehensive API reference covers all aspects of the Chute class. For specific implementation examples, see the Examples section and Templates Guide.

Chute API Reference

Class Definition

Constructor Parameters

Required Parameters

username: str

name: str

image: str | Image

Optional Parameters

tagline: str = ""

readme: str = ""

standard_template: str = None

revision: str = None

node_selector: NodeSelector = None

concurrency: int = 1

**kwargs

Decorators and Methods

Lifecycle Decorators

@chute.on_startup()

@chute.on_shutdown()

API Endpoint Decorators

@chute.cord()

@chute.job()

@chute.websocket()

Properties and Attributes

Runtime Properties

chute.app

chute.logger

Configuration Properties

chute.config

Method Reference

Utility Methods

chute.run_local()

chute.health_check()

Advanced Usage Patterns

State Management

Dependency Injection

Error Handling

Best Practices

Performance Optimization

Security Best Practices

Common Patterns and Examples

Multi-Model Chute

Template-Based Chute