The Chute class is the core component of the Chutes framework, representing a deployable AI application unit. This reference covers all methods, properties, and configuration options.
Docker image for the chute runtime environment (required).
Example:
# Using a string reference
chute = Chute(
username="mycompany",
name="text-generator",
image="python:3.11"
)
# Using a custom Image objectfrom chutes.image import Image
custom_image = Image(username="mycompany", name="custom-ai", tag="1.0")
chute = Chute(
username="mycompany",
name="text-generator",
image=custom_image
)
Optional Parameters
tagline: str = ""
A brief description of what the chute does.
Example:
chute = Chute(
username="mycompany",
name="text-generator",
image="python:3.11",
tagline="Advanced text generation with GPT models"
)
Best Practices:
Keep under 100 characters
Use present tense
Be descriptive but concise
readme: str = ""
Detailed documentation for the chute in Markdown format.
Example:
readme = """
# Text Generation API
This chute provides advanced text generation capabilities using state-of-the-art language models.
## Features
- Multiple model support
- Customizable parameters
- High-performance inference
- Real-time streaming
## Usage
Send a POST request to `/generate` with your prompt.
"""
chute = Chute(
username="mycompany",
name="text-generator",
image="python:3.11",
readme=readme
)
standard_template: str = None
Reference to a standard template to use as a base for the chute.
Maximum number of concurrent requests the chute can handle.
Example:
chute = Chute(
username="mycompany",
name="text-generator",
concurrency=8# Handle up to 8 concurrent requests
)
Guidelines:
Higher concurrency requires more memory
Consider model size and GPU memory
Typical values: 1-16 for most applications
**kwargs
Additional keyword arguments passed to the underlying FastAPI application.
Example:
chute = Chute(
username="mycompany",
name="text-generator",
image="python:3.11",
title="My AI API", # FastAPI title
description="Custom AI service", # FastAPI description
version="1.0.0"# API version
)
Decorators and Methods
Lifecycle Decorators
@chute.on_startup()
Decorator for functions to run during chute startup.
Signature:
@chute.on_startup()asyncdefinitialization_function(self) -> None:
"""Function to run on startup."""pass
Example:
@chute.on_startup()asyncdefload_model(self):
"""Load the AI model during startup."""from transformers import AutoTokenizer, AutoModelForCausalLM
self.tokenizer = AutoTokenizer.from_pretrained("gpt2")
self.model = AutoModelForCausalLM.from_pretrained("gpt2")
print("Model loaded successfully")
Use Cases:
Load AI models
Initialize databases
Set up caches
Configure services
Load configuration
Best Practices:
Use async functions for I/O operations
Add error handling
Log initialization steps
Keep startup time reasonable
@chute.on_shutdown()
Decorator for functions to run during chute shutdown.
Signature:
@chute.on_shutdown()asyncdefcleanup_function(self) -> None:
"""Function to run on shutdown."""pass
Example:
@chute.on_shutdown()asyncdefcleanup_resources(self):
"""Clean up resources during shutdown."""ifhasattr(self, 'model'):
delself.model
ifhasattr(self, 'database_connection'):
awaitself.database_connection.close()
print("Resources cleaned up")
input_schema: Optional[Type[BaseModel]] - Pydantic schema for input validation
output_schema: Optional[Type[BaseModel]] - Pydantic schema for output validation
minimal_input_schema: Optional[Type[BaseModel]] - Simplified schema for documentation
output_content_type: str - Response content type
stream: bool = False - Enable streaming responses
Basic Example:
@chute.cord(public_api_path="/generate", method="POST")asyncdefgenerate_text(self, prompt: str) -> str:
"""Generate text from a prompt."""returnawaitself.model.generate(prompt)
Advanced Example with Schemas:
from pydantic import BaseModel, Field
classGenerationInput(BaseModel):
prompt: str = Field(..., description="Input prompt")
max_tokens: int = Field(100, ge=1, le=1000)
temperature: float = Field(0.7, ge=0.0, le=2.0)
classGenerationOutput(BaseModel):
generated_text: str = Field(..., description="Generated text")
tokens_used: int = Field(..., description="Number of tokens used")
@chute.cord(
public_api_path="/generate",
method="POST",
input_schema=GenerationInput,
output_schema=GenerationOutput
)asyncdefgenerate_text(self, params: GenerationInput) -> GenerationOutput:
"""Generate text with parameters."""
result = awaitself.model.generate(
params.prompt,
max_tokens=params.max_tokens,
temperature=params.temperature
)
return GenerationOutput(
generated_text=result,
tokens_used=len(result.split())
)
Non-JSON Responses:
from fastapi import Response
@chute.cord(
public_api_path="/generate_image",
method="POST",
output_content_type="image/png")asyncdefgenerate_image(self, prompt: str) -> Response:
"""Generate an image and return as PNG."""
image_data = awaitself.model.generate_image(prompt)
return Response(
content=image_data,
media_type="image/png",
headers={"Content-Disposition": "inline; filename=generated.png"}
)
Streaming Example:
from fastapi.responses import StreamingResponse
@chute.cord(
public_api_path="/stream_generate",
method="POST",
stream=True)asyncdefstream_generate(self, prompt: str):
"""Stream text generation token by token."""asyncdefgenerate_tokens():
asyncfor token inself.model.stream_generate(prompt):
yieldf"data: {json.dumps({'token': token})}\n\n"return StreamingResponse(
generate_tokens(),
media_type="text/event-stream"
)
from chutes.templates import build_vllm_chute
# Use template as base
chute = build_vllm_chute(
username="mycompany",
name="custom-llm",
model_name="gpt2"
)
# Add custom functionality@chute.cord(public_api_path="/custom_generate")asyncdefcustom_generate(self, prompt: str, style: str = "formal"):
"""Custom generation with style control."""
style_prompts = {
"formal": "Please respond in a formal tone: ",
"casual": "Please respond casually: ",
"technical": "Please provide a technical explanation: "
}
styled_prompt = style_prompts.get(style, "") + prompt
result = awaitself.generate(styled_prompt)
return {"generated_text": result, "style": style}
This comprehensive API reference covers all aspects of the Chute class. For specific implementation examples, see the Examples section and Templates Guide.