Frequently Asked Questions (FAQ)

Common questions and answers about Chutes SDK and platform.

General Questions

What is Chutes?

Chutes is a serverless AI compute platform that lets you deploy and scale AI models on GPU infrastructure without managing servers. You write Python code using our SDK, and we handle the infrastructure, scaling, and deployment.

Key benefits:

Deploy AI models in minutes, not hours
Pay only for actual compute time used
Automatic scaling from 0 to hundreds of instances
Access to latest GPU hardware (H100, A100, etc.)
No DevOps or Kubernetes knowledge required

How is Chutes different from other platforms?

Feature	Chutes	Traditional Cloud	Other AI Platforms
Setup Time	Minutes	Hours/Days	Hours
Scaling	Automatic (0→∞)	Manual	Limited
Pricing	Pay-per-use	Always-on	Subscription
GPU Access	Latest hardware	Limited selection	Restricted
Code Style	Simple Python	Complex configs	Platform-specific

Who should use Chutes?

Perfect for:

AI/ML engineers building production applications
Startups needing scalable AI infrastructure
Researchers requiring powerful GPU compute
Companies wanting serverless AI deployment

Use cases:

LLM chat applications
Image/video generation services
Real-time AI APIs
Batch processing workflows
Model inference at scale

Is Chutes suitable for production?

Yes! Chutes is designed for production workloads with:

99.9% uptime SLA
Enterprise security and compliance
Global edge deployment
Automatic failover and recovery
24/7 monitoring and support

Getting Started

How do I get started with Chutes?

Install the SDK
```
pip install chutes
```
Create account and authenticate
```
chutes auth login
```

Deploy your first chute

from chutes.chute import Chute

chute = Chute(username="myuser", name="hello-world")

@chute.cord(public_api_path="/hello")
async def hello():
    return {"message": "Hello, World!"}

chutes deploy

Do I need Docker experience?

No! Chutes handles containerization automatically. However, if you need custom dependencies, you can optionally use our class:

from chutes.image import Image

# Simple dependency installation
image = Image(
    base_image="nvidia/cuda:12.1-runtime-ubuntu22.04"
).run([
    "pip install transformers torch"
])

chute = Chute(
    username="myuser",
    name="my-app",
    image=image
)

What programming languages are supported?

Currently, Chutes supports Python only. We're considering other languages based on user demand.

Python versions supported:

Python 3.8+
Recommended: Python 3.10 or 3.11

Can I use my existing Python code?

Yes! Chutes is designed to work with existing Python codebases. You typically just need to:

Wrap your functions with decorators
Add any dependencies to an if needed
Deploy with

Deployment & Usage

How long does deployment take?

First deployment: 5-15 minutes (includes image building)
Code-only updates: 1-3 minutes
No-code config updates: 30 seconds

Can I deploy multiple versions?

Yes! Each deployment creates a new version:

# Deploy new version
chutes deploy

# List versions
chutes chutes versions <chute-name>

# Rollback to previous version
chutes chutes rollback <chute-name> --version v1.2.3

How does scaling work?

Chutes automatically scales based on traffic:

Scale to zero: No requests = no costs
Auto-scaling: Handles traffic spikes automatically
Global load balancing: Requests routed to optimal regions
Cold start optimization: Fast instance startup

# Configure scaling behavior
chute = Chute(
    username="myuser",
    name="my-app",
    min_replicas=0,    # Scale to zero
    max_replicas=100   # Scale up to 100 instances
)

Can I deploy the same model multiple times?

Yes! You can have multiple deployments:

# Production deployment
prod_chute = Chute(
    username="myuser",
    name="llm-prod",
    node_selector=NodeSelector()
)

# Development deployment
dev_chute = Chute(
    username="myuser",
    name="llm-dev",
    node_selector=NodeSelector()
)

How do I handle different environments?

Use environment variables and different chute names:

import os

environment = os.getenv("ENVIRONMENT", "dev")
chute_name = f"my-app-{environment}"

chute = Chute(username="myuser", name=chute_name)

Performance & Optimization

How can I optimize performance?

Model optimization:

# Use optimized engines
from chutes.templates import build_vllm_chute

chute = build_vllm_chute(
    username="myuser",
    name="fast-llm",
    model_name="microsoft/DialoGPT-medium",
    engine_args={
        "gpu_memory_utilization": 0.9,
        "enable_chunked_prefill": True,
        "use_v2_block_manager": True
    }
)

Hardware selection:

# Choose appropriate hardware
from chutes.chute import NodeSelector

node_selector = NodeSelector(
    gpu_count=1,
    min_vram_gb_per_gpu=24,
    include=["h100", "a100"]  # High-performance GPUs
)

Caching strategies:

from functools import lru_cache

@lru_cache(maxsize=1000)
def expensive_computation(input_hash):
    return compute_result(input_hash)

What's the latency for API calls?

Typical latencies:

Warm instances: 50-200ms
Cold start: 5-30 seconds (depending on model size)
Global edge: <100ms additional routing overhead

How do I minimize cold starts?

# Keep minimum replicas warm
chute = Chute(
    username="myuser",
    name="low-latency-app",
    min_replicas=1  # Always keep 1 instance warm
)

# Optimize startup time
@chute.on_startup()
async def setup(self):
    # Load models efficiently
    self.model = load_model_optimized()

Can I use multiple GPUs?

Yes! Specify multiple GPUs in your node selector:

# Multi-GPU setup
node_selector = NodeSelector(
    gpu_count=4,  # Use 4 GPUs
    min_vram_gb_per_gpu=40
)

# Distribute model across GPUs
@chute.on_startup()
async def setup(self):
    self.model = load_model_distributed(device_map="auto")

Pricing & Billing

How does pricing work?

Chutes uses pay-per-use pricing:

Compute: Per GPU-second of actual usage
Memory: Per GB-second of RAM usage
Network: Per GB of data transfer
Storage: Per GB of persistent storage

No charges for:

Idle time (scaled to zero)
Failed requests
Development/testing (generous free tier)

What's included in the free tier?

Compute: 100 GPU-hours per month
Memory: 1TB-hours of RAM
Network: 100GB egress
Storage: 50GB persistent storage
Support: Community support

How can I control costs?

Use spot instances:

node_selector = NodeSelector()

Scale to zero:

chute = Chute(
    username="myuser",
    name="cost-optimized",
    min_replicas=0  # No idle costs
)

Choose appropriate hardware:

# Cost-effective GPUs for development
node_selector = NodeSelector(
    include=["l40", "a6000"],  # Less expensive than H100
    exclude=["h100", "h200"]
)

Monitor usage:

# Check current usage
chutes account usage

# Set billing alerts
chutes account alerts --threshold 100

Do you offer volume discounts?

Yes! We offer:

Startup credits: Up to $10,000 for qualifying startups
Enterprise pricing: Custom rates for large usage
Research grants: Free credits for academic research
Volume discounts: Automatic discounts at usage tiers

Fill in this form to contact sales.

Features & Capabilities

What AI frameworks are supported?

Officially supported:

PyTorch: Full support with CUDA optimization
Transformers: Hugging Face models and pipelines
VLLM: High-performance LLM inference
SGLang: Structured generation for LLMs
Diffusers: Image/video generation models

Community supported:

TensorFlow/Keras
JAX/Flax
ONNX Runtime
OpenCV
scikit-learn

Can I use custom models?

Absolutely! Upload your models several ways:

# From Hugging Face Hub
model_name = "your-username/custom-model"

# From local files
image = Image().copy("./my-model/", "/opt/model/")

# From cloud storage
image = Image().run([
    "wget https://storage.example.com/model.bin -O /opt/model.bin"
])

Do you support streaming responses?

Yes! Perfect for LLM chat applications:

from typing import AsyncGenerator

@chute.cord(public_api_path="/stream")
async def stream_generate(self, prompt: str) -> AsyncGenerator[str, None]:
    async for token in self.model.stream_generate(prompt):
        yield f"data: {token}\n\n"

Can I run background jobs?

Yes! Use the decorator:

@chute.job()
async def process_batch(self, batch_data: List[str]):
    results = []
    for item in batch_data:
        result = await self.process_item(item)
        results.append(result)
    return results

# Trigger job
@chute.cord(public_api_path="/submit_batch")
async def submit_batch(self, data: List[str]):
    job_id = await self.process_batch(data)
    return {"job_id": job_id}

Is there a Python client library?

Yes! Use the generated client or standard HTTP:

# Generated client (coming soon)
from chutes.client import ChuteClient

client = ChuteClient("https://your-chute.chutes.ai")
result = await client.predict(text="Hello world")

# Standard HTTP requests
import httpx

async with httpx.AsyncClient() as client:
    response = await client.post(
        "https://your-chute.chutes.ai/predict",
        json={"text": "Hello world"}
    )
    result = response.json()

Technical Details

What regions are available?

Current regions:

US: us-west-2 (Oregon), us-east-1 (Virginia)
Europe: eu-west-1 (Ireland), eu-central-1 (Frankfurt)
Asia: ap-southeast-1 (Singapore), ap-northeast-1 (Tokyo)

Coming soon:

us-central-1, eu-west-2, ap-south-1

What GPU types are available?

GPU	VRAM	Best For	Pricing Tier
T4	16GB	Small models, dev	$
V100	16GB/32GB	Training, medium models	$$
A6000	48GB	Production inference	$$$
L40	48GB	Cost-effective inference	$$$
A100	40GB/80GB	Large models, training	$$$$
H100	80GB	Latest generation	$$$$$
H200	141GB	Massive models	$$$$$

How does networking work?

Public endpoints: HTTPS with automatic SSL certificates
Private endpoints: VPC peering for enterprise customers
Load balancing: Automatic traffic distribution
CDN: Global content delivery for static assets

What about data persistence?

Temporary storage (included):

Container filesystem
Cleared on restart/redeploy

Persistent storage (optional):

chute = Chute(
    username="myuser",
    name="persistent-app",
    storage_gb=100  # 100GB persistent disk
)

# Access at /opt/storage/
@chute.cord(public_api_path="/save")
async def save_data(self, data: str):
    with open("/opt/storage/data.txt", "w") as f:
        f.write(data)

Can I access the underlying infrastructure?

Chutes is serverless, so direct infrastructure access isn't available. However, you get:

System info: CPU, memory, GPU details via APIs
Metrics: Performance monitoring and alerts
Logs: Comprehensive application and system logs
Debug endpoints: Custom debugging interfaces

Troubleshooting

My deployment is failing. What should I check?

Validate configuration:
```
chutes chutes validate --file chute.py
```

Check build logs:

chutes chutes logs --build-logs <chute-name>

Verify resource availability:
```
chutes nodes list --available
```
Common fixes:
- Reduce GPU requirements
- Enable spot instances
- Use more flexible node selector
- Check dependency versions

I'm getting out of memory errors. How do I fix this?

Immediate fixes:

# Request more VRAM
node_selector = NodeSelector(min_vram_gb_per_gpu=48)

# Or reduce batch size
engine_args = {"max_num_batched_tokens": 1024}

# Enable memory optimization
engine_args = {"gpu_memory_utilization": 0.85}

See the Troubleshooting Guide for more details.

How do I debug performance issues?

# Add performance monitoring
import time

@chute.cord(public_api_path="/predict")
async def predict(self, input_data):
    start_time = time.time()
    result = await self.model.predict(input_data)
    duration = time.time() - start_time
    self.logger.info(f"Prediction took {duration:.2f}s")
    return result

# Check resource usage
@chute.cord(public_api_path="/stats")
async def get_stats(self):
    return {
        "gpu_memory": torch.cuda.memory_allocated(),
        "cpu_percent": psutil.cpu_percent()
    }

Integrations

Can I integrate with my existing CI/CD?

Yes! Chutes works with any CI/CD system:

GitHub Actions:

name: Deploy to Chutes
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - name: Install Chutes
        run: pip install chutes
      - name: Deploy
        run: chutes deploy --name my-app-prod
        env:
          CHUTES_API_KEY: ${{ secrets.CHUTES_API_KEY }}

Does it work with monitoring tools?

Yes! Export metrics to your preferred tools:

# Prometheus metrics
@chute.cord(public_api_path="/metrics")
async def metrics(self):
    return generate_prometheus_metrics()

# Custom webhooks
@chute.cord(public_api_path="/predict")
async def predict(self, input_data):
    result = await self.model.predict(input_data)

    # Send to monitoring
    await send_to_datadog(metric="prediction_count", value=1)

    return result

Can I use it with databases?

Absolutely! Connect to any database:

# PostgreSQL example
import asyncpg

@chute.on_startup()
async def setup(self):
    self.db = await asyncpg.connect(
        host=os.getenv("DB_HOST"),
        user=os.getenv("DB_USER"),
        password=os.getenv("DB_PASSWORD"),
        database=os.getenv("DB_NAME")
    )

@chute.cord(public_api_path="/query")
async def query_data(self, query: str):
    rows = await self.db.fetch("SELECT * FROM table WHERE condition = $1", query)
    return [dict(row) for row in rows]

Security & Privacy

How secure is my data?

Infrastructure security:

SOC 2 Type II compliance
End-to-end encryption (TLS 1.3)
Network isolation between deployments
Regular security audits and penetration testing

Data handling:

No persistent storage of request/response data
Optional data encryption at rest
GDPR and CCPA compliant
Customer data never used for training

Can I use private models?

Yes! Several options for private models:

# Private Hugging Face models (requires token)
os.environ["HUGGINGFACE_HUB_TOKEN"] = "your_token"

# Upload during build
image = Image().copy("./private-model/", "/opt/model/")

# Download from private S3
image = Image().run([
    "aws s3 cp s3://private-bucket/model.bin /opt/model.bin"
]).env("AWS_ACCESS_KEY_ID", "your_key")

Still have questions?

Community: Join our Discord for community support
Documentation: Check our comprehensive docs
Support: Email for technical assistance
Sales: Fill in this form: Form

We're constantly updating this FAQ based on user feedback. If you have a question not covered here, please let us know!