Developers

Deploying Chutes

The command takes your built images and deploys them as live, scalable AI applications on the Chutes platform.

Basic Deploy Command

Deploy a chute to the platform.

chutes deploy <chute_ref> [OPTIONS]

Arguments:

  • : Chute reference in format

Options:

  • : Custom config path
  • : Path to logo image for the chute
  • : Enable debug logging
  • : Mark chute as public/available to anyone
  • : Acknowledge and accept the deployment fee

Deployment Examples

Basic Deployment

# Deploy with fee acknowledgment
chutes deploy my_chute:chute --accept-fee

What happens:

  • ✅ Validates image exists and is built
  • ✅ Creates deployment configuration
  • ✅ Registers chute with the platform
  • ✅ Returns chute ID and version

Production Deployment

# Deploy with logo
chutes deploy my_chute:chute \
  --logo ./assets/logo.png \
  --accept-fee

Private vs Public Deployments

# Private deployment (default) - only you can access
chutes deploy my_chute:chute --accept-fee

# Public deployment (requires special permissions)
chutes deploy my_chute:chute --public --accept-fee

Note: Public chutes require special permissions. If you need to share your chute, use the command instead.

Deployment Process

Deployment Stages

# Example deployment output
Deploying chute: my_chute:chute
You are about to upload my_chute.py and deploy my-chute, confirm? (y/n) y
Successfully deployed chute my-chute chute_id=abc123 version=1

What Gets Deployed

When you deploy, the following is sent to the platform:

  • Chute Configuration: Name, readme, tagline
  • Node Selector: GPU requirements
  • Cords: API endpoints your chute exposes
  • Code Reference: Your chute's Python code
  • Image Reference: The built image to use

Deployment Fees

Deployment incurs a one-time fee based on your NodeSelector configuration:

# Deploy and acknowledge the fee
chutes deploy my_chute:chute --accept-fee

If you don't include , you may receive a 402 error indicating the deployment fee needs to be acknowledged.

Fee Structure

Deployment fees are calculated based on:

  • GPU Type: Higher-end GPUs cost more
  • GPU Count: More GPUs = higher fee
  • VRAM Requirements: Higher VRAM requirements cost more

Example fee calculation:

  • Single RTX 3090 at $0.12/hr = $0.36 deployment fee
  • Multiple GPUs or premium GPUs will have higher fees

Pre-Deployment Checklist

Before deploying, ensure:

1. Image is Built and Ready

# Check image status
chutes images list --name my-image
chutes images get my-image

# Should show status: "built and pushed"

2. Chute Configuration is Correct

# Verify your chute definition
from chutes.chute import Chute, NodeSelector

chute = Chute(
    username="myuser",
    name="my-chute",
    tagline="My awesome AI chute",
    readme="## My Chute\n\nDescription here...",
    image=my_image,
    concurrency=4,
    node_selector=NodeSelector(
        gpu_count=1,
        min_vram_gb_per_gpu=16,
    ),
)

3. Cords are Defined

@chute.cord()
async def my_function(self, input_data: str) -> str:
    return f"Processed: {input_data}"

@chute.cord(
    public_api_path="/generate",
    public_api_method="POST",
)
async def generate(self, prompt: str) -> str:
    # Your logic here
    return result

Chute Configuration Options

NodeSelector

Control which GPUs your chute runs on:

from chutes.chute import NodeSelector

node_selector = NodeSelector(
    gpu_count=1,              # Number of GPUs (1-8)
    min_vram_gb_per_gpu=16,   # Minimum VRAM per GPU (16-80)
    include=["rtx4090"],      # Only use these GPU types
    exclude=["rtx3090"],      # Don't use these GPU types
)

Concurrency

Set how many concurrent requests your chute can handle:

chute = Chute(
    ...
    concurrency=4,  # Handle 4 concurrent requests per instance
)

Auto-Scaling

Configure automatic scaling behavior:

chute = Chute(
    ...
    max_instances=10,           # Maximum number of instances
    scaling_threshold=0.8,      # Scale up threshold
    shutdown_after_seconds=300, # Shutdown idle instances after 5 minutes
)

Network Egress

Control external network access:

chute = Chute(
    ...
    allow_external_egress=True,  # Allow external network access
)

Note: By default, is true for custom chutes but false for vllm/sglang templates. Set to if your chute needs to fetch external resources (e.g., image URLs for vision models).

Sharing Chutes

After deployment, you can share your chute with other users:

# Share with another user
chutes share --chute-id my-chute --user-id colleague

# Remove sharing
chutes share --chute-id my-chute --user-id colleague --remove

Billing When Sharing

When you share a chute:

  • You (chute owner) pay the hourly rate while instances are running
  • The user you shared with pays the standard usage rate (per token, per step, etc.)

Troubleshooting Deployments

Common Deployment Issues

"Image is not available to be used (yet)!"

# Image hasn't finished building - check status
chutes images get my-image

# Wait for status: "built and pushed"

"Unable to create public chutes from non-public images"

# If deploying public chute, image must also be public
# Rebuild image with --public flag
chutes build my_chute:chute --public --wait

402 Payment Required

# Include --accept-fee flag
chutes deploy my_chute:chute --accept-fee

409 Conflict

# Chute with this name already exists
# Delete existing chute first
chutes chutes delete my-chute

# Or use a different name in your chute definition

Debug Commands

# Enable debug logging
chutes deploy my_chute:chute --debug --accept-fee

# Check existing chutes
chutes chutes list
chutes chutes get my-chute

# Check image status
chutes images get my-image

CI/CD Integration

GitHub Actions

name: Deploy to Chutes
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install Chutes
        run: pip install chutes

      - name: Configure Chutes
        env:
          CHUTES_CONFIG: ${{ secrets.CHUTES_CONFIG }}
        run: |
          mkdir -p ~/.chutes
          echo "$CHUTES_CONFIG" > ~/.chutes/config.ini

      - name: Build and Deploy
        run: |
          chutes build my_app:chute --wait
          chutes deploy my_app:chute --accept-fee

GitLab CI

deploy:
  stage: deploy
  script:
    - pip install chutes
    - mkdir -p ~/.chutes
    - echo "$CHUTES_CONFIG" > ~/.chutes/config.ini
    - chutes build my_app:chute --wait
    - chutes deploy my_app:chute --accept-fee
  only:
    - main

Production Deployment Checklist

Pre-Deployment

# ✅ Run tests locally
python -m pytest tests/

# ✅ Build image and verify
chutes build my_chute:chute --wait
chutes images get my-chute

# ✅ Test locally if possible
docker run --rm -it -p 8000:8000 my_chute:tag chutes run my_chute:chute --dev

Deployment

# ✅ Deploy with fee acknowledgment
chutes deploy my_chute:chute --accept-fee

# ✅ Note the chute_id and version from output

Post-Deployment

# ✅ Verify deployment
chutes chutes get my-chute

# ✅ Warm up the chute
chutes warmup my-chute

# ✅ Test the endpoint
curl -X POST https://your-chute-url/your-endpoint \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"input": "test"}'

Best Practices

1. Use Meaningful Names

chute = Chute(
    name="sentiment-analyzer-v2",  # Clear, versioned name
    tagline="Analyze sentiment in text using BERT",
    readme="## Sentiment Analyzer\n\n...",
)

2. Set Appropriate Concurrency

# For LLMs with continuous batching (vllm/sglang)
concurrency=64

# For single-request models (diffusion, custom)
concurrency=1

# For models with some parallelism
concurrency=4

3. Configure Shutdown Timer

# For development/testing - short timeout
shutdown_after_seconds=60

# For production - longer timeout to avoid cold starts
shutdown_after_seconds=300

4. Right-Size GPU Requirements

# Match your model's actual requirements
NodeSelector(
    gpu_count=1,
    min_vram_gb_per_gpu=24,  # For ~13B parameter models
)

# Don't over-provision
NodeSelector(
    gpu_count=1,
    min_vram_gb_per_gpu=80,  # Only if you actually need A100
)

Next Steps