Developers

Start

Quick Start Guide

Get your first chute deployed in under 10 minutes! This guide will walk you through creating, building, and deploying a simple AI application.

Just want to call existing models, not deploy your own? Point any OpenAI-compatible client at with a Chutes API key (). That gives you instant access to every model already running on the network — no deployment required. See Authentication for key setup and the LLM Chat guide for full examples. The rest of this page is for deploying your own chute.

Prerequisites

Make sure you've completed the Installation & Setup guide first.

Step 1: Create Your First Chute

Let's build a simple text generation chute using a pre-built template.

Create a new file called :

from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute

    # Build a chute using the VLLM template
chute = build_vllm_chute(
    username="your-username",  # Replace with your Chutes username
    readme="## Meta Llama 3.2 1B Instruct\n### Hello.",
    model_name="unsloth/Llama-3.2-1B-Instruct",
    node_selector=NodeSelector(
        gpu_count=1,
    ),
    concurrency=4,
    readme="""
    # My First Chute
    A simple conversational AI powered by Llama 3.2.

    ## Usage
    Send a POST request to `/v1/chat/completions` with your message.
    """
)

That's it! You've just defined a complete AI application with:

  • āœ… A pre-configured VLLM server
  • āœ… Automatic model downloading
  • āœ… OpenAI-compatible API endpoints
  • āœ… GPU resource requirements
  • āœ… Auto-scaling configuration

Step 2: Build Your Image

Build the Docker image for your chute:

chutes build my_first_chute:chute --wait

This will:

  • šŸ“¦ Create a Docker image with all dependencies
  • šŸ”§ Install VLLM and required libraries
  • ā¬‡ļø Pre-download your model
  • āœ… Validate the configuration

The flag streams the build logs to your terminal so you can monitor progress.

Step 3: Deploy Your Chute

Deploy your chute to the Chutes platform:

chutes deploy my_first_chute:chute

After deployment, you'll see output like:

āœ… Chute deployed successfully!
🌐 Public API: https://your-username-my-first-chute.chutes.ai
šŸ“‹ Chute ID: 12345678-1234-5678-9abc-123456789012

Step 4: Test Your Chute

Your chute is now live! Test it with a simple chat completion:

Option 1: Using curl

curl -X POST https://your-username-my-first-chute.chutes.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Llama-3.2-1B-Instruct",
    "messages": [
      {"role": "user", "content": "Hello! How are you today?"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Option 2: Using Python

import asyncio
import aiohttp
import json

async def chat_with_chute():
    url = "https://your-username-my-first-chute.chutes.ai/v1/chat/completions"

    payload = {
        "model": "unsloth/Llama-3.2-1B-Instruct",
        "messages": [
            {"role": "user", "content": "Hello! How are you today?"}
        ],
        "max_tokens": 100,
        "temperature": 0.7
    }

    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=payload) as response:
            result = await response.json()
            print(json.dumps(result, indent=2))

# Run the test
asyncio.run(chat_with_chute())

Option 3: Test Locally

You can also test your chute locally before deploying using the CLI:

# Run your chute locally
chutes run my_first_chute:chute --dev

# Then in another terminal, test with curl
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Llama-3.2-1B-Instruct",
    "messages": [
      {"role": "user", "content": "Hello! How are you today?"}
    ],
    "max_tokens": 100,
    "temperature": 0.7
  }'

Step 5: Monitor and Manage

View Your Chutes

chutes chutes list

Get Detailed Information

chutes chutes get my-first-chute

Check Logs

Visit the Chutes Dashboard to view real-time logs and metrics.

Deleting Resources

When you're done with a chute, it's good practice to clean up your resources.

  • Note: You must remove a chute before you can delete its image. Images tied to running chutes cannot be deleted.
# 1. Delete the chute
chutes chutes delete <chute_id>

# 2. Delete the image (after chute is removed)
chutes images delete <image_id>

What Just Happened?

Congratulations! You just:

  1. šŸŽÆ Defined an AI application with just a few lines of Python
  2. šŸ—ļø Built a production-ready Docker image
  3. šŸš€ Deployed to GPU-accelerated infrastructure
  4. 🌐 Exposed OpenAI-compatible API endpoints
  5. šŸ’° Pay-per-use - only charged when your chute receives requests

Next Steps

Now that you have a working chute, explore more advanced features:

šŸŽØ Try Different Models

Replace with:

  • (requires more VRAM)

šŸ”§ Customize Hardware

Adjust your :

NodeSelector(
    gpu_count=1,           # Use 1 GPU
    min_vram_gb_per_gpu=24, # Require 24GB VRAM per GPU
    include=["a100", "h100"], # Prefer specific GPU types
    exclude=["k80"]        # Avoid older GPUs
)

šŸŽ›ļø Tune Performance

Modify engine arguments:

chute = build_vllm_chute(
    # ... other parameters ...
    engine_args={
        "max_model_len": 4096,
        "gpu_memory_utilization": 0.9,
        "max_num_seqs": 32
    }
)

šŸ“š Learn Core Concepts

šŸ—ļø Build Custom Applications

šŸ”— Integrations

  • Vercel AI SDK - Use Chutes with the Vercel AI SDK for streaming, tool calling, and more

Common Questions

Q: How much does this cost? A: You only pay for GPU time when your chute is processing requests. Idle time is free!

Q: Can I use my own models? A: Yes! Upload models to HuggingFace or use the custom image building features.

Q: What about scaling? A: Chutes automatically scales based on demand. Configure to control how many requests each instance handles.

Q: How do I debug issues? A: Check the logs in the Chutes Dashboard or use the CLI:

Troubleshooting

Build failed?

  • Check that your model name is correct
  • Try with a smaller model first

Deployment failed?

  • Verify your image built successfully
  • Check your username and chute name are valid
  • Ensure you have proper permissions

Can't access your chute?

  • Wait a few minutes for DNS propagation
  • Check the exact URL from
  • Verify the chute is in "running" status

Get Help


Ready to build something more advanced? Check out Your First Custom Chute to learn how to build completely custom applications!