Quick Start Guide
Get your first chute deployed in under 10 minutes! This guide will walk you through creating, building, and deploying a simple AI application.
Just want to call existing models, not deploy your own? Point any OpenAI-compatible client at with a Chutes API key (). That gives you instant access to every model already running on the network ā no deployment required. See Authentication for key setup and the LLM Chat guide for full examples. The rest of this page is for deploying your own chute.
Prerequisites
Make sure you've completed the Installation & Setup guide first.
Step 1: Create Your First Chute
Let's build a simple text generation chute using a pre-built template.
Create a new file called :
from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute
# Build a chute using the VLLM template
chute = build_vllm_chute(
username="your-username", # Replace with your Chutes username
readme="## Meta Llama 3.2 1B Instruct\n### Hello.",
model_name="unsloth/Llama-3.2-1B-Instruct",
node_selector=NodeSelector(
gpu_count=1,
),
concurrency=4,
readme="""
# My First Chute
A simple conversational AI powered by Llama 3.2.
## Usage
Send a POST request to `/v1/chat/completions` with your message.
"""
)That's it! You've just defined a complete AI application with:
- ā A pre-configured VLLM server
- ā Automatic model downloading
- ā OpenAI-compatible API endpoints
- ā GPU resource requirements
- ā Auto-scaling configuration
Step 2: Build Your Image
Build the Docker image for your chute:
chutes build my_first_chute:chute --waitThis will:
- š¦ Create a Docker image with all dependencies
- š§ Install VLLM and required libraries
- ā¬ļø Pre-download your model
- ā Validate the configuration
The flag streams the build logs to your terminal so you can monitor progress.
Step 3: Deploy Your Chute
Deploy your chute to the Chutes platform:
chutes deploy my_first_chute:chuteAfter deployment, you'll see output like:
ā
Chute deployed successfully!
š Public API: https://your-username-my-first-chute.chutes.ai
š Chute ID: 12345678-1234-5678-9abc-123456789012Step 4: Test Your Chute
Your chute is now live! Test it with a simple chat completion:
Option 1: Using curl
curl -X POST https://your-username-my-first-chute.chutes.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'Option 2: Using Python
import asyncio
import aiohttp
import json
async def chat_with_chute():
url = "https://your-username-my-first-chute.chutes.ai/v1/chat/completions"
payload = {
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}
async with aiohttp.ClientSession() as session:
async with session.post(url, json=payload) as response:
result = await response.json()
print(json.dumps(result, indent=2))
# Run the test
asyncio.run(chat_with_chute())Option 3: Test Locally
You can also test your chute locally before deploying using the CLI:
# Run your chute locally
chutes run my_first_chute:chute --dev
# Then in another terminal, test with curl
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'Step 5: Monitor and Manage
View Your Chutes
chutes chutes listGet Detailed Information
chutes chutes get my-first-chuteCheck Logs
Visit the Chutes Dashboard to view real-time logs and metrics.
Deleting Resources
When you're done with a chute, it's good practice to clean up your resources.
- Note: You must remove a chute before you can delete its image. Images tied to running chutes cannot be deleted.
# 1. Delete the chute
chutes chutes delete <chute_id>
# 2. Delete the image (after chute is removed)
chutes images delete <image_id>What Just Happened?
Congratulations! You just:
- šÆ Defined an AI application with just a few lines of Python
- šļø Built a production-ready Docker image
- š Deployed to GPU-accelerated infrastructure
- š Exposed OpenAI-compatible API endpoints
- š° Pay-per-use - only charged when your chute receives requests
Next Steps
Now that you have a working chute, explore more advanced features:
šØ Try Different Models
Replace with:
- (requires more VRAM)
š§ Customize Hardware
Adjust your :
NodeSelector(
gpu_count=1, # Use 1 GPU
min_vram_gb_per_gpu=24, # Require 24GB VRAM per GPU
include=["a100", "h100"], # Prefer specific GPU types
exclude=["k80"] # Avoid older GPUs
)šļø Tune Performance
Modify engine arguments:
chute = build_vllm_chute(
# ... other parameters ...
engine_args={
"max_model_len": 4096,
"gpu_memory_utilization": 0.9,
"max_num_seqs": 32
}
)š Learn Core Concepts
- Understanding Chutes - Deep dive into the Chute class
- Security Architecture - Learn about our TEE and hardware attestation security
- Cords (API Endpoints) - Custom API endpoints
- Custom Images - Build your own Docker images
šļø Build Custom Applications
- Your First Custom Chute - Build from scratch
- Custom Image Building - Advanced Docker setups
- Input/Output Schemas - Type-safe APIs
š Integrations
- Vercel AI SDK - Use Chutes with the Vercel AI SDK for streaming, tool calling, and more
Common Questions
Q: How much does this cost? A: You only pay for GPU time when your chute is processing requests. Idle time is free!
Q: Can I use my own models? A: Yes! Upload models to HuggingFace or use the custom image building features.
Q: What about scaling? A: Chutes automatically scales based on demand. Configure to control how many requests each instance handles.
Q: How do I debug issues? A: Check the logs in the Chutes Dashboard or use the CLI:
Troubleshooting
Build failed?
- Check that your model name is correct
- Try with a smaller model first
Deployment failed?
- Verify your image built successfully
- Check your username and chute name are valid
- Ensure you have proper permissions
Can't access your chute?
- Wait a few minutes for DNS propagation
- Check the exact URL from
- Verify the chute is in "running" status
Get Help
- š Detailed Guides: Continue with Your First Custom Chute
- š¬ Community: Join our Discord
- š Issues: GitHub Issues
- š§ Support:
Ready to build something more advanced? Check out Your First Custom Chute to learn how to build completely custom applications!