Let's build a simple text generation chute using a pre-built template.
Create a new file called my_first_chute.py:
from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute
# Build a chute using the VLLM template
chute = build_vllm_chute(
username="your-username", # Replace with your Chutes username
model_name="microsoft/DialoGPT-medium",
revision="main", # Required: locks model to specific version
node_selector=NodeSelector(
gpu_count=1,
min_vram_gb_per_gpu=16
),
concurrency=4,
readme="""
# My First Chute
A simple conversational AI powered by DialoGPT.
## Usage
Send a POST request to `/v1/chat/completions` with your message.
"""
)
That's it! You've just defined a complete AI application with:
ā A pre-configured VLLM server
ā Automatic model downloading
ā OpenAI-compatible API endpoints
ā GPU resource requirements
ā Auto-scaling configuration
Step 2: Build Your Image
Build the Docker image for your chute:
chutes build my_first_chute:chute --wait
This will:
š¦ Create a Docker image with all dependencies
š§ Install VLLM and required libraries
ā¬ļø Pre-download your model
ā Validate the configuration
The --wait flag streams the build logs to your terminal so you can monitor progress.
Your chute is now live! Test it with a simple chat completion:
Option 1: Using curl
curl -X POST https://your-username-my-first-chute.chutes.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'
Option 2: Using Python
import asyncio
import aiohttp
import json
asyncdefchat_with_chute():
url = "https://your-username-my-first-chute.chutes.ai/v1/chat/completions"
payload = {
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}
asyncwith aiohttp.ClientSession() as session:
asyncwith session.post(url, json=payload) as response:
result = await response.json()
print(json.dumps(result, indent=2))
# Run the test
asyncio.run(chat_with_chute())
Option 3: Test Locally
You can also test your chute locally before deploying using the CLI:
# Run your chute locally
chutes run my_first_chute:chute --dev
# Then in another terminal, test with curl
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "microsoft/DialoGPT-medium",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'