Simple Text Analysis Chute
This example shows how to build a basic text analysis service using transformers and custom API endpoints. Perfect for getting started with custom Chutes.
What We'll Build
A simple text sentiment analysis service that:
- 📊 Analyzes sentiment using a pre-trained model
- 🔍 Validates input with Pydantic schemas
- 🚀 Provides REST API for easy integration
- 📦 Uses custom Docker image with optimized dependencies
Complete Example
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from pydantic import BaseModel, Field
from fastapi import HTTPException
from chutes.chute import Chute, NodeSelector
from chutes.image import Image
# === INPUT/OUTPUT SCHEMAS ===
class TextInput(BaseModel):
text: str = Field(..., min_length=5, max_length=1000, description="Text to analyze")
class Config:
schema_extra = {
"example": {
"text": "I love using this new AI service!"
}
}
class SentimentResult(BaseModel):
text: str
sentiment: str # POSITIVE, NEGATIVE, NEUTRAL
confidence: float
processing_time: float
# === CUSTOM IMAGE ===
image = (
Image(username="myuser", name="sentiment-analyzer", tag="1.0")
.from_base("nvidia/cuda:12.4.1-runtime-ubuntu22.04")
.with_python("3.11")
.run_command("pip install torch>=2.4.0 transformers>=4.44.0 accelerate>=0.33.0")
.with_env("TRANSFORMERS_CACHE", "/app/models")
.run_command("mkdir -p /app/models")
)
# === CHUTE DEFINITION ===
chute = Chute(
username="myuser",
name="sentiment-analyzer",
image=image,
tagline="Simple sentiment analysis with transformers",
readme="""
# Sentiment Analyzer
A simple sentiment analysis service using DistilBERT.
## Usage
Send a POST request to `/analyze`:
```bash
curl -X POST https://myuser-sentiment-analyzer.chutes.ai/analyze \\
-H "Content-Type: application/json" \\
-d '{"text": "I love this product!"}'Response
{
"text": "I love this product!",
"sentiment": "POSITIVE",
"confidence": 0.99,
"processing_time": 0.05
}""",
node_selector=NodeSelector(
gpu_count=1,
min_vram_gb_per_gpu=8
))
=== MODEL LOADING ===
@chute.on_startup() async def load_model(self): """Load the sentiment analysis model on startup.""" print("Loading sentiment analysis model...")
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
# Load tokenizer and model
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Move to GPU if available
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model.to(self.device)
self.model.eval() # Set to evaluation mode
print(f"Model loaded on device: {self.device}")=== API ENDPOINTS ===
@chute.cord( public_api_path="/analyze", method="POST", input_schema=TextInput, output_content_type="application/json" ) async def analyze_sentiment(self, data: TextInput) -> SentimentResult: """Analyze the sentiment of the input text.""" import time
start_time = time.time()
try:
# Tokenize input
inputs = self.tokenizer(
data.text,
return_tensors="pt",
truncation=True,
padding=True,
max_length=512
).to(self.device)
# Run inference
with torch.no_grad():
outputs = self.model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get results
labels = ["NEGATIVE", "POSITIVE"] # DistilBERT SST-2 labels
predicted_class = predictions.argmax(dim=-1).item()
confidence = predictions[0][predicted_class].item()
processing_time = time.time() - start_time
return SentimentResult(
text=data.text,
sentiment=labels[predicted_class],
confidence=confidence,
processing_time=processing_time
)
except Exception as e:
raise HTTPException(status_code=500, detail=f"Analysis failed: {str(e)}")@chute.cord( public_api_path="/health", method="GET", output_content_type="application/json" ) async def health_check(self) -> dict: """Simple health check endpoint.""" return { "status": "healthy", "model_loaded": hasattr(self, 'model'), "device": getattr(self, 'device', 'unknown') }
Test the chute locally (optional)
if name == "main": import asyncio
async def test_locally():
# Simulate startup
await load_model(chute)
# Test analysis
test_input = TextInput(text="I love this new AI service!")
result = await analyze_sentiment(chute, test_input)
print(f"Result: {result}")
asyncio.run(test_locally())
## Step-by-Step Breakdown
### 1. Define Input/Output Schemas
```python
class TextInput(BaseModel):
text: str = Field(..., min_length=5, max_length=1000)- Validation: Ensures text is between 5-1000 characters
- Documentation: Provides clear API documentation
- Type Safety: Automatic parsing and validation
2. Build Custom Image
image = (
Image(username="myuser", name="sentiment-analyzer", tag="1.0")
.from_base("nvidia/cuda:12.4.1-runtime-ubuntu22.04")
.with_python("3.11")
.run_command("pip install torch>=2.4.0 transformers>=4.44.0")
)- Base Image: CUDA-enabled Ubuntu for GPU support
- Dependencies: Only what we need for sentiment analysis
- Optimization: Runtime image (smaller than devel)
3. Model Loading
@chute.on_startup()
async def load_model(self):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
self.model.to(self.device)- Startup Hook: Load model once when chute starts
- GPU Support: Automatically use GPU if available
- State Management: Store in chute instance
4. API Endpoint
@chute.cord(public_api_path="/analyze", input_schema=TextInput)
async def analyze_sentiment(self, data: TextInput) -> SentimentResult:
# Process the input
return SentimentResult(...)- Path Mapping: Creates endpoint
- Input Validation: Automatic validation using schema
- Typed Response: Structured output with SentimentResult
Building and Deploying
1. Build the Image
chutes build sentiment_analyzer:chute --wait2. Deploy the Chute
chutes deploy sentiment_analyzer:chute3. Test Your Deployment
curl -X POST https://myuser-sentiment-analyzer.chutes.ai/analyze \
-H "Content-Type: application/json" \
-d '{"text": "This is amazing!"}'Expected response:
{
"text": "This is amazing!",
"sentiment": "POSITIVE",
"confidence": 0.99,
"processing_time": 0.05
}Testing Different Texts
import requests
texts = [
"I love this product!", # Should be POSITIVE
"This is terrible.", # Should be NEGATIVE
"It's okay, nothing special.", # Could be NEGATIVE or POSITIVE
"Amazing technology!", # Should be POSITIVE
"Poor quality." # Should be NEGATIVE
]
for text in texts:
response = requests.post(
"https://myuser-sentiment-analyzer.chutes.ai/analyze",
json={"text": text}
)
result = response.json()
print(f"'{text}' -> {result['sentiment']} ({result['confidence']:.2f})")Key Concepts Learned
1. Custom Images
- How to build optimized Docker environments
- Installing Python packages efficiently
- Setting environment variables
2. Model Management
- Loading models at startup (not per request)
- GPU detection and utilization
- Memory optimization
3. API Design
- Input validation with Pydantic
- Structured responses
- Error handling
4. Performance
- Model reuse across requests
- Efficient tokenization
- GPU acceleration
Next Steps
Now that you understand the basics, try:
- Streaming Responses - Real-time analysis
- Batch Processing - Process multiple texts
- Multi-Model Setup - Combine multiple models
- Custom Image Building - Advanced Docker
Common Issues & Solutions
Model not loading?
- Check GPU requirements in NodeSelector
- Verify model name is correct
- Ensure sufficient VRAM
Slow responses?
- Model loads on first request (normal)
- Consider warming up with health check
- Check GPU utilization
Out of memory?
- Reduce max_length in tokenizer
- Use smaller model variant
- Increase VRAM requirements