This guide covers "Day 2" operations for Chutes miners: monitoring, troubleshooting, updating, and maintaining your mining infrastructure.
Routine Maintenance
1. Updating Components
The Chutes ecosystem evolves rapidly. Keep your miner up to date to ensure compatibility and maximize rewards.
Updating Charts:
Use the provided Ansible playbooks to update your Helm charts. This pulls the latest miner and GPU agent images.
# From your ansible/k3s directory
ansible-playbook -i inventory.yml playbooks/deploy-charts.yml
Updating OS & Drivers:
Periodically update your base OS and NVIDIA drivers. Caution: Drain the node or set it to unschedulable in Kubernetes before rebooting to avoid slashing/penalties for dropping active chutes.
2. Cleaning Disk Space
HuggingFace models and Docker images can consume significant disk space. The chutes-cacheclean service usually handles this, but you can run manual cleanups if needed.
Prune Docker Images:
# On a GPU node
docker system prune -a -f --filter "until=24h"
Clear HuggingFace Cache:
Model weights are stored in the configured cache directory (default /var/snap). You can manually delete old models if space is critical, but this will force re-downloads for new deployments.
Troubleshooting
Common Issues
1. Node Not Joining Cluster
Check Wireguard: Ensure wg0 interface is up and has the correct IP.
ip addr show wg0
systemctl status wg-quick@wg0
Check K3s Agent:
systemctl status k3s-agent
Logs: journalctl -u k3s-agent -f
2. GPU Not Detected
NVIDIA SMI: Run nvidia-smi on the node. If it fails, reinstall drivers.
K8s Detection: Check if the node advertises GPU resources: