Please excuse our absence from Proof of Talk in Paris

Timon Agar

Engineering and product team.

TL;DR

We've been focusing on powerful, significant, and fully quantifiable improvements to chutes' operating budget, security, intellectual property, etc. We very much have a GSD (GET SHIT DONE) philosophy, and sometimes that requires "all-hands-on-deck" work that precludes the possibility of attending conferences.

Model consolidation / long-tail cost reduction

For a while now we have had the goal of migrating the platform and all active chutes to be fully TEE. We have now finished that migration but this meant that we had to trim down and consolidate many of the models/chutes we offered publicly. This change had the following effects:

Substantial reduction in the amount of idle/wasted inventory
Much higher security given the TEE workload attestation/admission controllers/signed images/etc.
Eliminating the need for any GPU infra for the validator entirely which is a significant expense reduction.
Many of the models, such as image models, voice tools (text-to-speech, speech-to-text, etc.) now have model "packs" where those models can run on single GPUs vs each requiring a separate chute/GPU.
More publicly offered models removed to focus on providing hardware for the most used/useful models to improve uptime/throughput/etc.

Validator infra optimization / cost reduction

Another aspect that needed an update was our validator. Our previous version worked great but had significant overhead that could be reduced with changes without a loss of performance. These changes include:

Autoscaling

All validator nodes and services now autoscale via either standard HPA or custom metrics (via KEDA):

Previously we were a bit over-provisioned with fixed inventory to handle bursty traffic, but the former provider tried to massively increase prices. This price hike made the decision to leave that provider and modify the system that much easier to make.
Our Infrastructure can scale down quite dramatically during off-peak hours with near-instantaneous upscaling when traffic increases saving significant resources and money.

Observability

Much better visibility/monitoring tooling all consolidated into a unified metrics/observability framework
By making these changes and building our own monitoring tooling we were also able to eliminate a fairly large monthly Datadog bill.

Security & privacy

The entire prod validator stack and services are now more secure and private:

Every component is protected via TEE (AMD SEV-SNP)
Better more thorough use of network policies to restrict traffic to only what is absolutely required.
More extensive use of private services and RBAC to eliminate any potential external access and remove credentials.
Significantly large cost reduction from eliminating all WAN traffic to internal services (AlloyDB, managed Redis, etc.)
Optional private-network-only infrastructure with NAT egress to reduce infrastructure fingerprint and visibility

High availability & global resiliency

Path to higher global presence and resiliency on the validator side:

Much easier cross-region replication and HA via AlloyDB (vs. the previous manual stack with daily incremental, weekly full, and continuous WAL archiving, Patroni with Keepalived and etcd, etc.)
Unlocks the option of regional partitioning for users who may require it, e.g. EU regulations requiring data never leaves Europe

Remote builds

Remote chute builds via depot.dev make for much faster build times and much higher security. This is due to the fact that builds with arbitrary user code never occur inside the core validator infra/network.

Blackwell B200/B300 support

Kyle, in combination with the support and efforts of legendary miner Pierre, have updated our VM/code/etc. and successfully onboarded our first B200/B300 nodes in TEE. The first model making use of this hardware is NVIDIA Nemotron-3 Ultra 550B.

The importance of this milestone cannot be overstated. We have crossed over from only supporting Hopper and RTX Pro 6000s into supporting the more powerful flagship GPUs, which unlock:

Higher performance and concurrency
Larger model support
Elimination of many KVCache bottlenecks on some models
Native NVFP4 compute capabilities for significantly higher concurrency/throughput and reduced TTFT

CPU-only TEE infrastructure

cxmplex has been tackling support for CPU/RAM-only compute offerings, with the same TEE attestation and attested workloads we have today for GPU infra. This will enable secure, verifiable VMs that don't need GPUs (e.g. sandboxes, agent runners, etc.).

This is not as simple as checking a quote for validity. Even with very strict measurements including software stacks, there are still many attack vectors to address before this can be used securely. For example:

A miner could try to proxy network traffic through a transparent gateway, perform the real attestation, then intercept and redirect SSH traffic to an insecure or different box. This requires the VM SSH host key to be part of the attestation, and requires the client to force verification against that host key to prevent any MITM or redirect.
Logs present a surprisingly significant attack vector. Even if memory is encrypted and all network traffic is over secure channels with attested certs, some attack vectors are much simpler. For CPU-only chutes with VMs, there is no k8s/RBAC option, and sensitive data could theoretically leak into system logs visible from the host even without access to the secure TEE guest VM.

More work to be done here, but we have made great progress towards the CPU-only infra.

Parallax

The draft of the Parallax tech report is available to review. Parallax enables training significantly larger sparse MoE models across heterogenous disconnected nodes in a highly performant, compute- and comms-efficient manner.

We think this is a fairly important technical innovation that removes barriers to entry in previous decentralized training techniques.

Additionally, we have made significant progress towards building out a truly monstrous data pipeline to collect and synthetically augment coding and agentic workloads.

Training a model that nobody uses is a bit of a useless endeavor in most cases, regardless of how cool the underlying tech is. We want to build real, useful, and used models, and that requires extremely high quality, comprehensive data. Data has been and always will be king.

Upcoming exploit conference

While we did not attend Proof of Talk, we will have a strong presence at the exploit conference later this year. We sponsored this event at the platinum tier and we very much look forward to meeting with all of you in person!

Closing thoughts

Conferences are great, and face-to-face meetings are certainly valuable. But alas, we are simple creatures who operate mostly on data. It can be very difficult to quantify the usefulness or benefit of attending a conference.

On the contrary, migrating our entire production validator infrastructure has precise measurable dollar amounts associated with it, which directly reduce sell pressure on the token, as does eliminating much of the long tail of public models we offer.

This time around, we opted to skip the conference to focus on these measurable, quantifiable changes that will have lasting and important impact on chutes. There were simply too many things in flight converging at this exact moment in time to take a pause and attend. Apologies, but again, GSD.

View All News