
One chute, 12 models, 13 endpoints — covering text-to-speech, voice cloning, voice design, transcription, denoising, source separation, speaker verification, VAD, and language detection. Including Kokoro-82M, Qwen3-TTS 1.7B, Whisper large-v3-turbo, NVIDIA Canary-Qwen 2.5B, and NVIDIA Parakeet TDT.
AudioDojo bundles speech generation, voice cloning, transcription, speaker analysis, denoising, and separation. Pick a task and the input form adapts to it.
Ultra-fast text-to-speech with built-in multilingual Kokoro voices.
Run this model with your Chutes account, quota, and API access.