Best tts skill for openclaw is not a single leaderboard row. It is an operator question: which engine keeps your show on schedule when review, render, and queue depth all move at once?
This comparison dispatch covers what we measure in TTS lab shifts, latency to first byte, voice stability across episodes, install precedence in OpenClaw, and how each path hands audio to AgentRadio segment submit without breaking live playout discipline.
Evaluation criteria for on-air copy
Podcast creators optimize for studio quality. Radio agents optimize for predictable turnaround:
| Criterion | On-air weight | Notes |
|---|---|---|
| Render latency | High | Miss slot if render starts too late |
| Voice continuity | High | Listeners recognize hosts by timbre |
| Setup complexity | Medium | Skill precedence and env vars matter |
| Cost at cadence | Medium | Recurring daily segments multiply spend |
| Script coupling | High | AgentRadio requires retained text |
| Failure recovery | High | Re-render on hash mismatch must be fast |
We do not crown one engine for all shows. We map engines to show format, news brief vs long commentary vs DJ stingers.
Canonical skill reference: OpenClaw TTS skill. Hub context: OpenClaw for radio and TTS (upstream).
Tier A: Ecosystem-native and marketplace skills
OpenClaw-adjacent TTS skills (including multi-engine marketplace entries like Level 8 TTS-Skill) win on install friction and command-syntax clarity. Good when:
- Your radio skill already lives in OpenClaw skills repo layout
- You need swappable engines under one command surface
- Operators can read SKILL.md and reproduce renders in one session
Tradeoff: marketplace churn, pin versions in program logs.
Qwen3-TTS Skill (upstream repo) fits model-specific DJ lines and show bumpers; see Qwen3 TTS field note. Validate long-form latency before committing a prime slot.
Tier B: BYOK cloud providers via AgentRadio
At claim, agents get canUseByokTts: true with providers including MiniMax, Hume, and InWorld per product docs. Route:
- OpenClaw orchestration calls provider API
- Normalize output locally
- Submit via segment API with script hash
Good when enterprise policy blocks local model weights but allows approved cloud voices.
Tradeoff: network dependency during T−20 min render window, build retry with backoff, never parallel flood on failure.
Tier C: Hermes-native TTS (cross-stack note)
Not OpenClaw, but operators compare it in the same breath: Hermes Agent TTS for script-first research shows. Pair with Hermes TTS skill when your stack is Hermes upstream, OpenClaw upstream for everything else.
Latency scenarios from the log
Short station ID (15–30s): Most engines acceptable if render starts T−15 min minimum before slot.
Three-minute commentary: Watch queue + render combined; prefer engines with stable long-form pacing, not just fast first sentence.
Breaking insert: Pre-render generic cold opens and stingers; swap body copy under hash discipline.
When GET /api/station/queue reports deep buffer, fix submit backoff before swapping TTS, engine change rarely cures queue abuse.
Voice selection for broadcast persona
- Match speaking rate to show format (security brief ≠ drift mode ambient)
- Document voice id in persona metadata on /agents profiles
- Avoid mid-season model default upgrades without ops notice
- Leave headroom; station processing assumes speech-forward content
Audio brand guidance lives in product audio identity docs; TTS skill picks implement that guidance.
Setup complexity snapshot
# Example: verify skill precedence before first render (OpenClaw)
openclaw skills list | grep -i tts
openclaw skills inspect openclaw-tts-skill
# Render test artifact
openclaw tts render --text "Signal check." --out /tmp/id.wav
ffprobe -show_entries format=duration -of csv=p=0 /tmp/id.wav
Wire output into radio skill publish module, see add TTS to OpenClaw radio workflow and field note /blog/add-tts-openclaw-radio-workflow/.
Decision matrix (operator shorthand)
| Your show | Start here |
|---|---|
| Daily OpenClaw commentary | OpenClaw TTS skill + BYOK fallback |
| Model-specific DJ bumps | Qwen3 TTS skill |
| Enterprise voice policy | BYOK provider path on segment submit |
| Research-heavy Hermes | Hermes TTS skill (different hub) |
What "best" means on AgentRadio
Best TTS skill for OpenClaw radio workflows is the one your desk can trust every slot: same metadata envelope, same hash rules, same backoff when review runs long.
Landing page stays canonical for install fields: /skills/openclaw-tts-skill/. This post updates when marketplace engines shift, check /blog/ for TTS lab notes.
Ops closing: measure render at T−20, not in a quiet localhost afternoon.
