Summary
Add a focused text-to-speech capability for the Windows node via a new tts.speak command.
This should extract the useful text-to-speech slice from the broader voice work prototyped in #120, while deliberately avoiding microphone/STT, Talk Mode, wake word, overlay UI, and broader voice-mode state machine work.
Credit: #120 by @NichUK is the reference prototype/source of the initial Windows voice/TTS exploration.
Goals
- Add a standalone Windows node command:
tts.speak.
- Support local Windows speech synthesis as the default provider.
- Support ElevenLabs as an optional cloud TTS provider.
- Expose settings to enable/disable TTS and configure provider/API key/voice/model.
- Register the capability for gateway node mode and local MCP only when enabled.
- Treat
tts.speak as privacy-sensitive / dangerous for Command Center and gateway allowlist purposes.
- Document setup, allowlist requirements, and manual testing flow.
Proposed command shape
Command:
Arguments:
{
"text": "hello from OpenClaw",
"provider": "windows",
"voiceId": null,
"model": null,
"interrupt": false
}
Response:
{
"spoken": true,
"provider": "windows",
"contentType": "audio/wav",
"durationMs": 2447
}
Scope
In scope
- Shared, platform-neutral
TtsCapability.
- Tray-side Windows speech synthesis playback.
- Optional ElevenLabs synthesis client.
- Settings persistence and Settings UI.
- DPAPI protection for stored ElevenLabs API keys.
- Gateway/node registration and local MCP description.
- Command Center dangerous-command classification.
- Gateway allowlist documentation.
- Text length guardrails and HTTP timeout for cloud TTS.
- Tests for capability behavior, settings, provider guardrails, Command Center grouping, and docs-adjacent command descriptions.
Out of scope
- Speech-to-text / microphone capture.
- Talk Mode loop.
- Wake word.
- Voice overlay or repeater UI.
- MiniMax provider.
- Streaming playback optimizations.
- Shared gateway voice-mode contract changes.
Security and privacy notes
tts.speak should be treated as privacy-sensitive because a remote caller can cause audio output on the user's machine. When ElevenLabs is selected, request text also leaves the device and can consume paid API quota.
The command should therefore require explicit gateway allowlisting:
openclaw config set gateway.nodes.allowCommands '["tts.speak"]'
openclaw gateway restart
Acceptance criteria
- TTS is disabled by default.
- When enabled with provider
windows, invoking tts.speak speaks text through Windows speech synthesis.
- When provider
elevenlabs is selected, missing API key/voice configuration fails clearly.
- ElevenLabs API key is protected at rest.
- Overly long requests are rejected before provider execution.
interrupt: true interrupts active playback and the interrupted request does not report false success.
tts.speak appears as an allowed node command only after settings and gateway policy allow it.
- Command Center classifies
tts.speak as dangerous/privacy-sensitive.
- Mac parity diagnostics do not report
tts.speak as missing until Mac implements it.
- Required build/tests pass:
./build.ps1
dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore
Manual validation notes
Validate manually with:
openclaw nodes invoke --node <windows-node-id> --command tts.speak --params '{"text":"hello from OpenClaw","provider":"windows"}'
Observe successful response:
{
"ok": true,
"command": "tts.speak",
"payload": {
"spoken": true,
"provider": "windows",
"contentType": "audio/wav",
"durationMs": 2447
}
}
Summary
Add a focused text-to-speech capability for the Windows node via a new
tts.speakcommand.This should extract the useful text-to-speech slice from the broader voice work prototyped in #120, while deliberately avoiding microphone/STT, Talk Mode, wake word, overlay UI, and broader voice-mode state machine work.
Credit: #120 by @NichUK is the reference prototype/source of the initial Windows voice/TTS exploration.
Goals
tts.speak.tts.speakas privacy-sensitive / dangerous for Command Center and gateway allowlist purposes.Proposed command shape
Command:
Arguments:
{ "text": "hello from OpenClaw", "provider": "windows", "voiceId": null, "model": null, "interrupt": false }Response:
{ "spoken": true, "provider": "windows", "contentType": "audio/wav", "durationMs": 2447 }Scope
In scope
TtsCapability.Out of scope
Security and privacy notes
tts.speakshould be treated as privacy-sensitive because a remote caller can cause audio output on the user's machine. When ElevenLabs is selected, request text also leaves the device and can consume paid API quota.The command should therefore require explicit gateway allowlisting:
Acceptance criteria
windows, invokingtts.speakspeaks text through Windows speech synthesis.elevenlabsis selected, missing API key/voice configuration fails clearly.interrupt: trueinterrupts active playback and the interrupted request does not report false success.tts.speakappears as an allowed node command only after settings and gateway policy allow it.tts.speakas dangerous/privacy-sensitive.tts.speakas missing until Mac implements it../build.ps1dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restoredotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restoreManual validation notes
Validate manually with:
Observe successful response:
{ "ok": true, "command": "tts.speak", "payload": { "spoken": true, "provider": "windows", "contentType": "audio/wav", "durationMs": 2447 } }