Skip to content

feat: Add Windows node text-to-speech command (tts.speak) #252

@RBrid

Description

@RBrid

Summary

Add a focused text-to-speech capability for the Windows node via a new tts.speak command.

This should extract the useful text-to-speech slice from the broader voice work prototyped in #120, while deliberately avoiding microphone/STT, Talk Mode, wake word, overlay UI, and broader voice-mode state machine work.

Credit: #120 by @NichUK is the reference prototype/source of the initial Windows voice/TTS exploration.

Goals

  • Add a standalone Windows node command: tts.speak.
  • Support local Windows speech synthesis as the default provider.
  • Support ElevenLabs as an optional cloud TTS provider.
  • Expose settings to enable/disable TTS and configure provider/API key/voice/model.
  • Register the capability for gateway node mode and local MCP only when enabled.
  • Treat tts.speak as privacy-sensitive / dangerous for Command Center and gateway allowlist purposes.
  • Document setup, allowlist requirements, and manual testing flow.

Proposed command shape

Command:

tts.speak

Arguments:

{
  "text": "hello from OpenClaw",
  "provider": "windows",
  "voiceId": null,
  "model": null,
  "interrupt": false
}

Response:

{
  "spoken": true,
  "provider": "windows",
  "contentType": "audio/wav",
  "durationMs": 2447
}

Scope

In scope

  • Shared, platform-neutral TtsCapability.
  • Tray-side Windows speech synthesis playback.
  • Optional ElevenLabs synthesis client.
  • Settings persistence and Settings UI.
  • DPAPI protection for stored ElevenLabs API keys.
  • Gateway/node registration and local MCP description.
  • Command Center dangerous-command classification.
  • Gateway allowlist documentation.
  • Text length guardrails and HTTP timeout for cloud TTS.
  • Tests for capability behavior, settings, provider guardrails, Command Center grouping, and docs-adjacent command descriptions.

Out of scope

  • Speech-to-text / microphone capture.
  • Talk Mode loop.
  • Wake word.
  • Voice overlay or repeater UI.
  • MiniMax provider.
  • Streaming playback optimizations.
  • Shared gateway voice-mode contract changes.

Security and privacy notes

tts.speak should be treated as privacy-sensitive because a remote caller can cause audio output on the user's machine. When ElevenLabs is selected, request text also leaves the device and can consume paid API quota.

The command should therefore require explicit gateway allowlisting:

openclaw config set gateway.nodes.allowCommands '["tts.speak"]'
openclaw gateway restart

Acceptance criteria

  • TTS is disabled by default.
  • When enabled with provider windows, invoking tts.speak speaks text through Windows speech synthesis.
  • When provider elevenlabs is selected, missing API key/voice configuration fails clearly.
  • ElevenLabs API key is protected at rest.
  • Overly long requests are rejected before provider execution.
  • interrupt: true interrupts active playback and the interrupted request does not report false success.
  • tts.speak appears as an allowed node command only after settings and gateway policy allow it.
  • Command Center classifies tts.speak as dangerous/privacy-sensitive.
  • Mac parity diagnostics do not report tts.speak as missing until Mac implements it.
  • Required build/tests pass:
    • ./build.ps1
    • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
    • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore

Manual validation notes

Validate manually with:

openclaw nodes invoke --node <windows-node-id> --command tts.speak --params '{"text":"hello from OpenClaw","provider":"windows"}'

Observe successful response:

{
  "ok": true,
  "command": "tts.speak",
  "payload": {
    "spoken": true,
    "provider": "windows",
    "contentType": "audio/wav",
    "durationMs": 2447
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions