openab can automatically transcribe Discord voice message attachments and forward the transcript to your ACP agent as text.
Add an [stt] section to your config.toml:
[stt]
enabled = trueIf GROQ_API_KEY is set in your environment, that's all you need — openab will auto-detect it and use Groq's free tier. You can also set the key explicitly:
[stt]
enabled = true
api_key = "${GROQ_API_KEY}"Discord voice message (.ogg)
│
▼
openab downloads the audio file
│
▼
POST /audio/transcriptions → STT provider
│
▼
transcript injected as:
"[Voice message transcript]: <transcribed text>"
│
▼
ACP agent receives plain text
The transcript is prepended to the prompt as a ContentBlock::Text, so the downstream agent (Kiro CLI, Claude Code, etc.) sees it as regular text input.
[stt]
enabled = true # default: false
api_key = "${GROQ_API_KEY}" # required for cloud providers
model = "whisper-large-v3-turbo" # default
base_url = "https://api.groq.com/openai/v1" # default| Field | Required | Default | Description |
|---|---|---|---|
enabled |
no | false |
Enable/disable STT. When disabled, audio attachments are silently skipped. |
api_key |
no* | — | API key for the STT provider. *Auto-detected from GROQ_API_KEY env var if not set. For local servers, use any non-empty string (e.g. "not-needed"). |
model |
no | whisper-large-v3-turbo |
Whisper model name. Varies by provider. |
base_url |
no | https://api.groq.com/openai/v1 |
OpenAI-compatible API base URL. |
openab uses the standard OpenAI-compatible /audio/transcriptions endpoint. Any provider that implements this API works — just change base_url.
[stt]
enabled = true
api_key = "${GROQ_API_KEY}"- Free tier with rate limits
- Model:
whisper-large-v3-turbo(default) - Sign up at https://console.groq.com
[stt]
enabled = true
api_key = "${OPENAI_API_KEY}"
model = "whisper-1"
base_url = "https://api.openai.com/v1"- ~$0.006 per minute of audio
- Model:
whisper-1
For users running openab on a Mac Mini, home lab, or any machine with a local whisper server:
[stt]
enabled = true
api_key = "not-needed"
model = "large-v3-turbo"
base_url = "http://localhost:8080/v1"- Audio stays local — never leaves your machine
- No API key or cloud account needed
- Apple Silicon users get hardware acceleration
Compatible local whisper servers:
| Server | Install | Apple Silicon |
|---|---|---|
| faster-whisper-server | pip install faster-whisper-server |
✅ CoreML |
| whisper.cpp server | brew install whisper-cpp |
✅ Metal |
| LocalAI | Docker or binary | ✅ |
Point to a whisper server running on another machine in your network:
[stt]
enabled = true
api_key = "not-needed"
base_url = "http://192.168.1.100:8080/v1"- Ollama — does not expose an
/audio/transcriptionsendpoint.
When deploying via the openab Helm chart, STT is a first-class config block — no manual configmap patching needed:
helm upgrade openab openab/openab \
--set agents.kiro.stt.enabled=true \
--set agents.kiro.stt.apiKey=gsk_xxxThe API key is stored in a K8s Secret and injected as an env var (never in plaintext in the configmap). You can also customize model and endpoint:
helm upgrade openab openab/openab \
--set agents.kiro.stt.enabled=true \
--set agents.kiro.stt.apiKey=gsk_xxx \
--set agents.kiro.stt.model=whisper-large-v3-turbo \
--set agents.kiro.stt.baseUrl=https://api.groq.com/openai/v1Omit the [stt] section entirely, or set:
[stt]
enabled = falseWhen disabled, audio attachments are silently skipped with no impact on existing functionality.
- openab sends
response_format=jsonin the transcription request to ensure the response is always parseable JSON. Some local whisper servers default to plain text output without this parameter. - The actual MIME type from the Discord attachment is passed through to the STT API (e.g.
audio/ogg,audio/mp4,audio/wav). - Environment variables in config values are expanded via
${VAR}syntax (e.g.api_key = "${GROQ_API_KEY}"). - The
api_keyfield is auto-detected from theGROQ_API_KEYenvironment variable when using the default Groq endpoint. If you set a custombase_url(e.g. local server), auto-detect is disabled to avoid leaking the Groq key to unrelated endpoints — you must setapi_keyexplicitly.