diff --git a/recipes/cli/speech-to-text/v2/streaming/README.md b/recipes/cli/speech-to-text/v2/streaming/README.md new file mode 100644 index 00000000..a5ee8ca0 --- /dev/null +++ b/recipes/cli/speech-to-text/v2/streaming/README.md @@ -0,0 +1,38 @@ +# Live Streaming Transcription (STT v2) + +WebSocket streaming with the v2 listen endpoint using flux-general-en. + +## What it does + +Streams raw audio over a WebSocket connection for real-time transcription using the v2 API. This example uses `ffmpeg` to convert a WAV file into raw linear16 audio and pipes it to `dg listen`. The CLI automatically routes to the v2 WebSocket endpoint when a `flux-*` model is specified. Unlike v1's interim/final result pairs, v2 uses contextual turn detection to deliver turn-based transcription results. + +## Key parameters + +| Parameter | Value | Description | +|-----------|-------|-------------| +| `--model` | `flux-general-en` | V2 English-optimized model with turn detection | +| `--encoding` | `linear16` | Raw audio encoding format | + +## Example output + +``` +Yeah, as much as, it's funny when I think of anything that's related to outer space... +``` + +## Prerequisites + +- Deepgram CLI installed (`curl -fsSL https://deepgram.com/install.sh | sh`) +- `ffmpeg` installed +- `DEEPGRAM_API_KEY` environment variable set + +## Run + +```bash +bash example.sh +``` + +## Test + +```bash +bash example_test.sh +``` diff --git a/recipes/cli/speech-to-text/v2/streaming/example.sh b/recipes/cli/speech-to-text/v2/streaming/example.sh new file mode 100644 index 00000000..7fee0d39 --- /dev/null +++ b/recipes/cli/speech-to-text/v2/streaming/example.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash +# Recipe: Live Streaming Transcription (STT v2) +# +# Streams audio via ffmpeg to dg listen using the flux-general-en model. +# The CLI automatically routes to the v2 WebSocket endpoint when a +# flux-* model is specified. V2 provides turn-based transcription +# with contextual turn detection. + +AUDIO_URL="https://dpgr.am/spacewalk.wav" +AUDIO_FILE="/tmp/spacewalk.wav" + +curl -sL "$AUDIO_URL" -o "$AUDIO_FILE" + +ffmpeg -i "$AUDIO_FILE" -f s16le -ar 16000 -ac 1 -loglevel quiet - \ + | dg listen --encoding linear16 --model flux-general-en diff --git a/recipes/cli/speech-to-text/v2/streaming/example_test.sh b/recipes/cli/speech-to-text/v2/streaming/example_test.sh new file mode 100644 index 00000000..6fb5dd45 --- /dev/null +++ b/recipes/cli/speech-to-text/v2/streaming/example_test.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +set +e +output=$(bash "$SCRIPT_DIR/example.sh" 2>&1) +status=$? +set -e + +if [ $status -ne 0 ]; then + echo "FAIL: example.sh exited with status $status" + echo "$output" + exit 1 +fi + +if [ -z "$output" ]; then + echo "FAIL: example.sh produced no output" + exit 1 +fi + +echo "PASS" +echo "$output" | head -3