A Rust client for ESP32-S3 that connects to a Streamcore Voice Agent server over WebRTC using WHIP signaling.
Uses the esp_peer C library from Espressif's esp-webrtc-solution for the on-device WebRTC stack (ICE, DTLS, SRTP, SCTP).
Device-specific implementation: This code targets the ESP32-S3 WROOM-1 module. The GPIO pin assignments below are for this specific hardware only. You must change the pins in
src/main.rsto match your board.
Tested target: ESP32-S3 WROOM-1 with 8MB PSRAM (N8R8 variant)
Audio (I2S simplex mode - separate TX/RX ports):
| Function | I2S Port | GPIO | Purpose |
|---|---|---|---|
| Speaker BCLK | I2S0 TX | GPIO 46 | MAX98357A BCLK |
| Speaker DOUT | I2S0 TX | GPIO 3 | MAX98357A DIN |
| Speaker WS | I2S0 TX | GPIO 1 | MAX98357A LRC |
| Mic BCLK | I2S1 RX | GPIO 41 | INMP441 SCK |
| Mic DIN | I2S1 RX | GPIO 2 | INMP441 SD |
| Mic WS | I2S1 RX | GPIO 42 | INMP441 WS |
Important: ESP32-S3 (I2S_HW_VERSION_2) requires
left_align=truefor I2S. The HAL's default config sets this incorrectly — this is handled insrc/main.rs.
Display (ST7789 240x280 TFT):
| Function | GPIO | Purpose |
|---|---|---|
| MOSI | GPIO 47 | SPI data |
| SCLK | GPIO 21 | SPI clock |
| CS | GPIO 14 | Chip select |
| DC | GPIO 45 | Data/command |
| BL | GPIO 48 | Backlight |
Controls:
| Function | GPIO | Purpose |
|---|---|---|
| Boot button | GPIO 0 | Push-to-talk (hold to unmute) |
Edit src/main.rs to change:
- Display SPI pins (lines 91-96)
- Speaker I2S pins (lines 120-127)
- Mic I2S pins (lines 141-148)
- Boot button GPIO (line 154)
This client connects to the Streamcore Voice Agent server. Set it up first:
👉 streamcoreai/streamcore-server — follow the README there to get the server running.
Once the server is up, set WHIP_ENDPOINT below to point at it (e.g. http://<server-ip>:8080/whip).
-
Rust ESP toolchain — install via espup:
cargo install espup espup install # Source the export file (added to your shell profile) . $HOME/export-esp.sh
-
ESP-IDF v5.4 — pulled automatically by
esp-idf-sysduring the first build into.embuild/. This directory is auto-generated and should not be committed.Note: The first build downloads ~2GB of ESP-IDF tooling. Subsequent builds reuse the cached
.embuild/directory. -
ldproxy and espflash:
cargo install ldproxy espflash
-
esp-webrtc-solution — included as a git submodule. After cloning, init it:
git submodule update --init
Set these environment variables before building (or edit the constants in src/main.rs):
export WIFI_SSID="your-wifi-ssid"
export WIFI_PASSWORD="your-wifi-password"
export WHIP_ENDPOINT="http://192.168.1.100:8080/whip"
STUN_SERVERis currently hardcoded to empty insrc/main.rs. Edit the constant directly if you need STUN.
# Build
cargo build --release
# Flash and monitor (connect your ESP32-S3 via USB)
espflash flash target/xtensa-esp32s3-espidf/release/voiceagent-esp32 --monitorOr in one step:
cargo run --release(requires uncommenting the runner line in .cargo/config.toml)
src/
├── main.rs # Entry point: WiFi → WHIP → WebRTC → audio loop
├── afe_pipeline.rs # ESP-SR AFE wrapper (AGC + noise suppression)
├── audio.rs # I2S audio driver (speaker TX + mic RX, simplex mode)
├── audio_processing.rs # Half-duplex tracking + fallback software gain
├── display.rs # ST7789 TFT display driver (240x280) + UI rendering
├── esp_peer_ffi.rs # Raw FFI bindings to the esp_peer C API
├── opus_codec.rs # Opus encoder/decoder wrapper
├── webrtc.rs # Safe Rust wrapper around esp_peer C library
├── whip.rs # WHIP signaling (HTTP POST/DELETE) via ESP-IDF HTTP client
└── wifi.rs # WiFi STA connection helper
- Boot → connect to WiFi → start display thread (ST7789 @ 10 FPS)
- Open
esp_peerWebRTC connection (Opus audio + "events" data channel) - Create SDP offer → POST to WHIP endpoint → receive SDP answer
- Set remote description → ICE/DTLS handshake completes
- Audio loop:
- Mic path: I2S RX 16kHz mono → 32-to-16-bit conversion → AFE (AGC + noise suppression) → Opus encode @ 16kHz → send to server
- Speaker path: receive Opus → decode @ 16kHz → upsample 16→24kHz (linear interpolation) → I2S TX playback
- Controls: GPIO0 boot button = push-to-talk (hold to unmute)
- Display: Shows connection status, VU meter, speaking indicator, last user/AI transcript
- Data channel: delivers transcript/response/error JSON (same format as other SDKs)
The server sends JSON messages on the events data channel:
{"type": "transcript", "text": "hello", "final": true}
{"type": "response", "text": "Hi there! How can I help?"}
{"type": "error", "message": "something went wrong"}sdkconfig.defaults: setCONFIG_IDF_TARGET="esp32", removeCONFIG_SPIRAM_MODE_OCT,CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240.cargo/config.toml: settarget = "xtensa-esp32-espidf"andMCU = "esp32"src/main.rs: change all GPIO pin assignments to match your board (the current pins are S3-specific and many don't exist on plain ESP32)
Note: The ESP-SR AFE pipeline is memory-hungry. Plain ESP32 with limited PSRAM may not have enough RAM to run it — the fallback software-gain path will be used instead.
The existing rust-sdk/ uses the webrtc crate (Pion-based, requires tokio + full OS networking), reqwest, and audiopus — none of which can run on ESP32's FreeRTOS-based ESP-IDF environment. This project uses Espressif's native esp_peer C library instead, called from Rust via FFI.