ESP32 Voice Agent Client

A Rust client for ESP32-S3 that connects to a Streamcore Voice Agent server over WebRTC using WHIP signaling.

Uses the esp_peer C library from Espressif's esp-webrtc-solution for the on-device WebRTC stack (ICE, DTLS, SRTP, SCTP).

Hardware

Device-specific implementation: This code targets the ESP32-S3 WROOM-1 module. The GPIO pin assignments below are for this specific hardware only. You must change the pins in src/main.rs to match your board.

Tested target: ESP32-S3 WROOM-1 with 8MB PSRAM (N8R8 variant)

Audio (I2S simplex mode - separate TX/RX ports):

Function	I2S Port	GPIO	Purpose
Speaker BCLK	I2S0 TX	GPIO 46	MAX98357A BCLK
Speaker DOUT	I2S0 TX	GPIO 3	MAX98357A DIN
Speaker WS	I2S0 TX	GPIO 1	MAX98357A LRC
Mic BCLK	I2S1 RX	GPIO 41	INMP441 SCK
Mic DIN	I2S1 RX	GPIO 2	INMP441 SD
Mic WS	I2S1 RX	GPIO 42	INMP441 WS

Important: ESP32-S3 (I2S_HW_VERSION_2) requires left_align=true for I2S. The HAL's default config sets this incorrectly — this is handled in src/main.rs.

Display (ST7789 240x280 TFT):

Function	GPIO	Purpose
MOSI	GPIO 47	SPI data
SCLK	GPIO 21	SPI clock
CS	GPIO 14	Chip select
DC	GPIO 45	Data/command
BL	GPIO 48	Backlight

Controls:

Function	GPIO	Purpose
Boot button	GPIO 0	Push-to-talk (hold to unmute)

Using a different ESP32 board?

Edit src/main.rs to change:

Display SPI pins (lines 91-96)
Speaker I2S pins (lines 120-127)
Mic I2S pins (lines 141-148)
Boot button GPIO (line 154)

Server

This client connects to the Streamcore Voice Agent server. Set it up first:

👉 streamcoreai/streamcore-server — follow the README there to get the server running.

Once the server is up, set WHIP_ENDPOINT below to point at it (e.g. http://<server-ip>:8080/whip).

Prerequisites

Rust ESP toolchain — install via espup:

cargo install espup
espup install
# Source the export file (added to your shell profile)
. $HOME/export-esp.sh

ESP-IDF v5.4 — pulled automatically by esp-idf-sys during the first build into .embuild/. This directory is auto-generated and should not be committed.

Note: The first build downloads ~2GB of ESP-IDF tooling. Subsequent builds reuse the cached .embuild/ directory.
ldproxy and espflash:
```
cargo install ldproxy espflash
```
esp-webrtc-solution — included as a git submodule. After cloning, init it:
```
git submodule update --init
```

Configuration

Set these environment variables before building (or edit the constants in src/main.rs):

export WIFI_SSID="your-wifi-ssid"
export WIFI_PASSWORD="your-wifi-password"
export WHIP_ENDPOINT="http://192.168.1.100:8080/whip"

STUN_SERVER is currently hardcoded to empty in src/main.rs. Edit the constant directly if you need STUN.

Build & Flash

# Build
cargo build --release

# Flash and monitor (connect your ESP32-S3 via USB)
espflash flash target/xtensa-esp32s3-espidf/release/voiceagent-esp32 --monitor

Or in one step:

cargo run --release

(requires uncommenting the runner line in .cargo/config.toml)

Architecture

src/
├── main.rs              # Entry point: WiFi → WHIP → WebRTC → audio loop
├── afe_pipeline.rs      # ESP-SR AFE wrapper (AGC + noise suppression)
├── audio.rs             # I2S audio driver (speaker TX + mic RX, simplex mode)
├── audio_processing.rs  # Half-duplex tracking + fallback software gain
├── display.rs           # ST7789 TFT display driver (240x280) + UI rendering
├── esp_peer_ffi.rs      # Raw FFI bindings to the esp_peer C API
├── opus_codec.rs        # Opus encoder/decoder wrapper
├── webrtc.rs            # Safe Rust wrapper around esp_peer C library
├── whip.rs              # WHIP signaling (HTTP POST/DELETE) via ESP-IDF HTTP client
└── wifi.rs              # WiFi STA connection helper

Flow

Boot → connect to WiFi → start display thread (ST7789 @ 10 FPS)
Open esp_peer WebRTC connection (Opus audio + "events" data channel)
Create SDP offer → POST to WHIP endpoint → receive SDP answer
Set remote description → ICE/DTLS handshake completes
Audio loop:
- Mic path: I2S RX 16kHz mono → 32-to-16-bit conversion → AFE (AGC + noise suppression) → Opus encode @ 16kHz → send to server
- Speaker path: receive Opus → decode @ 16kHz → upsample 16→24kHz (linear interpolation) → I2S TX playback
Controls: GPIO0 boot button = push-to-talk (hold to unmute)
Display: Shows connection status, VU meter, speaking indicator, last user/AI transcript
Data channel: delivers transcript/response/error JSON (same format as other SDKs)

Data Channel Messages

The server sends JSON messages on the events data channel:

{"type": "transcript", "text": "hello", "final": true}
{"type": "response", "text": "Hi there! How can I help?"}
{"type": "error", "message": "something went wrong"}

Targeting plain ESP32

sdkconfig.defaults: set CONFIG_IDF_TARGET="esp32", remove CONFIG_SPIRAM_MODE_OCT, CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240
.cargo/config.toml: set target = "xtensa-esp32-espidf" and MCU = "esp32"
src/main.rs: change all GPIO pin assignments to match your board (the current pins are S3-specific and many don't exist on plain ESP32)

Note: The ESP-SR AFE pipeline is memory-hungry. Plain ESP32 with limited PSRAM may not have enough RAM to run it — the fallback software-gain path will be used instead.

Why not reuse the rust-sdk?

The existing rust-sdk/ uses the webrtc crate (Pion-based, requires tokio + full OS networking), reqwest, and audiopus — none of which can run on ESP32's FreeRTOS-based ESP-IDF environment. This project uses Espressif's native esp_peer C library instead, called from Rust via FFI.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.cargo		.cargo
assets		assets
components		components
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
components_esp32s3.lock		components_esp32s3.lock
idf_component.yml		idf_component.yml
partitions.csv		partitions.csv
rust-toolchain.toml		rust-toolchain.toml
sdkconfig.defaults		sdkconfig.defaults

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ESP32 Voice Agent Client

Hardware

Using a different ESP32 board?

Server

Prerequisites

Configuration

Build & Flash

Architecture

Flow

Data Channel Messages

Targeting plain ESP32

Why not reuse the rust-sdk?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ESP32 Voice Agent Client

Hardware

Using a different ESP32 board?

Server

Prerequisites

Configuration

Build & Flash

Architecture

Flow

Data Channel Messages

Targeting plain ESP32

Why not reuse the rust-sdk?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages