Agent Voice Response - Deepgram Speech-to-Speech Integration

This repository showcases the integration between Agent Voice Response and Deepgram's Speech-to-Speech API. The application leverages Deepgram's powerful speech processing capabilities to provide intelligent, context-aware responses in real-time audio format.

Prerequisites

To set up and run this project, you will need:

Node.js and npm installed
A Deepgram API key with access to the Speech-to-Speech API
WebSocket support in your environment

Setup

1. Clone the Repository

git clone https://github.com/agentvoiceresponse/avr-sts-deepgram.git
cd avr-sts-deepgram

2. Install Dependencies

npm install

3. Configure Environment Variables

Create a .env file in the root of the project (see .env.example). The following variables are available:

Required:

Variable	Description
`DEEPGRAM_API_KEY`	Your Deepgram API key
`AGENT_PROMPT`	System prompt that defines the AI agent's behavior and personality

Optional -- Server:

Variable	Description	Default
`PORT`	WebSocket server port	`6033`

Optional -- Audio Input:

Variable	Description	Default
`DEEPGRAM_SAMPLE_RATE`	Fallback sample rate used when input/output specific rates are not set	`8000`
`DEEPGRAM_INPUT_ENCODING`	Audio encoding for the input stream (`linear16`, `mulaw`, `alaw`)	`linear16`
`DEEPGRAM_INPUT_SAMPLE_RATE`	Sample rate in Hz for the input stream	Value of `DEEPGRAM_SAMPLE_RATE`

Optional -- Audio Output:

Variable	Description	Default
`DEEPGRAM_OUTPUT_ENCODING`	Audio encoding for the output stream (`linear16`, `mulaw`, `alaw`)	`linear16`
`DEEPGRAM_OUTPUT_SAMPLE_RATE`	Sample rate in Hz for the output stream	Value of `DEEPGRAM_SAMPLE_RATE`
`DEEPGRAM_OUTPUT_CONTAINER`	Output audio container format (`none`, `wav`)	`none`

Optional -- Agent:

Variable	Description	Default
`DEEPGRAM_LANGUAGE`	Agent language code (e.g. `en`, `it`, `es`, `fr`, `de`)	`en`
`DEEPGRAM_GREETING`	Initial greeting message spoken by the agent	`Hi there, I'm your virtual assistant—how can I help today?`

Optional -- Listen (STT) Provider:

Variable	Description	Default
`DEEPGRAM_LISTEN_PROVIDER`	Speech-to-text provider (`deepgram`)	`deepgram`
`DEEPGRAM_ASR_MODEL`	STT model name	`nova-3`

Optional -- Think (LLM) Provider:

Variable	Description	Default
`DEEPGRAM_THINK_PROVIDER`	LLM provider (`open_ai`, `anthropic`, `groq`, `google`)	`open_ai`
`DEEPGRAM_THINK_MODEL`	LLM model name	`gpt-4o-mini`

Optional -- Speak (TTS) Provider:

Variable	Description	Default
`DEEPGRAM_SPEAK_PROVIDER`	Text-to-speech provider (`deepgram`, `eleven_labs`)	`deepgram`
`DEEPGRAM_TTS_MODEL`	TTS model name	`aura-2-thalia-en`

Note: The TTS model name encodes the language (e.g. aura-2-thalia-en for English, aura-2-melia-it for Italian). Make sure the model matches the language set in DEEPGRAM_LANGUAGE, otherwise the connection will fail.

Available Deepgram Aura-2 Italian models (it): aura-2-melia-it, aura-2-elio-it, aura-2-flavio-it, aura-2-maia-it, aura-2-cinzia-it, aura-2-cesare-it, aura-2-livia-it, aura-2-perseo-it, aura-2-dionisio-it, aura-2-demetra-it

Full list of models: https://developers.deepgram.com/docs/tts-models

Optional -- Advanced:

Variable	Description	Default
`DEEPGRAM_KEEPALIVE_INTERVAL`	Keep-alive ping interval in milliseconds	`5000`
`AMI_URL`	URL of the AMI service used by call-control tools (`avr_transfer`, `avr_hangup`)	`http://127.0.0.1:6006`

4. Running the Application

Start the application by running the following command:

node index.js

The server will start on the port defined in the environment variable (default: 6033).

How It Works

The Agent Voice Response system integrates with Deepgram's Speech-to-Speech API to provide intelligent audio-based responses to user queries. The server receives audio input from users, forwards it to Deepgram's API, and then returns the model's response as audio in real-time using WebSocket communication.

Key Components

Express.js Server: Handles incoming audio streams from clients
WebSocket Communication: Manages real-time communication with Deepgram's API
Audio Processing: Handles audio format conversion and streaming
Real-time Streaming: Processes and streams audio data in real-time

Audio Processing

The application is configured to work with:

Input Audio: 16-bit PCM at 8kHz
Output Audio: 16-bit PCM at 8kHz
Encoding: Linear16 format

API Endpoints

POST `/speech-to-speech-stream`

This endpoint accepts an audio stream and returns a streamed audio response generated by Deepgram.

Request:

Content-Type: audio/x-raw
Format: 16-bit PCM at 8kHz
Method: POST

Response:

Content-Type: text/event-stream
Format: 16-bit PCM at 8kHz
Streamed audio data in real-time

Customizing the Application

See the Environment Variables section above for the full list of configurable options.

Error Handling

The application includes comprehensive error handling for:

WebSocket connection issues
Audio processing errors
Deepgram API errors
Stream processing errors

All errors are logged to the console and appropriate error messages are returned to the client.

Contributors

We would like to express our gratitude to all the contributors who have helped make this project possible:

Mirko Bertone - For their valuable contributions and support

Support & Community

GitHub: https://github.com/agentvoiceresponse - Report issues, contribute code.
Discord: https://discord.gg/DFTU69Hg74 - Join the community discussion.
Docker Hub: https://hub.docker.com/u/agentvoiceresponse - Find Docker images.
NPM: https://www.npmjs.com/~agentvoiceresponse - Browse our packages.
Wiki: https://wiki.agentvoiceresponse.com/en/home - Project documentation and guides.

Support AVR

AVR is free and open-source. Any support is entirely voluntary and intended as a personal gesture of appreciation. Donations do not provide access to features, services, or special benefits, and the project remains fully available regardless of donations.

License

MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
avr_tools		avr_tools
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
index.js		index.js
loadTools.js		loadTools.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent Voice Response - Deepgram Speech-to-Speech Integration

Prerequisites

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Running the Application

How It Works

Key Components

Audio Processing

API Endpoints

POST `/speech-to-speech-stream`

Customizing the Application

Error Handling

Contributors

Support & Community

Support AVR

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agent Voice Response - Deepgram Speech-to-Speech Integration

Prerequisites

Setup

1. Clone the Repository

2. Install Dependencies

3. Configure Environment Variables

4. Running the Application

How It Works

Key Components

Audio Processing

API Endpoints

POST /speech-to-speech-stream

Customizing the Application

Error Handling

Contributors

Support & Community

Support AVR

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/speech-to-speech-stream`

Packages