Overview

spot_audio is a ROS2 package consisting of two nodes:

spot_microphone_node.py
spot_speaker_node.py

Microphone and Audio Processing

Spot uses the ReSpeaker Microphone Array v2.0 (or "respeaker" for short) to collect audio data. The respeaker consists of four microphones in a circular array, but we only use one microphone.

Spot makes use of two deep-learning models to perform its listening tasks: Whisper and Audio Spectrogram Transformer (AST).

Whisper is an off-the-shelf speech-to-text (stt) model Spot uses to transcribe any speech it hears. We use the faster-whisper, a python3 wrapper for this model.

AST is an off-the-shelf audio classifier Spot uses to detect: speech, nonverbal vocalizations, and respiratory distress. We use a python3 package to interact with this model.

The spot_microphone_node.py is the ros2 node that processes the raw audio from mic. The node publishes four ros2 topics and serves a single ros2 service:

String.msg published on /<SPOT_NAME>/heart_beat topic. We publish to this topic every 8 seconds to alert spot_state_manager node that the microphone node is live.
Speech.msg published on /<SPOT_NAME/speech topic. We publish to this topic to every 3 seconds if whisper picked up on any speech.
AudioData.msg published on /<SPOT_NAME/raw_audio topic. We publish to this topic continuously as we collect data from the microphone.
Observation.msg published on /<SPOT_NAME/observations_no_id topic. This gets published whenever whisper picks up on speech or the AST detects speech, a non-verbal vocalization, or respiratory distress.
StopListening.srv served on /<SPOT_NAME/stop_listening_service_name. The spot_speaker_node.py calls this service whenever it's about to play audio containing speech, so that Spot doesn't accidentally transcribe its own speech.

The spot_microphone_node.py uses two threads whose jobs are:

To poll the microphone for raw audio data and put it onto a buffer, and
To process the audio buffer using whisper and AST.

The second thread processes the audio...

Outstanding Problems

The noise produced by Spot walking around drowns out all other audio data in the microphone. This limits us to using the microphone when Spot isn't moving. To address this, we plan on purchasing a directional microphone, which will hopefully attenuate all noise not coming in the direction of the speaker. The integration of this microphone into this software stack may take substantial time.

Whisper's transcriptions are not perfect and tend not to pick up on the casualty's speech. Whisper's transcriptions are better when the audio segment is longer. We restrict ourselves to processing audio segments of three seconds, because if we wait longer, then Spot's responses seem very delayed compared to when the speaker finishes talking. To resolve this problem, some minor refinements should be made to how the audio data is processed within the node.

Rather than publishing a custom heart_beat message, we should be using the diagnostic_message ROS2 package.

Speaker

Style-Guide

import statements should be in alphabetical order.

Lists of dependencies in package.xml or CMakeLists.txt should be in alphabetical order.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
data		data
launch		launch
spot_audio		spot_audio
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
package.xml		package.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Microphone and Audio Processing

Outstanding Problems

Speaker

Style-Guide

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Microphone and Audio Processing

Outstanding Problems

Speaker

Style-Guide

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages