Speech2Speech

A web application that converts speech to text, processes it through an AI language model, and converts the response back to speech using advanced text-to-speech technology.

This all happens in the local browser, nothing is sent to any server.

Important: You NEED a local running chat LLM server like llama-server

Features

Speech Recognition: Uses Moonshine to transcribe spoken english language only into text
AI Processing: Sends transcribed text to a language model API for intelligent responses
Text-to-Speech: Converts the AI response back to speech using Kokoro TTS
Dark Mode: Modern dark-themed UI for comfortable use

Technologies Used

Moonshine: Speech recognition model by Useful Sensors
Kokoro: Advanced text-to-speech synthesis engine
Hugging Face Transformers.js: Client-side machine learning models
Web Audio API: For audio recording and playback
Modern JavaScript: ES6+ features including modules, classes, and async/await

Getting Started

Prerequisites

Modern web browser with JavaScript AND WebGPU enabled
Local or remote server to host the application
Optional: Web server that supports the language model API endpoint

Installation

Clone the repository
Host the files on a web server
Open the application in a web browser

Configuration

In the Settings tab:

Set the Chat Inference Server URL to your language model endpoint
Configure the System Prompt to control the AI assistant's behavior

Example Prompt, Sales Coach:

You are helping me practice a sales call.

I am always the Seller.
Every message I send is a Seller message to the prospect.
Never treat my messages as coming from the Customer.

You will play two roles:
1. Customer — the prospect I am selling to.
2. Coach — private feedback for me, the Seller.

After each Seller message I send, reply with exactly two lines:

Customer: [Respond as the prospect to my latest Seller message only.]
Coach: [Give private coaching feedback to me, the Seller, about my latest Seller message only.]

Critical rules:
- I am never the Customer.
- Do not coach the Customer.
- Do not respond to the Customer as if they are me.
- The Coach must evaluate only my latest Seller message.
- The Customer must respond only to my latest Seller message.
- The Customer must not use or react to Coach feedback.
- Generate the Customer line before the Coach line.
- Customer response must contain only one realistic point.
- Customer response may contain questions, objections, current pain points, pricing, can't they do this themselves etc.
- Coach response must include one thing I did right and one better next move or phrase.
- Keep the total response under 400 characters.
- No headings, bullets, explanations, or extra text.

Service context:
GCP Cost Visibility Setup is a fixed-scope setup for GCP billing export, dashboards, alerts, recommendation review and handover for startups and small product teams.

Good example:
Seller: How are you currently reviewing GCP costs?
Customer: Mostly our CTO checks the bill when it spikes.
Coach: Good discovery question. Next, ask who owns the monthly review habit.

Bad example:
Seller: How much are you charging?
Customer: It depends on scope and project count.
Coach: You asked a good pricing question.

How to Use

Navigate to the Conversation tab
Click "Start Recording" to begin speaking
Click "Stop Recording" when finished to process the audio
Wait for the AI to generate a response
The response will be spoken aloud using the selected voice
View the conversation history in the Transcription section

Project Structure

/css: Stylesheet files for the UI
/js: JavaScript modules for application logic
- AudioPlayer.js: Handles audio playback
- conversation.js: Manages the conversation flow
- kokoro.js: Text-to-speech implementation
- stt.js: Speech-to-text functionality
- ui.js: User interface interactions
/index.html: Main application page

Credits

This project uses the following open source technologies:

Moonshine - Speech recognition model by Useful Sensors
Kokoro - Text-to-speech synthesis engine
Hugging Face Transformers.js - Machine learning models in the browser

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
css		css
img		img
js		js
LICENSE		LICENSE
MOONSHINE_LICENSE		MOONSHINE_LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech2Speech

Features

Technologies Used

Getting Started

Prerequisites

Installation

Configuration

How to Use

Project Structure

Credits

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech2Speech

Features

Technologies Used

Getting Started

Prerequisites

Installation

Configuration

How to Use

Project Structure

Credits

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages