Speech-to-Text Typer

A Python application that captures speech from your microphone and automatically types the transcribed text using Google Gemini API.

Features

Record audio from microphone with manual control (Ctrl+C to stop)
Transcribe recorded audio using Google Gemini API
Automatic typing of transcribed text
Multi-language support with automatic translation to English
Toggle script for easy start/stop control
Background operation with logging

System Requirements

Linux

Install these system packages before running the application:

Ubuntu/Debian:

sudo apt update
sudo apt install python3-dev portaudio19-dev xclip

Fedora/RHEL:

sudo dnf install python3-devel portaudio-devel xclip

Arch Linux:

sudo pacman -S python portaudio xclip

macOS

Most dependencies are built-in. You may need:

# Install Xcode command line tools if not already installed
xcode-select --install
# Install portaudio if needed
brew install portaudio

General Requirements

Python: 3.13 or higher
Microphone: Working microphone with proper system permissions
Audio System: PortAudio for microphone input
Display Server: GUI automation capabilities (built-in for most systems)
Permissions: Microphone and accessibility permissions may be required

Setup

Install dependencies using uv:
```
uv sync
```
Set up environment variables:
```
cp .env.example .env
```
Edit .env and add your Google API key:
```
GOOGLE_API_KEY=your_actual_api_key_here
```
Get your API key from Google AI Studio

Usage

Direct execution:

uv run main.py

Recording will start immediately. Stop recording with Ctrl+C to transcribe and type the audio.

Toggle script (recommended):

./toggle_stt.sh

First run: starts the application in background
Second run: stops the application
Logs are saved to /tmp/stt_typer.log

System-wide access:

sudo ln -s $(pwd)/toggle_stt.sh /usr/local/bin/toggle-stt
toggle-stt

Keyboard shortcut:

Add to Lubuntu keyboard shortcuts:

Command: /full/path/to/your/project/toggle_stt.sh
Key combination: your choice (e.g., Super+S)

How it works

The application records audio from your microphone until you stop it with Ctrl+C. It then uploads the audio file to Google Gemini API for transcription and automatically types the transcribed text. The system supports multiple languages and automatically translates non-English speech to English.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
install.sh		install.sh
main.py		main.py
pyproject.toml		pyproject.toml
toggle_stt.sh		toggle_stt.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text Typer

Features

System Requirements

Linux

macOS

General Requirements

Setup

Usage

Direct execution:

Toggle script (recommended):

System-wide access:

Keyboard shortcut:

How it works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Typer

Features

System Requirements

Linux

macOS

General Requirements

Setup

Usage

Direct execution:

Toggle script (recommended):

System-wide access:

Keyboard shortcut:

How it works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages