A Python application that captures speech from your microphone and automatically types the transcribed text using Google Gemini API.
- Record audio from microphone with manual control (Ctrl+C to stop)
- Transcribe recorded audio using Google Gemini API
- Automatic typing of transcribed text
- Multi-language support with automatic translation to English
- Toggle script for easy start/stop control
- Background operation with logging
Install these system packages before running the application:
Ubuntu/Debian:
sudo apt update
sudo apt install python3-dev portaudio19-dev xclipFedora/RHEL:
sudo dnf install python3-devel portaudio-devel xclipArch Linux:
sudo pacman -S python portaudio xclipMost dependencies are built-in. You may need:
# Install Xcode command line tools if not already installed
xcode-select --install
# Install portaudio if needed
brew install portaudio- Python: 3.13 or higher
- Microphone: Working microphone with proper system permissions
- Audio System: PortAudio for microphone input
- Display Server: GUI automation capabilities (built-in for most systems)
- Permissions: Microphone and accessibility permissions may be required
-
Install dependencies using uv:
uv sync
-
Set up environment variables:
cp .env.example .env
Edit
.envand add your Google API key:GOOGLE_API_KEY=your_actual_api_key_here -
Get your API key from Google AI Studio
uv run main.pyRecording will start immediately. Stop recording with Ctrl+C to transcribe and type the audio.
./toggle_stt.sh- First run: starts the application in background
- Second run: stops the application
- Logs are saved to
/tmp/stt_typer.log
sudo ln -s $(pwd)/toggle_stt.sh /usr/local/bin/toggle-stt
toggle-sttAdd to Lubuntu keyboard shortcuts:
- Command:
/full/path/to/your/project/toggle_stt.sh - Key combination: your choice (e.g.,
Super+S)
The application records audio from your microphone until you stop it with Ctrl+C. It then uploads the audio file to Google Gemini API for transcription and automatically types the transcribed text. The system supports multiple languages and automatically translates non-English speech to English.