Skip to content

JStaRFilms/Koe

Repository files navigation

Koe Logo

Koe (声)

Lightning-Fast, Privacy-First Voice Dictation for Windows, iOS, and Android

Release License Electron Groq


What is Koe?

Koe (声, Japanese for "voice") is a free, open-source alternative to subscription-based voice dictation tools. Press a hotkey (Desktop) or a button (Mobile), speak naturally, and get polished AI text typed at your cursor or copied to your clipboard.

Unlike cloud-based solutions that charge monthly fees, Koe uses your own Groq API key and stays free for up to 8 hours of transcription a day on Groq's free tier.

Why Koe?

Feature WhisperFlow ($8+/mo) Built-in OS Dictation Koe (Free)
Cost Subscription Free Free (BYOK)
Accuracy High Poor High (Whisper)
AI Enhancement Yes No Yes
Privacy Cloud audio Local Local VAD + BYOK
Global Hotkey Yes Limited Yes
Auto-Paste Yes No Yes

  • Cross-Platform — Native performance on Windows (Desktop) and iOS/Android (Mobile)
  • Global Hotkey (Desktop) — Press Ctrl + Shift + Space anywhere to start or stop dictation
  • Clipboard-First (Mobile) — High-fidelity audio capture with instant polished results copied to your clipboard
  • Pause Naturally — Koe keeps listening through short pauses instead of treating every breath like the end of a recording
  • Rolling Segments — Long recordings are processed in the background as ordered chunks, so performance stays fast even on longer sessions
  • Instant Transcription — Groq Whisper handles speech-to-text at high speed
  • AI Text Enhancement — Each segment is refined before it is committed, so only polished text is returned
  • Auto-Type (Desktop) — Refined text is typed progressively into the focused text field while you are still talking
  • Minimalist UI — A premium, high-contrast interface designed for focus and speed
  • Transcription History — One-click copy and retry for saved transcripts
  • Usage Dashboard — Track daily audio seconds, request pressure, and queue activity

Desktop (Windows)

  1. Download the latest .exe from Releases.
  2. Install and launch. Koe will live in your system tray.

Mobile (iOS & Android)

  1. Clone the repo and navigate to apps/mobile.
  2. Install Expo Go on your device.
  3. Run pnpm dev:mobile and scan the QR code. Note: Native builds (.ipa/.apk) can be generated via EAS.

Build Everything from Source

# Clone the repository
git clone https://github.com/JStaRFilms/Koe.git
cd Koe

# Install all dependencies (Monorepo)
pnpm install

# Run Desktop
pnpm dev

# Build for production
pnpm build

# Run Mobile
pnpm dev:mobile

If pnpm dev fails with Electron failed to install correctly, pnpm likely skipped Electron's install script during dependency setup. This repo now allowlists the required build/install scripts for pnpm 10+, and an existing checkout can be repaired with:

pnpm rebuild electron esbuild protobufjs electron-winstaller

Release Builds

  • Real release artifacts should be built on GitHub Actions, not locally
  • Push a matching version tag such as v1.1.3 after updating package.json
  • The release workflow will build Windows and macOS and attach artifacts to that GitHub Release
  • See docs/release-process.md

Vercel Deployment

  • The marketing website is the Next.js app in koe-website/
  • In Vercel Project Settings -> Build and Deployment -> Root Directory, set the root to koe-website
  • Leave the framework as Next.js for that project
  • A root-level vercel.json is also included as a fallback so root builds target koe-website

Requirements

  • Windows 10/11 (for Desktop) or iOS/Android (for Mobile)
  • Groq API Key (free tier available)
  • Microphone access

Quick Start

  1. Launch Koe — it minimizes to your system tray
  2. Configure — Right-click the tray icon → Settings → Enter your Groq API key
  3. Dictate — Click any text field and press Ctrl + Shift + Space
  4. Speak — The pill UI appears. Talk naturally, including pauses
  5. Done — Press the hotkey again when you're finished. Koe finalizes the session, copies the full refined transcript, and keeps it in history

Usage Guide

Global Hotkeys

Action Shortcut
Start / Stop Recording Ctrl + Shift + Space
Retry Last Failed / Latest Transcript Ctrl + Shift + ,
Open Settings Tray menu

How Recording Works

  • Koe records one continuous session until you stop it
  • Internally, it breaks longer recordings into ordered segments
  • Segments are transcribed and refined in the background
  • Refined text is typed in order as it becomes ready
  • When the session ends, Koe keeps one full final transcript in clipboard and history

The Pill UI

The floating pill is designed to stay out of the way while still telling you what matters:

  • Idle — Waiting for the next dictation
  • Listening — Live voice levels and active recording state
  • Warning — Mic fallback or chunk failure without immediately killing the session
  • Processing — Finalizing remaining work after you stop
  • Complete — Brief success state before hiding

Settings

Configure via the settings window (right-click the tray icon):

Setting Description Default
Groq API Key Your API key from console.groq.com
Language Transcription language (auto for detection) auto
Prompt Style How Koe refines the transcript Clean
Auto-Paste Automatically type into the focused window enabled
Theme Dark / Light mode dark

Koe uses a shared core architecture to ensure consistency across Desktop and Mobile. Business logic lives in @koe/core, while platform-specific drivers handle audio and output.

See the Detailed Architecture Guide for more info.

Platform Specifics

Feature Desktop (Windows) Mobile (iOS/Android)
Trigger Global Hotkey Capture Button
Output Auto-Paste / Type Clipboard-First
Storage electron-store SecureStore
Capture Logic Local VAD + ordered segments Metering-driven chunk rotation + ordered segments

Privacy-First Design

  1. Desktop speech detection runs locally using ONNX WebAssembly
  2. Mobile recording control stays on-device until a chunk is ready to transcribe
  3. Retry audio is stored only for failed or unresolved segments
  4. Your API key is stored locally on each platform and only used for transcription/refinement requests

Tech Stack

Layer Technology
Framework Electron + Vite
Frontend Vanilla JavaScript, Custom CSS
Audio Capture Web Audio API
Voice Detection @ricky0123/vad-web (Silero VAD)
Transcription Groq Whisper API (whisper-large-v3-turbo)
Text Enhancement Groq chat refinement pipeline
Storage electron-store + temp retry files
Packaging electron-builder

Groq API Limits

Koe is designed to stay inside Groq's free-tier limits:

Metric Limit Approximate Usage
Requests per minute 20 ~6 transcribed segments/minute with paced refinement
Requests per day 2,000 ~8 hours of normal dictation
Audio per day 28,800 sec 8 hours

The built-in scheduler tracks request pressure and keeps the app responsive while staying inside the cap.


Roadmap

Completed

  • Global hotkey toggle (Desktop)
  • Local VAD speech detection
  • Groq Whisper transcription
  • AI transcript refinement
  • Auto-paste to focused window (Desktop)
  • Transcription history & Usage dashboard
  • Mobile App (iOS/Android V1)
  • Shared Core Extraction

Planned

  • Custom AI prompts
  • Keyboard shortcut customization
  • Export history as .txt / .md
  • Native macOS support (Electron)
  • Android IME (Custom Keyboard) implementation

Future

  • Snippet library with voice shortcuts
  • App-specific tone profiles
  • Cloud sync across devices
  • Team collaboration features

See Feature Requests for the full backlog.


Contributing

Contributions are welcome. Please see the repo docs and existing code patterns before opening a PR.

Development Setup

# Fork and clone
git clone https://github.com/your-username/Koe.git
cd Koe

# Install dependencies
pnpm install

# Start development
pnpm dev

Monorepo Structure

Koe is transitioning to a monorepo to support multiple platforms:

  • Root: Legacy Electron Desktop app and shared workspace configuration
  • apps/mobile: Expo-based mobile client (iOS/Android)
  • packages/koe-core: Shared business logic, types, and API services

Development Commands

Target Command Description
Desktop pnpm dev Start the Electron app in dev mode
Mobile pnpm dev:mobile Start the Expo development server
Core pnpm build:core Build the shared logic package
All pnpm type-check Run type-checking across all packages

Project Structure

Koe/
├── apps/               # Application projects
│   └── mobile/        # Expo mobile app
├── packages/           # Shared logic
│   └── koe-core/      # Core services (Whisper, Sessions)
├── src/                # Legacy Desktop source
│   ├── main/           # Electron main process
│   └── renderer/       # UI code
├── docs/               # Documentation & Tasks
├── pnpm-workspace.yaml # Workspace config
└── package.json        # Root manifest & scripts

Troubleshooting

"No audio detected"

  • Ensure microphone permissions are granted in Windows Settings
  • Check that your default recording device is selected
  • If another app is holding the mic, Koe will try another available input and warn you in the pill UI

"API rate limit exceeded"

  • Wait for the per-minute window to clear
  • Check the usage dashboard for queue pressure
  • Very long continuous dictation can still pile up requests on the free tier

"Auto-paste not working"

  • Some applications block simulated keystrokes
  • Disable auto-paste in Settings and use Ctrl + V manually
  • Run Koe as administrator if the issue persists

App won't launch

  • Ensure you're on Windows 10/11 64-bit
  • Check that Visual C++ Redistributables are installed
  • Check Windows Event Viewer for crash details

Acknowledgments

  • Groq — For the fast Whisper and chat APIs
  • Silero — For the VAD model
  • @ricky0123 — For the vad-web library
  • WhisperFlow — For helping prove the category exists

License

Koe is licensed under the ISC License. See the LICENSE file for details.


Built with ❤️ by J StaR Films Studios

Star us on GitHub if you find Koe useful.

Packages

 
 
 

Contributors