Skip to content

jb381/phonos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

100 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Phonos logo

๐ŸŽ™๏ธ๐Ÿ—ฃ๏ธ๐Ÿ”ช ฯ†ฯŒฮฝฮฟฯ‚ โ€” phonos

phรณnos โ€” voice, sound, speech... but also murder, slaughter, homicide (yes, really)

Speak freely. Your words, where you need them.

License: MIT Platform Release Status Python

A Whisper Flowโ€“style dictation tool that runs entirely on hardware you control. Press a hotkey, talk, and watch your words appear in whatever app you're in. No cloud. No subscriptions. No mystery box between your microphone and your text.

Beta: usable for local/Tailscale dictation, but still early and ad-hoc signed.

Phonos slays subscription dictation โ€” your wallet gets to stay alive. โ˜ ๏ธ

โ”€โ”€โ”€

๐Ÿ—ก๏ธ Why Phonos?

The name means "murder" in Greek. We are keeping the bit, just making the software around it more serious.

โ˜๏ธ Cloud dictation ๐Ÿ”ช Phonos
Audio is processed by a remote service Audio goes only to your configured server
Subscription required Free and open source
Latency depends on internet and vendor load Bounded by your hardware and model choice
One model fits all Pick the model for your hardware
Vendor behavior is opaque Open source and auditable
Surprise product decisions You run the thing

โ”€โ”€โ”€

โšก How it works

  1. Press a hotkey โ€” hold it down or toggle, your call.
  2. Talk โ€” your Mac captures the audio.
  3. Whisper transcribes โ€” your server runs faster-whisper in a dedicated subprocess.
  4. Text appears โ€” pasted directly into the app you were using, with clipboard fallback.

โ”€โ”€โ”€

๐Ÿ—๏ธ Architecture

The server runs the active Whisper model in a dedicated subprocess. The main FastAPI process communicates with that worker through local queues. When you switch models via PUT /models/active, the old worker is stopped and a new worker is started with the requested model, allowing the operating system to reclaim model memory cleanly.

Same battle-tested idea as one-process-per-model serving, minus the orchestration ceremony for a local dictation box.

The server is intended for localhost, LAN, or private-network use such as Tailscale. Set PHONOS_AUTH_TOKEN before exposing it beyond localhost.

โ”€โ”€โ”€

๐Ÿš€ Quick start

Server

The โ€œserverโ€ can be another machine on your LAN/Tailscale network, or just a Docker container running locally on the same Mac. For local-only use, set the Mac app Server URL to http://localhost:8765.

git clone https://github.com/jb381/phonos && cd phonos/apps/server
cp .env.example .env          # optional: set PHONOS_AUTH_TOKEN
docker compose up -d          # boom, transcription server on :8765

Or run the published server image directly:

docker run -d \
  --name phonos-server \
  -p 8765:8765 \
  -e PHONOS_AUTH_TOKEN=your-secret-token \
  -v phonos_models:/root/.cache/huggingface \
  ghcr.io/jb381/phonos-server:latest

Without Docker:

uv sync
uv run uvicorn phonos_server.main:app --host 0.0.0.0 --port 8765

macOS client

From a release โ€” download the latest Phonos-*.dmg from the Releases page, open it, and drag Phonos.app to Applications.

From source:

cd apps/macos
./dev-run.sh                  # fast dev loop: build, quit old app, launch new app
./build.sh                    # creates Phonos.dmg and Phonos.app
open Phonos.dmg               # then drag to Applications

For day-to-day macOS development, prefer ./dev-run.sh. It builds a debug app bundle at .build/dev/Phonos.app, ejects any mounted Phonos DMG volumes, quits the currently running app, and opens the freshly built one. It avoids the installer DMG loop entirely.

Releases are triggered by git tag vX.Y.Z && git push --tags. CI builds, ad-hoc signs, and publishes a DMG automatically.

No Apple Developer account yet = ad-hoc signing. Gatekeeper may complain on first launch โ€” right-click the app and choose Open, or go to System Settings โ†’ Privacy & Security and click Open Anyway. Accessibility permission may need to be re-granted after ad-hoc rebuilds.

To silence Gatekeeper from the terminal, sudo may be needed because the app lives in /Applications:

sudo xattr -dr com.apple.quarantine /Applications/Phonos.app

The grown-up version of this is Developer ID signing + notarization. It is on the roadmap.

macOS does not allow scripts to grant Microphone or Accessibility permissions. The practical development fix is stable signing: set PHONOS_CODESIGN_IDENTITY or install/use an Apple Development identity so the first manual grant sticks across rebuilds. Ad-hoc signing changes the app's code requirement often enough that macOS may ask for permissions again.

โ”€โ”€โ”€

โœจ Features

  • ๐ŸŽ™๏ธ Menu-bar status item with recording/transcribing/paste state
  • โŒจ๏ธ Global hotkey (Control-Space by default, customizable)
  • ๐ŸŽฎ Hold-to-record and toggle recording modes
  • ๐Ÿ“‹ Direct auto-paste into the previously active application
  • ๐Ÿงฏ Clipboard fallback when Accessibility permission is not granted
  • ๐Ÿ› ๏ธ First-run setup for permissions and server connection
  • ๐Ÿ”„ Model selector with live switching from the server
  • ๐Ÿ“œ Persistent transcript history in the menu bar (SQLite, local to your Mac, searchable by text/model/language)
  • ๐Ÿ” Auth token stored in macOS Keychain

โ”€โ”€โ”€

๐Ÿ“‹ Requirements

Component What you need
Server Docker, a CPU (or GPU if you're fancy ๐ŸงŠ)
Client macOS 14+, Xcode 15+
Network Tailscale or same LAN

โ”€โ”€โ”€

๐Ÿ“ก API

Method Path Purpose
GET /health Server health + model info
GET /models List configured models
GET /models/active Get currently loaded model
PUT /models/active Switch active model
POST /transcribe Transcribe audio

PUT /models/active and POST /transcribe require auth when PHONOS_AUTH_TOKEN is set.

โ”€โ”€โ”€

๐Ÿ”ง Server config

PHONOS_AUTH_TOKEN=          # leave empty to skip auth

PHONOS_MODEL=base.en
PHONOS_MODELS=tiny.en,base.en,small.en,medium.en,large-v3,turbo,distil-large-v3

PHONOS_DEVICE=cpu
PHONOS_COMPUTE_TYPE=int8
PHONOS_VAD_FILTER=true
PHONOS_TRANSCRIBE_TIMEOUT_SECONDS=600
PHONOS_MAX_UPLOAD_MB=100

Docker Compose binds to 127.0.0.1 by default. For remote access, set PHONOS_BIND=0.0.0.0 and PHONOS_AUTH_TOKEN.

โ”€โ”€โ”€

๐Ÿ›ก๏ธ Privacy and security

  • Audio is recorded by the macOS app and uploaded only to the configured Phonos server.
  • The server does not require internet access after model files are downloaded and cached.
  • Set PHONOS_AUTH_TOKEN before binding the server to a LAN or private network interface.
  • The macOS auth token is stored in Keychain.
  • Temporary client recording files are removed after each transcription flow completes.
  • Transcript history is stored locally in a SQLite database (~/Library/Application Support/Phonos/history.sqlite) and can be cleared or disabled in Settings.
  • Server logs include request metadata and transcript text for debugging; run the server only where those logs are acceptable.

โš ๏ธ Current limitations

  • Official macOS builds are ad-hoc signed and not notarized yet, so Gatekeeper may require manual approval on first launch.
  • Phonos is intended for localhost, LAN, or private networks such as Tailscale. Do not expose the server directly to the public internet.
  • Clipboard restoration is best-effort for complex clipboard contents, though normal text clipboard restore is supported.
  • Keychain integration tests are manual because unsigned test binaries can trigger macOS permission prompts.

โ”€โ”€โ”€

๐Ÿ“Š Models

All models are English-optimized. Larger models are more accurate but slower and need more memory.

Model Params Notes
tiny.en 39M Fastest, lowest memory ๐Ÿƒ
base.en 74M Fast, decent English quality
small.en 244M Good quality/speed โ€” recommended CPU daily driver โœ…
medium.en 769M Better accuracy, handles harder speech
turbo 798M Speed-optimized, multilingual ๐ŸŒ
distil-large-v3 756M Distilled large, strong English
large-v3 1550M Highest quality, very slow on CPU ๐Ÿ’€

Start with small.en for CPU usage. Try turbo or distil-large-v3 if you need higher quality or multilingual transcription. Use large-v3 only when the server has enough CPU/GPU capacity and memory.

โ”€โ”€โ”€

๐Ÿ“„ License

MIT โ€” do whatever, just keep the Greek in the README. Preferably the murder one.

โ”€โ”€โ”€

made with โ˜•, ๐ŸŽง, a mild obsession with terminal aesthetics, and a name that apparently means murder

About

Open-source macOS dictation powered by your own Whisper server. Speak freely, paste anywhere.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors