🎙️🗣️🔪 φόνος — phonos

phónos — voice, sound, speech... but also murder, slaughter, homicide (yes, really)

Speak freely. Your words, where you need them.

A Whisper Flow–style dictation tool that runs entirely on hardware you control. Press a hotkey, talk, and watch your words appear in whatever app you're in. No cloud. No subscriptions. No mystery box between your microphone and your text.

Beta: usable for local/Tailscale dictation, but still early and ad-hoc signed.

Phonos slays subscription dictation — your wallet gets to stay alive. ☠️

───

🗡️ Why Phonos?

The name means "murder" in Greek. We are keeping the bit, just making the software around it more serious.

☁️ Cloud dictation	🔪 Phonos
Audio is processed by a remote service	Audio goes only to your configured server
Subscription required	Free and open source
Latency depends on internet and vendor load	Bounded by your hardware and model choice
One model fits all	Pick the model for your hardware
Vendor behavior is opaque	Open source and auditable
Surprise product decisions	You run the thing

───

⚡ How it works

Press a hotkey — hold it down or toggle, your call.
Talk — your Mac captures the audio.
Whisper transcribes — your server runs faster-whisper in a dedicated subprocess.
Text appears — pasted directly into the app you were using, with clipboard fallback.

───

🏗️ Architecture

The server runs the active Whisper model in a dedicated subprocess. The main FastAPI process communicates with that worker through local queues. When you switch models via PUT /models/active, the old worker is stopped and a new worker is started with the requested model, allowing the operating system to reclaim model memory cleanly.

Same battle-tested idea as one-process-per-model serving, minus the orchestration ceremony for a local dictation box.

The server is intended for localhost, LAN, or private-network use such as Tailscale. Set PHONOS_AUTH_TOKEN before exposing it beyond localhost.

───

🚀 Quick start

Server

The “server” can be another machine on your LAN/Tailscale network, or just a Docker container running locally on the same Mac. For local-only use, set the Mac app Server URL to http://localhost:8765.

git clone https://github.com/jb381/phonos && cd phonos/apps/server
cp .env.example .env          # optional: set PHONOS_AUTH_TOKEN
docker compose up -d          # boom, transcription server on :8765

Or run the published server image directly:

docker run -d \
  --name phonos-server \
  -p 8765:8765 \
  -e PHONOS_AUTH_TOKEN=your-secret-token \
  -v phonos_models:/root/.cache/huggingface \
  ghcr.io/jb381/phonos-server:latest

Without Docker:

uv sync
uv run uvicorn phonos_server.main:app --host 0.0.0.0 --port 8765

macOS client

From a release — download the latest Phonos-*.dmg from the Releases page, open it, and drag Phonos.app to Applications.

From source:

cd apps/macos
./dev-run.sh                  # fast dev loop: build, quit old app, launch new app
./build.sh                    # creates Phonos.dmg and Phonos.app
open Phonos.dmg               # then drag to Applications

For day-to-day macOS development, prefer ./dev-run.sh. It builds a debug app bundle at .build/dev/Phonos.app, ejects any mounted Phonos DMG volumes, quits the currently running app, and opens the freshly built one. It avoids the installer DMG loop entirely.

Releases are triggered by git tag vX.Y.Z && git push --tags. CI builds, ad-hoc signs, and publishes a DMG automatically.

No Apple Developer account yet = ad-hoc signing. Gatekeeper may complain on first launch — right-click the app and choose Open, or go to System Settings → Privacy & Security and click Open Anyway. Accessibility permission may need to be re-granted after ad-hoc rebuilds.

To silence Gatekeeper from the terminal, sudo may be needed because the app lives in /Applications:
sudo xattr -dr com.apple.quarantine /Applications/Phonos.app
The grown-up version of this is Developer ID signing + notarization. It is on the roadmap.

macOS does not allow scripts to grant Microphone or Accessibility permissions. The practical development fix is stable signing: set PHONOS_CODESIGN_IDENTITY or install/use an Apple Development identity so the first manual grant sticks across rebuilds. Ad-hoc signing changes the app's code requirement often enough that macOS may ask for permissions again.

───

✨ Features

🎙️ Menu-bar status item with recording/transcribing/paste state
⌨️ Global hotkey (Control-Space by default, customizable)
🎮 Hold-to-record and toggle recording modes
📋 Direct auto-paste into the previously active application
🧯 Clipboard fallback when Accessibility permission is not granted
🛠️ First-run setup for permissions and server connection
🔄 Model selector with live switching from the server
📜 Persistent transcript history in the menu bar (SQLite, local to your Mac, searchable by text/model/language)
🔐 Auth token stored in macOS Keychain

───

📋 Requirements

Component	What you need
Server	Docker, a CPU (or GPU if you're fancy 🧊)
Client	macOS 14+, Xcode 15+
Network	Tailscale or same LAN

───

📡 API

Method	Path	Purpose
GET	`/health`	Server health + model info
GET	`/models`	List configured models
GET	`/models/active`	Get currently loaded model
PUT	`/models/active`	Switch active model
POST	`/transcribe`	Transcribe audio

PUT /models/active and POST /transcribe require auth when PHONOS_AUTH_TOKEN is set.

───

🔧 Server config

PHONOS_AUTH_TOKEN=          # leave empty to skip auth

PHONOS_MODEL=base.en
PHONOS_MODELS=tiny.en,base.en,small.en,medium.en,large-v3,turbo,distil-large-v3

PHONOS_DEVICE=cpu
PHONOS_COMPUTE_TYPE=int8
PHONOS_VAD_FILTER=true
PHONOS_TRANSCRIBE_TIMEOUT_SECONDS=600
PHONOS_MAX_UPLOAD_MB=100

Docker Compose binds to 127.0.0.1 by default. For remote access, set PHONOS_BIND=0.0.0.0 and PHONOS_AUTH_TOKEN.

───

🛡️ Privacy and security

Audio is recorded by the macOS app and uploaded only to the configured Phonos server.
The server does not require internet access after model files are downloaded and cached.
Set PHONOS_AUTH_TOKEN before binding the server to a LAN or private network interface.
The macOS auth token is stored in Keychain.
Temporary client recording files are removed after each transcription flow completes.
Transcript history is stored locally in a SQLite database (~/Library/Application Support/Phonos/history.sqlite) and can be cleared or disabled in Settings.
Server logs include request metadata and transcript text for debugging; run the server only where those logs are acceptable.

⚠️ Current limitations

Official macOS builds are ad-hoc signed and not notarized yet, so Gatekeeper may require manual approval on first launch.
Phonos is intended for localhost, LAN, or private networks such as Tailscale. Do not expose the server directly to the public internet.
Clipboard restoration is best-effort for complex clipboard contents, though normal text clipboard restore is supported.
Keychain integration tests are manual because unsigned test binaries can trigger macOS permission prompts.

───

📊 Models

All models are English-optimized. Larger models are more accurate but slower and need more memory.

Model	Params	Notes
`tiny.en`	39M	Fastest, lowest memory 🏃
`base.en`	74M	Fast, decent English quality
`small.en`	244M	Good quality/speed — recommended CPU daily driver ✅
`medium.en`	769M	Better accuracy, handles harder speech
`turbo`	798M	Speed-optimized, multilingual 🌍
`distil-large-v3`	756M	Distilled large, strong English
`large-v3`	1550M	Highest quality, very slow on CPU 💀

Start with small.en for CPU usage. Try turbo or distil-large-v3 if you need higher quality or multilingual transcription. Use large-v3 only when the server has enough CPU/GPU capacity and memory.

───

📄 License

MIT — do whatever, just keep the Greek in the README. Preferably the murder one.

───

made with ☕, 🎧, a mild obsession with terminal aesthetics, and a name that apparently means murder

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.github/workflows		.github/workflows
apps		apps
docs		docs
packages/protocol		packages/protocol
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️🗣️🔪 φόνος — phonos

🗡️ Why Phonos?

⚡ How it works

🏗️ Architecture

🚀 Quick start

Server

macOS client

✨ Features

📋 Requirements

📡 API

🔧 Server config

🛡️ Privacy and security

⚠️ Current limitations

📊 Models

📄 License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️🗣️🔪 φόνος — phonos

🗡️ Why Phonos?

⚡ How it works

🏗️ Architecture

🚀 Quick start

Server

macOS client

✨ Features

📋 Requirements

📡 API

🔧 Server config

🛡️ Privacy and security

⚠️ Current limitations

📊 Models

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages