Skip to content

jjscholtes/FreeWhispr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FreeWhispr

FreeWhispr is a local macOS transcription app for recording or importing audio, generating transcripts, assigning speakers, and exporting results. It keeps the workflow simple with fast recording, searchable sessions and folders, and built-in editing tools for speaker names and transcript cleanup.

Current State & Technical Stack

  • macOS app (SwiftUI + AVFoundation) for recording/importing audio, session folders, transcript editing, speaker naming/reassignment, and exports
  • ASR (speech-to-text) runs locally with whisper.cpp, including Apple Silicon acceleration (Metal/GPU and Core ML encoder support when available)
  • Speaker separation (diarization) runs locally with pyannote.audio in a Python worker runtime (Hugging Face token required for gated pyannote model access)
  • Worker communication uses JSONL IPC between the Swift app and the local Python worker (setup validation, progress events, jobs, errors)
  • Storage is local in macOS Application Support (audio, transcript JSON/TXT/SRT, logs, metrics)
  • Security: the Hugging Face token is stored in the macOS Keychain (not in session/transcript JSON files)
  • Packaging scripts build a macOS .app, .zip, and .dmg, with optional on-demand speaker runtime install to reduce download size
  • Quality: includes versioned JSON contract examples and basic tests for session store, exports, reconciliation, and worker IPC

Speaker Recognition (Speaker Separation) Setup - Step by Step

FreeWhispr can transcribe audio locally without extra account setup, but speaker separation (who said what) uses pyannote models hosted on Hugging Face and requires one-time access setup.

For people using the downloaded app (recommended)

If you installed FreeWhispr by downloading FreeWhispr.zip and dragging the app into Applications, follow these steps (no terminal needed):

1. Install and open FreeWhispr

  • Download FreeWhispr.zip
  • Unzip it
  • Drag FreeWhispr.app into Applications
  • Open FreeWhispr

2. Open Settings in FreeWhispr

  • Open Settings
  • Go to Diarization Setup / Model Access
  • Leave Enable diarization by default on (or toggle it on later per workflow)

3. Create a Hugging Face token (one-time)

Speaker separation uses gated pyannote models, so you need a Hugging Face account + token.

  • Create or sign in to a Hugging Face account
  • Generate an access token with Read access
  • Copy the token

4. Request/accept access to the gated pyannote model (one-time)

Open the model page and request/accept access:

  • pyannote/speaker-diarization-community-1

If Hugging Face prompts for terms/approval, complete that with the same account that created your token.

5. Paste the token into FreeWhispr

  • In Settings, paste the token into Hugging Face token (pyannote)
  • Click Save settings
  • Click Validate setup

You want to see something like:

  • whisper.cpp: available
  • pyannote: available
  • HF token: present

6. Test speaker separation

  • Record or import an audio file
  • Make sure speaker separation/diarization is enabled
  • Process the recording

The first run may take longer while models are downloaded/cached locally.

7. If it fails (common fixes)

Error: 401 Cannot access gated repo ... pyannote/speaker-diarization-community-1

  • Your token is present, but your account does not yet have access to the gated model
  • Go back to the model page and make sure access was approved/accepted
  • Confirm you used the same Hugging Face account for the token
  • Re-run Validate setup and try again

Error: whisper.cpp: missing

  • If you are using the packaged app, reinstall/update FreeWhispr and try Validate setup again
  • If you are running from source, run:
./scripts/install_whispercpp.sh --download-model large-v3-turbo
  • Restart the app and click Validate setup again

Transcription works, but no speaker labels

  • Make sure diarization/speaker separation is enabled
  • Very short clips or low-quality audio may produce weak speaker splits

Microphone records silence

  • Check macOS permissions: System Settings -> Privacy & Security -> Microphone
  • Allow access for FreeWhispr

8. Notes

  • The Hugging Face token is stored in the macOS Keychain (not in your transcript files).
  • You can still use FreeWhispr without speaker separation by disabling diarization.
  • Transcription and diarization run locally after the required models are installed and cached.

Running FreeWhispr from source (developer setup)

If you are running FreeWhispr from the source repo instead of using the packaged app, install the local worker dependencies first:

./scripts/install_worker_deps.sh

This creates a local Python environment (.venv313) and installs the worker dependencies (including pyannote.audio).

Then install whisper.cpp (default ASR backend) and a local model:

./scripts/install_whispercpp.sh --download-model large-v3-turbo

Then start the app from source:

cd app
swift run

Developer Notes

  • The packaged app is the easiest way to get started.
  • The source workflow is mainly for development and debugging.

Tests

./scripts/swift_test.sh
python3 -m unittest discover -s worker/tests -p 'test_*.py'

If ./scripts/swift_test.sh reports an SDK/compiler mismatch, update/select a matching Xcode/Command Line Tools installation. The script already works around the local module-cache permission issue.

About

FreeWhispr is a local macOS transcription app for recording or importing audio, generating transcripts, assigning speakers, and exporting results. It keeps the workflow simple with fast recording, searchable sessions and folders, and built-in editing tools for speaker names and transcript cleanup.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors