Private Audio Transcriber (PAT)

Version 1.0 - Transcription only
Version 2.0 - Transcription and Translation

Private Audio Transcriber (PAT)

A lightweight, fully offline, multilingual dictation and transcription console for Mac.
Collaborative workflow. The system transcribes fast. You immediately fix any errors.
Supports batch processing.
Record in app, or drag and drop audio files.
Your data never leaves your device.
Powered by MLX-Whisper (whisper-turbo-mlx) for transcription and Tiny-Aya-Global (tiny-aya-global-8bit-mlx) for translation.

This tool is particularly valuable for professionals (doctors, lawyers, journalists) who need to convert audio recordings to text, but are restricted by law or ethics from sending their data to the cloud.

Version 1.0 (Transcription only)

YouTube Demo
https://www.youtube.com/watch?v=IsaXxHD7nfI

User-friendly, distraction-free interface
Transcriptions are displayed with original audio for easier checking

Version 2.0 (Transcription and Translation)

Quick translation
Supports more than 60 languages

Features

Runs offline: Data stays local.
Fully Transparent: All code files accessible for compliance auditing. No black-box executables. No proprietary wrappers.
Runs on mac: Supports MacOS on Apple Silicon.
Fast: Uses the Apple MLX framework.
Supports batch transcriptions: Drag and drop your audio files.
Free and Open Source: Ideal for high volume use cases where cloud costs add up fast.
Self-Contained Single-File Architecture: The frontend and backend code is contained in a single app.py file. This "see the entire picture at once" design makes the codebase easy to audit for security and privacy. It also makes the code highly maintainable through AI collaboration. Developers can share the entire codebase with an AI assistant in a single prompt. This enables them to add features or fix bugs immediately rather than logging GitHub issues and waiting for responses.
"Double-Click to Run" Accessibility: Through a simple .command MacOS script, the application can be launched without needing to use the command line. This makes it accessible to non-programmers.
Translation support: Version 2.0 has a translation feature that supports more than 60 languages.

Security

Local-Only Binding (Air-Gap Readiness)
The application includes a check_host validation layer that forces the server to bind strictly to 127.0.0.1 or localhost. This prevents the app from being exposed to an external network or the public internet.
Hardened Content Security Policy (CSP)
A strict CSP header is enforced on every response, restricting resource loading to 'self'. It explicitly manages media-src for secure blob URL audio playback while preventing unauthorized cross-site scripting (XSS) vectors.
Anti-Clickjacking Protection
The app implements the X-Frame-Options: DENY header, ensuring the interface cannot be embedded in an iframe on a malicious site to trick users into interacting with the microphone.
Custom Request Verification
The /transcribe endpoint requires a specific custom header (X-Requested-With: MedicalApp). This acts as a basic CSRF (Cross-Site Request Forgery) defense by ensuring requests originate from your frontend and not a simple cross-origin form submission.
Automated Temporary File Cleanup
To protect patient privacy and data sovereignty, the app uses a finally block to ensure all uploaded audio files are deleted from the local disk immediately after transcription, regardless of whether the process succeeded or failed.
Error Masking & Detailed Logging
The backend is configured to log detailed exception data to the server terminal while returning only generic, "safe" error messages to the client. This prevents "Information Leakage" where internal file paths or system configurations might be exposed to the user interface.
Input Validation & Payload Limiting
The server enforces a MAX_CONTENT_LENGTH of 100MB and performs strict file extension validation (.wav, .mp3, .m4a, .webm) to mitigate "Zip Bomb" style attacks or the execution of malicious scripts.

How to Install and Run

Note: The instructions below are for version 1.0. The process is the same for version 2.0 however, in version 2.0 two models will be downloaded during installation (5.2 GB total).

In this section you will do the following:

Install the uv Python package manager
Install ffmeg
Start the app by double clicking a file


--------------------------------------------------------------
System Requirements
--------------------------------------------------------------

Operating System: MacOS
Computer: Apple Silicon Mac (M Series)
RAM: 8GB
Free disk Space: 2.5 GB

--------------------------------------------------------------
Step-by-Step Setup
--------------------------------------------------------------

If you already have UV and ffmeg installed then please skip those steps.


1. Install ffmpeg
--------------------------------------------------------------

Use Hombrew (https://brew.sh/).

1. Open the terminal on your Mac
2. Paste in this line and press Enter:
brew install ffmpeg


2. Install UV
--------------------------------------------------------------

Paste this command into the terminal and press Enter:
wget -qO- https://astral.sh/uv/install.sh | sh


3. Download the project folder and place it on your desktop
--------------------------------------------------------------

On GitHub click on "<> Code". Then select "Download Zip"
Download the project folder and unzip it.
Inside the main folder you will find a folder named: Private-Audio-Transcriber-v1.0
Place Private-Audio-Transcriber-v1.0 on your desktop.

4. Install the App
--------------------------------------------------------------

1. cd into Private-Audio-Transcriber-v1.0 folder:
cd Desktop
cd Private-Audio-Transcriber-v1.0

7. Paste this command into the terminal and press Enter:
(This overwrites the file and changes the file permissions to make it executable.)

cat start-mac-app.command > temp && mv temp start-mac-app.command && chmod +x start-mac-app.command

8. Open the Private-Audio-Transcriber-v1.0 folder

9. Double click this file: start-mac-app.command

10. The app will auto download all requirements and then open in your browser.
The whisper-turbo mlx model (1.61 GB) will also be downloaded.
The first time, the app may take about a minute to start.
After that it will start very fast.


--------------------------------------------------------------
Stopping the App
--------------------------------------------------------------

The app does not stop running when you close the browser tab.
To shut down the app, close the terminal window.
You can also close the terminal by selecting it and typing: Ctrl+C


--------------------------------------------------------------
Future startup
--------------------------------------------------------------

Now that the setup is complete, in future simply double-click the start-mac-app.command file to launch the app.
The project folder must be placed on your desktop before the app is launched.

Easy to customize

The code is simple. Someone with only a basic knowledge of Python (or an AI assistant) can modify the code to tailor the output to suit a particular use case. Only the run_transcription function (below) needs to be modified in the app.py file.

def run_transcription(audio_path):
    """
    Transcribes audio using the mlx_whisper model and highlights dictation keywords.
    """
    result = mlx_whisper.transcribe(
        audio_path,
        # Make sure this points to your local model directory
        path_or_hf_repo="models/whisper-turbo-mlx"
    )

    text = result['text'].strip()
    language = result['language']
	
	
    if language == 'en':
	
	    # Keywords to be highlighted. This is case-insensitive.
	    dictation_keywords = [
	        'comma',
	        'period',
	        'colon',
	        'new paragraph',
	        'end of note'
	    ]
	
	    highlighted_text = text
	    for keyword in dictation_keywords:
	        # Use re.sub with a lambda function to wrap the found word
	        # while preserving its original casing (e.g., "Comma" becomes "<Comma>").
	        # Word boundaries (\b) ensure we don't highlight parts of other words.
	        pattern = r'\b(' + re.escape(keyword) + r')\b'
	        highlighted_text = re.sub(
	            pattern,
	            lambda match: f"<{match.group(1)}>",
	            highlighted_text,
	            flags=re.IGNORECASE
	        )
	
	    return highlighted_text
		
    else:
        return text

For example, you can add logic to fix errors that the transcriber routinely makes. Or, if the language is Spanish, you might want to highlight certain words in the text so they will be easier to see and quicker to edit.

elif language == 'es':
    dictation_keywords = ['coma', 'punto', 'nuevo párrafo']
    # ... apply same highlighting logic ...

Improve Privacy and Reliability

Turn on FileVault Disk Encryption

FileVault is a built-in macOS security feature that provides full-disk encryption. This ensures that if a Mac is lost or stolen the files cannot be accessed.

System Settings > Privacy & Security > Scroll down to FileVault

Turn off telemetry on your Mac

System Settings > Privacy & Security > Analytics & Improvements
(This does not block all telemetry.)

Disable automatic download and installation of MacOS updates

This will ensure that you don't wake up one morning and discover that a new OS has been installed and it has negatively impacted the operation of your Mac.

System Settings > General > Software Update > Scroll down to Automatic Updates > Click on "i"

Notes

Transcription quality varies depending on the language.
Whisper Turbo automatically detects the language being spoken.

References

mlx-community/whisper-turbo
https://huggingface.co/mlx-community/whisper-turbo
MLX-Whisper
https://github.com/ml-explore/mlx-examples/tree/main/whisper

Discussion Forum

Feel free to share your thoughts and experiences. Click on the "Discussions" tab above to open the discussion forum for this project.

Revision History

Version 1.0
14-Feb-2026
Prototype. Released for testing.

Version 2.0
22-Feb-2026
Prototype. Added translation feature. Released for testing.

Rough Notes

When the input text contains multiple paragraphs, tiny-aya-global-8bit-mlx sometimes fails to translate the last paragraph. It seems that the model incorrectly thinks that it has reached the end of the input text. It could be that the quantization is causing this instability. The BF16 model might be fine.
The app can fail completely with large audio files (75 MB). Lesson learned: Test the system under full load, replicating the conditions under which the app will be used.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
Private-Audio-Transcriber-v1.0		Private-Audio-Transcriber-v1.0
Private-Audio-Transcriber-v2.0		Private-Audio-Transcriber-v2.0
images		images
sample-audio-files-for-testing		sample-audio-files-for-testing
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Private Audio Transcriber (PAT)

Version 1.0 (Transcription only)

Version 2.0 (Transcription and Translation)

Features

Security

How to Install and Run

Easy to customize

Improve Privacy and Reliability

Notes

References

Discussion Forum

Revision History

Rough Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Private Audio Transcriber (PAT)

Version 1.0 (Transcription only)

Version 2.0 (Transcription and Translation)

Features

Security

How to Install and Run

Easy to customize

Improve Privacy and Reliability

Notes

References

Discussion Forum

Revision History

Rough Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages