Purpose

This project offers a privacy-focused solution for transcribing and summarizing audio recordings through entirely local processing on your machine. Using OpenAI's Whisper for transcription and local LLMs via Ollama for summarization, it processes audio files (MP3/WAV) entirely on your machine, ensuring sensitive content never leaves your environment.

The tool automatically generates structured summaries including:

Executive overview
Detailed content breakdown
Action items
Meeting metadata

Note

This project is functional on Linux and Windows 11.

Linux Setup: Using Python Virtual Environment (Recommended)

Prerequisites

Install the system-level dependencies using your package manager:

# Debian/Ubuntu
sudo apt install python3 python3-venv python3-pip ffmpeg

# Fedora
sudo dnf install python3 python3-pip ffmpeg-free

# Arch
sudo pacman -S python python-pip ffmpeg

You also need Ollama installed and a model pulled (e.g., ollama pull llama3.1:8b).

Automated Setup

From the root of the cloned repository, run the install script:

# CPU-only PyTorch (works on any machine)
./install.sh

# Or, if you have an NVIDIA GPU with CUDA:
./install.sh --cuda

The script creates a virtual environment, installs all Python dependencies, and verifies your setup.

Linux Manual Setup (Click to expand)

python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip

# CPU-only PyTorch:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Or CUDA PyTorch (see https://pytorch.org/get-started/locally/):
# pip install torch torchvision torchaudio

pip install -r requirements.txt
python pytorch_verify.py

Windows Automated Setup: Using Python Virtual Environment (Recommended)

If you are on Windows, you can use the included PowerShell script to automatically install Chocolatey, FFmpeg, Python 3.10, and all required Python dependencies.

Important: You must run this script directly from the root folder of the cloned repository.

Open PowerShell as an Administrator.
Navigate to your cloned project directory:

cd path\to\Ollama-Transcriber

Run the setup script using the following command (this temporarily bypasses Windows execution policies to allow the script to run):

powershell.exe -ExecutionPolicy Bypass -File .\install.ps1

The script will automatically check for missing dependencies, set up your Python virtual environment, and verify your GPU access.

Important

Restart Required: If you have VS Code, Cmder, Command Prompt, or any other terminal open while running the setup script, you must completely close and restart those applications (and sometimes reboot your computer) after the installation finishes.

Otherwise, your terminal will not recognize the newly installed ffmpeg command, and audio processing will fail.

Windows Manual Setup: Direct Install (Click to expand)

Select Python Interpreter Version

This project requires Python 3.8 or later. It is highly recommended to set up a virtual environment (python -m venv venv) before proceeding.

Install `ffmpeg` Globally as PowerShell Administrator

ffmpeg is required for Whisper to process audio files. Follow the instructions HERE to install Chocolatey via PowerShell Administration, then install ffmpeg:

choco install ffmpeg

Requirements Installation

python -m pip install -r requirements.txt --no-warn-script-location

Enable Long Paths

From an Administrator PowerShell window, run the following:

New-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem" -Name "LongPathsEnabled" -Value 1 -PropertyType DWORD -Force

Download PyTorch with CUDA Support for GPU Acceleration

If you have an NVIDIA GPU, determine your compute platform by running: nvidia-smi.exe
Identify your "CUDA Version".
Navigate to: https://pytorch.org/get-started/locally/
Select options specific to your environment and run the provided install command.
Once installation is complete, verify your setup:

python pytorch_verify.py

Usage

LLM Customization

Install Ollama on your system and download your preferred model.
Modify the config.yaml file located in src/utils/config.yaml and specify the model you are using.
Refer to the Ollama documentation for details on other available options like num_ctx, num_predict, top_k, repeat_penalty, and num_gpu.

llm:
  model_name: "mistral:latest" # Choose your Ollama model (e.g., "mistral:latest", "llama3.1:8b")
  options:
    temperature: 0.3 # Controls response creativity (0.0-1.0). Higher values are more creative.
    top_p: 0.5 # Controls similarity sampling (accuracy) when generating a response (0.1-1).

Begin Ollama Server

Before running the transcriber, ensure your local Ollama server is running:

ollama serve

Run Project

To run the project, ensure your virtual environment is active, then use the main.py script:

python main.py [OPTIONS]

Click here to see all available CLI commands and flags

python main.py --gui: Use the graphical user interface (GUI) to select an audio file.
python main.py --audio path/to/recording.mp3: Process a specific audio file with default settings.
python main.py --audio path/to/recording.mp3 --language es: Specify the language of the audio file (e.g., Spanish) using ISO codes.
python main.py --audio path/to/recording.mp3 --output path/to/output --transcript medium: Specify the output directory and the Whisper model size for transcription.
python main.py --audio path/to/recording.mp3 --llm mistral:latest: Use a specific LLM model for summarization.
python main.py --audio path/to/recording.mp3 --output path/to/summaries --transcript medium --language es --llm mistral:latest: Full example utilizing multiple customized flags.
python main.py --help: Display the help menu with all available options.

The results of the processing will be stored in a results directory created in the same location where you run main.py. This directory will contain:

converted_audio/: Stores the audio files converted to the required format (if necessary).
transcribed_text/: Holds the raw .txt transcriptions of the audio files.
meeting_summaries/: Contains the generated LLM meeting summary files.

Supported Languages

Whisper supports nearly 100 languages. Pass the 2-letter ISO code using the --language flag (e.g., --language es).

Click here to expand the full list of language codes

Code	Language	Code	Language	Code	Language
`en`	English	`es`	Spanish	`fr`	French
`de`	German	`it`	Italian	`pt`	Portuguese
`nl`	Dutch	`ja`	Japanese	`ko`	Korean
`zh`	Chinese	`ru`	Russian	`ar`	Arabic
`hi`	Hindi	`tr`	Turkish	`pl`	Polish

(Note: You can find the complete list of all 90+ supported ISO-639-1 codes in the official Whisper documentation.)

System Prompt Customization

Modify the config.yaml file located in src/utils/config.yaml to customize the exact structure and focus of the AI summary:

prompts:
  summary_prompt: | 
    Analyze the provided transcript and create a comprehensive Summary Report that captures all essential information.

    Structure the summary as follows:

    1. **EXECUTIVE OVERVIEW**
    - Synthesize core meeting purpose and outcomes

    2. **KEY DISCUSSION POINTS**
    - Present main topics chronologically with timestamps

    3. **ACTION ITEMS AND RESPONSIBILITIES**
    - List concrete tasks with clear ownership and deliverables

    4. **CONCLUSIONS AND NEXT STEPS**
    - Summarize achieved outcomes against objectives

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Purpose

Linux Setup: Using Python Virtual Environment (Recommended)

Prerequisites

Automated Setup

Windows Automated Setup: Using Python Virtual Environment (Recommended)

Select Python Interpreter Version

Install `ffmpeg` Globally as PowerShell Administrator

Requirements Installation

Enable Long Paths

Download PyTorch with CUDA Support for GPU Acceleration

Usage

LLM Customization

Begin Ollama Server

Run Project

Supported Languages

System Prompt Customization

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
src		src
.gitignore		.gitignore
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh
main.py		main.py
pytorch_verify.py		pytorch_verify.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Purpose

Linux Setup: Using Python Virtual Environment (Recommended)

Prerequisites

Automated Setup

Windows Automated Setup: Using Python Virtual Environment (Recommended)

Select Python Interpreter Version

Install ffmpeg Globally as PowerShell Administrator

Requirements Installation

Enable Long Paths

Download PyTorch with CUDA Support for GPU Acceleration

Usage

LLM Customization

Begin Ollama Server

Run Project

Supported Languages

System Prompt Customization

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Install `ffmpeg` Globally as PowerShell Administrator

Packages