Cross-platform desktop application for extracting vocabulary from images and creating Anki flashcards. Everything runs locally using llama.cpp and the Gemma 3 4B model. No cloud dependencies, no API keys, fully offline.
The application is composed of two layers: a Flutter GUI that provides the user interface and a Python FastAPI backend that handles vision OCR and vocabulary enrichment through llama.cpp.
| Layer | Technology | Purpose |
|---|---|---|
| Flutter GUI | Dart, Material 3 | Desktop interface (Linux, macOS, Windows) |
| Python API | FastAPI, llama.cpp | Vision OCR and text enrichment backend |
| Vision OCR | llama-mtmd-cli | Extract text from images (GPU accelerated) |
| Text tasks | llama-server | Definitions, examples, vocabulary enrichment |
| Model | Gemma 3 4B QAT Q4_0 | Single model for both vision and text |
Release binaries require no pre-installed dependencies. On first launch the app will automatically download:
- Python runtime (~30 MB) — a portable copy, cached locally
- AI model (~3.2 GB) — Gemma 3 4B, one-time download
The only system requirement is GTK 3 on Linux.
For building from source, have Nix installed with flakes enabled.
Grab the latest tarball from the releases page, extract, and run:
tar xzf ocr-to-anki-v0.2.0-linux-x86_64.tar.gz
cd ocr-to-anki-v0.2.0-linux-x86_64
# GTK3 is required at runtime.
# On Ubuntu/Debian: sudo apt install libgtk-3-0
# On Fedora: sudo dnf install gtk3
# On NixOS: already available
./run.shFirst launch — fully automatic setup
On first run the app detects what's missing and guides you through two one-click downloads:
- "Python runtime needed — Download Python" (~30 MB, if Python is not already installed)
- "Model download required — Download now" (~3.2 GB)
Both downloads are cached locally and only happen once. After that the app starts instantly.
You can also download the model manually with the bundled script:
./scripts/setup-llama-cpp.sh
git clone https://github.com/stradichenko/ocr-to-anki.git
cd ocr-to-anki
# 1. Download the model and vision projector (~3 GB total, one time)
nix develop
./scripts/setup-llama-cpp.sh
# 2. Build the Flutter app
nix develop .#flutter
cd app
flutter pub get
flutter build linux --release
# The binary is at: app/build/linux/x64/release/bundle/ocr_to_ankiFor a distributable tarball that bundles the backend source:
nix develop .#flutter --command ./scripts/build-flutter.sh linux
# Output: output/release/ocr-to-anki-v0.1.0-linux-x86_64.tar.gzOr as a pure Nix derivation:
nix build .#flutter-app
./result/bin/ocr-to-ankiSee docs/building.md for macOS, Windows, and advanced build options.
| File | Size | Source |
|---|---|---|
| gemma-3-4b-it-q4_0_s.gguf | ~2.4 GB | stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small |
| mmproj-model-f16-4B.gguf | ~812 MB | stduhpf/google-gemma-3-4b-it-qat-q4_0-gguf-small |
Both are downloaded by ./scripts/setup-llama-cpp.sh via direct URL. No
authentication required.
Quantization-Aware Training (QAT) produces roughly 15% better perplexity than standard post-training Q4_0 quantization at the same size. The stduhpf repack also fixes broken control token metadata.
- Select context: handwritten or printed text, or highlighted words (pick colour)
- Add images through the file picker or drag and drop
- Vision OCR: Gemma 3 extracts words from the image
- Enrich: the LLM generates definitions and example sentences
- Review: edit the generated cards before export
- Export: send to Anki via AnkiConnect, or save as TSV/JSON
The Flutter app manages the backend process automatically. When you launch the app, it spawns the FastAPI server and waits until it reports healthy. No manual server management is needed.
If you prefer to run the backend separately:
nix develop
PYTHONPATH=src uvicorn src.api.app:app --host 0.0.0.0 --port 8000Edit config/settings.yaml to customize the backend:
ai_backend:
type: 'llama_cpp'
llama_cpp:
host: '127.0.0.1'
port: 8090
context_size: 4096
n_gpu_layers: -1
mmproj_offload: false # set true when using OpenCL backendMost settings are also available through the in-app Settings screen.
Flutter desktop does not support cross-compilation. Each platform must be built
on its native OS. The CI/CD workflow at .github/workflows/build.yml handles
this using platform-specific runners.
| Build host | Linux | macOS | Windows |
|---|---|---|---|
| Linux | yes | no | no |
| macOS | no | yes | no |
| Windows | no | no | yes |
Requires a Mac with Xcode installed:
nix develop .#flutter
cd app && flutter pub get && flutter build macos --releaseRequires Visual Studio 2022 with the "Desktop development with C++" workload:
cd app
flutter pub get
flutter build windows --releasePush a version tag to trigger builds for all three platforms:
git tag v0.2.0
git push origin v0.2.0This creates a draft GitHub Release with Linux, macOS, and Windows artifacts. See docs/building.md for the full reference.
The vision backend requires llama-mtmd-cli built with GPU support:
# OpenCL (recommended for Intel integrated GPUs)
nix develop .#sycl
./scripts/build-llama-mtmd-opencl.sh
# Vulkan (fallback, see note below)
./scripts/build-llama-mtmd-vulkan.shAuto-detection picks the best available backend: CUDA, Metal, OpenCL, Vulkan, then CPU.
| Backend | Vision encoder | Encode time | Text gen | Binary |
|---|---|---|---|---|
| OpenCL | correct | ~2 min (GPU) | 4.1 tok/s | llama-mtmd-cli-opencl |
| Vulkan | corrupted | 0.4s (garbage) | 3.6 tok/s | llama-mtmd-cli |
| CPU | correct | ~43 min | 0.7 tok/s | any binary with --no-mmproj-offload |
OpenCL is roughly 20x faster than CPU vision and produces correct output. It requires a one-line patch for Intel work group sizes, applied automatically by the build script. See patches/opencl-intel-workgroup-fix.patch.
Vulkan corruption details
On Intel integrated GPUs (for example UHD Graphics CML GT2), the Vulkan compute backend produces corrupted output from the SigLIP vision encoder. Text generation works fine on Vulkan; only the vision projector is affected.
Root cause: Intel Vulkan compute shaders produce f16 underflow and overflow in the CLIP/SigLIP transformer. Debug embeddings show 75%+ of values saturate to exactly -1.0 (clamped NaN/inf). This is a known class of bug on integrated GPUs.
If you have a discrete NVIDIA GPU, Vulkan and CUDA both work fine. Set
mmproj_offload: true in config/settings.yaml.
GET /health Backend status
GET /backends Detected GPU hardware
POST /ocr/vision Vision OCR (base64 image)
POST /ocr/vision/upload Vision OCR (file upload)
POST /generate Raw text generation
POST /enrich Vocabulary enrichment (definitions + examples)
POST /pipeline/image-to-cards Full pipeline: image to OCR to enrich to Anki cards
app/ Flutter GUI application
lib/
main.dart Entry point and routing
models/ Data models (AnkiNote, AppSettings, HighlightColor)
services/ Business logic
inference_service.dart LLM inference (talks to FastAPI)
highlight_detector.dart HSV highlight colour detection
anki_export_service.dart AnkiConnect and JSON export
backend_server_service.dart Backend process lifecycle
database/ Drift (SQLite) local storage
providers/ Riverpod state management
screens/ Home, Processing, Review, Settings, History
src/ Python backend
api/
app.py FastAPI endpoints and lifespan hooks
models.py Pydantic request/response models
backends/
auto_detect.py GPU and backend auto detection
mtmd_cli.py llama-mtmd-cli wrapper (vision, subprocess)
llama_cpp_server.py llama-server wrapper (text, persistent HTTP)
preprocessing/
highlight_cropper.py HSV highlight detection (Python reference)
workflows/ End to end pipelines
output/ Anki export and JSON output
config/
settings.yaml All configuration
scripts/ Build and setup scripts
build-flutter.sh Build Flutter for Linux/macOS/Windows
bundle-backend.sh Bundle Python backend with PyInstaller
setup-llama-cpp.sh Download model and vision projector
build-llama-mtmd-*.sh Build llama-mtmd-cli with various GPU backends
docs/
building.md Full build and release documentation
nix develop # Default: Python backend development
nix develop .#flutter # Flutter app build and development
nix develop .#cuda # With CUDA toolkit
nix develop .#sycl # With Intel OneAPI/SYCL and OpenCLnix build .#flutter-app # Flutter Linux desktop binary
nix build .#backend # Nix-wrapped Python backend
nix build .#bundle # Complete distribution (GUI + backend + launcher)
nix build .#dockerImage # Docker image for server deployment