PDF Accessibility App

An automated PDF remediation tool from the CUNY AI Lab that converts uploaded PDFs into accessible, PDF/UA-1 compliant documents.

Overview

Upload a PDF and the app automatically remediates it through a multi-step pipeline: classification, OCR, structure extraction, semantic analysis, accessible tagging, and validation. Output is gated by veraPDF compliance checks and fidelity analysis to ensure quality. Documents that can't be fully remediated are flagged for manual review.

Pipeline

Classify — Determine whether the PDF is digital, mixed, or scanned
OCR — Add searchable text to scanned pages (OCRmyPDF) with automatic language detection
Structure — Extract document structure via Docling, with LLM-assisted TOC enhancement
Alt Text — Generate alt text for figures and reclassify misidentified elements using a vision LLM
Tag — Resolve ambiguous semantics (tables, forms, reading order, grounded text) via LLM, then write PDF/UA structure tags deterministically with pikepdf
Validate — Check PDF/UA-1 compliance with veraPDF
Fidelity — Verify output faithfulness (text drift, reading order, table coverage, form labels)

Tech Stack

Layer	Technology
Backend	Python 3.12, FastAPI, SQLAlchemy (async SQLite)
Frontend	React, TypeScript, Vite, Tailwind CSS 4, TanStack Query
PDF Processing	pikepdf, OCRmyPDF, Ghostscript, Poppler, QPDF
Structure Extraction	Docling (local or docling-serve)
Semantic Analysis	Gemini Developer API (`gemini-3-flash-preview`)
OCR	OCRmyPDF, Tesseract
Validation	veraPDF

Prerequisites

Python 3.12+ and uv
Bun
Ghostscript
OCRmyPDF
Tesseract (used by OCRmyPDF and for local crop OCR)
veraPDF (requires Java runtime)
Poppler (pdftoppm)

On macOS: install via Homebrew. On Ubuntu/Debian: ghostscript, poppler-utils, tesseract-ocr, plus a Java runtime for veraPDF.

Getting Started

1. Configure environment

cp .env.example .env
# Edit .env — at minimum, set GEMINI_API_KEY

Key environment variables:

Variable	Description	Default
`GEMINI_API_KEY`	Google Gemini API key for direct PDF understanding and fallback chat-completions calls	—
`LLM_BASE_URL`	Gemini Developer API chat-completions base URL	`https://generativelanguage.googleapis.com/v1beta/openai`
`LLM_API_KEY`	Optional override for the chat-completions client; falls back to `GEMINI_API_KEY` when unset	—
`LLM_MODEL`	Model identifier	`google/gemini-3-flash-preview`
`GEMINI_MODEL`	Direct Gemini model identifier for native PDF lanes	`gemini-3-flash-preview`
`GEMINI_DIRECT_THINKING_LEVEL`	Default Gemini thinking level for direct PDF semantic lanes	`low`
`GEMINI_DIRECT_ALT_TEXT_THINKING_LEVEL`	Gemini thinking level override for figure semantics and alt text	`medium`
`ALT_TEXT_MAX_CONCURRENCY`	Maximum concurrent page-level figure/alt-text LLM requests per PDF	`8`
`ALT_TEXT_GLOBAL_MAX_CONCURRENCY`	Process-wide cap for concurrent figure/alt-text provider work across PDFs	`12`
`DOCLING_SERVE_URL`	Local or remote `docling-serve` URL for structure extraction	—
`DOCLING_SERVE_TOKEN`	Optional bearer token for a protected `docling-serve` proxy	—
`OCR_LANGUAGE`	Default Tesseract language code	`eng`
`JOB_TTL_HOURS`	Hours before jobs expire	`12`
`VERAPDF_PATH`	Path to veraPDF binary	`verapdf`
`GHOSTSCRIPT_PATH`	Path to Ghostscript binary	`gs`

2. Install dependencies

cd backend && uv sync
cd ../frontend && bun install

3. Run locally

# Terminal 1 — backend
cd backend
uv run uvicorn app.main:app --reload --port 8001

# Terminal 2 — frontend
cd frontend
bun dev

Frontend: http://localhost:5173
Backend API: http://localhost:8001

The frontend proxies /api and /health to the backend via Vite config.

Recommended Mac Runtime

For the main app on this machine, the intended setup is:

LLM semantics through the Gemini Developer API
structure extraction through local docling-serve
Apple GPU acceleration through MPS on the docling-serve process when available

Set DOCLING_SERVE_URL=http://localhost:5001 in .env, and start docling-serve with DOCLING_DEVICE=mps. The structure step will use that server. The later PDF tagging/writing step is still local CPU work.

You can verify the effective runtime with:

cd backend
PYTHONPATH=. uv run python scripts/runtime_diagnostics.py

Docker

A single-container deployment bundles all dependencies (Ghostscript, OCRmyPDF, Tesseract, Poppler, QPDF, Java, veraPDF) with the built frontend served by FastAPI.

cp .env.example .env
# Edit .env with your GEMINI_API_KEY
# Leave LLM_API_KEY empty unless you intentionally want a different
# chat-completions credential than the Gemini Developer API key.

docker compose up -d --build

Open http://localhost:8080. Health check at /health.

If port 8080 is in use, set APP_PORT in .env.

You can also run the image directly without Compose:

docker build -t pdf-accessibility-app .
docker run -d \
  --name pdf-accessibility-app \
  --env-file .env \
  -p 8080:8001 \
  -v pdf_accessibility_data:/app/data \
  -v pdf_accessibility_cache:/home/app/.cache \
  pdf-accessibility-app

Notes:

The image preloads Docling models so there are no first-run downloads.
The intended Gemini-first deployment shape is: GEMINI_API_KEY=<key>, LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai, LLM_API_KEY= blank, LLM_MODEL=google/gemini-3-flash-preview, GEMINI_MODEL=gemini-3-flash-preview, USE_DIRECT_GEMINI_PDF=true, GEMINI_DIRECT_THINKING_LEVEL=low, GEMINI_DIRECT_ALT_TEXT_THINKING_LEVEL=medium, ALT_TEXT_MAX_CONCURRENCY=8, and ALT_TEXT_GLOBAL_MAX_CONCURRENCY=12.
For subpath deployments, set VITE_APP_BASE_PATH before building (e.g., /pdf-accessibility/).
Tesseract language packs included: English, Spanish, French, German, Chinese (Simplified + Traditional), Russian, Arabic, Korean, Bengali, Polish, Hebrew, Yiddish, Haitian Creole, Hindi, Italian, Portuguese, Japanese. Add others by extending the Dockerfile.

Project Structure

backend/
  app/
    api/              # FastAPI route handlers
    pipeline/         # classify, ocr, structure, tag, validate, fidelity
    services/         # semantic adjudication, storage, LLM client
    models.py         # SQLAlchemy ORM models
    config.py         # App settings
  tests/              # Backend test suite

frontend/
  src/
    pages/            # Upload, Dashboard, JobDetail, Review
    components/       # UI components
    api/              # TanStack Query hooks
    types/            # Shared TypeScript types

data/                 # Runtime storage (git-ignored)

Testing

# Backend
cd backend
PYTHONPATH=. uv run pytest tests -q

# Frontend
cd frontend
bun run lint
bun run build

OCR Language Support

The app auto-detects the document language during classification and selects the appropriate Tesseract language pack for OCR. For digital/mixed PDFs, it extracts existing text and identifies the language with lingua-py. For scanned PDFs, it runs a quick probe OCR on page 1 with all installed language packs, then identifies the language from the result.

Language priority: auto-detection > OCR_LANGUAGE env var default.

For local development, install Tesseract language packs via your package manager. On macOS, brew install tesseract-lang installs all languages. On Debian/Ubuntu, install individual packs (e.g., apt install tesseract-ocr-spa). If a language pack is missing, probe OCR falls back gracefully to the OCR_LANGUAGE default.

Session Model

The app uses anonymous browser sessions — no login required. Each browser gets an HTTP-only session cookie, and all jobs are scoped to that session. Jobs expire after JOB_TTL_HOURS (default: 12 hours).

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.claude		.claude
backend		backend
docs		docs
frontend		frontend
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
ACCESSIBILITY_COVERAGE.md		ACCESSIBILITY_COVERAGE.md
Dockerfile		Dockerfile
README.md		README.md
compose.yaml		compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Accessibility App

Overview

Pipeline

Tech Stack

Prerequisites

Getting Started

1. Configure environment

2. Install dependencies

3. Run locally

Recommended Mac Runtime

Docker

Project Structure

Testing

OCR Language Support

Session Model

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Accessibility App

Overview

Pipeline

Tech Stack

Prerequisites

Getting Started

1. Configure environment

2. Install dependencies

3. Run locally

Recommended Mac Runtime

Docker

Project Structure

Testing

OCR Language Support

Session Model

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages