Offline PDF Reading Assistant: convert PDFs to Markdown, build a local vector index, and chat with your documents (with image understanding) on your machine.
- PDF → Markdown via Docling (OCR + tables)
- Optional image understanding (Moondream via Ollama)
- Local FAISS vector index (per folder), incremental updates and persistence
- Simple UI (Tkinter) with collapsible sidebar and favorites
- Fully offline: uses local models via Ollama
- Python 3.10+
- Ollama installed and running
# Install Ollama (macOS)
brew install ollama
ollama servegit clone https://github.com/<your-user>/PDFassistant.git
cd PDFassistant
python3 -m venv .venv
source .venv/bin/activate
# Core deps
pip install -U pip
pip install customtkinter pillow docling langchain-community langchain-huggingface langchain-ollama faiss-cpu# LLM for answering
ollama pull qwen2.5:7b
# Vision model for image descriptions
ollama pull moondreampython3 app.py- Click “Import Folder” and choose a folder containing PDFs (non‑recursive: reads PDFs in the folder root).
- The app will:
- Convert PDFs to Markdown (next to each PDF, same filename with .md)
- Build/update a FAISS index in
<folder>/.rag_storage/ - Let you ask questions; answers quote snippets and show source files
- Chinese text appears garbled in Markdown
- Many PDFs use CID fonts without proper ToUnicode mapping. In such cases, rely on OCR.
- Ensure Docling can use an OCR engine with Chinese language data (e.g., EasyOCR/Tesseract).
- If downloads are blocked, preinstall OCR models or configure Docling OCR options.
- No PDFs found
- The app scans only the selected folder’s root for
*.pdf. Move files to the root or extend scanning logic.
- The app scans only the selected folder’s root for
- Model not found
- Make sure
ollama serveis running and you’ve pulledqwen2.5:7bandmoondream.
- Make sure