Skip to content

Nei1TH/PDFlens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

PDFlens(RAG)

Offline PDF Reading Assistant: convert PDFs to Markdown, build a local vector index, and chat with your documents (with image understanding) on your machine.

Features

  • PDF → Markdown via Docling (OCR + tables)
  • Optional image understanding (Moondream via Ollama)
  • Local FAISS vector index (per folder), incremental updates and persistence
  • Simple UI (Tkinter) with collapsible sidebar and favorites
  • Fully offline: uses local models via Ollama
圖片 圖片

Quickstart (macOS)

1) Prerequisites

  • Python 3.10+
  • Ollama installed and running
# Install Ollama (macOS)
brew install ollama
ollama serve

2) Clone & Setup

git clone https://github.com/<your-user>/PDFassistant.git
cd PDFassistant

python3 -m venv .venv
source .venv/bin/activate

# Core deps
pip install -U pip
pip install customtkinter pillow docling langchain-community langchain-huggingface langchain-ollama faiss-cpu

3) Pull Models

# LLM for answering
ollama pull qwen2.5:7b

# Vision model for image descriptions
ollama pull moondream

4) Run

python3 app.py
  • Click “Import Folder” and choose a folder containing PDFs (non‑recursive: reads PDFs in the folder root).
  • The app will:
    1. Convert PDFs to Markdown (next to each PDF, same filename with .md)
    2. Build/update a FAISS index in <folder>/.rag_storage/
    3. Let you ask questions; answers quote snippets and show source files

Tips & Troubleshooting

  • Chinese text appears garbled in Markdown
    • Many PDFs use CID fonts without proper ToUnicode mapping. In such cases, rely on OCR.
    • Ensure Docling can use an OCR engine with Chinese language data (e.g., EasyOCR/Tesseract).
    • If downloads are blocked, preinstall OCR models or configure Docling OCR options.
  • No PDFs found
    • The app scans only the selected folder’s root for *.pdf. Move files to the root or extend scanning logic.
  • Model not found
    • Make sure ollama serve is running and you’ve pulled qwen2.5:7b and moondream.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages