Command-line tool for OCR using DeepSeek vision models. Supports Ollama (local) and vLLM (GPU server) backends.
- Multi-backend: Ollama (local, free) and vLLM (OpenAI-compatible API)
- Supports PDFs and images (JPG, PNG, WEBP, GIF, BMP, TIFF)
- Per-document output folders with figures
- Batch processing with incremental resume (skips already-processed files)
- Retry with exponential backoff for transient failures
- Parallel page processing for faster PDF OCR
--dry-runto preview files before processing- Clean markdown output with HTML tables converted to markdown
- Python 3.10+
- Ollama installed and running (for Ollama backend)
deepseek-ocrmodel pulled in Ollama
# macOS/Linux
brew install ollama
# Or download from https://ollama.aiollama pull deepseek-ocrpip install deepseek-ocr-cli# Process a single image
deepseek-ocr document.jpg
# Process a PDF
deepseek-ocr paper.pdf
# Process all files in a directory
deepseek-ocr ./documents/ --recursive
# Preview files without processing
deepseek-ocr ./documents/ --dry-run
# Custom output directory
deepseek-ocr doc.pdf -o ./results/
# Use vLLM backend
deepseek-ocr paper.pdf --backend vllm --vllm-url http://gpu-server:8000/v1
# Parallel processing for faster PDF OCR
deepseek-ocr large-document.pdf -w 2
# Extract and analyze embedded figures
deepseek-ocr paper.pdf --analyze-figures
# Quiet mode (paths only, for scripting)
deepseek-ocr paper.pdf -qdeepseek-ocr [OPTIONS] INPUT_PATH
Options:
-o, --output-dir PATH Output directory for results
-r, --recursive Recursively process directories
--model TEXT Model name (default: deepseek-ocr)
--prompt TEXT Custom prompt for OCR
--task [convert|ocr|layout|extract|parse]
OCR task type
--extract-images Extract and save page images from PDFs
--no-metadata Exclude metadata from output
--dpi INTEGER PDF rendering DPI (default: 200)
-w, --workers INTEGER Parallel workers for PDF pages (default: 1)
--analyze-figures Extract and analyze embedded figures with AI
--max-dim INTEGER Max image dimension (default: 1920, 0 to disable)
--backend [ollama|vllm] Backend to use (default: ollama)
--vllm-url TEXT vLLM API URL (default: http://localhost:8000/v1)
--reprocess Force reprocessing of already-done files
--dry-run Preview files without processing
-q, --quiet Suppress output, print paths only
--verbose Enable verbose output
--help Show this message and exit.
Process documents and images with OCR. The process subcommand is optional:
deepseek-ocr document.pdf
# equivalent to
deepseek-ocr process document.pdfShow system and configuration information.
deepseek-ocr infoEach document gets its own folder:
output/
└── document/
├── document.md # OCR markdown
└── figures/ # Extracted figures (if --analyze-figures)
└── page1_fig1.png
The markdown includes metadata:
---
source: /path/to/document.pdf
processed: 2025-12-01T15:30:00
pages: 3
processing_time: 18.45s
model: deepseek-ocr
backend: ollama
---
## Page 1
[Extracted content...]Batch processing saves metadata.json in the output directory. On re-run, already-processed files are skipped automatically. Use --reprocess to force reprocessing.
Create a .env file or set environment variables with DEEPSEEK_OCR_ prefix:
DEEPSEEK_OCR_BACKEND=ollama
DEEPSEEK_OCR_MODEL_NAME=deepseek-ocr
DEEPSEEK_OCR_OUTPUT_DIR=output
DEEPSEEK_OCR_OLLAMA_URL=http://localhost:11434
DEEPSEEK_OCR_VLLM_BASE_URL=http://localhost:8000/v1
DEEPSEEK_OCR_MAX_DIMENSION=1920
DEEPSEEK_OCR_MAX_RETRIES=3
DEEPSEEK_OCR_RETRY_DELAY=1.0from pathlib import Path
from deepseek_ocr import create_backend, OCRProcessor
backend = create_backend(backend_type="ollama", model_name="deepseek-ocr")
backend.load_model()
processor = OCRProcessor(
backend=backend,
output_dir=Path("./results"),
workers=2,
)
result = processor.process_file(Path("document.pdf"))
print(result.output_text)
processor.save_result(result)
backend.unload_model()ollama serveollama pull deepseek-ocrdeepseek-ocr infoMIT License - see LICENSE for details.