Skip to content

Latest commit

 

History

History
65 lines (47 loc) · 2 KB

File metadata and controls

65 lines (47 loc) · 2 KB

PDF Compression and Linearization Script

This Python script automates the compression and linearization of PDF files using two open-source tools: Ghostscript and QPDF.

📦 Features

  • Compress PDFs using Ghostscript: Offers high compression while maintaining original source quality.
  • Linearize PDFs using QPDF: Enables fast web viewing of large PDF files.
  • Batch Processing: Recursively scans directories for PDFs matching a given keyword.
  • Automatic File Naming: Prevents overwriting and infinite loops with unique output filenames.

📚 Tools Used

Ghostscript

Ghostscript is used for compressing PDFs. After thorough testing, the following command options were chosen for best results based on the given case. But this can be changed based on the users needs. Full commands can be found at https://community.intersystems.com/post/pdf-compression-using-ghostscript

gs_commands = [
  "gs", "-sDEVICE=pdfwrite",
  "-dCompatibilityLevel=1.4",
  f"-dPDFSETTINGS=/{quality}",
  "-dNOPAUSE",
  "-dBATCH",
  f"-sOutputFile={output_path}",
  input_path
]

QPDF

QPDF is used to linearize the compressed PDF, enabling faster loading on web browsers. The process to actually linearize a file is much simpler. More information on the commands can be found at https://www.linux-magazine.com/Issues/2019/226/Reconstruction

qpdf --linearize input.pdf output.pdf

📝 How It Works

  1. Start: The script begins traversing the specified input directory (and all its subdirectories).
  2. Search: It looks for all .pdf files that contain a specified keyword in the filename.
  3. Compress: Matching files are compressed using Ghostscript.
  4. Linearize: The compressed PDF is then linearized using QPDF.
  5. Output: Results are saved with unique names in the specified output directory.

🛠 Usage + Running

  1. Need Python 3.x
  2. Ghostscript
  3. QPDF

Installing

brew install ghostscript
brew install qpdf

Running The Script

python3 filename.py