Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion examples/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
aspose-pdf
lxml
pydicom
pandas
pandas
pytesseract
Comment on lines +4 to +5
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding pytesseract introduces a runtime dependency on the native Tesseract binary (not installed via pip). Without documenting installation steps (or handling TesseractNotFoundError with a clear message), users will hit confusing failures at runtime. Consider adding a short note in the example (and/or README) describing how to install Tesseract and how to configure pytesseract.pytesseract.tesseract_cmd on Windows.

Suggested change
pandas
pytesseract
pandas

Copilot uses AI. Check for mistakes.
40 changes: 38 additions & 2 deletions examples/working_with_documents/example_create_pdf_document.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
import aspose.pdf as ap
import io
import pytesseract
Comment on lines +1 to +3
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytesseract is imported at module load time, which makes all examples in this file fail to run if the optional OCR dependency (or its native tesseract binary) is not present, even when only create_new_document is executed. Consider moving the pytesseract import (and any related setup) inside create_searchable_document, and raise a clear exception when Tesseract is not available.

Copilot uses AI. Check for mistakes.
import sys
from os import path
from pathlib import Path

import aspose.pdf as ap

sys.path.append(path.join(path.dirname(__file__), ".."))

Expand All @@ -16,20 +19,53 @@ def create_new_document(input_pdf, output_pdf):
document.save(output_pdf)


def create_searchable_document(infile, outfile, image_file_path, page_number=1):
"""
An example of using optical character recognition (OCR) technology to create a searchable PDF document.

Args:
infile (str): The name of the input PDF file
outfile (str): The base name for output files (index will be appended)
image_file_path (str): The name of the image file
page_number (int): The page number

Returns:
None
"""
Comment on lines +24 to +34
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring formatting is over-indented, and it claims outfile is a base name with an appended index, but the implementation saves exactly to outfile without adding an index. Please align the docstring indentation and parameter descriptions with the actual behavior.

Suggested change
An example of using optical character recognition (OCR) technology to create a searchable PDF document.
Args:
infile (str): The name of the input PDF file
outfile (str): The base name for output files (index will be appended)
image_file_path (str): The name of the image file
page_number (int): The page number
Returns:
None
"""
Use optical character recognition (OCR) to create a searchable PDF document.
Args:
infile (str): The path to the input PDF file.
outfile (str): The path to the output searchable PDF file.
image_file_path (str): The path to the intermediate image file.
page_number (int): The page number to process.
Returns:
None
"""

Copilot uses AI. Check for mistakes.
image_stream = io.FileIO(image_file_path, 'x')
try:
document = ap.Document(infile)
resolution = ap.devices.Resolution(300)
png_device = ap.devices.PngDevice(resolution)
png_device.process(document.pages[page_number], image_stream)
pdf = pytesseract.image_to_pdf_or_hocr(image_file_path, extension='pdf')
document = ap.Document(io.BytesIO(pdf))
Comment on lines +35 to +42
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image_stream is opened with mode 'x' before the try block. If the file already exists (e.g., a previous run crashed before cleanup), this raises before cleanup runs. Also, the stream remains open when pytesseract reads image_file_path, which can fail on Windows due to file locking and/or unflushed writes. Open the file inside the try (or use a context manager), write/flush/close it before calling pytesseract, and consider using a tempfile-managed path to avoid collisions.

Copilot uses AI. Check for mistakes.
document.save(outfile)
finally:
image_stream.close()
image_file = Path(image_file_path)
image_file.unlink(missing_ok=True)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path.unlink(missing_ok=True) requires Python 3.8+, but the repo README states Python 3.7+ support. Replace this with a try/except FileNotFoundError (or check existence) to keep compatibility.

Suggested change
image_file.unlink(missing_ok=True)
try:
image_file.unlink()
except FileNotFoundError:
pass

Copilot uses AI. Check for mistakes.


def run_all_examples(data_dir=None, license_path=None):
"""Run PDF creation examples and report status."""
set_license(license_path)
input_dir, output_dir = initialize_data_dir(data_dir)

examples = [
("Create new document", create_new_document),
("Create a Searchable PDF document", create_searchable_document),
]

for name, func in examples:
try:
input_file_name = path.join(input_dir, f"{func.__name__}.pdf")
output_file_name = path.join(output_dir, f"{func.__name__}.pdf")
func(input_file_name, output_file_name)
if func == create_searchable_document:
image_path = path.join(output_dir, "create_searchable_document.png")
func(input_file_name, output_file_name, image_path)
else:
func(input_file_name, output_file_name)
print(f"✅ Success: {name}")
except Exception as e:
print(f"❌ Failed: {name} - {str(e)}")
Expand Down
Binary file not shown.