Skip to content

LDEV-2968 v3 openpdftohtml#62

Open
zspitzer wants to merge 44 commits into
masterfrom
v3-openhtmltopdf
Open

LDEV-2968 v3 openpdftohtml#62
zspitzer wants to merge 44 commits into
masterfrom
v3-openhtmltopdf

Conversation

@zspitzer
Copy link
Copy Markdown
Member

@zspitzer zspitzer commented Feb 5, 2026

- optimize: remove bookmarks, metadata, JS, attachments, thumbnails, comments, forms, links
- sanitize: security-focused removal of dangerous elements (JS, attachments, metadata, link actions)
- addStamp: delegates to watermark for image-based stamps
- srcfile/src without tag body now triggers rendering via doEndTag
- getBaseUrl() handles Lucee Resources, not just java.io.File
- Empty body no longer overrides srcfile content
- Encryption constants are now distinct; AES-128 uses setPreferAES
- Page ranges like "3-" resolve actual page count instead of -1
- "printing" permission maps to ALLOW_PRINTING, remove dead duplicate
- Cache getInfo() result to avoid re-parsing PDF on every struct access
- Close source PDDocuments in concat(), use try-with-resources in toImage()
- Fix InputStream leak in PDFForm.loadPDDocument() on error
- Remove pd4ml.jar, ss_css2.jar, .flattened-pom.xml
- Consolidate handlePageNumbers() into processPageVariables()
- Remove dead getMultipleHF() and multi-render loop
- Remove empty writeImages() stub
- Require action attribute on cfpdf (was silent no-op)
- Implement setFilter() for directory merge glob patterns
- Fix FONT_EMBED_SELECCTIVE typo
- Fix setMimetype() discarding normalised value
- Update fontdirectory TLD description
Routes render logging through Lucee's pdf log when defined.
No-op if the log isn't configured in admin.
cfdocument src with proxyserver/proxyport now works.
Verifies invalid proxy errors on remote fetch, and is ignored for local content.
- Prevent XXE in PDFForm XML parsing (disable DTDs and external entities)
- Sanitise saveAsName in Content-Disposition header (strip CR/LF/quotes)
- Fix LuceeLogHandler accumulation (only attach once per JVM)
- Escape extracted text in XML output (extractText type=xml)
- Fix setFilter glob-to-regex using Pattern.quote for literal chars
- Map thumbnail scale (1-100) to DPI (3-300) instead of using raw value
- Remove dead PDF2Image.java
- Enable checkFileLocation on cfdocument filename attribute
- DocumentSection.setMimetype() now passes normalised value (was discarded)
- PDFPageMark.getHtmlTemplate() delegates to getHtml() for bounds safety
- ApplicationSettings.init is now volatile
- Remove FontsJarExtractor.main() debug method
- Replace e.printStackTrace() calls with comments
…onts.jar

- Strip path components from PDF attachment filenames to prevent directory traversal
- Set 15s timeout on JSoup URL fetching
- Remove dead fonts.jar from res/ (2.4MB, loaded from classloader not res/)
Use OpenHTMLToPDF's native <bookmarks> element for PDF outline
generation, replacing the post-render hack that pointed all bookmarks
to section start pages. Bookmarks now resolve to exact rendered page
positions for explicit bookmarks, HTML headings, and section names.

cfpdf merge now preserves and remaps bookmarks from all source PDFs
with correct page offsets, and filters out bookmarks for excluded pages.
Renders content onto a larger page proportional to the scale factor,
then uses PDFBox to scale pages back to target dimensions.
IsPDFArchive now validates pdfaid:part in XMP metadata instead of just
checking if the file is a valid PDF. getInfo() includes PDFAVersion key.
Register IsPDFArchive as a standalone function in function.fld.
Allow self-closing unknown HTML tags for compatibility with real-world HTML
Accepts a Component (with onResourceFetch method) or UDF to intercept
image/CSS/resource fetching. Returns content or null to fall through
to default. Wired into both src fetching and OpenHTMLToPDF rendering.
Bumps test.bat to jdk-11.0.30 / jdk-21.0.10 and 7.1/snapshot/light.
Adds -fs-table-paginate: paginate to the default OpenHTMLToPDF
stylesheet so tables break across pages instead of dropping rows.
The handler now receives (url, parsedUrl) where parsedUrl is a struct
with protocol, host, port, path, query, fragment. CFC handlers can
still declare onResourceFetch(url) and ignore the second arg.
setScale now throws for values <= 0 (was < 0), matching the documented
1-100 range. Adds TODO note where mergeDocuments() can NPE on form
fields with null font names — PDFBOX-5963, fixed in 3.0.8.
DocumentRendering.cfc — 24 specs covering basic rendering, HTML
entities, unicode, CSS, page breaks, page-size dimensions, images,
HTML-to-AcroField conversion, and error handling. One skip for the
checkbox case waiting on PDFBOX-5963.
Both files were previously fully skipped because they needed a PDF
with AcroFields. The fixture is now generated from a cfdocument with
<input type=text>, so all 9 populate specs and 6 read specs run.
Tests:
- PDFWatermark — verifies watermark image is actually embedded via
  extractImage. removeWatermark stub is documented + skipped.
- PDFThumbnail — verifies scale produces different-sized images.
- PDFExtractImages — checks imagePrefix is honoured + extracted file
  is a valid image, multi-image case.
- PDFRemoveAttachments — verifies attachments are gone via
  extractAttachments round-trip.

Java:
- Extract path-traversal sanitizer to PDFUtil.sanitizeFilename, add
  null-byte rejection. Used by extractAttachments.
- Fix InputStream leak in PDFForm.loadFromResource — switch to
  try-with-resources, the buffer copies bytes but never owns the
  stream.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant