Skip to content

Fixed OOM issue when reading partially corrupted PDFs#20723

Open
akopachov wants to merge 4 commits intomozilla:masterfrom
akopachov:master
Open

Fixed OOM issue when reading partially corrupted PDFs#20723
akopachov wants to merge 4 commits intomozilla:masterfrom
akopachov:master

Conversation

@akopachov
Copy link

I encountered multiple PDFs that appeared to be partially corrupted. As a result, PDF.js was unable to load them due to an out‑of‑memory (OOM) error.

This PR fixes the issue, and I can confirm that with this change in place I am now able to successfully load those PDFs.

Unfortunately, I can’t include or share the affected PDFs due to NDA restrictions. However, I suspect they may have been generated using one of the online “PDF combine” services. In terms of content, the PDFs each contain around 10–15 pages with a mix of scanned images and text.

@calixteman
Copy link
Contributor

Sorry but I'm not super excited with having such a patch without a test and a pdf.
You can send me a pdf on my pro email (you can find it here: https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&component=PDF%20Viewer, I'm the triage owner of the pdf component). If you can't because the pdf is too much confidential, that's fine but you need edit it yourself in order to remove whatever has to be private.
If you understand well enough the underlying problem, you can create a very basic one in using a text editor which render correctly in Acrobat or Chrome but not with pdf.js.

@akopachov
Copy link
Author

Will try to prepare some test PDF.
Thank you.

ch = data[offset];
while (ch !== LF && ch !== CR && ch !== LT) {
if (++offset >= data.length) {
const MAX_TOKEN_LENGTH = 0x1fffffe8;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any "magical" values/constants should always have a good comment explaining why they are needed and how they were chosen.

Copy link
Author

@akopachov akopachov Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thank you.
I added the comment explaining where and why I picked this magic value from, but I'm open to any discussion if it should be set to some more reasonable value.

Copy link
Collaborator

@Snuffleupagus Snuffleupagus Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that PDF.js is first and foremost intended for browsers, using Node.js to choose that value seems a little strange.


Also, please see https://github.com/mozilla/pdf.js/wiki/Squashing-Commits

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 62.78%. Comparing base (909a700) to head (f08d965).
⚠️ Report is 352 commits behind head on master.

Files with missing lines Patch % Lines
src/core/xref.js 93.33% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20723   +/-   ##
=======================================
  Coverage   62.77%   62.78%           
=======================================
  Files         169      169           
  Lines      119824   119832    +8     
=======================================
+ Hits        75224    75234   +10     
+ Misses      44600    44598    -2     
Flag Coverage Δ
fonttest 7.64% <ø> (ø)
unittestcli 62.75% <93.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants