Skip to content

[api-minor] Replace the CMapReaderFactory, StandardFontDataFactory, and WasmFactory API options with a single factory/option#20919

Closed
Snuffleupagus wants to merge 4 commits intomozilla:masterfrom
Snuffleupagus:BinaryDataFactory
Closed

[api-minor] Replace the CMapReaderFactory, StandardFontDataFactory, and WasmFactory API options with a single factory/option#20919
Snuffleupagus wants to merge 4 commits intomozilla:masterfrom
Snuffleupagus:BinaryDataFactory

Conversation

@Snuffleupagus
Copy link
Copy Markdown
Collaborator

@Snuffleupagus Snuffleupagus commented Mar 19, 2026

  • [api-minor] Replace the CMapReaderFactory, StandardFontDataFactory, and WasmFactory API options with a single
    factory/option

    Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1]
    These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise.

    Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will only be used when either:

    • The PDF.js library runs in Node.js environments.
    • The user manually sets useWorkerFetch = false when calling getDocument.
    • The user provides custom CMapReaderFactory, StandardFontDataFactory, and/or WasmFactory instances when calling getDocument.

    Thanks to the use of import maps the impact of these separate factories on the total bundle-size is mostly avoided in the Firefox PDF Viewer, however there's still some amount of effectively dead code present there.
    In order to reduce the number of getDocument options a little bit, this patch replaces the existing options with a single new BinaryDataFactory option instead.

    A question is obviously if the option of providing custom factories to replace the default ones is actually utilized[3], by third-party PDF.js users, however that's impossible to know and outright removing the API option is probably too risky?

    Please note: Any existing third-party code using the old factories should still work[4], since we (for now) fallback to use those if provided.


    [1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great.

    [2] Note that the Firefox PDF Viewer will never use these factories, since it "forcibly" sets useWorkerFetch = true during building.

    [3] Originally I recall that we only provided browser versions of these factories, and thus Node.js users needed to implement custom ones. However, this hasn't been the case for many years.

    [4] Barring any bugs in the patch, of course.

  • Remove explicit name/filename validation in various BaseBinaryDataFactory methods

    Given that the BaseBinaryDataFactory class is only used from the "FetchBinaryData" message handler, the name/filename parameters should never actually be missing here.
    Besides, a missing name/filename would result in a "nonsense" URL and the actual data fetching would then fail instead. Hence it just doesn't seem necessary to keep this extra bit of validation now.

  • Simplify the main-thread reading of "raw" built-in CMap files

    It's not entirely clear to me why we need to special-case reading of these, given that:

    • The bcmap format has been used for over a decade now, see PR CMaps binary packing #4470.
    • The bcmap files have not been updated even once, since their introduction.
    • The "raw" CMap files only contain regular ASCII characters.
    • Fetching the bytes-data directly should just work, which I've confirmed by successfully testing this patch with test/pdfs/mao.pdf and the "raw" CMap files in PR Adds built in CMaps and unifies the glyph mapping. #4259.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 76.47059% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.59%. Comparing base (ff1af5a) to head (ed12eda).

Files with missing lines Patch % Lines
src/display/binary_data_factory.js 73.94% 37 Missing ⚠️
src/display/api.js 80.95% 8 Missing ⚠️
src/core/evaluator.js 84.61% 2 Missing ⚠️
src/core/jpx.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20919      +/-   ##
==========================================
- Coverage   62.65%   62.59%   -0.06%     
==========================================
  Files         173      171       -2     
  Lines      121383   121284      -99     
==========================================
- Hits        76049    75922     -127     
- Misses      45334    45362      +28     
Flag Coverage Δ
fonttest 7.66% <0.00%> (ø)
unittestcli 62.57% <76.47%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Snuffleupagus

This comment was marked as off-topic.

@mozilla mozilla deleted a comment from moz-tools-bot Mar 20, 2026
@mozilla mozilla deleted a comment from moz-tools-bot Mar 20, 2026
@mozilla mozilla deleted a comment from moz-tools-bot Mar 20, 2026
@mozilla mozilla deleted a comment from moz-tools-bot Mar 20, 2026
…`, and `WasmFactory` API options with a single factory/option

Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1]
These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise.

Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will *only* be used when either:
 - The PDF.js library runs in Node.js environments.
 - The user manually sets `useWorkerFetch = false` when calling `getDocument`.
 - The user provides custom `CMapReaderFactory`, `StandardFontDataFactory`, and/or `WasmFactory` instances when calling `getDocument`.

Thanks to the use of import maps the impact of these separate factories on the total bundle-size is *mostly* avoided in the Firefox PDF Viewer, however there's still some amount of effectively dead code present there.
In order to reduce the number of `getDocument` options a little bit, this patch replaces the existing options with a single new `BinaryDataFactory` option instead.

A question is obviously if the option of providing *custom factories* to replace the default ones is actually utilized[3], by third-party PDF.js users, however that's impossible to know and outright removing the API option is probably too risky?

*Please note:* Any existing third-party code using the old factories should still work[4], since we (for now) fallback to use those if provided.

---

[1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great.

[2] Note that the Firefox PDF Viewer will never use these factories, since it "forcibly" sets `useWorkerFetch = true` during building.

[3] Originally I recall that we only provided browser versions of these factories, and thus Node.js users needed to implement custom ones. However, this hasn't been the case for many years.

[4] Barring any bugs in the patch, of course.
…taFactory` methods

Given that the `BaseBinaryDataFactory` class is only used from the "FetchBinaryData" message handler, the `name`/`filename` parameters should never actually be missing here.
Besides, a missing `name`/`filename` would result in a "nonsense" URL and the actual data fetching would then fail instead. Hence it just doesn't seem necessary to keep this extra bit of validation now.
It's not entirely clear to me why we need to special-case reading of these, given that:
 - The `bcmap` format has been used for over a decade now, see PR 4470.
 - The `bcmap` files have not been updated even once, since their introduction.
 - The "raw" CMap files only contain regular ASCII characters.
 - Fetching the bytes-data directly should just work, which I've confirmed by successfully testing this patch with `test/pdfs/mao.pdf` and the "raw" CMap files in PR 4259.
This way we can build the complete `filename` directly in the worker-thread, which simplifies the CMap fetching in the `BaseBinaryDataFactory` class and improves overall consistency there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants