[api-minor] Replace the CMapReaderFactory, StandardFontDataFactory, and WasmFactory API options with a single factory/option#20919
Closed
Snuffleupagus wants to merge 4 commits intomozilla:masterfrom
Closed
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20919 +/- ##
==========================================
- Coverage 62.65% 62.59% -0.06%
==========================================
Files 173 171 -2
Lines 121383 121284 -99
==========================================
- Hits 76049 75922 -127
- Misses 45334 45362 +28
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as off-topic.
This comment was marked as off-topic.
d9cdc86 to
ee90931
Compare
ee90931 to
20cc07c
Compare
20cc07c to
6ec05d5
Compare
f92a46f to
ef15e5b
Compare
ef15e5b to
83b3ae5
Compare
…`, and `WasmFactory` API options with a single factory/option Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1] These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise. Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will *only* be used when either: - The PDF.js library runs in Node.js environments. - The user manually sets `useWorkerFetch = false` when calling `getDocument`. - The user provides custom `CMapReaderFactory`, `StandardFontDataFactory`, and/or `WasmFactory` instances when calling `getDocument`. Thanks to the use of import maps the impact of these separate factories on the total bundle-size is *mostly* avoided in the Firefox PDF Viewer, however there's still some amount of effectively dead code present there. In order to reduce the number of `getDocument` options a little bit, this patch replaces the existing options with a single new `BinaryDataFactory` option instead. A question is obviously if the option of providing *custom factories* to replace the default ones is actually utilized[3], by third-party PDF.js users, however that's impossible to know and outright removing the API option is probably too risky? *Please note:* Any existing third-party code using the old factories should still work[4], since we (for now) fallback to use those if provided. --- [1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great. [2] Note that the Firefox PDF Viewer will never use these factories, since it "forcibly" sets `useWorkerFetch = true` during building. [3] Originally I recall that we only provided browser versions of these factories, and thus Node.js users needed to implement custom ones. However, this hasn't been the case for many years. [4] Barring any bugs in the patch, of course.
…taFactory` methods Given that the `BaseBinaryDataFactory` class is only used from the "FetchBinaryData" message handler, the `name`/`filename` parameters should never actually be missing here. Besides, a missing `name`/`filename` would result in a "nonsense" URL and the actual data fetching would then fail instead. Hence it just doesn't seem necessary to keep this extra bit of validation now.
It's not entirely clear to me why we need to special-case reading of these, given that: - The `bcmap` format has been used for over a decade now, see PR 4470. - The `bcmap` files have not been updated even once, since their introduction. - The "raw" CMap files only contain regular ASCII characters. - Fetching the bytes-data directly should just work, which I've confirmed by successfully testing this patch with `test/pdfs/mao.pdf` and the "raw" CMap files in PR 4259.
83b3ae5 to
1a5877f
Compare
1a5877f to
58aa162
Compare
This way we can build the complete `filename` directly in the worker-thread, which simplifies the CMap fetching in the `BaseBinaryDataFactory` class and improves overall consistency there.
58aa162 to
ed12eda
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[api-minor] Replace the
CMapReaderFactory,StandardFontDataFactory, andWasmFactoryAPI options with a singlefactory/option
Currently we have no less than three different, but very similar, factories for reading built-in CMap files, standard font files, and wasm files on the main-thread.[1]
These factories were added at different points in time, since I cannot imagine that we'd add essentially three copies of the same code otherwise.
Nowadays these factories are often not even used[2], since worker-thread fetching is used whenever possible to improve performance. In particular, they will only be used when either:
useWorkerFetch = falsewhen callinggetDocument.CMapReaderFactory,StandardFontDataFactory, and/orWasmFactoryinstances when callinggetDocument.Thanks to the use of import maps the impact of these separate factories on the total bundle-size is mostly avoided in the Firefox PDF Viewer, however there's still some amount of effectively dead code present there.
In order to reduce the number of
getDocumentoptions a little bit, this patch replaces the existing options with a single newBinaryDataFactoryoption instead.A question is obviously if the option of providing custom factories to replace the default ones is actually utilized[3], by third-party PDF.js users, however that's impossible to know and outright removing the API option is probably too risky?
Please note: Any existing third-party code using the old factories should still work[4], since we (for now) fallback to use those if provided.
[1] Any new functionality could easily lead to more such factories being added in the future, which wouldn't be great.
[2] Note that the Firefox PDF Viewer will never use these factories, since it "forcibly" sets
useWorkerFetch = trueduring building.[3] Originally I recall that we only provided browser versions of these factories, and thus Node.js users needed to implement custom ones. However, this hasn't been the case for many years.
[4] Barring any bugs in the patch, of course.
Remove explicit
name/filenamevalidation in variousBaseBinaryDataFactorymethodsGiven that the
BaseBinaryDataFactoryclass is only used from the "FetchBinaryData" message handler, thename/filenameparameters should never actually be missing here.Besides, a missing
name/filenamewould result in a "nonsense" URL and the actual data fetching would then fail instead. Hence it just doesn't seem necessary to keep this extra bit of validation now.Simplify the main-thread reading of "raw" built-in CMap files
It's not entirely clear to me why we need to special-case reading of these, given that:
bcmapformat has been used for over a decade now, see PR CMaps binary packing #4470.bcmapfiles have not been updated even once, since their introduction.test/pdfs/mao.pdfand the "raw" CMap files in PR Adds built in CMaps and unifies the glyph mapping. #4259.