Convert Catalog.getAllPageDicts to an async method

The patch in PR 14335 *essentially* re-introduced the old code from before PR 3848, however looking at this code a bit closer it should be possible to simplify it by making the method asynchronous.

While this method is currently only used as a *fallback* in corrupt documents, the way that `MissingDataException`s are handled is less than ideal. Note that if a `MissingDataException` is thrown, we're forced to re-parse the *entire* /Pages tree[1].
With this method now being asynchronous, we're able to handle fetching of References in a *much* easier/nicer way than before without having to throw `MissingDataException`s and re-parse anything.
These changes also let us simplify the call-site slightly, by calling the method *directly* instead of using the `PDFManager`-instance (since again it will no longer throw `MissingDataException`s).

Furthermore, this patch contains the following other changes:
 - Reduce unnecessary duplication in the various `catch` handlers throughout the method, by simply moving the `XRefEntryException` handling into the `addPageError` helper function instead.
 - Move the "circular references"-check to occur slightly earlier, since there's obviously no point in asynchronously fetching data just to then throw an Error *immediately* afterwards.

---
[1] Imagine e.g. a thousand page document, where there's a `MissingDataException` thrown when fetching/parsing page 900.
This commit is contained in:
Jonas Jenwald 2021-12-31 14:57:01 +01:00
parent 3d7bb6c38d
commit b0e774d9c5
2 changed files with 29 additions and 39 deletions

View file

@ -1401,9 +1401,7 @@ class PDFDocument {
let pagesTree;
try {
pagesTree = await pdfManager.ensureCatalog("getAllPageDicts", [
recoveryMode,
]);
pagesTree = await catalog.getAllPageDicts(recoveryMode);
} catch (reasonAll) {
if (reasonAll instanceof XRefEntryException && !recoveryMode) {
throw new XRefParseException();