Improve memory usage around the BasePdfManager.docBaseUrl parameter (PR 7689 follow-up)

While there is nothing *outright* wrong with the existing implementation, it can however lead to increased memory usage in one particular case (that I completely overlooked when implementing this): For "data:"-URLs, which by definition contains the entire PDF document and can thus be arbitrarily large, we obviously want to avoid sending, storing, and/or logging the "raw" docBaseUrl in that case. To address this, this patch makes the following changes: - Ignore any non-string in the `docBaseUrl` option passed to `getDocument`, since those are unsupported anyway, already on the main-thread. - Ignore "data:"-URLs in the `docBaseUrl` option passed to `getDocument`, to avoid having to send what could potentially be a *very* long string to the worker-thread. - Parse the `docBaseUrl` option *directly* in the `BasePdfManager`-constructors, on the worker-thread, to avoid having to store the "raw" docBaseUrl in the first place.
2025-07-09 09:45:42 +02:00 · 2021-03-16 11:56:39 +01:00 · 2021-03-16 11:56:39 +01:00 · c4c7216171
commit c4c7216171
parent bd9dee1544
3 changed files with 26 additions and 18 deletions
--- a/src/display/api.js
+++ b/src/display/api.js
@ -40,6 +40,7 @@ import {
  deprecated,
  DOMCanvasFactory,
  DOMCMapReaderFactory,
+  isDataScheme,
  loadScript,
  PageViewport,
  RenderingCancelledException,
@ -285,6 +286,15 @@ function getDocument(src) {
  params.fontExtraProperties = params.fontExtraProperties === true;
  params.pdfBug = params.pdfBug === true;

+  if (
+    typeof params.docBaseUrl !== "string" ||
+    isDataScheme(params.docBaseUrl)
+  ) {
+    // Ignore "data:"-URLs, since they can't be used to recover valid absolute
+    // URLs anyway. We want to avoid sending them to the worker-thread, since
+    // they contain the *entire* PDF document and can thus be arbitrarily long.
+    params.docBaseUrl = null;
+  }
  if (!Number.isInteger(params.maxImageSize)) {
    params.maxImageSize = -1;
  }