fix: set max decompressed size for elements JSON#4244
Merged
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
badGarnet
previously approved these changes
Feb 19, 2026
lawrence-u10d
previously approved these changes
Feb 19, 2026
lawrence-u10d
approved these changes
Feb 19, 2026
aadland6
pushed a commit
that referenced
this pull request
Feb 19, 2026
Sets a max size on the decompressed version of an elements JSON. A quite large JSON from a 1225 page document is 5MB, for reference. One place we still might run into headroom issues is if a JSON from a quite large document included embedded digital images. The result of a JSON being too large, is that the decompressed version will not parse, as the tail will be left off. Part of the review should be to determine whether this is an acceptable failure mode. <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Medium Risk** > Touches deserialization of compressed element payloads, which can affect ingestion/round-tripping for large documents and changes the failure mode to explicit exceptions when limits are hit. > > **Overview** > Adds a hard cap (`MAX_DECOMPRESSED_SIZE`, default 200MB) when inflating base64+gzipped elements JSON in `elements_from_base64_gzipped_json`, preventing unbounded memory/disk blowups; decompression now explicitly fails with `DecompressedSizeExceededError` (new) when the limit is hit, or `zlib.error` when the payload is incomplete/corrupt. > > Bumps version to `0.20.7`, updates the changelog, and adds targeted tests covering normal round-trip, incomplete streams, and size-limit exceedance (via patching the max size). > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a5e5256. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sets a max size on the decompressed version of an elements JSON. A quite large JSON from a 1225 page document is 5MB, for reference. One place we still might run into headroom issues is if a JSON from a quite large document included embedded digital images.
The result of a JSON being too large, is that the decompressed version will not parse, as the tail will be left off. Part of the review should be to determine whether this is an acceptable failure mode.
Note
Medium Risk
Touches deserialization of compressed element payloads, which can affect ingestion/round-tripping for large documents and changes the failure mode to explicit exceptions when limits are hit.
Overview
Adds a hard cap (
MAX_DECOMPRESSED_SIZE, default 200MB) when inflating base64+gzipped elements JSON inelements_from_base64_gzipped_json, preventing unbounded memory/disk blowups; decompression now explicitly fails withDecompressedSizeExceededError(new) when the limit is hit, orzlib.errorwhen the payload is incomplete/corrupt.Bumps version to
0.20.7, updates the changelog, and adds targeted tests covering normal round-trip, incomplete streams, and size-limit exceedance (via patching the max size).Written by Cursor Bugbot for commit a5e5256. This will update automatically on new commits. Configure here.