Merge changes from upstream#35
Merged
KRRT7 merged 5 commits intocodeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8from Feb 27, 2026
Conversation
The _last_patched_idx approach overwrites previously patched chars when cur_item reverts after a figure with text ops. Instead, each do_TJ/do_Tj snapshots len(objs) before super() and only patches from that index.
pdfminer's base do_Tj delegates to self.do_TJ([s]), which already
dispatches to the overridden do_TJ. The do_Tj override was patching
the same char range a second time.
Repro (add print traces to do_TJ/do_Tj, run against any PDF):
from unstructured.partition.pdf_image.pdfminer_utils import open_pdfminer_pages_generator
with open("example-docs/pdf/reliance.pdf", "rb") as f:
for page, layout in open_pdfminer_pages_generator(f):
break
Before this fix, every Tj op produces two patch calls with the same
start index:
[TRACE] do_TJ patching from 9
[TRACE] do_Tj patching from 9 <- redundant
6fa1716
into
codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8
1 check passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Does this all look right?