Skip to content

fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode#33

Merged
KRRT7 merged 5 commits intoleak-controlfrom
codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8
Feb 26, 2026
Merged

fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode#33
KRRT7 merged 5 commits intoleak-controlfrom
codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8

Conversation

@codeflash-ai
Copy link
Copy Markdown

@codeflash-ai codeflash-ai bot commented Feb 26, 2026

Problem

_patch_current_chars_with_render_mode is called on every do_TJ/do_Tj text operator during PDF parsing. The original implementation re-scanned the entire cur_item._objs list each time, checking hasattr(item, "rendermode") to skip already-patched items. For a page with N characters across M text operations, this is O(N*M) — effectively quadratic.

Memray profiling showed this function as the # 1 allocator**: 17.57 GB total across 549M allocations in a session processing just 4 files.

Fix

Track the last-patched index so each call only processes newly-added LTChar objects. Reset automatically when cur_item changes (new page or figure).

Before: O(N²) per page — re-scans all accumulated objects on every text operator
After: O(N) per page — each object visited exactly once

Runtime improvement: the change reduces average call time from ~174μs to ~128μs (≈35% faster overall), with the biggest wins on workloads that iterate many items (e.g., the 1,000-item test improved ~49%).

What changed
- Combined two checks into one short‑circuiting conditional:
  - Old: check hasattr(item, "rendermode") first, continue if present, then isinstance(item, LTChar) before assignment.
  - New: if isinstance(item, LTChar) and not hasattr(item, "rendermode"): assign.
- Removed the explicit continue and the separate hasattr branch; the logic is identical from a correctness perspective.

Why this speeds things up
- Fewer attribute lookups per loop iteration. In Python attribute access (hasattr / attribute lookup) is relatively expensive. The original code executed hasattr(item, "rendermode") for every item, even for non-LTChar objects. By testing isinstance(item, LTChar) first, the hasattr call is avoided for non-LTChar items (the common case), so we save an attribute lookup per non-LTChar item.
- Short‑circuit evaluation reduces bytecode and branching overhead (no separate continue branch).
- These savings compound in the hot path: this method is invoked from do_TJ and do_Tj while parsing text, so it runs many times per page. The profiler and tests show the loop-level savings lead to measurable end-to-end runtime improvement.

Profiler & test evidence
- Overall profiler total time dropped from 0.001776s → 0.001371s.
- The large-scale test (1000 LTChar objects) went from 113μs → 76.2μs (~49.5% faster), demonstrating the optimization scales with number of items.
- Many unit/test cases also show smaller but consistent improvements (see annotated_tests). A single small regression was observed in one micro-benchmark (+~5% in a very narrow case where items are always LTChar and already patched), which is an acceptable trade-off given the overall runtime/throughput gains.

Impact on workloads
- Best for PDFs with many text items or mixed-type _objs lists (many non-LTChar items): savings are greatest because we avoid unnecessary hasattr calls for non-LTChar entries.
- Safe to merge: behavior is preserved (pre-existing rendermode attributes are still honored), and dependencies/visibility are unchanged.

Summary
- Primary benefit: reduced runtime (35% overall, large improvements on heavier inputs).
- Cause: eliminated redundant attribute checks via a single short‑circuiting conditional, lowering per-item overhead in a hot path.
- Trade-off: negligible; one tiny micro-benchmark showed a small regression, but the overall throughput and real-workload performance improved significantly.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 February 26, 2026 13:01
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Feb 26, 2026
KRRT7 and others added 2 commits February 26, 2026 08:32
Track last-patched index so each do_TJ/do_Tj call only processes
newly-added LTChar objects instead of re-scanning the entire _objs list.
Resets automatically when cur_item changes (new page or figure).
Runtime improvement (primary): The optimized function runs ~7% faster overall (181 μs → 168 μs). That runtime reduction is the reason this change was accepted.

What changed (concrete optimizations)
- Single-step cur_item lookup: replaced "hasattr(self.device, 'cur_item') and self.device.cur_item" with cur_item = getattr(self.device, 'cur_item', None) and an early return. This reduces two attribute lookups into one and short-circuits the common no-cur-item case.
- Single getattr for _objs: replaced the conditional expression cur_item._objs if hasattr(cur_item, "_objs") else [] with objs = getattr(cur_item, "_objs", []). That avoids an extra hasattr call and repeated attribute access.
- Avoid repeated len/index work: replaced index-based loop for i in range(start, len(objs)): obj = objs[i] with direct iteration over the sub-list objs[start:] (for obj in objs[start:]). Also added an explicit if start < len(objs): guard so we don't build a slice when there's nothing to process.

Why these changes speed things up (mechanics)
- Fewer Python-level attribute lookups: getattr replaces two operations (hasattr + attribute fetch) with one, and caching cur_item and objs removes repeated attribute access inside the hot path. Attribute lookup in Python is comparatively expensive, so reducing them reduces per-call overhead.
- Reduced indexing overhead: using direct iteration avoids repeated integer arithmetic and two list index operations per iteration (compute index, __getitem__). That lowers per-iteration Python overhead inside the tight loop that checks and patches LTChar objects.
- Early-return common negative case: when device.cur_item is missing/falsy the function now returns earlier with less work, which helps where many interpreter calls have no cur_item.
- Guarding slice creation: creating objs[start:] can allocate a new list. The added start < len(objs) check prevents unnecessary allocations on the common empty/no-new-items case; when the slice is needed it's typically small (we only process newly appended items), so the copy cost is minimal compared to saved per-iteration overhead.

Why this matters in context (hot path)
- This method is invoked from do_TJ and do_Tj (text painting operators) in the interpreter. Those are hot paths during PDF text processing: the method can be called many times per page. Small per-call savings accumulate, so a ~7% per-call runtime improvement is meaningful.

Behavioral and workload impact (based on tests)
- Regression tests show the biggest wins on heavy workloads: large-scale patching and repeated-appends (1000-item and many-iteration cases) observe sizable reductions (e.g., ~16% and ~26% in annotated tests). Those are precisely the cases where per-iteration overhead dominates.
- Some tiny regressions in a few microbench tests (single small calls) are visible in the annotated tests (sub-microsecond differences or small percent slowdowns). These are minor and expected trade-offs for the net runtime improvement across typical workloads.
- Memory trade-off: objs[start:] creates a shallow copy of the sub-list. In typical usage this sub-list is small (only newly appended items) so the allocation cost is small and outweighed by per-iteration savings. If you have pathological cases where you repeatedly slice very large tails, that could increase temporary memory pressure — but tests and typical interpreter usage show net benefit.

Correctness
- The logic is preserved: render_mode is applied only to new LTChar objects and _last_patched_idx is updated the same way. Using getattr defaults keeps previous behavior for missing attributes.

Summary
- Net effect: fewer attribute lookups, less indexing overhead, and an early-exit optimize the common and hot cases encountered during PDF text interpretation, producing the measured 7% runtime speedup. The small memory/alloc cost of slicing is guarded and, in practice, outweighed by the reduced CPU overhead on the hot path (do_TJ/do_Tj).
@KRRT7 KRRT7 closed this Feb 26, 2026
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8 branch February 26, 2026 15:05
@KRRT7 KRRT7 restored the codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8 branch February 26, 2026 15:05
@KRRT7 KRRT7 reopened this Feb 26, 2026
KRRT7 and others added 2 commits February 26, 2026 15:12
…erpreter._patch_current_chars_with_render_mode-mm3lbz82

⚡️ Speed up method `CustomPDFPageInterpreter._patch_current_chars_with_render_mode` by 7%
Use idiomatic slice iteration instead of index-based loop, remove
redundant guard and trailing return.
@KRRT7 KRRT7 changed the title ⚡️ Speed up method CustomPDFPageInterpreter._patch_current_chars_with_render_mode by 36% fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode Feb 26, 2026
@KRRT7 KRRT7 merged commit abef17a into leak-control Feb 26, 2026
1 check failed
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8 branch February 26, 2026 15:46
@KRRT7 KRRT7 restored the codeflash/optimize-CustomPDFPageInterpreter._patch_current_chars_with_render_mode-mm3h21a8 branch February 26, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant