Skip to content

Improve fast partition cold start#4242

Merged
badGarnet merged 5 commits intomainfrom
perf/lazy-imports-fast-coldstart
Feb 17, 2026
Merged

Improve fast partition cold start#4242
badGarnet merged 5 commits intomainfrom
perf/lazy-imports-fast-coldstart

Conversation

@CyMule
Copy link
Copy Markdown
Contributor

@CyMule CyMule commented Feb 17, 2026

Improve PDF fast strategy cold-start latency by lazy-loading hi-res-only imports in pdf.py.

This reduces first-call startup overhead without changing partition behavior.

Local benchmarks show a significant fast strategy cold-start speedup of ~35% from 2.75s -> 1.78s.
They also show a small hi_res slowdown (~2-4%), which is acceptable given the fast improvements.

Benchmark was run on 6 pdfs
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/DA-1p.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/chevron-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/embedded-images-tables.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/fake-memo-with-duplicate-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/interface-config-guide-p93.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/layout-parser-paper-fast.pdf


Note

Medium Risk
Touches core PDF partitioning by changing import timing and locations; behavior should be unchanged but there is some risk of missed/conditional imports causing runtime errors in less-tested hi_res/OCR/analysis paths.

Overview
Improves PDF fast strategy cold-start performance by lazy-loading hi-res-only dependencies in unstructured/partition/pdf.py (moving several pdf_image/unstructured_inference-related imports into _partition_pdf_or_image_local and other hi-res/OCR-only code paths), while keeping the fast path lighter.

Adds scripts/performance/quick_partition_bench.py for quick local cold vs warm partition timing across one or more PDFs, updates the table metrics helper to import convert_pdf_to_images from pdf_image_utils, and bumps the library version to 0.20.4 with corresponding changelog entry.

Written by Cursor Bugbot for commit b66ae0e. This will update automatically on new commits. Configure here.

@CyMule CyMule marked this pull request as ready for review February 17, 2026 16:08
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

@badGarnet badGarnet added this pull request to the merge queue Feb 17, 2026
Merged via the queue into main with commit e1f75a3 Feb 17, 2026
52 checks passed
@badGarnet badGarnet deleted the perf/lazy-imports-fast-coldstart branch February 17, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants