Improve fast partition cold start by CyMule · Pull Request #4242 · Unstructured-IO/unstructured

CyMule · 2026-02-17T15:19:34Z

Improve PDF fast strategy cold-start latency by lazy-loading hi-res-only imports in pdf.py.

This reduces first-call startup overhead without changing partition behavior.

Local benchmarks show a significant fast strategy cold-start speedup of ~35% from 2.75s -> 1.78s.
They also show a small hi_res slowdown (~2-4%), which is acceptable given the fast improvements.

Benchmark was run on 6 pdfs
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/DA-1p.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/chevron-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/embedded-images-tables.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/fake-memo-with-duplicate-page.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/interface-config-guide-p93.pdf
https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/pdf/layout-parser-paper-fast.pdf

Note

Medium Risk
Touches core PDF partitioning by changing import timing and locations; behavior should be unchanged but there is some risk of missed/conditional imports causing runtime errors in less-tested hi_res/OCR/analysis paths.

Overview
Improves PDF fast strategy cold-start performance by lazy-loading hi-res-only dependencies in unstructured/partition/pdf.py (moving several pdf_image/unstructured_inference-related imports into _partition_pdf_or_image_local and other hi-res/OCR-only code paths), while keeping the fast path lighter.

Adds scripts/performance/quick_partition_bench.py for quick local cold vs warm partition timing across one or more PDFs, updates the table metrics helper to import convert_pdf_to_images from pdf_image_utils, and bumps the library version to 0.20.4 with corresponding changelog entry.

^{Written by Cursor Bugbot for commit b66ae0e. This will update automatically on new commits. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

unstructured/partition/pdf.py

CyMule added 2 commits February 17, 2026 10:19

Improve fast partition cold start

1966013

changelog

1c3d5e6

CyMule marked this pull request as ready for review February 17, 2026 16:08

cursor bot reviewed Feb 17, 2026

View reviewed changes

unstructured/partition/pdf.py Show resolved Hide resolved

CyMule added 3 commits February 17, 2026 11:36

benchmark script

439b3ba

fix import

ab1435f

lint

b66ae0e

badGarnet approved these changes Feb 17, 2026

View reviewed changes

badGarnet added this pull request to the merge queue Feb 17, 2026

Merged via the queue into main with commit e1f75a3 Feb 17, 2026
52 checks passed

badGarnet deleted the perf/lazy-imports-fast-coldstart branch February 17, 2026 22:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve fast partition cold start#4242

Improve fast partition cold start#4242
badGarnet merged 5 commits intomainfrom
perf/lazy-imports-fast-coldstart

CyMule commented Feb 17, 2026 •

edited by cursor bot

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CyMule commented Feb 17, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CyMule commented Feb 17, 2026 •

edited by cursor bot

Loading