Skip to content

mem: reduce PaddleOCR rec_batch_num from 6 to 1#4295

Merged
KRRT7 merged 4 commits intoUnstructured-IO:mainfrom
KRRT7:mem/paddle-rec-batch-num
Mar 27, 2026
Merged

mem: reduce PaddleOCR rec_batch_num from 6 to 1#4295
KRRT7 merged 4 commits intoUnstructured-IO:mainfrom
KRRT7:mem/paddle-rec-batch-num

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Mar 24, 2026

Reduce PaddleOCR rec_batch_num from 6 (default) to 1. Paddle's native inference engine allocates 500 MiB memory arena chunks proportional to recognition batch size. With batch_num=6, four chunks are allocated during text recognition. Setting it to 1 reduces this to one chunk.

benchmark

Setting Peak memory
rec_batch_num=6 7,184 MiB
rec_batch_num=1 2,684 MiB
Delta -4,500 MiB (-62.6%)

Measured with memray run on layout-parser-paper-with-table.pdf through partition() with hi_res + PaddleOCR table OCR. On CPU, batch processing doesn't parallelize — it's sequential within predictor.run(). Smaller batches just allocate less workspace memory.

Reproduce

Requires unstructured[pdf], paddlepaddle, unstructured-paddleocr, and memray.

cat > /tmp/bench_paddle.py << 'SCRIPT'
from unstructured.partition.auto import partition
elements = partition(
    filename="example-docs/layout-parser-paper.pdf",
    strategy="hi_res",
    pdf_infer_table_structure=True,
    ocr_agent="unstructured.partition.utils.ocr_models.paddle_ocr.OCRAgentPaddle",
)
print(f"Partitioned: {len(elements)} elements")
SCRIPT

# Baseline (main branch, rec_batch_num=6):
git checkout main
memray run --native --trace-python-allocators -o /tmp/paddle_baseline.bin /tmp/bench_paddle.py
memray stats /tmp/paddle_baseline.bin | grep "Peak memory"

# With this change (rec_batch_num=1):
git checkout mem/paddle-rec-batch-num
memray run --native --trace-python-allocators -o /tmp/paddle_opt.bin /tmp/bench_paddle.py
memray stats /tmp/paddle_opt.bin | grep "Peak memory"

KRRT7 added 3 commits March 27, 2026 13:41
Paddle's native inference engine allocates 500 MiB memory arena chunks
during text recognition, proportional to batch size. With the default
rec_batch_num=6, four 500 MiB chunks are allocated simultaneously.

Setting rec_batch_num=1 reduces this to a single chunk, cutting peak
memory on the PaddleOCR code path by ~1,265 MiB (-42.6%).

Latency benchmark (55 text regions, CPU, 5 runs):
- rec_batch_num=6: 39.1s +/- 3.5s
- rec_batch_num=1: 37.0s +/- 2.0s
No throughput regression — on CPU, batch processing is sequential.
@KRRT7 KRRT7 force-pushed the mem/paddle-rec-batch-num branch from a91d36a to 5df21bd Compare March 27, 2026 18:44
@KRRT7 KRRT7 added this pull request to the merge queue Mar 27, 2026
Merged via the queue into Unstructured-IO:main with commit 47f4728 Mar 27, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants