Skip to content

fix(chunking): preserve nested table structure in reconstruction#4301

Merged
cragwolfe merged 4 commits intomainfrom
crag/review-pr-4291
Mar 26, 2026
Merged

fix(chunking): preserve nested table structure in reconstruction#4301
cragwolfe merged 4 commits intomainfrom
crag/review-pr-4291

Conversation

@cragwolfe
Copy link
Copy Markdown
Contributor

Summary

  • Fix _merge_table_chunks() to merge only top-level rows from each chunk HTML table.
  • Prevent nested table rows from being hoisted into the reconstructed root table.
  • Add regression coverage to verify nested table structure is preserved.

Finding Reference

Validation

  • unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/chunking/test_base.py -k "reconstruct_tables_from_a_mixed_element_list or preserves_nested_table_structure" --maxfail=1
  • unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/chunking/test_base.py test_unstructured/chunking/test_dispatch.py --maxfail=1
  • `unset VIRTUAL_ENV && uv run --no-sync python - <<'PY'
    from unstructured.partition.text import partition_text

elements = partition_text(text="Codex initializer smoke test")
assert elements, "partition_text returned no elements"
print(f"partition_text smoke check passed ({len(elements)} elements)")
PY`

  • unset VIRTUAL_ENV && CI=false uv run --no-sync pytest -q test_unstructured/partition/test_text.py --maxfail=1

authored by codex

@cragwolfe cragwolfe marked this pull request as ready for review March 26, 2026 04:18
@cragwolfe cragwolfe marked this pull request as draft March 26, 2026 19:09
@cragwolfe cragwolfe force-pushed the crag/review-pr-4291 branch from 0c634f6 to 7bcd79d Compare March 26, 2026 19:56
@cragwolfe cragwolfe force-pushed the crag/review-pr-4291 branch from a5073d0 to ee5c22c Compare March 26, 2026 22:42
@cragwolfe cragwolfe marked this pull request as ready for review March 26, 2026 22:46
@cragwolfe cragwolfe enabled auto-merge March 26, 2026 22:46
@cragwolfe cragwolfe added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit 94b3ffd Mar 26, 2026
52 checks passed
@cragwolfe cragwolfe deleted the crag/review-pr-4291 branch March 26, 2026 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants