feat: add create_file_from_elements() to re-create document files from elements#4259
Conversation
|
@PastelStorm Would you please review this? Thanks. |
2961ec2 to
f05e045
Compare
|
@PastelStorm have a chance to review? |
Review:
|
| Area | Result |
|---|---|
| Duplicated code | Minor (write-path), not severe |
| Dead code | None in this diff |
| Incorrect mock usage | No mocking introduced |
| Race conditions | None — straightforward single-file writes |
| Useless tests | Not useless, but the HTML test bypasses the critical default path |
Summary
The main thing to fix is the HTML default — a function called "create file from elements" shouldn't silently discard elements out of the box. The assert should also be replaced with a proper guard. The rest are smaller improvements that would be nice to clean up while you're in here.
|
@PastelStorm Fixed. please review again. |
2bea3f3 to
e4401a0
Compare
|
@PastelStorm Approve required? :) |
d0f8620
Summary
Adds
create_file_from_elements()inunstructured.staging.baseso users can re-build a document file from a list of elements (reverse of partition). Supports the workflow: partition -> modify elements (e.g. replace Image with NarrativeText using alt text) -> write back to file.Closes #3994.
Changes
unstructured/staging/base.py: Newcreate_file_from_elements(elements, format="markdown"|"html"|"text", filename=None, ...)that delegates toelements_to_md,elements_to_html, orelements_to_textand optionally writes to a file.test_unstructured/staging/test_base.py: Tests for markdown, text, and HTML output and for unsupported format raisingValueError.