Releases: Unstructured-IO/unstructured
Releases · Unstructured-IO/unstructured
0.22.10
What's Changed
- fix(chunking): preserve nested table structure in reconstruction by @cragwolfe in #4301
- Replace lazyproperty with functools.cached_property by @KRRT7 in #4282
- mem: reduce PaddleOCR rec_batch_num from 6 to 1 by @KRRT7 in #4295
- fix: isolate Table elements in pre-chunks by @claytonlin1110 in #4307
- feat(chunking): repeat table headers on continuation chunks by @cragwolfe in #4298
Full Changelog: 0.22.6...0.22.10
0.22.6
What's Changed
- fix(deps): Update security updates [SECURITY] by @utic-renovate[bot] in #4303
- fix: Self-contained script for version extraction in release CI by @vladimir-kivi-ds in #4304
Full Changelog: 0.22.4...0.22.6
0.22.4
What's Changed
- feat: add create_file_from_elements() to re-create document files from elements by @claytonlin1110 in #4259
- Bump dependencies by @PastelStorm in #4265
- fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode by @KRRT7 in #4266
- add check if libmagic fails by @aadland6 in #4273
- Adds Form Element by @aadland6 in #4272
- feat: audio speech to text partition by @claytonlin1110 in #4264
- Add a check for complex pdfs by @aadland6 in #4268
- chore: disable fail-build on Anchore container scan by @lawrence-u10d in #4285
- feat: make telemetry off by default by @claytonlin1110 in #4281
- fix(deps): Update security vulnerability in pypdf to v6.9.1 [SECURITY] by @utic-renovate[bot] in #4248
- feat: Store routing in ElementMetadata by @vladimir-kivi-ds in #4293
- feat: custom Markdown extensions for partition_md by @claytonlin1110 in #4292
- feat: tablechunks can reconstruct table by @qued in #4291
New Contributors
- @KRRT7 made their first contribution in #4266
- @vladimir-kivi-ds made their first contribution in #4293
Full Changelog: 0.21.5...0.22.4
0.21.5
What's Changed
- feat: custom fallback for language detection by @claytonlin1110 in #4238
- Add Github action for time regressions by @aadland6 in #4261
- fix: relax lower bound for pdfminer.six by @badGarnet in #4262
New Contributors
- @claytonlin1110 made their first contribution in #4238
- @aadland6 made their first contribution in #4261
Full Changelog: 0.21.2...0.21.5
0.21.2
0.21.1
0.21.0
0.21.0
Fixes
- Replace NLTK with spaCy to remediate CVE-2025-14009: NLTK's downloader uses
zipfile.extractall()without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.
0.20.8
What's Changed
- fix: set max decompressed size for elements JSON by @qued in #4244
- fix: update depdencies by @badGarnet in #4247
Full Changelog: 0.20.6...0.20.8
0.20.6
What's Changed
- Automate pypi publishing by @PastelStorm in #4239
- fix: remove duplicate characters caused by fake bold rendering in PDFs by @bittoby in #4215
- Improve fast partition cold start by @CyMule in #4242
- fix: gracefully handle invalide html string during chunking by @badGarnet in #4243
- fix: remap parent id after hashing by @badGarnet in #4245
New Contributors
Full Changelog: 0.20.1...0.20.6
0.20.2
Release 0.20.2