Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: Unstructured-IO/unstructured
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.21.2
Choose a base ref
...
head repository: Unstructured-IO/unstructured
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 0.21.5
Choose a head ref
  • 3 commits
  • 16 files changed
  • 3 contributors

Commits on Feb 23, 2026

  1. feat: custom fallback for language detection (#4238)

    Closes #4091
    
    Implements custom fallback for language detection so short text is not
    forced to English and callers can control or disable detection.
    
    ## Changes:
    - language_fallback
    Optional callable used when text is short (<5 words) and ASCII. It
    receives the text and can return a list of ISO 639-3 codes or None to
    leave language unspecified. If not provided, short text still defaults
    to ["eng"] (backward compatible).
    - detect_languages() / apply_lang_metadata()
    New parameter language_fallback; applied in the short-text path only.
    - partition() (auto)
    New parameter language_fallback; passed through to all partitioners via
    the metadata decorator.
    - partition_md()
    New parameter languages so callers can pass languages=[""] to disable
    language detection (aligned with other partitioners).
    
    ## Usage:
    - Return None for short text: partition(..., language_fallback=lambda
    text: None)
    - Custom short-text language: partition(...,
    language_fallback=my_detector)
    - Disable detection: partition_md(..., languages=[""]) or partition(...,
    languages=[""])
    claytonlin1110 authored Feb 23, 2026
    Configuration menu
    Copy the full SHA
    afbda95 View commit details
    Browse the repository at this point in the history
  2. Add Github action for time regressions (#4261)

    1. Adds a new action for testing the time to partition over a set number
    of documents.
    2. Changes from time.time() to time.perf_counter()
    aadland6 authored Feb 23, 2026
    Configuration menu
    Copy the full SHA
    16482f9 View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2026

  1. fix: relax lower bound for pdfminer.six (#4262)

    The current lower bound for pdfminer.six is still too new for some
    commonly used file parsing tools like `pdfplumber`. This PR lowers this
    bound so that `unstructured` is compatible with those tools.
    badGarnet authored Feb 24, 2026
    Configuration menu
    Copy the full SHA
    5302352 View commit details
    Browse the repository at this point in the history
Loading