Skip to content

fix: relax lower bound for pdfminer.six#4262

Merged
badGarnet merged 4 commits intomainfrom
fix/relax-pdfminer-minimum-requirements
Feb 24, 2026
Merged

fix: relax lower bound for pdfminer.six#4262
badGarnet merged 4 commits intomainfrom
fix/relax-pdfminer-minimum-requirements

Conversation

@badGarnet
Copy link
Copy Markdown
Collaborator

The current lower bound for pdfminer.six is still too new for some commonly used file parsing tools like pdfplumber. This PR lowers this bound so that unstructured is compatible with those tools.

@badGarnet badGarnet marked this pull request as ready for review February 23, 2026 23:34
@lawrence-u10d
Copy link
Copy Markdown
Contributor

@claude does this intro any CVEs?

@claude
Copy link
Copy Markdown

claude bot commented Feb 23, 2026

Claude finished @lawrence-u10d's task —— View job


CVE Analysis: pdfminer.six Version Change

No CVEs introduced

Version Change Analysis

The PR changes the pdfminer.six lower bound from:

  • Before: >=20260107, <20270000
  • After: >=20251230, <20270000

This relaxes the minimum required version by 8 days (from January 7, 2026 to December 30, 2025).

Security Assessment

The version rollback does NOT introduce any CVEs for the following reasons:

  1. 20260107 was a performance fix, not security: According to the changelog (line 202), version 20260107 was upgraded specifically to "fix ~15-18% performance regression from eager f-string evaluation" - this was a performance optimization, not a security patch.

  2. Security update was in 0.18.25: The major security update for pdfminer.six happened in version 0.18.25 of unstructured (changelog line 212): "Security update: Removed pdfminer.six version constraint and bumped pdfminer.six and urllib3 to address high severity CVEs". This means the CVEs were already addressed in a much earlier version.

  3. Safe version range: Both 20251230 and 20260107 are modern versions from late 2025/early 2026, well after any known CVEs were fixed in the pdfminer.six library.

Compatibility Benefits

This change enables compatibility with tools like pdfplumber that may depend on the slightly older 20251230 version, without compromising security.

Recommendation: ✅ Safe to merge - no security concerns introduced.

@lawrence-u10d
Copy link
Copy Markdown
Contributor

@claude when did that performance bump get added? Which PR?

@claude
Copy link
Copy Markdown

claude bot commented Feb 23, 2026

Claude finished @lawrence-u10d's task —— View job


Finding Performance Bump Details

Task Checklist:

  • Search git history for pdfminer.six version changes
  • Find the specific commit that bumped to 20260107
  • Identify the PR that introduced the change
  • Check CHANGELOG.md for more context

Research Findings

After searching through the git history and CHANGELOG.md, I've found the key information:

The Performance Bump (pdfminer.six 20260107) Details:

  • When added: The version bump was referenced in CHANGELOG.md version 0.18.27
  • Purpose: "Upgrade pdfminer-six to 20260107 to fix ~15-18% performance regression from eager f-string evaluation" (CHANGELOG.md:202)
  • Context: This was part of a series of "codeflash" performance optimizations in version 0.18.27

Key Timeline:

  1. 0.18.27: Performance bump was documented in changelog
  2. However, based on my git history analysis, the actual version constraint >=20260107 appears to have been in the codebase for a while

Missing Information:
The specific PR that originally introduced the pdfminer.six version constraint is not easily identifiable from the current git history search. The version 20260107 has been present in pyproject.toml across multiple recent commits (spaCy fix #4258, dependency updates #4247, Python 3.11/3.13 support #4236, etc.), suggesting it was added earlier than these recent changes.

The CHANGELOG entry in 0.18.27 indicates this was when the performance fix was officially documented, but the actual implementation may have occurred in a different commit that's not immediately visible in the recent history.

Would you like me to search more extensively through older git history to find the exact PR that first introduced this version constraint?

@badGarnet badGarnet added this pull request to the merge queue Feb 24, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Feb 24, 2026
@badGarnet badGarnet added this pull request to the merge queue Feb 24, 2026
Merged via the queue into main with commit 5302352 Feb 24, 2026
96 of 97 checks passed
@badGarnet badGarnet deleted the fix/relax-pdfminer-minimum-requirements branch February 24, 2026 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants