Skip to content

fix: sanitize MSG attachment filenames to prevent path traversal (GHS…#4117

Merged
william-u10d merged 2 commits intomainfrom
luke/unstructured-ghsa-gm8q-m8mv-jj5m
Nov 6, 2025
Merged

fix: sanitize MSG attachment filenames to prevent path traversal (GHS…#4117
william-u10d merged 2 commits intomainfrom
luke/unstructured-ghsa-gm8q-m8mv-jj5m

Conversation

@luke-kucing
Copy link
Copy Markdown
Contributor

Summary

Fixes path traversal vulnerability in email and MSG attachment filename handling (GHSA-gm8q-m8mv-jj5m).

Changes

Security Fix

Sanitizes attachment filenames in _AttachmentPartitioner for both email.py and msg.py
Uses os.path.basename() to strip path components from filenames
Normalizes backslashes to forward slashes to handle Windows paths on Unix systems
Removes null bytes and other control characters
Handles edge cases (empty strings, ".", "..")
Defaults to "unknown" for invalid or dangerous filenames
Test Coverage

Added 17 comprehensive tests covering:

Path traversal attempts (../../../etc/passwd)
Absolute Unix paths (/etc/passwd)
Absolute Windows paths (C:\Windows\System32\config\sam)
Null byte injection (file\x00.txt)
Dot and dotdot filenames (. and ..)
Missing/empty filenames
Complex mixed path separators
Valid filenames (ensuring they pass through unchanged)
Test Results

✅ All 17 new security tests pass
✅ All 129 existing tests pass
✅ No regressions
Security Impact

Prevents attackers from using malicious attachment filenames to write files outside the intended directory, which could lead to arbitrary file write vulnerabilities.

Changes include comprehensive test coverage for various attack vectors and a version bump to 0.18.18.

…A-gm8q-m8mv-jj5m)

Addresses a security vulnerability where malicious attachment filenames containing
path traversal sequences (e.g., "../../../etc/passwd") could write files outside
the intended directory when processing MSG files. The fix normalizes both Unix and
Windows path separators and sanitizes filenames by removing path components, null
bytes, and handling edge cases like "." and ".." filenames.

Changes include comprehensive test coverage for various attack vectors and a version
bump to 0.18.18.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@william-u10d william-u10d added this pull request to the merge queue Nov 6, 2025
Merged via the queue into main with commit b01d35b Nov 6, 2025
40 checks passed
@william-u10d william-u10d deleted the luke/unstructured-ghsa-gm8q-m8mv-jj5m branch November 6, 2025 23:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants