Skip to content

[CI] Fix two flaky Windows tests (root causes, not skips)#4141

Merged
Wauplin merged 3 commits intomainfrom
fix-flaky-windows-tests
Apr 28, 2026
Merged

[CI] Fix two flaky Windows tests (root causes, not skips)#4141
Wauplin merged 3 commits intomainfrom
fix-flaky-windows-tests

Conversation

@Wauplin
Copy link
Copy Markdown
Contributor

@Wauplin Wauplin commented Apr 24, 2026

Context: let's hope CI is finally green 🤞


Two Windows-only flaky tests, both with real root causes:

1. test_safetensors_index_rejects_absolute_path (Py 3.14 Windows)

This is a real security regression, not just a test bug. The validator in _load_sharded_checkpoint uses os.path.isabs(shard_file) to reject absolute paths in a safetensors index file. Since Python 3.13, ntpath.isabs() no longer treats /tmp/foo (single leading slash, no drive) as absolute — so on Windows Py 3.13+, an index file with "/tmp/malicious.safetensors" sails past validation and os.path.join resolves it to C:\tmp\malicious.safetensors.

Fix: check both PurePosixPath.is_absolute() and PureWindowsPath.is_absolute(), so a malicious POSIX-absolute path in an index file is rejected on Windows too (and vice-versa). Parametrized the test to cover POSIX-absolute, Windows drive-absolute, and UNC paths.

2. test_hf_file_system.test_pickle (Windows, all Py versions)

The test did HfFileSystem() (defaults to production) but self.text_file is a repo created on staging. The isfile() call always 404'd on production. In resolve_path, when a lookup fails with RepositoryNotFoundError, the code retries with a truncated repo_id (namespace-fallback) — writing a second cache entry. Whether the retry fires depends on the exact error returned by production, which differs between GHA Ubuntu and Windows runners. Windows consistently gets a clean 404 → fallback fires → 2 entries → assert len(...) == 1 fails. Linux gets a different error → 1 entry → passes.

Fix: point the pickled fs at ENDPOINT_STAGING so the initial isfile() resolves deterministically against the endpoint where the repo actually exists.

Verification

$ pytest tests/test_serialization.py::TestShardedCheckpointValidation
...
tests/test_serialization.py::TestShardedCheckpointValidation::test_safetensors_index_rejects_bin_shard PASSED
tests/test_serialization.py::TestShardedCheckpointValidation::test_safetensors_index_rejects_path_traversal PASSED
tests/test_serialization.py::TestShardedCheckpointValidation::test_safetensors_index_rejects_absolute_path[/tmp/malicious.safetensors] PASSED
tests/test_serialization.py::TestShardedCheckpointValidation::test_safetensors_index_rejects_absolute_path[C:\\malicious.safetensors] PASSED
tests/test_serialization.py::TestShardedCheckpointValidation::test_safetensors_index_rejects_absolute_path[\\\\server\\share\\malicious.safetensors] PASSED
5 passed

Windows CI behavior will be validated by this PR's own checks.

🤖 Generated with Claude Code


Note

Medium Risk
Touches shard filename validation in model loading (a security-sensitive area) and adjusts filesystem caching tests; logic change is small but could affect loading behavior for unusual paths across OSes.

Overview
Hardens sharded safetensors checkpoint loading by validating shard filenames using both PurePosixPath.is_absolute() and PureWindowsPath.is_absolute() (in addition to rejecting ..), closing a Windows/Python 3.13+ edge case where POSIX-style absolute paths could bypass os.path.isabs checks.

Deflakes Windows CI tests by (1) parametrizing the absolute-path rejection test to cover POSIX, Windows drive, and UNC paths on any host OS, and (2) making HfFileSystem pickling/caching test use the staging endpoint so cache population is deterministic.

Reviewed by Cursor Bugbot for commit 8c7756b. Bugbot is set up for automated code reviews on this repo. Configure here.

…pickle

- serialization._torch: reject shard paths that are absolute under POSIX *or*
  Windows semantics. os.path.isabs is host-dependent, and since Python 3.13
  ntpath.isabs no longer treats "/tmp/..." as absolute, so the security check
  was silently bypassed on Windows Py3.13+ (test input "/tmp/malicious.safetensors"
  reached the loader as "C:\\tmp\\malicious.safetensors").
- test_safetensors_index_rejects_absolute_path: parametrize over POSIX-absolute,
  Windows drive-absolute, and UNC paths.
- test_hf_file_system.test_pickle: point the pickled fs at ENDPOINT_STAGING
  instead of production. The repo only exists on staging, so production lookups
  always failed — and depending on whether the error was RepositoryNotFoundError
  (triggering resolve_path's namespace fallback -> 2 cache entries) or anything
  else (-> 1 entry), the assertion len(...) == 1 flipped between runners.
@Wauplin Wauplin requested a review from hanouticelina April 24, 2026 14:04
@bot-ci-comment
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 8c7756b. Configure here.

Comment thread src/huggingface_hub/serialization/_torch.py Outdated
@Wauplin Wauplin marked this pull request as draft April 24, 2026 14:14
Wauplin added 2 commits April 24, 2026 16:22
- serialization._torch: tightened absolute-path check. The previous
  PurePosixPath/PureWindowsPath `is_absolute()` combo missed Windows
  rooted-but-driveless paths like "\\evil" (cursor review) — since
  `PureWindowsPath("\\evil").is_absolute()` is False but `os.path.join`
  still resolves it against the base path's drive, allowing escape.
  Switched to `win.drive or win.root or ".." in win.parts` which catches
  POSIX-absolute, Windows-rooted (with or without drive), UNC, and both
  "../" and "..\\" traversal in one expression.
- Added parametrized cases for "\\evil", "C:evil", and "..\\evil".
- test_hf_file_system.test_pickle: added `skip_instance_cache=True` to
  the pre-pickle fs. The test is inherited by HfFileSystemRepositoryROTests
  and HfFileSystemBucketROTests, both of which construct
  `HfFileSystem(endpoint=STAGING, token=TOKEN)` with the same args.
  Without skip, the second one reuses the first's cached fs — inheriting
  its `_bucket_exists_cache` / `dircache` entries via the main-thread
  state-inheritance path in `_Cached.__call__`. That's exactly what
  Windows CI was showing: the repo test_pickle saw a leftover bucket
  entry from the bucket test_pickle, making len(...) == 2.
@Wauplin Wauplin marked this pull request as ready for review April 28, 2026 12:12
@Wauplin
Copy link
Copy Markdown
Contributor Author

Wauplin commented Apr 28, 2026

@hanouticelina ready for review. To be honest, not a PR from me but the comments look sensible to me. And since CI is passing I think it's good to merge.

Copy link
Copy Markdown
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description and the comments make sense to me!
thanks for taking care of that! (I don't remember the last time Windows tests passed 😄)

@Wauplin Wauplin merged commit 1593bef into main Apr 28, 2026
20 checks passed
@Wauplin Wauplin deleted the fix-flaky-windows-tests branch April 28, 2026 12:30
@huggingface-hub-bot
Copy link
Copy Markdown
Contributor

This PR has been shipped as part of the v1.13.0 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants