[CI] Fix two flaky Windows tests (root causes, not skips)#4141
Conversation
…pickle - serialization._torch: reject shard paths that are absolute under POSIX *or* Windows semantics. os.path.isabs is host-dependent, and since Python 3.13 ntpath.isabs no longer treats "/tmp/..." as absolute, so the security check was silently bypassed on Windows Py3.13+ (test input "/tmp/malicious.safetensors" reached the loader as "C:\\tmp\\malicious.safetensors"). - test_safetensors_index_rejects_absolute_path: parametrize over POSIX-absolute, Windows drive-absolute, and UNC paths. - test_hf_file_system.test_pickle: point the pickled fs at ENDPOINT_STAGING instead of production. The repo only exists on staging, so production lookups always failed — and depending on whether the error was RepositoryNotFoundError (triggering resolve_path's namespace fallback -> 2 cache entries) or anything else (-> 1 entry), the assertion len(...) == 1 flipped between runners.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 8c7756b. Configure here.
- serialization._torch: tightened absolute-path check. The previous
PurePosixPath/PureWindowsPath `is_absolute()` combo missed Windows
rooted-but-driveless paths like "\\evil" (cursor review) — since
`PureWindowsPath("\\evil").is_absolute()` is False but `os.path.join`
still resolves it against the base path's drive, allowing escape.
Switched to `win.drive or win.root or ".." in win.parts` which catches
POSIX-absolute, Windows-rooted (with or without drive), UNC, and both
"../" and "..\\" traversal in one expression.
- Added parametrized cases for "\\evil", "C:evil", and "..\\evil".
- test_hf_file_system.test_pickle: added `skip_instance_cache=True` to
the pre-pickle fs. The test is inherited by HfFileSystemRepositoryROTests
and HfFileSystemBucketROTests, both of which construct
`HfFileSystem(endpoint=STAGING, token=TOKEN)` with the same args.
Without skip, the second one reuses the first's cached fs — inheriting
its `_bucket_exists_cache` / `dircache` entries via the main-thread
state-inheritance path in `_Cached.__call__`. That's exactly what
Windows CI was showing: the repo test_pickle saw a leftover bucket
entry from the bucket test_pickle, making len(...) == 2.
|
@hanouticelina ready for review. To be honest, not a PR from me but the comments look sensible to me. And since CI is passing I think it's good to merge. |
hanouticelina
left a comment
There was a problem hiding this comment.
The PR description and the comments make sense to me!
thanks for taking care of that! (I don't remember the last time Windows tests passed 😄)
|
This PR has been shipped as part of the v1.13.0 release. |

Context: let's hope CI is finally green 🤞
Two Windows-only flaky tests, both with real root causes:
1.
test_safetensors_index_rejects_absolute_path(Py 3.14 Windows)This is a real security regression, not just a test bug. The validator in
_load_sharded_checkpointusesos.path.isabs(shard_file)to reject absolute paths in a safetensors index file. Since Python 3.13,ntpath.isabs()no longer treats/tmp/foo(single leading slash, no drive) as absolute — so on Windows Py 3.13+, an index file with"/tmp/malicious.safetensors"sails past validation andos.path.joinresolves it toC:\tmp\malicious.safetensors.Fix: check both
PurePosixPath.is_absolute()andPureWindowsPath.is_absolute(), so a malicious POSIX-absolute path in an index file is rejected on Windows too (and vice-versa). Parametrized the test to cover POSIX-absolute, Windows drive-absolute, and UNC paths.2.
test_hf_file_system.test_pickle(Windows, all Py versions)The test did
HfFileSystem()(defaults to production) butself.text_fileis a repo created on staging. Theisfile()call always 404'd on production. Inresolve_path, when a lookup fails withRepositoryNotFoundError, the code retries with a truncated repo_id (namespace-fallback) — writing a second cache entry. Whether the retry fires depends on the exact error returned by production, which differs between GHA Ubuntu and Windows runners. Windows consistently gets a clean 404 → fallback fires → 2 entries →assert len(...) == 1fails. Linux gets a different error → 1 entry → passes.Fix: point the pickled fs at
ENDPOINT_STAGINGso the initialisfile()resolves deterministically against the endpoint where the repo actually exists.Verification
Windows CI behavior will be validated by this PR's own checks.
🤖 Generated with Claude Code
Note
Medium Risk
Touches shard filename validation in model loading (a security-sensitive area) and adjusts filesystem caching tests; logic change is small but could affect loading behavior for unusual paths across OSes.
Overview
Hardens sharded safetensors checkpoint loading by validating shard filenames using both
PurePosixPath.is_absolute()andPureWindowsPath.is_absolute()(in addition to rejecting..), closing a Windows/Python 3.13+ edge case where POSIX-style absolute paths could bypassos.path.isabschecks.Deflakes Windows CI tests by (1) parametrizing the absolute-path rejection test to cover POSIX, Windows drive, and UNC paths on any host OS, and (2) making
HfFileSystempickling/caching test use the staging endpoint so cache population is deterministic.Reviewed by Cursor Bugbot for commit 8c7756b. Bugbot is set up for automated code reviews on this repo. Configure here.