Skip to content

Comments

Make uv cache clean parallel process safe#15888

Merged
konstin merged 3 commits intomainfrom
konsti/safe-cache-clean
Sep 19, 2025
Merged

Make uv cache clean parallel process safe#15888
konstin merged 3 commits intomainfrom
konsti/safe-cache-clean

Conversation

@konstin
Copy link
Member

@konstin konstin commented Sep 16, 2025

Currently, uv cache clean and uv cache prune can cause crashes in other uv processes running in parallel by removing their in-use files.

We can solve this by using a shared (read) lock on the cache directory, while the uv cache operations use an exclusive (write) lock. The drawback is that this is always one extra lock, and that we assume that all platforms support shared locks.

Once Rust 1.89 fulfills our N-2 policy, we can add support for these methods in fs_err and switch to https://doc.rust-lang.org/std/fs/struct.File.html#platform-specific-behavior-2.

Test Plan

Open one terminal, run:

uv venv -c -p 3.13
UV_CACHE_DIR=cache uv cache clean
UV_CACHE_DIR=cache uv pip install numpy==2.0.0

Open another terminal, run:

UV_CACHE_DIR=cache uv cache clean

Fixes #15704
Part of #13883

@konstin konstin added the bug Something isn't working label Sep 16, 2025
@konstin konstin temporarily deployed to uv-test-registries September 16, 2025 08:51 — with GitHub Actions Inactive
@konstin konstin force-pushed the konsti/safe-cache-clean branch from 55836e9 to a9df4a5 Compare September 16, 2025 10:17
@konstin konstin temporarily deployed to uv-test-registries September 16, 2025 10:19 — with GitHub Actions Inactive
@zanieb
Copy link
Member

zanieb commented Sep 17, 2025

and that we assume that all platforms support shared locks.

How confident are you in this?

@konstin konstin temporarily deployed to uv-test-registries September 17, 2025 11:15 — with GitHub Actions Inactive
@konstin
Copy link
Member Author

konstin commented Sep 17, 2025

Given that there's support for this in the Rust standard library without warnings about platforms that don't support this, I expect wide support. I added a path that ignores platforms and filesystems with missing shared lock support.

CC @BurntSushi who knows about file locking and @geofft who knows about Unix.

@geofft
Copy link
Contributor

geofft commented Sep 17, 2025

The docs say

This function currently corresponds to the flock function on Unix with the LOCK_SH flag, and the LockFileEx function on Windows.

Both of these seem supported and work properly on OS versions from the last few decades. [There is a quirk(https://utcc.utoronto.ca/~cks/space/blog/linux/FlockFcntlAndNFS) with flock on NFS if one client is accessing over NFS and another is directly accessing the local filesystem on the NFS server, but that's kind of a "don't do that" situation.

I haven't looked in detail at the implementation but the rough approach of making a .lock file in the directory, ensuring every client is referring go the same actual file by relying on atomic hardlinks into place (which does work reliably on NFS), and then using shared or exclusive locks of the same lock type (and not byte-range locks), seems sound to me.

I have a very slight worry about implementations that desugar flock into byte-range locks if there are no actual bytes in the file, i.e., the locked range is zero length. I am not sure if this is a real worry but we can just write one byte into the file to avoid the risk.

@BurntSushi
Copy link
Member

At least on the standard library side, support seems pretty broad: https://github.com/rust-lang/rust/blob/2ebb1263e3506412889410b567fa813ca3cb5c63/library/std/src/sys/fs/unix.rs#L1325-L1360

At least as broad as the regular lock() method.

@zanieb
Copy link
Member

zanieb commented Sep 17, 2025

I guess one other concern...

If the cache is read only, this will cause a regression?

@BurntSushi
Copy link
Member

If memory serves, flock and POSIX fcntl locks are process global. So the main footgun to be aware of here is that you can't use them as synchronization primitives across threads within the same process.

AFAIK, open file description locks on Linux support NFS and can be used as synchronization primitives across threads. But I think they are Linux-only unfortunately.

@charliermarsh
Copy link
Member

If the cache is read only, this will cause a regression?

Like elsewhere, I'd suggest we tracing::warn! if we can't acquire it, but not fail.

@konstin
Copy link
Member Author

konstin commented Sep 17, 2025

That's implemented :) https://github.com/astral-sh/uv/pull/15888/files#diff-d0c8455b65232353aa60383cd8a80d99a8b31cf7cd76bf22c18d32de36bed34cR403-R409

(I still have to test the read-only cache and handle the Windows CI failure before we can merge)

Currently, `uv cache clean` and `uv cache prune` can cause crashes in other uv processes running in parallel by removing their in-use files.

We can solve this by using a shared (read) lock on the cache directory, while the `uv cache` operations use an exclusive (write) lock. The drawback is that this is always one extra lock, and that we assume that all platforms support shared locks.

Once Rust 1.89 fulfills our N-2 policy, we can add support for these methods in fs_err and switch to https://doc.rust-lang.org/std/fs/struct.File.html#platform-specific-behavior-2.
@konstin konstin force-pushed the konsti/safe-cache-clean branch from e38f649 to ae716bc Compare September 18, 2025 16:32
@konstin konstin temporarily deployed to uv-test-registries September 18, 2025 16:35 — with GitHub Actions Inactive
@konstin
Copy link
Member Author

konstin commented Sep 18, 2025

It seems read-only caches already don't work, I filed a separate bug for it: #15934

@konstin
Copy link
Member Author

konstin commented Sep 18, 2025

I cannot reproduce the CI failure locally, any ideas what's happening? I could see something about the locked file not being deletable, but that should happen consistently, and it also doesn't match the error message.

@zanieb
Copy link
Member

zanieb commented Sep 18, 2025

I presume we attempt to delete the cache directory while it contains the locked file, which is okay on Unix but not allowed on Windows? (very naively)

@zanieb
Copy link
Member

zanieb commented Sep 18, 2025

re. CI vs locally... maybe because it's on a ReFS drive?

@konstin
Copy link
Member Author

konstin commented Sep 19, 2025

I thought so too, but I'm using a ReFS drive on my laptop and it passes there.

@konstin konstin temporarily deployed to uv-test-registries September 19, 2025 07:54 — with GitHub Actions Inactive
@konstin
Copy link
Member Author

konstin commented Sep 19, 2025

Removing the locked file last after the dropping the lock works.

@konstin konstin merged commit 00aa2ab into main Sep 19, 2025
98 checks passed
@konstin konstin deleted the konsti/safe-cache-clean branch September 19, 2025 08:21
zanieb added a commit that referenced this pull request Sep 22, 2025
…15990)

We're seeing reports of a regression from
#15888 where `--no-cache` causes `uv
run` and `uvx` to fail to spawn a command.

The intent of this code was to allow destructive cache operations
_after_ we'd finished setting up the environment. However, it's unclear
to me that it's safe to run `uv cache clean` during a `uv run` operation
(e.g., `uv run --script` uses an environment in the cache) and, more
importantly, we cannot drop non-persistent caches (e.g., from
`--no-cache`) as they include the environment we're spawning the command
in.

Alternative to #15977 which retains release of the lock — we may want to
consider that approach still but this regression needs to be resolved
quickly.

Closes #15989
Closes #15987
Closes #15967
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Sep 24, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.8.17` -> `0.8.22` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.8.22`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0822)

[Compare Source](astral-sh/uv@0.8.21...0.8.22)

Released on 2025-09-23.

##### Python

- Upgrade Pyodide to 0.28.3 ([#&#8203;15999](astral-sh/uv#15999))

##### Security

- Upgrade `astral-tokio-tar` to 0.5.5 which [hardens tar archive extraction](GHSA-3wgq-wrwc-vqmv) ([#&#8203;16004](astral-sh/uv#16004))

### [`v0.8.21`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0821)

[Compare Source](astral-sh/uv@0.8.20...0.8.21)

Released on 2025-09-23.

##### Enhancements

- Refresh lockfile when `--refresh` is provided ([#&#8203;15994](astral-sh/uv#15994))

##### Preview features

Add support for S3 request signing ([#&#8203;15925](astral-sh/uv#15925))

### [`v0.8.20`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0820)

[Compare Source](astral-sh/uv@0.8.19...0.8.20)

Released on 2025-09-22.

##### Enhancements

- Add `--force` flag for `uv cache clean` ([#&#8203;15992](astral-sh/uv#15992))
- Improve resolution errors with proxied packages ([#&#8203;15200](astral-sh/uv#15200))

##### Preview features

- Allow upgrading pre-release versions of the same minor Python version ([#&#8203;15959](astral-sh/uv#15959))

##### Bug fixes

- Hide `freethreaded+debug` Python downloads in `uv python list` ([#&#8203;15985](astral-sh/uv#15985))
- Retain the cache lock and temporary caches during `uv run` and `uvx` ([#&#8203;15990](astral-sh/uv#15990))

##### Documentation

- Add `package` level conflicts to the conflicting dependencies docs ([#&#8203;15963](astral-sh/uv#15963))
- Document pyodide support ([#&#8203;15962](astral-sh/uv#15962))
- Document support for free-threaded and debug Python versions ([#&#8203;15961](astral-sh/uv#15961))
- Expand the contribution docs on issue selection ([#&#8203;15966](astral-sh/uv#15966))
- Tweak title for viewing version in project guide ([#&#8203;15964](astral-sh/uv#15964))

### [`v0.8.19`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0819)

[Compare Source](astral-sh/uv@0.8.18...0.8.19)

Released on 2025-09-19.

##### Python

- Add CPython 3.14.0rc3
- Upgrade OpenSSL to 3.5.3

See the [python-build-standalone release notes](https://github.com/astral-sh/python-build-standalone/releases/tag/20250918) for more details.

##### Bug fixes

- Make `uv cache clean` parallel process safe ([#&#8203;15888](astral-sh/uv#15888))
- Fix implied `platform_machine` marker for `win_arm64` platform tag ([#&#8203;15921](astral-sh/uv#15921))

### [`v0.8.18`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0818)

[Compare Source](astral-sh/uv@0.8.17...0.8.18)

Released on 2025-09-17.

##### Enhancements

- Add PyG packages to torch backend ([#&#8203;15911](astral-sh/uv#15911))
- Add handling for unnamed conda environments in base environment detection ([#&#8203;15681](astral-sh/uv#15681))
- Allow selection of debug build interpreters ([#&#8203;11520](astral-sh/uv#11520))
- Improve `uv init` defaults for native build backend cache keys ([#&#8203;15705](astral-sh/uv#15705))
- Error when `pyproject.toml` target does not exist for dependency groups ([#&#8203;15831](astral-sh/uv#15831))
- Infer check URL from publish URL when known ([#&#8203;15886](astral-sh/uv#15886))
- Support Gitlab CI/CD as a trusted publisher ([#&#8203;15583](astral-sh/uv#15583))
- Add GraalPy 25.0.0 with support for Python 3.12 ([#&#8203;15900](astral-sh/uv#15900))
- Add `--no-clear` to `uv venv` to disable removal prompts ([#&#8203;15795](astral-sh/uv#15795))
- Add conflict detection between `--only-group` and `--extra` flags ([#&#8203;15788](astral-sh/uv#15788))
- Allow `[project]` to be missing from a `pyproject.toml` ([#&#8203;14113](astral-sh/uv#14113))
- Always treat conda environments named `base` and `root` as base environments ([#&#8203;15682](astral-sh/uv#15682))
- Improve log message when direct build for `uv_build` is skipped ([#&#8203;15898](astral-sh/uv#15898))
- Log when the cache is disabled ([#&#8203;15828](astral-sh/uv#15828))
- Show pyx organization name after authenticating ([#&#8203;15823](astral-sh/uv#15823))
- Use `_CONDA_ROOT` to detect Conda base environments ([#&#8203;15680](astral-sh/uv#15680))
- Include blake2b hash in `uv publish` upload form ([#&#8203;15794](astral-sh/uv#15794))
- Fix misleading debug message when removing environments in `uv sync` ([#&#8203;15881](astral-sh/uv#15881))

##### Deprecations

- Deprecate `tool.uv.dev-dependencies` ([#&#8203;15469](astral-sh/uv#15469))
- Revert "feat(ci): build loongarch64 binaries in CI ([#&#8203;15387](astral-sh/uv#15387))" ([#&#8203;15820](astral-sh/uv#15820))

##### Preview features

- Propagate preview flag to client for `native-auth` feature ([#&#8203;15872](astral-sh/uv#15872))
- Store native credentials for realms with the https scheme stripped ([#&#8203;15879](astral-sh/uv#15879))
- Use the root index URL when retrieving credentials from the native store ([#&#8203;15873](astral-sh/uv#15873))

##### Bug fixes

- Fix `uv sync --no-sources` not switching from editable to registry installations ([#&#8203;15234](astral-sh/uv#15234))
- Avoid display of an empty string when a path is the working directory ([#&#8203;15897](astral-sh/uv#15897))
- Allow cached environment reuse with `@latest` ([#&#8203;15827](astral-sh/uv#15827))
- Allow escaping spaces in --env-file handling ([#&#8203;15815](astral-sh/uv#15815))
- Avoid ANSI codes in debug! messages ([#&#8203;15843](astral-sh/uv#15843))
- Improve BSD tag construction ([#&#8203;15829](astral-sh/uv#15829))
- Include SHA when listing lockfile changes ([#&#8203;15817](astral-sh/uv#15817))
- Invert the logic for determining if a path is a base conda environment ([#&#8203;15679](astral-sh/uv#15679))
- Load credentials for explicit members when lowering ([#&#8203;15844](astral-sh/uv#15844))
- Re-add `triton` as a torch backend package ([#&#8203;15910](astral-sh/uv#15910))
- Respect `UV_INSECURE_NO_ZIP_VALIDATION=1` in duplicate header errors ([#&#8203;15912](astral-sh/uv#15912))

##### Documentation

- Add GitHub Actions to PyPI trusted publishing example ([#&#8203;15753](astral-sh/uv#15753))
- Add Coiled integration documentation ([#&#8203;14430](astral-sh/uv#14430))
- Add verbose output to the getting help section ([#&#8203;15915](astral-sh/uv#15915))
- Document `NO_PROXY` support ([#&#8203;15816](astral-sh/uv#15816))
- Document cache-keys for native build backends ([#&#8203;15811](astral-sh/uv#15811))
- Add documentation for dependency group `requires-python` ([#&#8203;14282](astral-sh/uv#14282))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0MS4xMTUuNiIsInVwZGF0ZWRJblZlciI6IjQxLjEyNS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
@markuspi
Copy link

markuspi commented Nov 7, 2025

Hi @konstin considering this PR, is the note in the documentation regarding cache safety still relevant?

Note that it's not safe to modify the uv cache (e.g., uv cache clean) while other uv commands are running, [...]

Specifically, i am interested in periodically running uv cache prune

@konstin
Copy link
Member Author

konstin commented Nov 7, 2025

Currently, uv blocks, but we want to make some further adjustment that might weaken this a bit so I don't want to remove the warning. I don't recommend using uv cache operations on a timer, I'd only use them manually and when required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

uv cache clean / uv pip install not concurrency safe?

6 participants