Speed up cache size command#17015
Conversation
Cargo.toml
Outdated
| zeroize = { version = "1.8.1" } | ||
| zip = { version = "2.2.3", default-features = false, features = ["deflate", "zstd", "bzip2", "lzma", "xz"] } | ||
| zstd = { version = "0.13.3" } | ||
| diskus = { git = "https://github.com/sharkdp/diskus", version = "0.8.0" } |
There was a problem hiding this comment.
We can't add Git dependencies, it'll break our crates.io publish
There was a problem hiding this comment.
Ah, okay, happy for me to copy most of David's code over then?
There was a problem hiding this comment.
I'm not sure yet.
@sharkdp are you interested in publishing the crate? Do you think we should just vendor the parts we need?
There was a problem hiding this comment.
I think we should just vendor any speed-ups because this is pulling in a bunch of new dependencies (e.g., a second version of Clap).
There was a problem hiding this comment.
I updated diskus today (it has always been published on crates.io, so no need for Git dependencies). You should now be able to pull it in as a relatively lightweight dependency using default-features = false. I also made an update specifically for counting apparent file size (which is what uv did here before: sum up metadata.len()), to make that more consistent with what du -sb does (exclude the size of directory entries themselves). I also cleaned up the diskus API a bit. If you want to depend on it here, the code should be something like:
let result = DiskUsage::new(&[cache.root()]).apparent_size().count();
let total_bytes = result.ignore_errors().size_in_bytes();(or leave out the call to .apparent_size() if you'd rather want to count disk usage).
Note that diskus uses a heuristic for the default number of workers which should ideally not be overwritten for the best possible performance (at least according to benchmarks I did years ago).
I am also completely fine with uv vendoring diskus. It's not a lot of code. In that case, you should potentially make those two updates mentioned above, though.
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.9.17` -> `0.9.18` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.9.18`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0918) [Compare Source](astral-sh/uv@0.9.17...0.9.18) Released on 2025-12-16. ##### Enhancements - Add value hints to command line arguments to improve shell completion accuracy ([#​17080](astral-sh/uv#17080)) - Improve error handling in `uv publish` ([#​17096](astral-sh/uv#17096)) - Improve rendering of multiline error messages ([#​17132](astral-sh/uv#17132)) - Support redirects in `uv publish` ([#​17130](astral-sh/uv#17130)) - Include Docker images with the alpine version, e.g., `python3.x-alpine3.23` ([#​17100](astral-sh/uv#17100)) ##### Configuration - Accept `--torch-backend` in `[tool.uv]` ([#​17116](astral-sh/uv#17116)) ##### Performance - Speed up `uv cache size` ([#​17015](astral-sh/uv#17015)) - Initialize S3 signer once ([#​17092](astral-sh/uv#17092)) ##### Bug fixes - Avoid panics due to reads on failed requests ([#​17098](astral-sh/uv#17098)) - Enforce latest-version in `@latest` requests ([#​17114](astral-sh/uv#17114)) - Explicitly set `EntryType` for file entries in tar ([#​17043](astral-sh/uv#17043)) - Ignore `pyproject.toml` index username in lockfile comparison ([#​16995](astral-sh/uv#16995)) - Relax error when using `uv add` with `UV_GIT_LFS` set ([#​17127](astral-sh/uv#17127)) - Support file locks on ExFAT on macOS ([#​17115](astral-sh/uv#17115)) - Change schema for `exclude-newer` into optional string ([#​17121](astral-sh/uv#17121)) ##### Documentation - Drop arm musl caveat from Docker documentation ([#​17111](astral-sh/uv#17111)) - Fix version reference in resolver example ([#​17085](astral-sh/uv#17085)) - Better documentation for `exclude-newer*` ([#​17079](astral-sh/uv#17079)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0Mi41Ny4xIiwidXBkYXRlZEluVmVyIjoiNDIuNTcuMSIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90Il19-->
Summary
uv cache sizecan be quite slow. Here i use https://github.com/sharkdp/diskus to walk the cache directory with in multiple threads.Add cli option to set the number of threads and default to
std::thread::available_parallelism()or 1.Test Plan
Added cli statement with info log test.
I believe this is a fair test, where i set cache dir to a large directory.