Use hash instead of full wheel name in wheels bucket#11738
Use hash instead of full wheel name in wheels bucket#11738charliermarsh merged 15 commits intoastral-sh:mainfrom
Conversation
d49319c to
217edc2
Compare
|
Thanks! Will review. |
217edc2 to
3d00223
Compare
|
I think this generally looks right, though I'm undecided on whether we should always do this, or only do it for wheels with filenames that exceed a certain length. It does hurt cache (plaintext) readability a bit which is inconvenient for debugging, since you can no longer infer the wheel tags etc. from the cached filename alone. |
805be70 to
a447796
Compare
fb97a95 to
01fb4c7
Compare
01fb4c7 to
cad227e
Compare
cad227e to
8134423
Compare
|
@charliermarsh Sorry for the delay. It is good from my end.
|
7013fff to
6fb1055
Compare
|
Side Note: Is windows CI always a bit flaky or am I just having a bad luck? I don't know what was the issue for the first failure, second failure seems to be network related. |
b50f790 to
f9e58d8
Compare
|
I'll try to get this merged today. |
|
Thanks for your work on this! |
| /// Prefers `{version}-{tags}` if such an identifier fits within the maximum allowed length; | ||
| /// otherwise, uses a truncated version of the version and a digest of the tags. | ||
| pub fn cache_key(&self) -> String { | ||
| const CACHE_KEY_MAX_LEN: usize = 64; |
There was a problem hiding this comment.
@nkitsaini -- I changed this to 64 to accommodate common manylinux tags, like cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64. I don't see much of a downside. What do you think?
There was a problem hiding this comment.
Yeah, it is better. Older one was mostly arbitrary. As long as we stay below 140, it should not be an issue.
|
|
||
| // Determine the wheel filename. | ||
| let filename = path.file_stem()?.to_str()?; | ||
| let filename = WheelFilename::from_stem(filename).ok()?; |
There was a problem hiding this comment.
@nkitsaini -- I also removed the Version checks from this method. I think the version (before the -) isn't actually guaranteed to be valid, since it could be truncated to, like... 1.2.0+. I might change that in the cache key to avoid those ambiguities, but still seems safer to remove the assumption here. (We already validate that the file ends in .rev anyway.)
There was a problem hiding this comment.
Ah, nice. Missed that. I am also not sure how valuable these checks were. It would make sense to use cache even if the filename is wrong but it points to a valid archive. And I wouldn't assume there would be any invalid files unless created manually, which should be pretty rare hence negating any performance benefits from saved disk I/O.
Maybe I am missing something?
|
Thanks @nkitsaini! This was a great contribution. I hope you'll consider continuing to contribute to uv. |
<!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Follow up for #11738 I missed this while reviewing the truncation changes. `format!("{:.N}", value)` only truncates if the `fmt::Display` implementation supports it (by reading `f.precision()` in trait implementation). So in our case `format!("{:.N}", version.to_string())` will work but not `format!("{:.N}", version)` unless `Version` supports it. Since we only need it once, I am just truncating after the string is created. ## Test Plan <!-- How was it tested? -->
<!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Closes astral-sh#2410 <!-- What's the purpose of the change? What does it do, and why? --> This changes the name of files in `wheels` bucket to use a hash instead of the wheel name as to not exceed maximum file length limit on various systems. This only addresses the primary concern of astral-sh#2410. It still does _not_ address: - Path limit of 260 on windows: astral-sh#2410 (comment) To solve this we need to opt-in to longer path limits on windows ([ref](astral-sh#2410 (comment))), but I think that is a separate issue and should be a separate MR. - Exceeding filename limit while building a wheel from source distribution As per my understanding, this is out of uv's control. Name of the output wheel will be decided by build-backend used by the project. For wheels built from source distribution, pip also uses the wheel names in cache. So I have not touched `sdists` cache. I have added a `filename: WheelFileName` field in `Archive`, so we can use it while indexing instead of relying on the filename on disk. Another way to do this was to read `.dist-info/WHEEL` and `.dist-info/METADATA` and build `WheelFileName` but that seems less robust and will be slower. ## Test Plan <!-- How was it tested? --> Tested by installing `yt-dlp`, `httpie` and `sqlalchemy` and verifying that cache files in `wheels` bucket use hash. --------- Co-authored-by: Charlie Marsh <[email protected]>
…1830) <!-- Thank you for contributing to uv! To help us out with reviewing, please consider the following: - Does this pull request include a summary of the change? (See below.) - Does this pull request include a descriptive title? - Does this pull request include references to any relevant issues? --> ## Summary Follow up for astral-sh#11738 I missed this while reviewing the truncation changes. `format!("{:.N}", value)` only truncates if the `fmt::Display` implementation supports it (by reading `f.precision()` in trait implementation). So in our case `format!("{:.N}", version.to_string())` will work but not `format!("{:.N}", version)` unless `Version` supports it. Since we only need it once, I am just truncating after the string is created. ## Test Plan <!-- How was it tested? -->
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.6.3` -> `0.6.4` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.6.4`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#064) [Compare Source](astral-sh/uv@0.6.3...0.6.4) ##### Enhancements - Upgrade pypy3.10 to v7.3.19 ([#​11814](astral-sh/uv#11814)) - Allow configuring log verbosity from the CLI (i.e., `-vvv`) ([#​11758](astral-sh/uv#11758)) - Warn when duplicate index names found in single file ([#​11824](astral-sh/uv#11824)) ##### Bug fixes - Always store registry index on resolution packages ([#​11815](astral-sh/uv#11815)) - Avoid error on relative paths in `uv tool uninstall` ([#​11889](astral-sh/uv#11889)) - Avoid silently dropping errors in directory enumeration ([#​11890](astral-sh/uv#11890)) - Disable interactive git terminal prompts during fetches ([#​11744](astral-sh/uv#11744)) - Discover Windows registry (PEP 514) Python versions across 32/64-bit ([#​11801](astral-sh/uv#11801)) - Don't panic on Ctrl-C in confirm prompt ([#​11706](astral-sh/uv#11706)) - Fix non-directory in workspace on Windows ([#​11833](astral-sh/uv#11833)) - Make interpreter caching robust to OS upgrades ([#​11875](astral-sh/uv#11875)) - Respect `include-system-site-packages` in layered environments ([#​11873](astral-sh/uv#11873)) - Suggest `uv tool update-shell` in PowerShell ([#​11846](astral-sh/uv#11846)) - Update code page to `65001` before setting environment variables in virtual environments ([#​11831](astral-sh/uv#11831)) - Use hash instead of full wheel name in wheels bucket ([#​11738](astral-sh/uv#11738)) - Fix version string truncation while generating cache_key ([#​11830](astral-sh/uv#11830)) - Explicitly handle ctrl-c in confirmation prompt instead of using a signal handler ([#​11897](astral-sh/uv#11897)) ##### Performance - Avoid cloning to string when creating cache path ([#​11772](astral-sh/uv#11772)) - Avoid redundant clones in version containment check ([#​11767](astral-sh/uv#11767)) - Avoid string allocation when enumerating tool names ([#​11910](astral-sh/uv#11910)) - Avoid using owned `String` for package name constructors ([#​11768](astral-sh/uv#11768)) - Avoid using owned `String` in deserializers ([#​11764](astral-sh/uv#11764)) - Migrate to `zlib-rs` (again) ([#​11894](astral-sh/uv#11894)) - Remove unnecessary clones when adding package names ([#​11771](astral-sh/uv#11771)) - Skip unquote allocation for non-quoted strings ([#​11813](astral-sh/uv#11813)) - Use `SmallString` for filenames and URLs ([#​11765](astral-sh/uv#11765)) - Use a Boxed slice for version specifiers ([#​11766](astral-sh/uv#11766)) - Use matches over contains for extra value parsing ([#​11770](astral-sh/uv#11770)) ##### Documentation - Avoid fallback to PyPI in mixed CPU/CUDA example ([#​11115](astral-sh/uv#11115)) - Docs: Clarify that setting cache-keys overrides defaults ([#​11895](astral-sh/uv#11895)) - Document our MSRV policy ([#​11898](astral-sh/uv#11898)) - Fix reference to macOS cache path ([#​11845](astral-sh/uv#11845)) - Fix typo in `no_default_groups` documentation and changelog ([#​11928](astral-sh/uv#11928)) - Update the "Locking and syncing" page ([#​11647](astral-sh/uv#11647)) - Update alternative indexes documentation to use new interface ([#​10826](astral-sh/uv#10826)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODUuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4NS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->




Summary
Closes #2410
This changes the name of files in
wheelsbucket to use a hash instead of the wheel name as to not exceed maximum file length limit on various systems.This only addresses the primary concern of #2410. It still does not address:
To solve this we need to opt-in to longer path limits on windows (ref), but I think that is a separate issue and should be a separate MR.
As per my understanding, this is out of uv's control. Name of the output wheel will be decided by build-backend used by the project. For wheels built from source distribution, pip also uses the wheel names in cache. So I have not touched
sdistscache.I have added a
filename: WheelFileNamefield inArchive, so we can use it while indexing instead of relying on the filename on disk. Another way to do this was to read.dist-info/WHEELand.dist-info/METADATAand buildWheelFileNamebut that seems less robust and will be slower.Test Plan
Tested by installing
yt-dlp,httpieandsqlalchemyand verifying that cache files inwheelsbucket use hash.