Skip to content

Comments

Use hash instead of full wheel name in wheels bucket#11738

Merged
charliermarsh merged 15 commits intoastral-sh:mainfrom
nkitsaini:cache-url-length
Feb 26, 2025
Merged

Use hash instead of full wheel name in wheels bucket#11738
charliermarsh merged 15 commits intoastral-sh:mainfrom
nkitsaini:cache-url-length

Conversation

@nkitsaini
Copy link
Contributor

@nkitsaini nkitsaini commented Feb 24, 2025

Summary

Closes #2410

This changes the name of files in wheels bucket to use a hash instead of the wheel name as to not exceed maximum file length limit on various systems.

This only addresses the primary concern of #2410. It still does not address:

  • Path limit of 260 on windows: Too long cache filenames #2410 (comment)
    To solve this we need to opt-in to longer path limits on windows (ref), but I think that is a separate issue and should be a separate MR.
  • Exceeding filename limit while building a wheel from source distribution
    As per my understanding, this is out of uv's control. Name of the output wheel will be decided by build-backend used by the project. For wheels built from source distribution, pip also uses the wheel names in cache. So I have not touched sdists cache.

I have added a filename: WheelFileName field in Archive, so we can use it while indexing instead of relying on the filename on disk. Another way to do this was to read .dist-info/WHEEL and .dist-info/METADATA and build WheelFileName but that seems less robust and will be slower.

Test Plan

Tested by installing yt-dlp, httpie and sqlalchemy and verifying that cache files in wheels bucket use hash.

@nkitsaini nkitsaini force-pushed the cache-url-length branch 2 times, most recently from d49319c to 217edc2 Compare February 24, 2025 08:11
@nkitsaini nkitsaini marked this pull request as ready for review February 24, 2025 08:38
@nkitsaini nkitsaini changed the title Use hash instead of full wheel name in wheels cache Use hash instead of full wheel name in wheels bucket Feb 24, 2025
@charliermarsh
Copy link
Member

Thanks! Will review.

@charliermarsh charliermarsh added bug Something isn't working no-build Disable building binaries in CI labels Feb 25, 2025
@charliermarsh
Copy link
Member

I think this generally looks right, though I'm undecided on whether we should always do this, or only do it for wheels with filenames that exceed a certain length. It does hurt cache (plaintext) readability a bit which is inconvenient for debugging, since you can no longer infer the wheel tags etc. from the cached filename alone.

@charliermarsh
Copy link
Member

At least the hashes are consistent across these files, so you can look at WHEEL:

Screenshot 2025-02-25 at 9 55 25 AM

That's probably good enough.

@nkitsaini
Copy link
Contributor Author

nkitsaini commented Feb 25, 2025

so you can look at WHEEL:

I was doing a more hacky cat <file> and finding the tags/version.

image

Let me change the filenames to {trimmed_filename}-{hash}. Should be quick.

@nkitsaini
Copy link
Contributor Author

nkitsaini commented Feb 25, 2025

I went with trimmed(version+tags)-hash as that was informative compared to trimmed(name+version+tags)-hash as we already have package name as dir name.

image

image

@nkitsaini
Copy link
Contributor Author

nkitsaini commented Feb 25, 2025

@charliermarsh Sorry for the delay. It is good from my end.

There seems to be one seemingly unrelated failure in windows tests. I don't have a good way to reproduce the failure locally. It was just some flakiness. I have no idea what caused it, maybe some failure with disk I/O ¯\_(ツ)_/¯ .

@nkitsaini
Copy link
Contributor Author

nkitsaini commented Feb 26, 2025

Side Note: Is windows CI always a bit flaky or am I just having a bad luck? I don't know what was the issue for the first failure, second failure seems to be network related.

@charliermarsh
Copy link
Member

I'll try to get this merged today.

@zanieb
Copy link
Member

zanieb commented Feb 26, 2025

Thanks for your work on this!

/// Prefers `{version}-{tags}` if such an identifier fits within the maximum allowed length;
/// otherwise, uses a truncated version of the version and a digest of the tags.
pub fn cache_key(&self) -> String {
const CACHE_KEY_MAX_LEN: usize = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkitsaini -- I changed this to 64 to accommodate common manylinux tags, like cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64. I don't see much of a downside. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is better. Older one was mostly arbitrary. As long as we stay below 140, it should not be an issue.


// Determine the wheel filename.
let filename = path.file_stem()?.to_str()?;
let filename = WheelFilename::from_stem(filename).ok()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nkitsaini -- I also removed the Version checks from this method. I think the version (before the -) isn't actually guaranteed to be valid, since it could be truncated to, like... 1.2.0+. I might change that in the cache key to avoid those ambiguities, but still seems safer to remove the assumption here. (We already validate that the file ends in .rev anyway.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice. Missed that. I am also not sure how valuable these checks were. It would make sense to use cache even if the filename is wrong but it points to a valid archive. And I wouldn't assume there would be any invalid files unless created manually, which should be pretty rare hence negating any performance benefits from saved disk I/O.

Maybe I am missing something?

@charliermarsh charliermarsh enabled auto-merge (squash) February 26, 2025 22:41
@charliermarsh charliermarsh merged commit fb35875 into astral-sh:main Feb 26, 2025
84 checks passed
@charliermarsh
Copy link
Member

Thanks @nkitsaini! This was a great contribution. I hope you'll consider continuing to contribute to uv.

@nkitsaini nkitsaini deleted the cache-url-length branch February 27, 2025 08:13
charliermarsh pushed a commit that referenced this pull request Feb 27, 2025
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Follow up for #11738

I missed this while reviewing the truncation changes. 

`format!("{:.N}", value)` only truncates if the `fmt::Display`
implementation supports it (by reading `f.precision()` in trait
implementation).

So in our case `format!("{:.N}", version.to_string())` will work but not
`format!("{:.N}", version)` unless `Version` supports it.

Since we only need it once, I am just truncating after the string is
created.

## Test Plan

<!-- How was it tested? -->
loic-lescoat pushed a commit to loic-lescoat/uv that referenced this pull request Mar 2, 2025
<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->


## Summary
Closes astral-sh#2410 
<!-- What's the purpose of the change? What does it do, and why? -->
This changes the name of files in `wheels` bucket to use a hash instead
of the wheel name as to not exceed maximum file length limit on various
systems.

This only addresses the primary concern of astral-sh#2410. It still does _not_
address:
- Path limit of 260 on windows:
astral-sh#2410 (comment)
To solve this we need to opt-in to longer path limits on windows
([ref](astral-sh#2410 (comment))),
but I think that is a separate issue and should be a separate MR.
- Exceeding filename limit while building a wheel from source
distribution
As per my understanding, this is out of uv's control. Name of the output
wheel will be decided by build-backend used by the project. For wheels
built from source distribution, pip also uses the wheel names in cache.
So I have not touched `sdists` cache.


I have added a `filename: WheelFileName` field in `Archive`, so we can
use it while indexing instead of relying on the filename on disk.
Another way to do this was to read `.dist-info/WHEEL` and
`.dist-info/METADATA` and build `WheelFileName` but that seems less
robust and will be slower.
## Test Plan

<!-- How was it tested? -->
Tested by installing `yt-dlp`, `httpie` and `sqlalchemy` and verifying
that cache files in `wheels` bucket use hash.

---------

Co-authored-by: Charlie Marsh <[email protected]>
loic-lescoat pushed a commit to loic-lescoat/uv that referenced this pull request Mar 2, 2025
…1830)

<!--
Thank you for contributing to uv! To help us out with reviewing, please
consider the following:

- Does this pull request include a summary of the change? (See below.)
- Does this pull request include a descriptive title?
- Does this pull request include references to any relevant issues?
-->

## Summary

Follow up for astral-sh#11738

I missed this while reviewing the truncation changes. 

`format!("{:.N}", value)` only truncates if the `fmt::Display`
implementation supports it (by reading `f.precision()` in trait
implementation).

So in our case `format!("{:.N}", version.to_string())` will work but not
`format!("{:.N}", version)` unless `Version` supports it.

Since we only need it once, I am just truncating after the string is
created.

## Test Plan

<!-- How was it tested? -->
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 6, 2025
This MR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.6.3` -> `0.6.4` |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>astral-sh/uv (astral-sh/uv)</summary>

### [`v0.6.4`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#064)

[Compare Source](astral-sh/uv@0.6.3...0.6.4)

##### Enhancements

-   Upgrade pypy3.10 to v7.3.19 ([#&#8203;11814](astral-sh/uv#11814))
-   Allow configuring log verbosity from the CLI (i.e., `-vvv`) ([#&#8203;11758](astral-sh/uv#11758))
-   Warn when duplicate index names found in single file ([#&#8203;11824](astral-sh/uv#11824))

##### Bug fixes

-   Always store registry index on resolution packages ([#&#8203;11815](astral-sh/uv#11815))
-   Avoid error on relative paths in `uv tool uninstall` ([#&#8203;11889](astral-sh/uv#11889))
-   Avoid silently dropping errors in directory enumeration ([#&#8203;11890](astral-sh/uv#11890))
-   Disable interactive git terminal prompts during fetches ([#&#8203;11744](astral-sh/uv#11744))
-   Discover Windows registry (PEP 514) Python versions across 32/64-bit ([#&#8203;11801](astral-sh/uv#11801))
-   Don't panic on Ctrl-C in confirm prompt ([#&#8203;11706](astral-sh/uv#11706))
-   Fix non-directory in workspace on Windows ([#&#8203;11833](astral-sh/uv#11833))
-   Make interpreter caching robust to OS upgrades ([#&#8203;11875](astral-sh/uv#11875))
-   Respect `include-system-site-packages` in layered environments ([#&#8203;11873](astral-sh/uv#11873))
-   Suggest `uv tool update-shell` in PowerShell ([#&#8203;11846](astral-sh/uv#11846))
-   Update code page to `65001` before setting environment variables in virtual environments ([#&#8203;11831](astral-sh/uv#11831))
-   Use hash instead of full wheel name in wheels bucket ([#&#8203;11738](astral-sh/uv#11738))
-   Fix version string truncation while generating cache_key ([#&#8203;11830](astral-sh/uv#11830))
-   Explicitly handle ctrl-c in confirmation prompt instead of using a signal handler ([#&#8203;11897](astral-sh/uv#11897))

##### Performance

-   Avoid cloning to string when creating cache path ([#&#8203;11772](astral-sh/uv#11772))
-   Avoid redundant clones in version containment check ([#&#8203;11767](astral-sh/uv#11767))
-   Avoid string allocation when enumerating tool names ([#&#8203;11910](astral-sh/uv#11910))
-   Avoid using owned `String` for package name constructors ([#&#8203;11768](astral-sh/uv#11768))
-   Avoid using owned `String` in deserializers ([#&#8203;11764](astral-sh/uv#11764))
-   Migrate to `zlib-rs` (again) ([#&#8203;11894](astral-sh/uv#11894))
-   Remove unnecessary clones when adding package names ([#&#8203;11771](astral-sh/uv#11771))
-   Skip unquote allocation for non-quoted strings ([#&#8203;11813](astral-sh/uv#11813))
-   Use `SmallString` for filenames and URLs ([#&#8203;11765](astral-sh/uv#11765))
-   Use a Boxed slice for version specifiers ([#&#8203;11766](astral-sh/uv#11766))
-   Use matches over contains for extra value parsing ([#&#8203;11770](astral-sh/uv#11770))

##### Documentation

-   Avoid fallback to PyPI in mixed CPU/CUDA example ([#&#8203;11115](astral-sh/uv#11115))
-   Docs: Clarify that setting cache-keys overrides defaults ([#&#8203;11895](astral-sh/uv#11895))
-   Document our MSRV policy ([#&#8203;11898](astral-sh/uv#11898))
-   Fix reference to macOS cache path ([#&#8203;11845](astral-sh/uv#11845))
-   Fix typo in `no_default_groups` documentation and changelog ([#&#8203;11928](astral-sh/uv#11928))
-   Update the "Locking and syncing" page ([#&#8203;11647](astral-sh/uv#11647))
-   Update alternative indexes documentation to use new interface ([#&#8203;10826](astral-sh/uv#10826))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS4xODUuMCIsInVwZGF0ZWRJblZlciI6IjM5LjE4NS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJSZW5vdmF0ZSBCb3QiXX0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working no-build Disable building binaries in CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Too long cache filenames

3 participants