Remove [u64; 4] from small version to move Arc to full version#10345
Remove [u64; 4] from small version to move Arc to full version#10345
[u64; 4] from small version to move Arc to full version#10345Conversation
Cloning and dropping the version arc took a significant fraction of the time in the resolver, which is a large overhead especially for the small variant that has only 9 bytes payload.
When moving the `Arc` to only apply to the full variant, the small variant is too large because it stores a `[u64; 4]` to have a release accessor a `&[u64]` that's shared with the `Vec<u64>` of the full variant. We proxy this by first extracting the compressed version digits of the small variant to a proxy type that stores up to 4 u64 on the stack and can be deref'ed to the existing `&[u64]`, minimizing churn.
```
$ Benchmark 1: target/profiling/uv pip compile scripts/requirements/airflow.in
Time (mean ± σ): 361.3 ms ± 2.7 ms [User: 503.4 ms, System: 174.9 ms]
Range (min … max): 356.3 ms … 365.2 ms 10 runs
Benchmark 2: ./uv-3 pip compile scripts/requirements/airflow.in
Time (mean ± σ): 402.9 ms ± 8.5 ms [User: 571.2 ms, System: 196.6 ms]
Range (min … max): 393.9 ms … 418.0 ms 10 runs
Summary
target/profiling/uv pip compile scripts/requirements/airflow.in ran
1.12 ± 0.03 times faster than ./uv-3 pip compile scripts/requirements/airflow.in
```
4053db3 to
64a8980
Compare
CodSpeed Performance ReportMerging #10345 will improve performances by 29.06%Comparing Summary
Benchmarks breakdown
|
BurntSushi
left a comment
There was a problem hiding this comment.
This is awesome! I love it.
| } | ||
|
|
||
| /// Lifetime and indexing workaround to allow accessing the release as `&[u64]` even though the | ||
| /// digits may be stored in a compressed representation. |
There was a problem hiding this comment.
Since this is part of the public API, I'd suggest keeping the first sentence to one line, and then adding more (if necessary) in a paragraph below it. Otherwise, this will show up as an overlong summary in the type listing.
| Small3([u64; 3]), | ||
| Small4([u64; 4]), | ||
| Full(&'a [u64]), | ||
| } |
There was a problem hiding this comment.
I was wondering if there was a more compact representation than this, so I tried:
enum ReleaseInner<'a> {
Small { numbers: [u64; 4], len: u8 },
Full(&'a [u64]),
}But indeed, it's the same size (which makes sense).
| @@ -930,6 +939,8 @@ struct VersionSmall { | |||
| /// places somewhat exposes internal details, since the "full" version | |||
| /// representation would not do that. | |||
| len: u8, | |||
There was a problem hiding this comment.
Oh man, if we could get rid of len here, this type would decrease in size by half I believe. IDK if that would make an actual difference perf-wise... And I think the only way to do it would be to take bits away from repr, which in turn means moving more cases to the Full representation. Which might be a net negative overall.
There was a problem hiding this comment.
We could get rid of len by counting the zeroes ourselves. The problem is that Version doesn't become 64 byte since the Arc has only a single niche (NonNull) in rustc's view, so both variants don't fit into a 64-bit word.
We could try a tagged pointer library, or we could lean into it and make the small representation [u8; 11] or [u8; 15] to use the space we're already allocating while leaving a bit (byte) for rustc for the enum discriminant.
This problem MRE:
use std::sync::Arc;
enum A {
Small(()),
Large(Arc<Vec<u8>>),
}
enum B {
Small(bool),
Large(Arc<Vec<u8>>),
}
enum C {
Small([u8; 7]),
Large(Arc<Vec<u8>>),
}
fn main() {
println!("{}", size_of::<A>()); // -> 8
println!("{}", size_of::<B>()); // -> 16, but can it be 8 please?
println!("{}", size_of::<C>()); // -> 16, but can it be 8 please?
}There was a problem hiding this comment.
Another option would be stealing two bits somewhere else (fourth release number, or capping the first release number to 14 bits).
There was a problem hiding this comment.
We could get rid of len by counting the zeroes ourselves.
Hmmm, but then I think you might lose the current property that trailing zeros are preserved. See: #10362.
Otherwise yeah, I see. I agree that going further probably means a tagged pointer of some kind.
A less thought out idea is a global interner like we use in uv-pep508 for markers. But whether that hits the right trade-offs is not at all clear to me. It could let you get a Version down to 32-bits pretty easily I think, but then pretty much every interaction with it would require consulting the global interner table. And that will probably destroy any benefits you might otherwise get.
| (small.repr >> 0o60) & 0xFFFF, | ||
| (small.repr >> 0o50) & 0xFF, | ||
| (small.repr >> 0o40) & 0xFF, | ||
| (small.repr >> 0o30) & 0xFF, |
Basically, this explicitly checks that parsing a `1.2.0` into a `Version` will roundtrip back to a `1.2.0`, and that parsing a `1.2` will roundtrip back to a `1.2`. I think this case is included in the other tests in this module, but this test makes the behavior more clearly intentional I think. Ref #10345
Basically, this explicitly checks that parsing a `1.2.0` into a `Version` will roundtrip back to a `1.2.0`, and that parsing a `1.2` will roundtrip back to a `1.2`. I think this case is included in the other tests in this module, but this test makes the behavior more clearly intentional I think. Ref #10345
This MR contains the following updates: | Package | Update | Change | |---|---|---| | [astral-sh/uv](https://github.com/astral-sh/uv) | patch | `0.5.15` -> `0.5.22` | MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot). **Proposed changes to behavior should be submitted there as MRs.** --- ### Release Notes <details> <summary>astral-sh/uv (astral-sh/uv)</summary> ### [`v0.5.22`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0522) [Compare Source](astral-sh/uv@0.5.21...0.5.22) ##### Enhancements - Include version and contact information in GitHub User Agent ([#​10785](astral-sh/uv#10785)) ##### Performance - Add fast-path for recursive extras in dynamic validation ([#​10823](astral-sh/uv#10823)) - Fetch `pyproject.toml` from GitHub API ([#​10765](astral-sh/uv#10765)) - Remove allocation in Git SHA truncation ([#​10801](astral-sh/uv#10801)) - Skip GitHub fast path when full commit is already known ([#​10800](astral-sh/uv#10800)) ##### Bug fixes - Add fallback to build backend when `Requires-Dist` mismatches ([#​10797](astral-sh/uv#10797)) - Avoid deserialization error for paths above the root ([#​10789](astral-sh/uv#10789)) - Avoid respecting preferences from other indexes ([#​10782](astral-sh/uv#10782)) - Disable the distutils setuptools shim during interpreter query ([#​10819](astral-sh/uv#10819)) - Omit variant when detecting compatible Python installs ([#​10722](astral-sh/uv#10722)) - Remove TOCTOU errors in Git clone ([#​10758](astral-sh/uv#10758)) - Validate metadata under GitHub fast path ([#​10796](astral-sh/uv#10796)) - Include conflict markers in fork markers ([#​10818](astral-sh/uv#10818)) ##### Error messages - Add tag incompatibility hints to sync failures ([#​10739](astral-sh/uv#10739)) - Improve log when distutils is missing ([#​10713](astral-sh/uv#10713)) - Show non-critical Python discovery errors if no other interpreter is found ([#​10716](astral-sh/uv#10716)) - Use colors for lock errors ([#​10736](astral-sh/uv#10736)) ##### Documentation - Add testing instructions to the AWS Lambda guide ([#​10805](astral-sh/uv#10805)) ### [`v0.5.21`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0521) [Compare Source](astral-sh/uv@0.5.20...0.5.21) ##### Enhancements - Avoid building dynamic versions when validating lockfile ([#​10703](astral-sh/uv#10703)) ##### Configuration - Add `UV_VENV_SEED` environment variable ([#​10715](astral-sh/uv#10715)) ##### Performance - Store unsupported tags in wheel filename ([#​10665](astral-sh/uv#10665)) ##### Bug fixes - Avoid attempting to patch macOS dylib for non-macOS installs ([#​10721](astral-sh/uv#10721)) - Avoid narrowing `requires-python` marker with disjunctions ([#​10704](astral-sh/uv#10704)) - Respect environment variable credentials for indexes outside root ([#​10688](astral-sh/uv#10688)) - Respect preferences for explicit index dependencies from `requirements.txt` ([#​10690](astral-sh/uv#10690)) - Sort preferences by environment, then index ([#​10700](astral-sh/uv#10700)) - Ignore permission errors when looking for user-level configuration file ([#​10697](astral-sh/uv#10697)) ##### Documentation - Add `SyntaxWarning` compatibility note to bytecode compilation docs ([#​10701](astral-sh/uv#10701)) - Add `MACOSX_DEPLOYMENT_TARGET` to the `--python-platform` documentation ([#​10698](astral-sh/uv#10698)) ### [`v0.5.20`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0520) [Compare Source](astral-sh/uv@0.5.19...0.5.20) ##### Bug fixes - Avoid failing when deserializing unknown tags ([#​10655](astral-sh/uv#10655)) ### [`v0.5.19`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0519) [Compare Source](astral-sh/uv@0.5.18...0.5.19) ##### Enhancements - Filter wheels from lockfile based on architecture ([#​10584](astral-sh/uv#10584)) - Omit dynamic versions from the lockfile ([#​10622](astral-sh/uv#10622)) - Add support for `pip freeze --path` ([#​10488](astral-sh/uv#10488)) - Reduce verbosity of inline-metadata message when using `uv run <script.py>` ([#​10588](astral-sh/uv#10588)) - Add opt-in Git LFS support ([#​10335](astral-sh/uv#10335)) - Recommend `--native-tls` on SSL errors ([#​10605](astral-sh/uv#10605)) - Show expected and available ABI tags in resolver errors ([#​10527](astral-sh/uv#10527)) - Show target Python version in error messages ([#​10582](astral-sh/uv#10582)) - Add `--output-format=json` support to `uv python list` ([#​10596](astral-sh/uv#10596)) ##### Python The managed Python distributions have been updated, including: - Python 3.14 support on Windows - Python 3.14.0a4 support - 64-bit RISC-V Linux support - Bundled `libedit` updated from [`2021091`](https://github.com/astral-sh/uv/commit/20210910)-3.1 -> [`2024080`](https://github.com/astral-sh/uv/commit/20240808)-3.1 - Bundled `tcl/tk` updated from 8.6.12 -> 8.6.14 (for all Python versions on Unix, only for Python 3.14 on Windows) See the [`python-build-standalone` release notes](https://github.com/astral-sh/python-build-standalone/releases/tag/20250115) for more details. ##### Performance - Avoid allocating when stripping source distribution extension ([#​10625](astral-sh/uv#10625)) - Reduce `WheelFilename` to 48 bytes ([#​10583](astral-sh/uv#10583)) - Reduce distribution size to 200 bytes ([#​10601](astral-sh/uv#10601)) - Remove `import re` from entrypoint wrapper scripts ([#​10627](astral-sh/uv#10627)) - Shrink size of platform tag enum ([#​10546](astral-sh/uv#10546)) - Use `ArcStr` in verbatim URL ([#​10600](astral-sh/uv#10600)) - Use `memchr` for wheel parsing ([#​10620](astral-sh/uv#10620)) ##### Bug fixes - Avoid reading symlinks during `uv python install` on Windows ([#​10639](astral-sh/uv#10639)) - Correct Pyston tag format ([#​10580](astral-sh/uv#10580)) - Provide `pyproject.toml` path for parse errors in `uv venv` ([#​10553](astral-sh/uv#10553)) - Don't treat `setuptools` and `wheel` as seed packages in uv sync on Python 3.12 ([#​10572](astral-sh/uv#10572)) - Fix git-tag cache-key reader in case of slashes ([#​10467](astral-sh/uv#10467)) ([#​10500](astral-sh/uv#10500)) - Include build tag in rendered wheel filenames ([#​10599](astral-sh/uv#10599)) - Patch embedded install path for Python dylib on macOS during `python install` ([#​10629](astral-sh/uv#10629)) - Read cached registry distributions when `--config-settings` are present ([#​10578](astral-sh/uv#10578)) - Show resolver hints for packages with markers ([#​10607](astral-sh/uv#10607)) ##### Documentation - Add meta titles to documents in guides, excluding integration documents ([#​10539](astral-sh/uv#10539)) - Remove `build-system` from example workspace rot ([#​10636](astral-sh/uv#10636)) ##### Preview features - Make build backend type annotations more generic ([#​10549](astral-sh/uv#10549)) ### [`v0.5.18`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0518) [Compare Source](astral-sh/uv@0.5.17...0.5.18) ##### Bug fixes - Avoid forking for identical markers ([#​10490](astral-sh/uv#10490)) - Avoid panic in `uv remove` when only comments exist ([#​10484](astral-sh/uv#10484)) - Revert "improve shell compatibility of venv activate scripts ([#​10397](astral-sh/uv#10397))" ([#​10497](astral-sh/uv#10497)) ### [`v0.5.17`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0517) [Compare Source](astral-sh/uv@0.5.16...0.5.17) This release includes support for generating lockfiles from scripts based on inline metadata, as defined in PEP 723. By default, scripts remain unlocked, and must be locked explicitly with `uv lock --script /path/to/script.py`, which will generate a lockfile adjacent to the script (e.g., `script.py.lock`). Once generated, the lockfile will be respected (and updated, if necessary) across `uv run --script`, `uv add --script`, and `uv remove --script` invocations. This release also includes support for `uv export --script` and `uv tree --script`. Both commands support PEP 723 scripts with and without accompanying lockfiles. ##### Enhancements - Add support for locking PEP 723 scripts ([#​10135](astral-sh/uv#10135)) - Respect PEP 723 script lockfiles in `uv run` ([#​10136](astral-sh/uv#10136)) - Update PEP 723 lockfile in `uv add --script` ([#​10145](astral-sh/uv#10145)) - Update PEP 723 lockfile in `uv remove --script` ([#​10162](astral-sh/uv#10162)) - Add `--script` support to `uv export` for PEP 723 scripts ([#​10160](astral-sh/uv#10160)) - Add `--script` support to `uv tree` for PEP 723 scripts ([#​10159](astral-sh/uv#10159)) - Add `ls` alias to `uv {tool, python, pip} list` ([#​10240](astral-sh/uv#10240)) - Allow reading `--with-requirements` from stdin in `uv add` and `uv run` ([#​10447](astral-sh/uv#10447)) - Warn-and-ignore for unsupported `requirements.txt` options ([#​10420](astral-sh/uv#10420)) ##### Preview features - Add remaining Python type annotations to build backend ([#​10434](astral-sh/uv#10434)) ##### Performance - Avoid allocating for names in the PEP 508 parser ([#​10476](astral-sh/uv#10476)) - Fetch concurrently for non-first-match index strategies ([#​10432](astral-sh/uv#10432)) - Remove unnecessary `.to_string()` call ([#​10419](astral-sh/uv#10419)) - Respect sentinels in package prioritization ([#​10443](astral-sh/uv#10443)) - Use `ArcStr` for marker values ([#​10453](astral-sh/uv#10453)) - Use `ArcStr` for package, extra, and group names ([#​10475](astral-sh/uv#10475)) - Use `matches!` rather than `contains` in `requirements.txt` parsing ([#​10423](astral-sh/uv#10423)) - Use faster disjointness check for markers ([#​10439](astral-sh/uv#10439)) - Pre-compute PEP 508 markers from universal markers ([#​10472](astral-sh/uv#10472)) ##### Bug fixes - Fix `UV_FIND_LINKS` delimiter to split on commas ([#​10477](astral-sh/uv#10477)) - Improve `uv tool list` output when tool environment is broken ([#​10409](astral-sh/uv#10409)) - Only track markers for compatible versions ([#​10457](astral-sh/uv#10457)) - Respect `requires-python` when installing tools ([#​10401](astral-sh/uv#10401)) - Visit proxy packages eagerly ([#​10441](astral-sh/uv#10441)) - Improve shell compatibility of `venv` activate scripts ([#​10397](astral-sh/uv#10397)) - Read publish username from URL ([#​10469](astral-sh/uv#10469)) ##### Documentation - Add Lambda layer instructions to AWS Lambda guide ([#​10411](astral-sh/uv#10411)) - Add `uv lock --script` to the docs ([#​10414](astral-sh/uv#10414)) - Use Windows-specific instructions in Jupyter guide ([#​10446](astral-sh/uv#10446)) ### [`v0.5.16`](https://github.com/astral-sh/uv/blob/HEAD/CHANGELOG.md#0516) [Compare Source](astral-sh/uv@0.5.15...0.5.16) ##### Enhancements - Accept full requirements in `uv remove` ([#​10338](astral-sh/uv#10338)) ##### Performance - Avoid over-counting versions in batch prefetcher ([#​10350](astral-sh/uv#10350)) - Deactivate tracing for version-choosing ([#​10351](astral-sh/uv#10351)) - Force a niche into `VersionSmall` ([#​10385](astral-sh/uv#10385)) - Optimize `requirements_for_extra` ([#​10348](astral-sh/uv#10348)) - Re-enable `zlib-ng` on x86 platforms ([#​10365](astral-sh/uv#10365)) - Re-enable zlib-ng on all platforms (except s390x, PowerPC, and FreeBSD) ([#​10370](astral-sh/uv#10370)) - Remove `[u64; 4]` from small version to move `Arc` to full version ([#​10345](astral-sh/uv#10345)) - Shrink `Dist` from 352 to 288 bytes ([#​10389](astral-sh/uv#10389)) - Speed up file pins by removing nested hash map ([#​10346](astral-sh/uv#10346)) - Buffer file reads in `serde_json::from_reader` ([#​10341](astral-sh/uv#10341)) ##### Bug fixes - Avoid enforcing project-level required version for `uv self` ([#​10374](astral-sh/uv#10374)) - Fix Ruff linting warnings from generated template files for extension modules ([#​10371](astral-sh/uv#10371)) ##### Documentation - Add AWS Lambda integration guide ([#​10278](astral-sh/uv#10278)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever MR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this MR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box --- This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS45My4wIiwidXBkYXRlZEluVmVyIjoiMzkuMTE4LjUiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbIlJlbm92YXRlIEJvdCJdfQ==-->
Ref #10344
Cloning and dropping the version arc took a significant fraction of the time in the resolver, which is a large overhead especially for the small variant that has only 9 bytes payload.
When moving the
Arcto only apply to the full variant, the small variant is too large because it stores a[u64; 4]to have a release accessor a&[u64]that's shared with theVec<u64>of the full variant. We proxy this by first extracting the compressed version digits of the small variant to a proxy type that stores up to 4 u64 on the stack and can be deref'ed to the existing&[u64], minimizing churn.