Incorporate build tag into wheel prioritization#3781
Conversation
CodSpeed Performance ReportMerging #3781 will degrade performances by 6.1%Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Is there a more compact representation for this?
There was a problem hiding this comment.
Yes. Using Option<Arc<str>> here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>, but I would go with Arc because I noticed this is being cloned in version_map.
Getting more compact than that I think would look more like how we deal with Version: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.
For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]. So something like this:
pub struct BuildTag(Arc<BuildTagInner>);
enum BuildTagInner {
Compact([u8; 8]),
General(u32, Option<Arc<str>>),
}You'd probably want to use a bit somewhere in the [u8; 8] to indicate whether the suffix is present or not.
But the problem here is that the size of BuildTagInner is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.
However, the heap indirection here does reduce the size of BuildTag itself to one word of memory. Maybe that's worth it to make WheelFilename overall smaller. So, something like this:
pub struct BuildTag(Arc<BuildTagInner>);
struct BuildTagInner(u32, Option<Box<str>>);But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.
Unless we have some data pointing at this as a problem, I'd just switch Option<String> to Option<Arc<str>> and call that good enough for now.
There was a problem hiding this comment.
Sounds good. The vast majority of wheels have no build tag, and of those that do, I've only ever seen them be numerical, so good not to allocate there. (But the spec says they can be non-numerical, hence this dance.)
2a1949a to
e6bed3e
Compare
| Self::Git => "git-v0", | ||
| Self::Interpreter => "interpreter-v1", | ||
| Self::Simple => "simple-v7", | ||
| Self::Simple => "simple-v8", |
There was a problem hiding this comment.
Nice. I was looking for this!
| self.abi_tag.join("."), | ||
| self.platform_tag.join(".") | ||
| ) | ||
| } |
There was a problem hiding this comment.
Does this not need the build tag somewhere in the formatted string?
There was a problem hiding this comment.
Yes. Using Option<Arc<str>> here will use one fewer words of memory (24 bytes instead of 32 on 64-bit targets). You could use Box<str>, but I would go with Arc because I noticed this is being cloned in version_map.
Getting more compact than that I think would look more like how we deal with Version: optimize some set of common cases into a more compact form and then use a bigger representation for "everything else." But this relies on being able to identify the common cases and the common cases being subject to optimization.
For example, maybe 95% of all build tags have a number less than 256 and no more than 7 bytes of suffix. Those could all be represented by [u8; 8]. So something like this:
pub struct BuildTag(Arc<BuildTagInner>);
enum BuildTagInner {
Compact([u8; 8]),
General(u32, Option<Arc<str>>),
}You'd probably want to use a bit somewhere in the [u8; 8] to indicate whether the suffix is present or not.
But the problem here is that the size of BuildTagInner is still the size of its biggest variant. So, you wouldn't really be saving any heap memory here. And I'm not sure you get much from the compact representation unless it can make comparisons substantially faster in a place where it matters.
However, the heap indirection here does reduce the size of BuildTag itself to one word of memory. Maybe that's worth it to make WheelFilename overall smaller. So, something like this:
pub struct BuildTag(Arc<BuildTagInner>);
struct BuildTagInner(u32, Option<Box<str>>);But, this means every build tag necessarily requires an allocation, where as the representation you have now only does an alloc when there is a non-empty suffix.
Unless we have some data pointing at this as a problem, I'd just switch Option<String> to Option<Arc<str>> and call that good enough for now.
e6bed3e to
1a5ee74
Compare
Summary
It turns out that in the spec, if a wheel filename includes a build tag, then we need to use it to break ties. This PR implements that behavior. (Previously, we dropped the build tag entirely.)
Closes #3779.
Test Plan
Run:
cargo run pip install -i https://pypi.anaconda.org/intel/simple mkl_fft==1.3.8 --python-platform linux --python-version 3.10. This now resolves without error. Previously, we selected build tag 63 ofmkl_fft==1.3.8, which led to an incompatibility with NumPy. Now, we select build tag 70.