-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Vectorize TensorPrimitives.CosineSimilarity<Half> #116898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Vectorize for Half by processing it as shorts, using the existing widening routine to two vectors of floats, and operating on those floats. Even for non-vectorized, this improves throughput as each intermediate operation is operating on floats rather than constantly needing to convert back to Half.
|
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds explicit vectorization support for Half inputs in TensorPrimitives.CosineSimilarity, refactors the core implementation to use common Update/Finalize helpers, and introduces a specialized CosineSimilarityHalfCore that processes Half as widened floats.
- Adds a generic wrapper for
CosineSimilarity<T>that dispatches to a newHalf-specific path - Refactors existing vector‐and‐scalar loops into shared
UpdateandFinalizemethods - Implements
CosineSimilarityHalfCorewith 128/256/512-bit vector and scalar fallbacks forHalf
Comments suppressed due to low confidence (2)
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.CosineSimilarity.cs:184
- A new specialized path for
Halfhas been added but no tests forTensorPrimitives.CosineSimilarityonHalfarrays appear in this PR. Please add unit tests covering both vectorized and scalar code paths to validate correctness.
private static Half CosineSimilarityHalfCore(ReadOnlySpan<Half> x, ReadOnlySpan<Half> y)
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.CosineSimilarity.cs:31
- The XML doc for
CosineSimilarity<T>does not mention the newHalf-specialized path. Please update the summary to note thatHalfinputs are now vectorized viaHalf⇒short⇒floatwidening.
public static T CosineSimilarity<T>(ReadOnlySpan<T> x, ReadOnlySpan<T> y)
...em.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.CosineSimilarity.cs
Show resolved
Hide resolved
...em.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.CosineSimilarity.cs
Show resolved
Hide resolved
...em.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.CosineSimilarity.cs
Show resolved
Hide resolved
tannergooding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
It's a bit unfortunate we need to duplicate the CosineSimilarityCore function here. I expect we could have a general operate with m-to-n intermediate helper, but that would be a larger refactoring (and I don't think it's worth blocking this on making that happen).
I have such a helper in another PR I'll put up for other methods, but applying it to CosineSimliarity (which doesn't use any of the shared helpers or operators) results in roundtripping between Half and float for each operation, which is measurably worse than staying with float as the accumulator. We can subsequently look at a larger refactoring around our aggregations to enable a) making the accumulation configurable and b) getting CosineSimilarity onto the same helpers (which is desirable, anyway, as it's not currently as robust in its optimizations as the shared helpers are). |
Vectorize for Half by processing it as shorts, using the existing widening routine to two vectors of floats, and operating on those floats. Even for non-vectorized, this improves throughput as each intermediate operation is operating on floats rather than constantly needing to convert back to Half.
Before:
After