-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Closed
Labels
Milestone
Description
Description
I noticed that this code snippet outperformes the built-in SoftMax function by 40%-45% while giving the exact same output.
TensorPrimitives.Exp(values, destination);
var sum = TensorPrimitives.Sum(destination);
TensorPrimitives.Divide(destination, sum, destination);Configuration
.NET 9 RC 2
Windows 10 x64 (AMD)
Data
| Method | Count | Mean | Error | StdDev | Ratio | Allocated | Alloc Ratio |
|-------- |-------- |-------------:|-----------:|-----------:|------:|----------:|------------:|
| BuiltIn | 1000 | 4.211 us | 0.0220 us | 0.0206 us | 1.00 | - | NA |
| Mine | 1000 | 2.308 us | 0.0296 us | 0.0277 us | 0.55 | - | NA |
| | | | | | | | |
| BuiltIn | 1000000 | 4,200.558 us | 30.2029 us | 28.2518 us | 1.00 | - | NA |
| Mine | 1000000 | 2,435.979 us | 21.0321 us | 19.6735 us | 0.58 | - | NA |
Analysis
I assume this is because TensorPrimitives.SoftMax computes e^x twice, which is a heavy operation, but it could simply cache the results in destination. This even works when values and destination are the same.
https://github.com/dotnet/runtime/blob/main/src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.SoftMax.cs
neon-sunset