Skip to content

Fix/f16 reduction accum#344

Merged
HenryNdubuaku merged 5 commits intocactus-compute:mainfrom
vyomshah05:fix/f16-reduction-accum
Feb 18, 2026
Merged

Fix/f16 reduction accum#344
HenryNdubuaku merged 5 commits intocactus-compute:mainfrom
vyomshah05:fix/f16-reduction-accum

Conversation

@vyomshah05
Copy link
Copy Markdown
Contributor

Fix: FP16 Reduction Accumulation Precision

Summary

Fixes a numerical precision issue in cactus_sum_all_f16 where accumulation was performed in FP16, causing potential overflow and silent precision loss for large tensors.

Accumulation is now performed in FP32 while keeping the API unchanged.


Problem

Previous implementation accumulated directly in FP16:

  • FP16 max value: 65504
  • Limited mantissa precision (~3 decimal digits)

This could cause:

  • Overflow for moderate tensor sizes
  • Small values being dropped during accumulation
  • Silent incorrect results

Fix

Upcast to FP32 immediately after loading and accumulate in FP32:

  • Input type remains __fp16
  • Output remains double
  • Only internal accumulator precision changed

This matches standard ML framework behavior (PyTorch, cuBLAS, etc.).


Validation

Added stress test summing 32k FP16 ones:

  • Previously produced incorrect results
  • Now returns the correct sum

Impact

  • No API changes
  • Improved numerical stability
  • Negligible performance impact

Signed-off-by: vyomshah05 <[email protected]>
@ParkiratS
Copy link
Copy Markdown
Collaborator

I looked over this PR and it looks good. Have you found any other instances of this mistake in other parts of the kernel?

@vyomshah05
Copy link
Copy Markdown
Contributor Author

Yes I noticed the error across the axis and mean functions and made similar fixes and validation tests across them as well

@ParkiratS
Copy link
Copy Markdown
Collaborator

Taking a further look into this. Will provide feedback by EOD tomorrow at worst.

Signed-off-by: HenryNdubuaku <[email protected]>
@ParkiratS
Copy link
Copy Markdown
Collaborator

So there are a few changes we need to make. Firstly, please remove the tests because we do not need the test for this instance since we understand that increased precision will result in higher accuracy. Secondly remove all comments from the code.

Signed-off-by: vyomshah05 <[email protected]>
@vyomshah05
Copy link
Copy Markdown
Contributor Author

@ParkiratS I cleaned up the code and tests as you said

@ParkiratS
Copy link
Copy Markdown
Collaborator

@vyomshah05 Looks good, I will let Henry know that this is ready to merge.

Copy link
Copy Markdown
Collaborator

@ParkiratS ParkiratS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@HenryNdubuaku HenryNdubuaku merged commit ba86484 into cactus-compute:main Feb 18, 2026
1 of 2 checks passed
ncylich pushed a commit that referenced this pull request Feb 24, 2026
* Fix FP16 reduction accumulation

Signed-off-by: vyomshah05 <[email protected]>

* Fixed test

Signed-off-by: vyomshah05 <[email protected]>

* fixed error accross all functions

Signed-off-by: vyomshah05 <[email protected]>

* add citation

Signed-off-by: HenryNdubuaku <[email protected]>

* Removed unnecessary tests

Signed-off-by: vyomshah05 <[email protected]>

---------

Signed-off-by: vyomshah05 <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
* Fix FP16 reduction accumulation

Signed-off-by: vyomshah05 <[email protected]>

* Fixed test

Signed-off-by: vyomshah05 <[email protected]>

* fixed error accross all functions

Signed-off-by: vyomshah05 <[email protected]>

* add citation

Signed-off-by: HenryNdubuaku <[email protected]>

* Removed unnecessary tests

Signed-off-by: vyomshah05 <[email protected]>

---------

Signed-off-by: vyomshah05 <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants