Fix/f16 reduction accum by vyomshah05 · Pull Request #344 · cactus-compute/cactus

vyomshah05 · 2026-02-13T00:07:32Z

Fix: FP16 Reduction Accumulation Precision

Summary

Fixes a numerical precision issue in cactus_sum_all_f16 where accumulation was performed in FP16, causing potential overflow and silent precision loss for large tensors.

Accumulation is now performed in FP32 while keeping the API unchanged.

Problem

Previous implementation accumulated directly in FP16:

FP16 max value: 65504
Limited mantissa precision (~3 decimal digits)

This could cause:

Overflow for moderate tensor sizes
Small values being dropped during accumulation
Silent incorrect results

Fix

Upcast to FP32 immediately after loading and accumulate in FP32:

Input type remains __fp16
Output remains double
Only internal accumulator precision changed

This matches standard ML framework behavior (PyTorch, cuBLAS, etc.).

Validation

Added stress test summing 32k FP16 ones:

Previously produced incorrect results
Now returns the correct sum

Impact

No API changes
Improved numerical stability
Negligible performance impact

Signed-off-by: vyomshah05 <[email protected]>

ParkiratS · 2026-02-13T00:56:16Z

I looked over this PR and it looks good. Have you found any other instances of this mistake in other parts of the kernel?

Signed-off-by: vyomshah05 <[email protected]>

vyomshah05 · 2026-02-13T01:29:43Z

Yes I noticed the error across the axis and mean functions and made similar fixes and validation tests across them as well

ParkiratS · 2026-02-13T19:15:40Z

Taking a further look into this. Will provide feedback by EOD tomorrow at worst.

Signed-off-by: HenryNdubuaku <[email protected]>

ParkiratS · 2026-02-17T17:09:25Z

So there are a few changes we need to make. Firstly, please remove the tests because we do not need the test for this instance since we understand that increased precision will result in higher accuracy. Secondly remove all comments from the code.

Signed-off-by: vyomshah05 <[email protected]>

vyomshah05 · 2026-02-17T20:47:06Z

@ParkiratS I cleaned up the code and tests as you said

ParkiratS · 2026-02-17T23:43:11Z

@vyomshah05 Looks good, I will let Henry know that this is ready to merge.

ParkiratS

Looks good.

* Fix FP16 reduction accumulation Signed-off-by: vyomshah05 <[email protected]> * Fixed test Signed-off-by: vyomshah05 <[email protected]> * fixed error accross all functions Signed-off-by: vyomshah05 <[email protected]> * add citation Signed-off-by: HenryNdubuaku <[email protected]> * Removed unnecessary tests Signed-off-by: vyomshah05 <[email protected]> --------- Signed-off-by: vyomshah05 <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>

vyomshah05 added 2 commits February 12, 2026 15:45

Fix FP16 reduction accumulation

ec25eae

Signed-off-by: vyomshah05 <[email protected]>

Fixed test

a226b06

Signed-off-by: vyomshah05 <[email protected]>

fixed error accross all functions

2250fa9

Signed-off-by: vyomshah05 <[email protected]>

add citation

eae98d5

Signed-off-by: HenryNdubuaku <[email protected]>

Removed unnecessary tests

614744a

Signed-off-by: vyomshah05 <[email protected]>

ParkiratS reviewed Feb 17, 2026

View reviewed changes

HenryNdubuaku merged commit ba86484 into cactus-compute:main Feb 18, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/f16 reduction accum#344

Fix/f16 reduction accum#344
HenryNdubuaku merged 5 commits intocactus-compute:mainfrom
vyomshah05:fix/f16-reduction-accum

vyomshah05 commented Feb 13, 2026

Uh oh!

ParkiratS commented Feb 13, 2026

Uh oh!

vyomshah05 commented Feb 13, 2026

Uh oh!

ParkiratS commented Feb 13, 2026

Uh oh!

ParkiratS commented Feb 17, 2026

Uh oh!

vyomshah05 commented Feb 17, 2026

Uh oh!

ParkiratS commented Feb 17, 2026

Uh oh!

ParkiratS left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vyomshah05 commented Feb 13, 2026

Fix: FP16 Reduction Accumulation Precision

Summary

Problem

Fix

Validation

Impact

Uh oh!

ParkiratS commented Feb 13, 2026

Uh oh!

vyomshah05 commented Feb 13, 2026

Uh oh!

ParkiratS commented Feb 13, 2026

Uh oh!

ParkiratS commented Feb 17, 2026

Uh oh!

vyomshah05 commented Feb 17, 2026

Uh oh!

ParkiratS commented Feb 17, 2026

Uh oh!

ParkiratS left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants