Skip to content

perf: add FP16 fast path for LayerNorm#433

Merged
HenryNdubuaku merged 2 commits intocactus-compute:mainfrom
yujonglee:perf/layernorm-fp16-fast-path
Feb 27, 2026
Merged

perf: add FP16 fast path for LayerNorm#433
HenryNdubuaku merged 2 commits intocactus-compute:mainfrom
yujonglee:perf/layernorm-fp16-fast-path

Conversation

@yujonglee
Copy link
Copy Markdown
Contributor

Shape Before After Δ latency Δ throughput
whisper-full (1500×512) FP16 0.896ms, 3.43 GB/s 0.692ms, 4.44 GB/s −23% +29%
bert-128 (128×1024) FP16 0.154ms, 3.41 GB/s 0.118ms, 4.44 GB/s −23% +30%
whisper-128 (128×512) FP16 0.079ms, 3.35 GB/s 0.060ms, 4.39 GB/s −24% +31%
FP32 (any shape) baseline unchanged ~0% ~0%

@HenryNdubuaku HenryNdubuaku merged commit 7171657 into cactus-compute:main Feb 27, 2026
5 of 6 checks passed
HenryNdubuaku pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 27, 2026
* add correctness and performance tests for layernorm

Signed-off-by: Yujong Lee <[email protected]>

* optimize!

Signed-off-by: Yujong Lee <[email protected]>

---------

Signed-off-by: Yujong Lee <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
* add correctness and performance tests for layernorm

Signed-off-by: Yujong Lee <[email protected]>

* optimize!

Signed-off-by: Yujong Lee <[email protected]>

---------

Signed-off-by: Yujong Lee <[email protected]>
@yujonglee yujonglee deleted the perf/layernorm-fp16-fast-path branch February 28, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants