Skip to content

Issue #406: Bilinear + Depthwise Optimizations#466

Merged
HenryNdubuaku merged 2 commits intocactus-compute:mainfrom
PiyawanChaiprasit2006:stft-bilinear-opt-406
Feb 27, 2026
Merged

Issue #406: Bilinear + Depthwise Optimizations#466
HenryNdubuaku merged 2 commits intocactus-compute:mainfrom
PiyawanChaiprasit2006:stft-bilinear-opt-406

Conversation

@PiyawanChaiprasit2006
Copy link
Copy Markdown
Contributor

Issue #406: Vectorize bilinear + depthwise conv improvements

Completed:

  • Vectorized cactus_bilinear_interpolation_f16, replacing the innermost scalar embed_dim loop with 8-wide vld1q_f16 + vfmaq_f16.
  • Improved depthwise conv gather for dilation == 1 by using vld1q_f16 directly instead.

Closes #406

@HenryNdubuaku HenryNdubuaku merged commit 899e290 into cactus-compute:main Feb 27, 2026
1 check failed
HenryNdubuaku pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 27, 2026
…compute#466)

* Vectorize embed_dim loop in cactus_bilinear_interpolation_f16 (cactus-compute#406)

* Improve depthwise conv gather — for dilation == 1 (cactus-compute#406)

---------

Co-authored-by: Piyawan Chaiprasit <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
cattermelon1234 pushed a commit to cattermelon1234/cactus that referenced this pull request Feb 28, 2026
…compute#466)

* Vectorize embed_dim loop in cactus_bilinear_interpolation_f16 (cactus-compute#406)

* Improve depthwise conv gather — for dilation == 1 (cactus-compute#406)

---------

Co-authored-by: Piyawan Chaiprasit <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread STFT magnitude and vectorize bilinear interpolation

3 participants