Skip to content

Thread STFT magnitude and vectorize bilinear interpolation #406

@HenryNdubuaku

Description

@HenryNdubuaku

kernel_conv.cpp — Conv1D & Signal Ops

Change Impact
Thread cactus_stft_magnitude_f16 — add parallel_for_2d over (N, num_fft_bins) High — currently single-threaded, bottleneck for Silero VAD
Vectorize cactus_bilinear_interpolation_f16 — the innermost embed_dim loop is pure scalar; replace with 8-wide vld1q_f16 + vfmaq_f16 Medium — bottleneck for vision encoders
Improve depthwise conv gather — for dilation == 1, the input slice is contiguous and can use vld1q_f16 directly instead of scalar gather into a stack array Medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions