Skip to content

Fix hybrid attention sliding window for Gemma (#320)#338

Merged
HenryNdubuaku merged 5 commits intocactus-compute:mainfrom
jrajala6:gemma_hybrid_attention
Feb 12, 2026
Merged

Fix hybrid attention sliding window for Gemma (#320)#338
HenryNdubuaku merged 5 commits intocactus-compute:mainfrom
jrajala6:gemma_hybrid_attention

Conversation

@jrajala6
Copy link
Copy Markdown
Contributor

The hybrid INT8 attention kernel (cactus_attention_hybrid_int8_fp16) wasn't using window_size during decode, so local attention layers in Gemma attended to the full context instead of just recent tokens, breaking the hybrid attention mechanism.

This pr threads window_size from model_gemma.cpp through the graph API down to the kernel. In the kernel, positions outside the window get masked to -inf before softmax. The parameter defaults to 0 (global attention) so nothing changes for other models.

Tested with test_kernel (10/10), test_graph (38/38), and end-to-end chat with Gemma 3 1B.

@jrajala6 jrajala6 closed this Feb 10, 2026
@jrajala6 jrajala6 reopened this Feb 10, 2026
@jrajala6 jrajala6 force-pushed the gemma_hybrid_attention branch from 1ac7c56 to c0a7973 Compare February 10, 2026 22:49
jrajala6 and others added 2 commits February 10, 2026 14:52
Signed-off-by: HenryNdubuaku <[email protected]>
@HenryNdubuaku
Copy link
Copy Markdown
Collaborator

thanks a million for this @jrajala6

@HenryNdubuaku HenryNdubuaku merged commit 730d683 into cactus-compute:main Feb 12, 2026
1 check passed
ncylich pushed a commit that referenced this pull request Feb 24, 2026
* Added hybrid attention functionality to gemma

Signed-off-by: Jisha Rajala <[email protected]>

* Reduced diff in kernel_attention

Signed-off-by: Jisha Rajala <[email protected]>

* Reduced diffs in kernel_attention.cpp

Signed-off-by: Jisha Rajala <[email protected]>

* Remove int4_learning_lab.cpp from PR

Signed-off-by: Jisha Rajala <[email protected]>

* some changes

Signed-off-by: HenryNdubuaku <[email protected]>

---------

Signed-off-by: Jisha Rajala <[email protected]>
Signed-off-by: HenryNdubuaku <[email protected]>
Co-authored-by: HenryNdubuaku <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants