Add Gemma 3N (text-only) model support by ncylich · Pull Request #493 · cactus-compute/cactus

ncylich · 2026-03-04T18:52:38Z

Summary

Adds text-only Gemma 3N (E4B) inference with AltUp (4-stream alternating updates), Laurel (learned augmented residual layers), and Per-Layer Input (PLI)
Implements hybrid local/global attention with sliding window, KV-cache sharing for the last 10 layers, per-head QK normalization, V normalization, and Gaussian top-k activation sparsity
Adds weight conversion pipeline and config parsing for gemma3n model type

Signed-off-by: Noah Cylich <[email protected]>

Copilot

Pull request overview

Adds runtime + conversion support for a new gemma3n model type, including Gemma 3N-specific config parsing, weight export patterns, and a new C++ model implementation integrated into the engine/tooling paths.

Changes:

Add Gemma 3N config extraction + model-type detection in the Python conversion pipeline.
Export Gemma 3N-specific weights (AltUp/Laurel/PLI and tower-prefixed tensors) during conversion.
Introduce GemmaModel3n in the C++ runtime and wire GEMMA3N into engine + tool-calling code paths.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
python/src/weight_patterns.py	Adds Gemma 3N global weight mappings and new per-layer patterns.
python/src/tensor_io.py	Tweaks precision override rules for embed-related tensors (incl. `embed_tokens_per_layer`).
python/src/converter.py	Adds `gemma3n` config extraction and Gemma 3N-specific weight export (global + tower prefixes).
python/src/config_utils.py	Adds `gemma3n` detection and Gemma 3N-specific config extraction (AltUp/Laurel/rope/etc.).
python/requirements.txt	Adds new Python dependencies (`timm`, `sentencepiece`).
cactus/models/model_gemma3n.cpp	New Gemma 3N model implementation (AltUp/Laurel/PLI + hybrid attention).
cactus/models/model.h	Declares `GemmaModel3n` and its weight-node layout.
cactus/ffi/cactus_complete.cpp	Treats `GEMMA3N` like `GEMMA` for tool formatting + stop sequences.
cactus/engine/engine_model.cpp	Adds `GEMMA3N` attention scaling, config parsing fields, logit softcapping, and model factory wiring.
cactus/engine/engine_constraints.cpp	Enables Gemma-style tool-call constraints for `GEMMA3N`.
cactus/engine/engine.h	Extends config/model-type enums and adds Gemma 3N config fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cactus/models/model.h

python/src/converter.py

cactus/models/model_gemma3n.cpp

Signed-off-by: Noah Cylich <[email protected]>

This reverts commit 3416df3. Signed-off-by: Noah Cylich <[email protected]>

This reverts commit 941c2f8. Signed-off-by: Noah Cylich <[email protected]>

Signed-off-by: Noah Cylich <[email protected]>

This reverts commit 2edf384. Signed-off-by: Noah Cylich <[email protected]>

Signed-off-by: Noah Cylich <[email protected]>

HenryNdubuaku · 2026-03-06T05:52:40Z

@ncylich we need to resolve the conflicts, also you need to add gemma 3n to the readme

Signed-off-by: Noah Cylich <[email protected]>

Signed-off-by: Karen Mosoyan <[email protected]>

Signed-off-by: Noah Cylich <[email protected]>

Signed-off-by: HenryNdubuaku <[email protected]>

ncylich added 3 commits March 3, 2026 23:12

weight saving

aae48d6

Signed-off-by: Noah Cylich <[email protected]>

WORKING!

9773bcd

Signed-off-by: Noah Cylich <[email protected]>

making embeddings int8 for int4

b93098c

Signed-off-by: Noah Cylich <[email protected]>

Copilot AI review requested due to automatic review settings March 4, 2026 18:52

Copilot started reviewing on behalf of ncylich March 4, 2026 18:53 View session

no rsqrt2 hard code float

fa55bd3

Signed-off-by: Noah Cylich <[email protected]>

Copilot AI reviewed Mar 4, 2026

View reviewed changes

cactus/models/model.h Show resolved Hide resolved

python/src/converter.py Show resolved Hide resolved

cactus/models/model_gemma3n.cpp Outdated Show resolved Hide resolved

cactus/models/model_gemma3n.cpp Show resolved Hide resolved

cactus/models/model_gemma3n.cpp Show resolved Hide resolved

ncylich added 9 commits March 4, 2026 12:29

sparse 16-group matmul

941c2f8

Signed-off-by: Noah Cylich <[email protected]>

group size 32 kernel

3416df3

Signed-off-by: Noah Cylich <[email protected]>

Revert "group size 32 kernel"

0eb2d5e

This reverts commit 3416df3. Signed-off-by: Noah Cylich <[email protected]>

Revert "sparse 16-group matmul"

8e4ee10

This reverts commit 941c2f8. Signed-off-by: Noah Cylich <[email protected]>

clearer/safer config params

97f56cc

Signed-off-by: Noah Cylich <[email protected]>

small inaccuracy fixes and prefill optimization (40% skipped)

4ac7434

Signed-off-by: Noah Cylich <[email protected]>

testing helpers

2edf384

Signed-off-by: Noah Cylich <[email protected]>

Revert "testing helpers"

8d64e78

This reverts commit 2edf384. Signed-off-by: Noah Cylich <[email protected]>

finishing touches

ce18729

Signed-off-by: Noah Cylich <[email protected]>

ncylich and others added 10 commits March 6, 2026 02:29

Merge branch 'main' into 3n

8e7701b

Signed-off-by: Noah Cylich <[email protected]>

adding gemma3n to readme

4caa43c

Signed-off-by: Noah Cylich <[email protected]>

new approximation for exponent on (0,1)

a6b9649

Signed-off-by: Karen Mosoyan <[email protected]>

DEBUGGED AND WORKING GEMMA 3N

476638f

Signed-off-by: Noah Cylich <[email protected]>

Removing all debugging files and debugging code from gemma 3n.

08362f4

Signed-off-by: Noah Cylich <[email protected]>

mode adaptive and safe gaussian topk filtering

7ea523f

Signed-off-by: Noah Cylich <[email protected]>

Merge branch 'main' into 3n

bc12115

Signed-off-by: Noah Cylich <[email protected]>

Merge branch 'main' into 3n

95a10c5

fused altup

891261a

Signed-off-by: HenryNdubuaku <[email protected]>

Further optimisations

3fb7809

Signed-off-by: HenryNdubuaku <[email protected]>

HenryNdubuaku merged commit 81b6b3c into main Mar 7, 2026
4 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Gemma 3N (text-only) model support#493

Add Gemma 3N (text-only) model support#493
HenryNdubuaku merged 23 commits intomainfrom
3n

ncylich commented Mar 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HenryNdubuaku commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ncylich commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HenryNdubuaku commented Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ncylich commented Mar 4, 2026 •

edited

Loading