Skip to content

fix: make FunctionGemma prompt formatting strict#502

Merged
HenryNdubuaku merged 1 commit intocactus-compute:mainfrom
lennartvoelz:fix/issue#501
Mar 7, 2026
Merged

fix: make FunctionGemma prompt formatting strict#502
HenryNdubuaku merged 1 commit intocactus-compute:mainfrom
lennartvoelz:fix/issue#501

Conversation

@lennartvoelz
Copy link
Copy Markdown
Contributor

  • Remove hardcoded “When you decide to call…” guidance and arg example from developer turn
  • Fix FunctionGemma trigger string (no trailing period) and append tools declarations directly
  • Wrap tool responses in a developer turn and allow stacking multiple tool responses
  • Stop wrapping tool outputs in value:; pass through {...} payload as-is
  • Close pending tool-response developer turn before next user/model turn to avoid malformed prompts

Follows the official docs

…mple, fix tool response wrapping)

Remove hardcoded “When you decide to call…” guidance and arg example from developer turn

Fix FunctionGemma trigger string (no trailing period) and append tools declarations directly

Wrap tool responses in a developer turn and allow stacking multiple tool responses

Stop wrapping tool outputs in value:; pass through {...} payload as-is

Close pending tool-response developer turn before next user/model turn to avoid malformed prompts

Signed-off-by: Lennart <[email protected]>
@HenryNdubuaku
Copy link
Copy Markdown
Collaborator

@lennartvoelz thanks for this! one thing, this fails more tool call tests than the old setup, any insights as to why?

@lennartvoelz
Copy link
Copy Markdown
Contributor Author

lennartvoelz commented Mar 6, 2026

What did you test it against? @HenryNdubuaku

@lennartvoelz
Copy link
Copy Markdown
Contributor Author

lennartvoelz commented Mar 7, 2026

@HenryNdubuaku

I would say for the standard weights, it is just noise. But there are a few things off with FunctionGemma setup rn. I'll later push some changes that make the model much more stable. As said, I tested against the Hackathon dataset. With my changes, the effect is especially visible for fine tuned models.
Fine-tuned & new setup:
Bildschirmfoto 2026-03-07 um 14 13 23
Fine-tuned & old setup:
Bildschirmfoto 2026-03-07 um 14 13 47
Base weights & new setup:
Bildschirmfoto 2026-03-07 um 14 14 27
Base weights & old setup:
Bildschirmfoto 2026-03-07 um 14 14 02

With the changes in the Gemma implementation, the effect of the extra tokens on latency is much smaller. However, I believe it is better practice to keep it standard (which also works better in this case!).
I am still working on reducing the execution time while maintaining the much better sampling quality.

@HenryNdubuaku
Copy link
Copy Markdown
Collaborator

thanks @lennartvoelz I will merge this now, thanks for testing

@HenryNdubuaku HenryNdubuaku merged commit f8b714c into cactus-compute:main Mar 7, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants