Skip to content

fix: apply logit bias before greedy sampling#507

Merged
HenryNdubuaku merged 4 commits intomainfrom
bias-greedy-sampling
Mar 9, 2026
Merged

fix: apply logit bias before greedy sampling#507
HenryNdubuaku merged 4 commits intomainfrom
bias-greedy-sampling

Conversation

@ncylich
Copy link
Copy Markdown
Collaborator

@ncylich ncylich commented Mar 7, 2026

Summary

  • Greedy sampling (temperature=0) previously skipped logit bias application, returning the argmax of raw logits instead of biased logits
  • Moved bias application before the greedy early-return path in both cactus_sample_f32 and cactus_sample_f16
  • Unified both functions to follow the same code pattern: memcpy → bias → greedy check → temperature scaling → sampling
  • Replaced manual argmax loops with std::max_element and manual copy loops with std::memcpy

Copilot AI review requested due to automatic review settings March 7, 2026 22:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes sampling correctness by ensuring logit bias is applied before the greedy (temperature=0) early-return path, so greedy decoding selects the argmax of biased logits rather than raw logits.

Changes:

  • Apply logit bias before the greedy early-return in cactus_sample_f32 and cactus_sample_f16.
  • Replace manual copy loops with std::memcpy and manual argmax loops with std::max_element.
  • In FP16 sampling, ensure temperature scaling operates on filtered_logits (so bias is preserved through scaling).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ncylich added 2 commits March 7, 2026 14:36
Moves the empty-vocab guard to the top of cactus_sample_f32 and
cactus_sample_f16 so it covers all paths (not just greedy), preventing
UB from std::max_element on empty vectors.


Signed-off-by: Noah Cylich <[email protected]>
Signed-off-by: Noah Cylich <[email protected]>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert(n <= 8) is compiled out in release builds, leaving the fixed-size
float c[8] buffer unprotected. Replace with an early return.


Signed-off-by: Noah Cylich <[email protected]>
@ncylich ncylich force-pushed the bias-greedy-sampling branch 2 times, most recently from ccb37f5 to 4fab757 Compare March 7, 2026 23:33
@HenryNdubuaku HenryNdubuaku merged commit 55abbff into main Mar 9, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants