Commit bcf14fd
committed
tools/main: llama-cli: prevent spurious assistant token (ggml-org#13402)
During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.
Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.
Fixes ggml-org#13402.
Signed-off-by: Vinkal Chudgar <[email protected]>1 parent 138c87c commit bcf14fd
1 file changed
+4
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
706 | 706 | | |
707 | 707 | | |
708 | 708 | | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
709 | 713 | | |
710 | 714 | | |
711 | 715 | | |
| |||
826 | 830 | | |
827 | 831 | | |
828 | 832 | | |
829 | | - | |
830 | | - | |
831 | | - | |
832 | 833 | | |
833 | 834 | | |
834 | 835 | | |
| |||
0 commit comments