common : fix common_chat_peg_parse for incomplete utf-8 sequence tail by akreal · Pull Request #19992 · ggml-org/llama.cpp

akreal · 2026-02-28T17:36:09Z

Sometimes I experience an issue with mistralai/Ministral-3-3B-Instruct-2512-GGUF:Q8_0 when it hits maximum generated tokens and leaves an incomplete UTF-8 sequence in the end of message (e.g. token IDs 1226 and 1156). This message then causes common_chat_peg_parse to fail with Failed to parse input at pos 0 error.

Claude Sonnet 4.6 was used to analyze the problem and to generate the code. Everything was manually reviewed and tested.

akreal · 2026-03-02T06:43:15Z

After more experiments it appeared that invalid UTF-8 sequence can appear also in the middle of string, so now the fix handles both mid-string and end-string conditions. But I am not sure if this is the correct place to do this.

akreal requested a review from ggerganov as a code owner February 28, 2026 17:36

github-actions bot added the testing Everything test related label Feb 28, 2026

Remove invalid UTF-8 parts in common_chat_peg_parse

1f03175

akreal force-pushed the strip_incomplete_utf8_tail branch from 2203f50 to 1f03175 Compare March 2, 2026 06:39

aldehir mentioned this pull request Mar 7, 2026

common : gracefully handle incomplete output #20191

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : fix common_chat_peg_parse for incomplete utf-8 sequence tail#19992

common : fix common_chat_peg_parse for incomplete utf-8 sequence tail#19992
akreal wants to merge 1 commit intoggml-org:masterfrom
akreal:strip_incomplete_utf8_tail

akreal commented Feb 28, 2026

Uh oh!

akreal commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akreal commented Feb 28, 2026

Uh oh!

akreal commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant