Skip to content

Attempt to recover from parsing failure on truncated input by returning last partial parse#20204

Closed
pwilkin wants to merge 1 commit intoggml-org:masterfrom
pwilkin:recover-parse
Closed

Attempt to recover from parsing failure on truncated input by returning last partial parse#20204
pwilkin wants to merge 1 commit intoggml-org:masterfrom
pwilkin:recover-parse

Conversation

@pwilkin
Copy link
Copy Markdown
Member

@pwilkin pwilkin commented Mar 7, 2026

When model output ends abruptly, we might end in a scenario where not everything is output, for example, we might not have ended reasoning yet. Nevertheless, try to recover by returning the last partial output instead of throwing an error.

Fixes #20193

@pwilkin pwilkin requested review from ggerganov and ngxson as code owners March 7, 2026 18:22
@aldehir
Copy link
Copy Markdown
Contributor

aldehir commented Mar 7, 2026

I think we just produce an incomplete AST like we do during streaming. I'll add it to my PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: llama-server responds with error code 500 and "Failed to parse input at pos ..." message when max_tokens is reached

2 participants