fix: prevent Ollama provider from hanging on tool-calling requests#7723
fix: prevent Ollama provider from hanging on tool-calling requests#7723DOsinga merged 6 commits intoblock:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 87518ea85a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a59f743a52
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
DOsinga
left a comment
There was a problem hiding this comment.
LGTM — clean fix for the Ollama hang. The per-chunk timeout correctly skips the first token (avoiding false positives on slow models), and the apply_ollama_options changes properly strip stream_options and remap max_completion_tokens → options.num_predict. Tests are meaningful and cover the actual bug.
michaelneale
left a comment
There was a problem hiding this comment.
yep - looks good. Do you mind resolving merge and updating - and we can get this in @fresh3nough
|
Not at all I'll take care of it today |
Strip unsupported OpenAI-specific fields (stream_options, max_completion_tokens) from the request payload sent to Ollama's /v1/chat/completions endpoint. Convert max_completion_tokens to Ollama's native options.num_predict. Add a 30-second per-chunk streaming timeout so stalled Ollama connections fail fast instead of hanging for the full 600s timeout. Fixes: block#7715 Signed-off-by: fre <[email protected]>
Move the tokio::time::timeout from wrapping message_stream.next() (the output of response_to_streaming_message_ollama) to wrapping the raw FramedRead/LinesCodec line reads via a new with_line_timeout helper. This prevents false stalls during long tool-call generations where response_to_streaming_message_ollama buffers XML content internally without yielding messages, while still timing out dead connections. Signed-off-by: fresh3nough <[email protected]> Signed-off-by: fre <[email protected]>
Update with_line_timeout so the first SSE line is read without the 30s per-chunk timeout, letting time-to-first-token be governed by the request timeout. After the first line arrives, apply tokio::time::timeout to subsequent line reads to preserve dead-connection stall detection. Signed-off-by: fresh3nough <[email protected]> Signed-off-by: fre <[email protected]>
19aec34 to
d483e82
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d483e8269f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…options Non-reasoning models emit max_tokens while reasoning models emit max_completion_tokens. Both are unsupported by Ollama and should be converted to options.num_predict. Update tests to match the actual create_request output for non-reasoning models like llama3.1. Signed-off-by: fre <[email protected]>
Signed-off-by: fre <[email protected]>
|
should be good now @DOsinga @michaelneale |
Signed-off-by: Douwe Osinga <[email protected]>
* main: (337 commits) fix: replace panics with user-friendly errors in CLI session builder (#7901) fix: read GOOSE_CONTEXT_LIMIT from config.yaml, not just env vars (#7900) fix: deliver truncation notice as separate content block (#7899) fix: use platform-appropriate commands in developer extension instructions (#7898) fix: replace any with proper SVG types in icon components (#7873) chore: remove debug console.log statements, stale comments, and dead code (#8142) feat: Gemini OAuth provider (#8129) chore(deps): bump picomatch from 2.3.1 to 2.3.2 in /documentation (#8123) feat: show installed skills in UI (#7910) fix(deps): gate keyring platform features behind target-specific deps (#8039) chore(deps): bump yaml from 2.8.2 to 2.8.3 in /evals/open-model-gym/suite (#8124) fix: strip message wrapper in CLI session title generation (#7996) fix(providers): fall back to configured models when models endpoint fetch fails (#7530) chore(deps): bump brace-expansion from 5.0.3 to 5.0.5 in /evals/open-model-gym/suite (#8139) fix: prevent Ollama provider from hanging on tool-calling requests (#7723) fix: VMware Tanzu Platform provider - bug fixes, streaming, UI improvements (#8126) feat: allow GOOSE_CLI_SHOW_THINKING to be set in config.yaml (#8097) fix: GitHub Copilot auth fails to open browser in Desktop app (#6957) (#8019) fix(ci): produce .tar.gz archives for Zed ACP registry compatibility (#8054) feat: add GOOSE_SHOW_FULL_OUTPUT config to disable tool output truncation (#7919) ... # Conflicts: # crates/goose/src/providers/formats/openai.rs
…lock#7723) Signed-off-by: fre <[email protected]> Signed-off-by: fresh3nough <[email protected]> Signed-off-by: Douwe Osinga <[email protected]> Co-authored-by: Douwe Osinga <[email protected]> Signed-off-by: Cameron Yick <[email protected]>
* main: (337 commits) fix: replace panics with user-friendly errors in CLI session builder (#7901) fix: read GOOSE_CONTEXT_LIMIT from config.yaml, not just env vars (#7900) fix: deliver truncation notice as separate content block (#7899) fix: use platform-appropriate commands in developer extension instructions (#7898) fix: replace any with proper SVG types in icon components (#7873) chore: remove debug console.log statements, stale comments, and dead code (#8142) feat: Gemini OAuth provider (#8129) chore(deps): bump picomatch from 2.3.1 to 2.3.2 in /documentation (#8123) feat: show installed skills in UI (#7910) fix(deps): gate keyring platform features behind target-specific deps (#8039) chore(deps): bump yaml from 2.8.2 to 2.8.3 in /evals/open-model-gym/suite (#8124) fix: strip message wrapper in CLI session title generation (#7996) fix(providers): fall back to configured models when models endpoint fetch fails (#7530) chore(deps): bump brace-expansion from 5.0.3 to 5.0.5 in /evals/open-model-gym/suite (#8139) fix: prevent Ollama provider from hanging on tool-calling requests (#7723) fix: VMware Tanzu Platform provider - bug fixes, streaming, UI improvements (#8126) feat: allow GOOSE_CLI_SHOW_THINKING to be set in config.yaml (#8097) fix: GitHub Copilot auth fails to open browser in Desktop app (#6957) (#8019) fix(ci): produce .tar.gz archives for Zed ACP registry compatibility (#8054) feat: add GOOSE_SHOW_FULL_OUTPUT config to disable tool output truncation (#7919) ... # Conflicts: # crates/goose/src/providers/formats/openai.rs
Summary
Strip unsupported OpenAI-specific fields (
stream_options,max_completion_tokens) from the request payload sent to Ollama's/v1/chat/completionsendpoint. Convertmax_completion_tokensto Ollama's nativeoptions.num_predict. Add a 30-second per-chunk streaming timeout so stalled Ollama connections fail fast instead of hanging for the full 600s timeout.Problem
When goose sends a chat completion request to Ollama (especially with tool definitions), Ollama hangs indefinitely at max CPU. Direct
curlandollama runwork fine. The root cause is that goose's Ollama provider reuses the generic OpenAIcreate_requestwhich emits fields Ollama does not support:stream_options: {"include_usage": true}-- not recognized by Ollamamax_completion_tokens-- should benum_predictinside Ollama'soptionsobjectChanges
apply_ollama_options: Now stripsstream_optionsand convertsmax_completion_tokenstooptions.num_predictbefore sending to Ollamastream_ollama: Added a 30-second per-chunk timeout so stalled streams produce a clear error instead of hanging silentlycreate_requestproduces the problematic fieldsapply_ollama_optionsremoves them correctlyFixes #7715