Skip to content

fix: prevent Ollama provider from hanging on tool-calling requests#7723

Merged
DOsinga merged 6 commits intoblock:mainfrom
fresh3nough:fix/ollama-hang-7715
Mar 26, 2026
Merged

fix: prevent Ollama provider from hanging on tool-calling requests#7723
DOsinga merged 6 commits intoblock:mainfrom
fresh3nough:fix/ollama-hang-7715

Conversation

@fresh3nough
Copy link
Copy Markdown
Contributor

Summary

Strip unsupported OpenAI-specific fields (stream_options, max_completion_tokens) from the request payload sent to Ollama's /v1/chat/completions endpoint. Convert max_completion_tokens to Ollama's native options.num_predict. Add a 30-second per-chunk streaming timeout so stalled Ollama connections fail fast instead of hanging for the full 600s timeout.

Problem

When goose sends a chat completion request to Ollama (especially with tool definitions), Ollama hangs indefinitely at max CPU. Direct curl and ollama run work fine. The root cause is that goose's Ollama provider reuses the generic OpenAI create_request which emits fields Ollama does not support:

  • stream_options: {"include_usage": true} -- not recognized by Ollama
  • max_completion_tokens -- should be num_predict inside Ollama's options object

Changes

  • apply_ollama_options: Now strips stream_options and converts max_completion_tokens to options.num_predict before sending to Ollama
  • stream_ollama: Added a 30-second per-chunk timeout so stalled streams produce a clear error instead of hanging silently
  • Tests: Added 3 new tests:
    • Bug reproduction: verifies create_request produces the problematic fields
    • Fix verification: verifies apply_ollama_options removes them correctly
    • Streaming timeout: verifies stalled streams produce an error within the timeout window

Fixes #7715

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87518ea85a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a59f743a52

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean fix for the Ollama hang. The per-chunk timeout correctly skips the first token (avoiding false positives on slow models), and the apply_ollama_options changes properly strip stream_options and remap max_completion_tokensoptions.num_predict. Tests are meaningful and cover the actual bug.

Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - looks good. Do you mind resolving merge and updating - and we can get this in @fresh3nough

@fresh3nough
Copy link
Copy Markdown
Contributor Author

Not at all I'll take care of it today

Strip unsupported OpenAI-specific fields (stream_options,
max_completion_tokens) from the request payload sent to Ollama's
/v1/chat/completions endpoint. Convert max_completion_tokens to
Ollama's native options.num_predict. Add a 30-second per-chunk
streaming timeout so stalled Ollama connections fail fast instead
of hanging for the full 600s timeout.

Fixes: block#7715
Signed-off-by: fre <[email protected]>
Move the tokio::time::timeout from wrapping message_stream.next()
(the output of response_to_streaming_message_ollama) to wrapping
the raw FramedRead/LinesCodec line reads via a new with_line_timeout
helper.

This prevents false stalls during long tool-call generations where
response_to_streaming_message_ollama buffers XML content internally
without yielding messages, while still timing out dead connections.

Signed-off-by: fresh3nough <[email protected]>
Signed-off-by: fre <[email protected]>
Update with_line_timeout so the first SSE line is read without the
30s per-chunk timeout, letting time-to-first-token be governed by the
request timeout.

After the first line arrives, apply tokio::time::timeout to subsequent
line reads to preserve dead-connection stall detection.

Signed-off-by: fresh3nough <[email protected]>
Signed-off-by: fre <[email protected]>
@fresh3nough fresh3nough force-pushed the fix/ollama-hang-7715 branch from 19aec34 to d483e82 Compare March 25, 2026 20:44
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d483e8269f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…options

Non-reasoning models emit max_tokens while reasoning models emit
max_completion_tokens. Both are unsupported by Ollama and should be
converted to options.num_predict. Update tests to match the actual
create_request output for non-reasoning models like llama3.1.

Signed-off-by: fre <[email protected]>
Signed-off-by: fre <[email protected]>
@fresh3nough
Copy link
Copy Markdown
Contributor Author

should be good now @DOsinga @michaelneale

@DOsinga DOsinga added this pull request to the merge queue Mar 26, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 26, 2026
@DOsinga DOsinga enabled auto-merge March 26, 2026 18:31
@DOsinga DOsinga added this pull request to the merge queue Mar 26, 2026
Merged via the queue into block:main with commit 7ec07f2 Mar 26, 2026
22 checks passed
michaelneale added a commit that referenced this pull request Mar 26, 2026
* main: (337 commits)
  fix: replace panics with user-friendly errors in CLI session builder (#7901)
  fix: read GOOSE_CONTEXT_LIMIT from config.yaml, not just env vars (#7900)
  fix: deliver truncation notice as separate content block (#7899)
  fix: use platform-appropriate commands in developer extension instructions (#7898)
  fix: replace any with proper SVG types in icon components (#7873)
  chore: remove debug console.log statements, stale comments, and dead code (#8142)
  feat: Gemini OAuth provider (#8129)
  chore(deps): bump picomatch from 2.3.1 to 2.3.2 in /documentation (#8123)
  feat: show installed skills in UI (#7910)
  fix(deps): gate keyring platform features behind target-specific deps (#8039)
  chore(deps): bump yaml from 2.8.2 to 2.8.3 in /evals/open-model-gym/suite (#8124)
  fix: strip message wrapper in CLI session title generation (#7996)
  fix(providers): fall back to configured models when models endpoint fetch fails (#7530)
  chore(deps): bump brace-expansion from 5.0.3 to 5.0.5 in /evals/open-model-gym/suite (#8139)
  fix: prevent Ollama provider from hanging on tool-calling requests (#7723)
  fix: VMware Tanzu Platform provider - bug fixes, streaming, UI improvements (#8126)
  feat: allow GOOSE_CLI_SHOW_THINKING to be set in config.yaml (#8097)
  fix: GitHub Copilot auth fails to open browser in Desktop app (#6957) (#8019)
  fix(ci): produce .tar.gz archives for Zed ACP registry compatibility (#8054)
  feat: add GOOSE_SHOW_FULL_OUTPUT config to disable tool output truncation (#7919)
  ...

# Conflicts:
#	crates/goose/src/providers/formats/openai.rs
hydrosquall pushed a commit to hydrosquall/goose that referenced this pull request Mar 31, 2026
…lock#7723)

Signed-off-by: fre <[email protected]>
Signed-off-by: fresh3nough <[email protected]>
Signed-off-by: Douwe Osinga <[email protected]>
Co-authored-by: Douwe Osinga <[email protected]>
Signed-off-by: Cameron Yick <[email protected]>
blackgirlbytes pushed a commit that referenced this pull request Apr 2, 2026
* main: (337 commits)
  fix: replace panics with user-friendly errors in CLI session builder (#7901)
  fix: read GOOSE_CONTEXT_LIMIT from config.yaml, not just env vars (#7900)
  fix: deliver truncation notice as separate content block (#7899)
  fix: use platform-appropriate commands in developer extension instructions (#7898)
  fix: replace any with proper SVG types in icon components (#7873)
  chore: remove debug console.log statements, stale comments, and dead code (#8142)
  feat: Gemini OAuth provider (#8129)
  chore(deps): bump picomatch from 2.3.1 to 2.3.2 in /documentation (#8123)
  feat: show installed skills in UI (#7910)
  fix(deps): gate keyring platform features behind target-specific deps (#8039)
  chore(deps): bump yaml from 2.8.2 to 2.8.3 in /evals/open-model-gym/suite (#8124)
  fix: strip message wrapper in CLI session title generation (#7996)
  fix(providers): fall back to configured models when models endpoint fetch fails (#7530)
  chore(deps): bump brace-expansion from 5.0.3 to 5.0.5 in /evals/open-model-gym/suite (#8139)
  fix: prevent Ollama provider from hanging on tool-calling requests (#7723)
  fix: VMware Tanzu Platform provider - bug fixes, streaming, UI improvements (#8126)
  feat: allow GOOSE_CLI_SHOW_THINKING to be set in config.yaml (#8097)
  fix: GitHub Copilot auth fails to open browser in Desktop app (#6957) (#8019)
  fix(ci): produce .tar.gz archives for Zed ACP registry compatibility (#8054)
  feat: add GOOSE_SHOW_FULL_OUTPUT config to disable tool output truncation (#7919)
  ...

# Conflicts:
#	crates/goose/src/providers/formats/openai.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bad connection with to ollama container

3 participants