fix: provider test infrastructure by codefromthecrypt · Pull Request #7738 · block/goose

codefromthecrypt · 2026-03-09T00:41:47Z

Summary

Fix Ollama, scheduler, and LiteLLM test failures and make it easy to add new providers like ACP.

Introduce ProviderTestConfig with with_llm_provider and with_agentic_provider for isolated and easier setup.
Isolate each provider sub-test with its own Agent and session, removing extension re-registration and session ID complexity
Make ollama pass: Ollama VL model unreliable for tool calls; split into qwen3/qwen3-vl
Cleanups: Scheduler env leakage into keyring; LiteLLM hardcoded host overwrite

Type of Change

Bug fix
Refactor / Code quality
Tests

AI Assistance

This PR was created or reviewed with AI assistance

Testing

$ OLLAMA_HOST=http://localhost:11434 cargo test --test providers -- --nocapture

--snip--

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 201.63s

============== Providers ==============
⏭️ Anthropic
⏭️ Azure
⏭️ Databricks
⏭️ Google
⏭️ LiteLLM
✅ Ollama
⏭️ OpenRouter
⏭️ SageMakerTgi
⏭️ Snowflake
⏭️ Xai
⏭️ aws_bedrock
✅ claude-code
✅ codex
⏭️ openai
=======================================

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4694a4397d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/tests/providers.rs

codefromthecrypt

impl notes for the curious

codefromthecrypt · 2026-03-09T01:01:16Z

crates/goose/src/scheduler.rs


    #[tokio::test]
    async fn test_job_runs_on_schedule() {
+        let _guard = env_lock::lock_env([


this is the pattern we use to make sure that the local goose config isn't read (e.g. my provider was set to tetrate, and this test ran trying to access its creds)

crates/goose/tests/providers.rs

codefromthecrypt · 2026-03-09T01:03:00Z

crates/goose/tests/providers.rs

-        model_switch_name: Option<String>,
-        mcp_extension: ExtensionConfig,
+impl ProviderTestConfig {
+    fn with_llm_provider(


llm providers are all remote apis and they don't internalize tool calls like agents do. This is how providers typically work in goose

codefromthecrypt · 2026-03-09T01:05:47Z

crates/goose/tests/providers.rs

+        self
+    }
+
+    fn with_agentic_provider(name: &'static str, model_name: &'static str, binary: &str) -> Self {


agentic providers currently are all (both CLI and ACP) binary based, even if they can also read ENV. so the guard for them is at least about the binary.

right now we don't have the means to propagate session IDs to their backend LLM or MCP apis, though it is possible one might support this via the corresponding specs.

right now they aren't semantically compatible with smart approve, though this might change in the future.

we didn't run content length tests on these in the past because they accept huge contexts and that results in extreme delays

codefromthecrypt · 2026-03-09T01:07:05Z

crates/goose/tests/providers.rs

+}
+
+impl ProviderFixture {
+    async fn setup(config: &ProviderTestConfig, mode: GooseMode) -> Result<Self> {


Before, it was really complicated in the tests themselves because they switched from default auto to a specific mode. This isolates each test to a provider instance so it doesn't need to switch modes. If later we want to add a runtime mode switch test, we could, but that's a different feature than testing a specific mode behavior.

codefromthecrypt · 2026-03-09T01:11:18Z

crates/goose/tests/providers.rs

        println!("===================");

-        let name_lower = self.name.to_lowercase();
-        if name_lower == "ollama" || name_lower == "openrouter" {


you can see this stuff has been high maintenance, hence moving to a feature based test system vs remmebering to update names in various places.

codefromthecrypt · 2026-03-09T01:12:57Z

crates/goose/tests/providers.rs

            Permission::AllowOnce,
            true,
-            "Use the get_code tool and output only its result.",
+            &format!("Write the word 'hello' to {}", test_file.path().display()),


changed this because we add read-only attribute to the MCP get_code which means in some agents it is not called as they permit read-only without permission. All agents I've used can write files, and so regardless of LLM (developer extension) or an agent (claude etc) we know that this write to a new path will require a permission guard.

codefromthecrypt · 2026-03-09T01:14:05Z

crates/goose/tests/providers.rs

-        false,
-    )
-    .await
+    ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"])


split image model from the default one because qwen3-vl is really bad at tool execution and fails routinely.

codefromthecrypt · 2026-03-09T01:15:03Z

crates/goose/tests/providers.rs

-    .await
+    ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"])
+        .image_model("qwen3-vl")
+        // Above qwen3's 40960 context_length but small enough for Ollama's 600s timeout


this was really not intuitive to hunt down, but yeah we were not checking anything except that the contentlengthexceeded failure. the prior config actually failed just with a different error due to overloading the ollama process!

codefromthecrypt · 2026-03-09T01:15:32Z

crates/goose/tests/providers.rs

-        true,
-    )
-    .await
+    ProviderTestConfig::with_agentic_provider("claude-code", CLAUDE_CODE_DEFAULT_MODEL, "claude")


much less cruft this way, which is needed as there are many untested agentic providers

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec385f2f21

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/tests/providers.rs

Fix Ollama, scheduler, and LiteLLM test failures and make it easy to add new providers like ACP. - Isolate each provider sub-test with its own Agent and session, removing extension re-registration and session ID complexity - Ollama VL model unreliable for tool calls; split into qwen3/qwen3-vl - Scheduler env leakage into keyring; LiteLLM hardcoded host overwrite Signed-off-by: Adrian Cole <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a31fdd23d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/tests/providers.rs

@angiejones

…deps * origin/main: (34 commits) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) gh fall back (#7695) fix: restore smart-approve mode (#7690) fix: make TLS configurable in goosed agent via GOOSE_TLS env var (#7686) Update to rmcp 1.1.0 (#7619) ... # Conflicts: # Cargo.lock

@angiejones

* origin/main: (21 commits) fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) ...

@angiejones

* origin/main: (21 commits) fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) ...

@angiejones

…e-issue * origin/main: fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)

codefromthecrypt enabled auto-merge March 9, 2026 00:41

codefromthecrypt force-pushed the acole/test-infra-fixes branch from 5a2b86e to 4694a43 Compare March 9, 2026 01:00

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

crates/goose/tests/providers.rs Show resolved Hide resolved

codefromthecrypt commented Mar 9, 2026

View reviewed changes

codefromthecrypt force-pushed the acole/test-infra-fixes branch from 4694a43 to ec385f2 Compare March 9, 2026 02:26

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

crates/goose/tests/providers.rs Show resolved Hide resolved

codefromthecrypt force-pushed the acole/test-infra-fixes branch from ec385f2 to 0a31fdd Compare March 9, 2026 02:35

codefromthecrypt mentioned this pull request Mar 9, 2026

feat: ACP providers for claude code and codex #6605

Merged

3 tasks

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

crates/goose/tests/providers.rs Show resolved Hide resolved

michaelneale approved these changes Mar 9, 2026

View reviewed changes

codefromthecrypt added this pull request to the merge queue Mar 9, 2026

Merged via the queue into main with commit 1f77839 Mar 9, 2026
21 checks passed

codefromthecrypt deleted the acole/test-infra-fixes branch March 9, 2026 10:37

github-actions bot mentioned this pull request Mar 10, 2026

chore(release): release version 1.28.0 (minor) #7780

Open

Conversation

codefromthecrypt commented Mar 9, 2026

Summary

Type of Change

AI Assistance

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

codefromthecrypt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants