Skip to content

fix: provider test infrastructure#7738

Merged
codefromthecrypt merged 1 commit intomainfrom
acole/test-infra-fixes
Mar 9, 2026
Merged

fix: provider test infrastructure#7738
codefromthecrypt merged 1 commit intomainfrom
acole/test-infra-fixes

Conversation

@codefromthecrypt
Copy link
Copy Markdown
Collaborator

Summary

Fix Ollama, scheduler, and LiteLLM test failures and make it easy to add new providers like ACP.

  • Introduce ProviderTestConfig with with_llm_provider and with_agentic_provider for isolated and easier setup.
  • Isolate each provider sub-test with its own Agent and session, removing extension re-registration and session ID complexity
  • Make ollama pass: Ollama VL model unreliable for tool calls; split into qwen3/qwen3-vl
  • Cleanups: Scheduler env leakage into keyring; LiteLLM hardcoded host overwrite

Type of Change

  • Bug fix
  • Refactor / Code quality
  • Tests

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

$ OLLAMA_HOST=http://localhost:11434 cargo test --test providers -- --nocapture

--snip--

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 201.63s

============== Providers ==============
⏭️ Anthropic
⏭️ Azure
⏭️ Databricks
⏭️ Google
⏭️ LiteLLM
✅ Ollama
⏭️ OpenRouter
⏭️ SageMakerTgi
⏭️ Snowflake
⏭️ Xai
⏭️ aws_bedrock
✅ claude-code
✅ codex
⏭️ openai
=======================================

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4694a4397d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown
Collaborator Author

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impl notes for the curious


#[tokio::test]
async fn test_job_runs_on_schedule() {
let _guard = env_lock::lock_env([
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the pattern we use to make sure that the local goose config isn't read (e.g. my provider was set to tetrate, and this test ran trying to access its creds)

model_switch_name: Option<String>,
mcp_extension: ExtensionConfig,
impl ProviderTestConfig {
fn with_llm_provider(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llm providers are all remote apis and they don't internalize tool calls like agents do. This is how providers typically work in goose

self
}

fn with_agentic_provider(name: &'static str, model_name: &'static str, binary: &str) -> Self {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agentic providers currently are all (both CLI and ACP) binary based, even if they can also read ENV. so the guard for them is at least about the binary.

  • right now we don't have the means to propagate session IDs to their backend LLM or MCP apis, though it is possible one might support this via the corresponding specs.
  • right now they aren't semantically compatible with smart approve, though this might change in the future.
  • we didn't run content length tests on these in the past because they accept huge contexts and that results in extreme delays

}

impl ProviderFixture {
async fn setup(config: &ProviderTestConfig, mode: GooseMode) -> Result<Self> {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, it was really complicated in the tests themselves because they switched from default auto to a specific mode. This isolates each test to a provider instance so it doesn't need to switch modes. If later we want to add a runtime mode switch test, we could, but that's a different feature than testing a specific mode behavior.

println!("===================");

let name_lower = self.name.to_lowercase();
if name_lower == "ollama" || name_lower == "openrouter" {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can see this stuff has been high maintenance, hence moving to a feature based test system vs remmebering to update names in various places.

Permission::AllowOnce,
true,
"Use the get_code tool and output only its result.",
&format!("Write the word 'hello' to {}", test_file.path().display()),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this because we add read-only attribute to the MCP get_code which means in some agents it is not called as they permit read-only without permission. All agents I've used can write files, and so regardless of LLM (developer extension) or an agent (claude etc) we know that this write to a new path will require a permission guard.

false,
)
.await
ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"])
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split image model from the default one because qwen3-vl is really bad at tool execution and fails routinely.

.await
ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"])
.image_model("qwen3-vl")
// Above qwen3's 40960 context_length but small enough for Ollama's 600s timeout
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was really not intuitive to hunt down, but yeah we were not checking anything except that the contentlengthexceeded failure. the prior config actually failed just with a different error due to overloading the ollama process!

true,
)
.await
ProviderTestConfig::with_agentic_provider("claude-code", CLAUDE_CODE_DEFAULT_MODEL, "claude")
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much less cruft this way, which is needed as there are many untested agentic providers

@codefromthecrypt codefromthecrypt force-pushed the acole/test-infra-fixes branch from 4694a43 to ec385f2 Compare March 9, 2026 02:26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec385f2f21

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Fix Ollama, scheduler, and LiteLLM test failures and make it easy to
add new providers like ACP.

- Isolate each provider sub-test with its own Agent and session, removing
  extension re-registration and session ID complexity
- Ollama VL model unreliable for tool calls; split into qwen3/qwen3-vl
- Scheduler env leakage into keyring; LiteLLM hardcoded host overwrite

Signed-off-by: Adrian Cole <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0a31fdd23d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@codefromthecrypt codefromthecrypt added this pull request to the merge queue Mar 9, 2026
Merged via the queue into main with commit 1f77839 Mar 9, 2026
21 checks passed
@codefromthecrypt codefromthecrypt deleted the acole/test-infra-fixes branch March 9, 2026 10:37
jh-block added a commit that referenced this pull request Mar 9, 2026
…deps

* origin/main: (34 commits)
  fix: reduce server log verbosity — skip session in instrument, defaul… (#7729)
  fix: provider test infrastructure (#7738)
  fix: sanitize streamable HTTP extension names derived from URLs (#7740)
  refactor: derive GooseMode string conversions with strum (#7706)
  docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525)
  fix: flake.nix (#7224)
  delete goose web (#7696)
  Add @angiejones as CODEOWNER for documentation (#7711)
  Add MLflow integration guide (#7563)
  docs: LM Studio availability (#7698)
  feat: add Avian as an LLM provider (#7561)
  Adds `linux-mcp-server` to the goose registry (#6979)
  fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)
  feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683)
  chore: cleanup old sandbox (#7700)
  Correct windows artifact (#7699)
  gh fall back (#7695)
  fix: restore smart-approve mode (#7690)
  fix: make TLS configurable in goosed agent via GOOSE_TLS env var (#7686)
  Update to rmcp 1.1.0 (#7619)
  ...

# Conflicts:
#	Cargo.lock
wpfleger96 added a commit that referenced this pull request Mar 9, 2026
* origin/main: (21 commits)
  fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335)
  fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571)
  fix: write to real file if config.yaml is symlink (#7669)
  fix: preserve pairings when stopping gateway (#7733)
  fix: reduce server log verbosity — skip session in instrument, defaul… (#7729)
  fix: provider test infrastructure (#7738)
  fix: sanitize streamable HTTP extension names derived from URLs (#7740)
  refactor: derive GooseMode string conversions with strum (#7706)
  docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525)
  fix: flake.nix (#7224)
  delete goose web (#7696)
  Add @angiejones as CODEOWNER for documentation (#7711)
  Add MLflow integration guide (#7563)
  docs: LM Studio availability (#7698)
  feat: add Avian as an LLM provider (#7561)
  Adds `linux-mcp-server` to the goose registry (#6979)
  fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)
  feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683)
  chore: cleanup old sandbox (#7700)
  Correct windows artifact (#7699)
  ...
wpfleger96 added a commit that referenced this pull request Mar 9, 2026
* origin/main: (21 commits)
  fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335)
  fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571)
  fix: write to real file if config.yaml is symlink (#7669)
  fix: preserve pairings when stopping gateway (#7733)
  fix: reduce server log verbosity — skip session in instrument, defaul… (#7729)
  fix: provider test infrastructure (#7738)
  fix: sanitize streamable HTTP extension names derived from URLs (#7740)
  refactor: derive GooseMode string conversions with strum (#7706)
  docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525)
  fix: flake.nix (#7224)
  delete goose web (#7696)
  Add @angiejones as CODEOWNER for documentation (#7711)
  Add MLflow integration guide (#7563)
  docs: LM Studio availability (#7698)
  feat: add Avian as an LLM provider (#7561)
  Adds `linux-mcp-server` to the goose registry (#6979)
  fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)
  feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683)
  chore: cleanup old sandbox (#7700)
  Correct windows artifact (#7699)
  ...
wpfleger96 added a commit that referenced this pull request Mar 9, 2026
…e-issue

* origin/main:
  fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335)
  fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571)
  fix: write to real file if config.yaml is symlink (#7669)
  fix: preserve pairings when stopping gateway (#7733)
  fix: reduce server log verbosity — skip session in instrument, defaul… (#7729)
  fix: provider test infrastructure (#7738)
  fix: sanitize streamable HTTP extension names derived from URLs (#7740)
  refactor: derive GooseMode string conversions with strum (#7706)
  docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525)
  fix: flake.nix (#7224)
  delete goose web (#7696)
  Add @angiejones as CODEOWNER for documentation (#7711)
  Add MLflow integration guide (#7563)
  docs: LM Studio availability (#7698)
  feat: add Avian as an LLM provider (#7561)
  Adds `linux-mcp-server` to the goose registry (#6979)
  fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants