fix: provider test infrastructure#7738
Conversation
5a2b86e to
4694a43
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4694a4397d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
codefromthecrypt
left a comment
There was a problem hiding this comment.
impl notes for the curious
|
|
||
| #[tokio::test] | ||
| async fn test_job_runs_on_schedule() { | ||
| let _guard = env_lock::lock_env([ |
There was a problem hiding this comment.
this is the pattern we use to make sure that the local goose config isn't read (e.g. my provider was set to tetrate, and this test ran trying to access its creds)
| model_switch_name: Option<String>, | ||
| mcp_extension: ExtensionConfig, | ||
| impl ProviderTestConfig { | ||
| fn with_llm_provider( |
There was a problem hiding this comment.
llm providers are all remote apis and they don't internalize tool calls like agents do. This is how providers typically work in goose
| self | ||
| } | ||
|
|
||
| fn with_agentic_provider(name: &'static str, model_name: &'static str, binary: &str) -> Self { |
There was a problem hiding this comment.
agentic providers currently are all (both CLI and ACP) binary based, even if they can also read ENV. so the guard for them is at least about the binary.
- right now we don't have the means to propagate session IDs to their backend LLM or MCP apis, though it is possible one might support this via the corresponding specs.
- right now they aren't semantically compatible with smart approve, though this might change in the future.
- we didn't run content length tests on these in the past because they accept huge contexts and that results in extreme delays
| } | ||
|
|
||
| impl ProviderFixture { | ||
| async fn setup(config: &ProviderTestConfig, mode: GooseMode) -> Result<Self> { |
There was a problem hiding this comment.
Before, it was really complicated in the tests themselves because they switched from default auto to a specific mode. This isolates each test to a provider instance so it doesn't need to switch modes. If later we want to add a runtime mode switch test, we could, but that's a different feature than testing a specific mode behavior.
| println!("==================="); | ||
|
|
||
| let name_lower = self.name.to_lowercase(); | ||
| if name_lower == "ollama" || name_lower == "openrouter" { |
There was a problem hiding this comment.
you can see this stuff has been high maintenance, hence moving to a feature based test system vs remmebering to update names in various places.
| Permission::AllowOnce, | ||
| true, | ||
| "Use the get_code tool and output only its result.", | ||
| &format!("Write the word 'hello' to {}", test_file.path().display()), |
There was a problem hiding this comment.
changed this because we add read-only attribute to the MCP get_code which means in some agents it is not called as they permit read-only without permission. All agents I've used can write files, and so regardless of LLM (developer extension) or an agent (claude etc) we know that this write to a new path will require a permission guard.
| false, | ||
| ) | ||
| .await | ||
| ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"]) |
There was a problem hiding this comment.
split image model from the default one because qwen3-vl is really bad at tool execution and fails routinely.
| .await | ||
| ProviderTestConfig::with_llm_provider("Ollama", "qwen3", &["OLLAMA_HOST"]) | ||
| .image_model("qwen3-vl") | ||
| // Above qwen3's 40960 context_length but small enough for Ollama's 600s timeout |
There was a problem hiding this comment.
this was really not intuitive to hunt down, but yeah we were not checking anything except that the contentlengthexceeded failure. the prior config actually failed just with a different error due to overloading the ollama process!
| true, | ||
| ) | ||
| .await | ||
| ProviderTestConfig::with_agentic_provider("claude-code", CLAUDE_CODE_DEFAULT_MODEL, "claude") |
There was a problem hiding this comment.
much less cruft this way, which is needed as there are many untested agentic providers
4694a43 to
ec385f2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ec385f2f21
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Fix Ollama, scheduler, and LiteLLM test failures and make it easy to add new providers like ACP. - Isolate each provider sub-test with its own Agent and session, removing extension re-registration and session ID complexity - Ollama VL model unreliable for tool calls; split into qwen3/qwen3-vl - Scheduler env leakage into keyring; LiteLLM hardcoded host overwrite Signed-off-by: Adrian Cole <[email protected]>
ec385f2 to
0a31fdd
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a31fdd23d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…deps * origin/main: (34 commits) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) gh fall back (#7695) fix: restore smart-approve mode (#7690) fix: make TLS configurable in goosed agent via GOOSE_TLS env var (#7686) Update to rmcp 1.1.0 (#7619) ... # Conflicts: # Cargo.lock
* origin/main: (21 commits) fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) ...
* origin/main: (21 commits) fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708) feat: combine TUI UX from alexhancock/tui-goodness with publishing config from jackamadeo/package-tui (#7683) chore: cleanup old sandbox (#7700) Correct windows artifact (#7699) ...
…e-issue * origin/main: fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335) fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571) fix: write to real file if config.yaml is symlink (#7669) fix: preserve pairings when stopping gateway (#7733) fix: reduce server log verbosity — skip session in instrument, defaul… (#7729) fix: provider test infrastructure (#7738) fix: sanitize streamable HTTP extension names derived from URLs (#7740) refactor: derive GooseMode string conversions with strum (#7706) docs: Add Spraay Batch Payments MCP Extension Tutorial (#7525) fix: flake.nix (#7224) delete goose web (#7696) Add @angiejones as CODEOWNER for documentation (#7711) Add MLflow integration guide (#7563) docs: LM Studio availability (#7698) feat: add Avian as an LLM provider (#7561) Adds `linux-mcp-server` to the goose registry (#6979) fix: add #[serde(default)] to description field on 4 ExtensionConfig variants (#7708)
Summary
Fix Ollama, scheduler, and LiteLLM test failures and make it easy to add new providers like ACP.
ProviderTestConfigwithwith_llm_providerandwith_agentic_providerfor isolated and easier setup.Type of Change
AI Assistance
Testing