fix(onboard): infer vision input for non-Azure custom models by widingmarcus-cyber · Pull Request #51893 · openclaw/openclaw

widingmarcus-cyber · 2026-03-21T22:12:30Z

Summary

infer likely vision-capable model IDs in onboard-custom for non-Azure custom providers
default inferred vision models to input: ["text", "image"] instead of hardcoded text-only
keep non-vision defaults unchanged
add regression test for claude-sonnet-4-6 on non-Azure custom providers

Why

openclaw onboard --custom currently writes non-Azure models with input: ["text"] only, which silently disables image attachments for vision-capable models unless users manually edit config.

Fixes #51869

Validation

corepack pnpm test src/commands/onboard-custom.test.ts

widingmarcus-cyber · 2026-03-21T22:12:43Z

@greptile-apps please review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d065eddee

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-21T22:16:49Z

src/commands/onboard-custom.ts

+  return (
+    id.includes("claude") ||
+    id.includes("gpt-4o") ||
+    id.includes("gpt-5") ||


Narrow the gpt-5 vision heuristic for custom models

This marks every non-Azure model ID containing gpt-5 as image-capable, but we already ship text-only GPT-5-family IDs such as openai-gpt-52 in src/agents/venice-models.ts:350-355. Because the runtime decides whether to forward image parts purely from model.input.includes("image") (src/agents/pi-embedded-runner/run/attempt.ts:1735 and src/agents/openai-ws-stream.ts:189-207), onboarding a custom proxy that serves one of those text-only IDs will now advertise image support and send input_image payloads that the upstream model cannot handle.

Useful? React with 👍 / 👎.

greptile-apps · 2026-03-21T22:19:54Z

Greptile Summary

This PR fixes a longstanding gap in onboard --custom where non-Azure models with vision-capable IDs (e.g. claude-sonnet-4-6, gpt-4o, gemini-*) were always written with input: ["text"], silently breaking image attachments. It introduces an isLikelyVisionModel heuristic and wires it into both fresh onboards and re-onboards.

Key changes:

New isLikelyVisionModel function uses pattern-matching (string includes + bounded regexes) to identify likely vision-capable model IDs; non-matching models retain the ["text"] default.
Re-onboard upgrades are guarded by a name-sentinel: shouldUpgradeNonAzureVisionInput only upgrades configs whose name still matches the auto-generated "${modelId} (Custom Provider)" template, preserving deliberate user customisations.
Five new tests cover the new default, the text-only baseline, multi-model families, and both re-onboard paths (auto-generated vs. user-renamed).

All three concerns from the prior review round are addressed:

The overly-broad id.includes("claude") catch-all was replaced with specific sub-family patterns.
Silent re-onboard overwrite is now blocked for user-renamed configs via the name sentinel.
The model.id !== modelId dead-code guard is not present in the implementation.

The heuristic is intentionally incomplete (e.g. o4, llava, phi-3-vision are not matched), which is an acknowledged and acceptable trade-off for this PR scope.

Confidence Score: 4/5

Safe to merge; the fix is well-scoped, all prior review concerns are addressed, and tests cover both the upgrade and preservation paths.
The implementation is correct and the three prior review concerns have been resolved. The one remaining area of mild fragility is the name-sentinel approach: if the auto-generated name template ever changes, old auto-generated configs will no longer be upgraded on re-onboard — but that is a forward-only concern, not a regression risk in this PR. The heuristic gaps (o4, llava, etc.) are acknowledged by the author and out of scope.
No files require special attention.

_{Reviews (5): Last reviewed commit: "refactor(onboard): drop unreachable mode..." | Re-trigger Greptile}

greptile-apps · 2026-03-21T22:20:00Z

src/commands/onboard-custom.ts

+function isLikelyVisionModel(modelId: string): boolean {
+  const id = modelId.toLowerCase();
+  return (
+    id.includes("claude") ||
+    id.includes("gpt-4o") ||
+    id.includes("gpt-5") ||
+    id.includes("gemini") ||
+    id.includes("qwen-vl") ||
+    id.includes("vision") ||
+    id.includes("-vl") ||
+    id.endsWith("vl")
+  );
+}


Broad claude catch-all may over-match text-only models

id.includes("claude") matches every Claude variant, including known text-only deployments like claude-instant-1 and claude-2.0. Those models will now receive input: ["text", "image"] as the default even though they don't support images.

For a heuristic this is probably fine (better to default to vision-capable than to silently disable it for modern claude-3+ models), but it's worth noting. If you want tighter matching you could scope it to claude-3 and above:

id.includes("claude-3") || id.includes("claude-sonnet") || id.includes("claude-haiku") || id.includes("claude-opus") ||

Also note that llava, pixtral, phi-3-vision, and minicpm-v style models won't match the current heuristic — but that's probably acceptable given those are less common custom-provider deployments.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/commands/onboard-custom.ts Line: 56-68 Comment: **Broad `claude` catch-all may over-match text-only models** `id.includes("claude")` matches every Claude variant, including known text-only deployments like `claude-instant-1` and `claude-2.0`. Those models will now receive `input: ["text", "image"]` as the default even though they don't support images. For a heuristic this is probably fine (better to default to vision-capable than to silently disable it for modern claude-3+ models), but it's worth noting. If you want tighter matching you could scope it to `claude-3` and above: ```ts id.includes("claude-3") || id.includes("claude-sonnet") || id.includes("claude-haiku") || id.includes("claude-opus") || ``` Also note that `llava`, `pixtral`, `phi-3-vision`, and `minicpm-v` style models won't match the current heuristic — but that's probably acceptable given those are less common custom-provider deployments. How can I resolve this? If you propose a fix, please make it concise.

…oard

widingmarcus-cyber · 2026-03-22T10:38:05Z

@greptile-apps please review

greptile-apps · 2026-03-22T10:40:31Z

src/commands/onboard-custom.ts

+    }
+    return model.input.length === 1 && model.input[0] === "text";
+  };
+
  const mergedModels = hasModel
    ? existingModels.map((model) =>
        model.id === modelId
          ? {


Re-onboard silently overwrites deliberate text-only config

shouldUpgradeNonAzureVisionInput only checks whether model.input is currently ["text"], but this is indistinguishable from a user who intentionally restricted a vision-capable model ID to text-only (e.g., because their custom deployment endpoint doesn't actually expose vision, or they want cost controls). Re-running onboard --custom will silently change their config with no opt-out.

A concrete failure path: user deploys claude-sonnet-4-6 behind a text-only proxy and explicitly sets input: ["text"]. After re-onboard, input becomes ["text", "image"], and any image attachment will now be forwarded to a deployment that rejects it at runtime.

Consider adding a user-settable sentinel — for example only upgrade configs that still carry the generated name: "${modelId} (Custom Provider)" pattern (strongly suggesting they were never hand-edited), or introduce an explicit inferredInput: true flag set at generation time that the upgrade path uses as the gate:

// only auto-upgrade if the input was inferred by onboard, not user-set return model.input.length === 1 && model.input[0] === "text" && model._inferredInput === true;

Without such a guard the upgrade is a one-way destructive migration for users with intentionally restricted configs.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/commands/onboard-custom.ts Line: 693-700 Comment: **Re-onboard silently overwrites deliberate text-only config** `shouldUpgradeNonAzureVisionInput` only checks whether `model.input` is currently `["text"]`, but this is indistinguishable from a user who *intentionally* restricted a vision-capable model ID to text-only (e.g., because their custom deployment endpoint doesn't actually expose vision, or they want cost controls). Re-running `onboard --custom` will silently change their config with no opt-out. A concrete failure path: user deploys `claude-sonnet-4-6` behind a text-only proxy and explicitly sets `input: ["text"]`. After re-onboard, `input` becomes `["text", "image"]`, and any image attachment will now be forwarded to a deployment that rejects it at runtime. Consider adding a user-settable sentinel — for example only upgrade configs that still carry the generated `name: "${modelId} (Custom Provider)"` pattern (strongly suggesting they were never hand-edited), or introduce an explicit `inferredInput: true` flag set at generation time that the upgrade path uses as the gate: ```ts // only auto-upgrade if the input was inferred by onboard, not user-set return model.input.length === 1 && model.input[0] === "text" && model._inferredInput === true; ``` Without such a guard the upgrade is a one-way destructive migration for users with intentionally restricted configs. How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dff2b282ef

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T10:41:44Z

src/commands/onboard-custom.ts

+              ...(shouldUpgradeNonAzureVisionInput(model)
+                ? ({ input: ["text", "image"] } as { input: Array<"text" | "image"> })
+                : {}),


Preserve explicit text-only custom input on re-onboard

When a user has intentionally kept a non-Azure custom model text-only (for example because their proxy only accepts text for a Claude/Gemini-style ID), re-running openclaw onboard --custom now overwrites that choice to [text, image]. That regresses the existing “preserve user-customized fields” behavior covered in src/commands/onboard-custom.test.ts, and it has a real runtime effect because both src/agents/pi-embedded-runner/run/attempt.ts and src/agents/openai-ws-stream.ts decide whether to forward image parts from model.input. So a harmless API-key refresh can turn a working text-only custom provider into one that starts sending unsupported image payloads.

Useful? React with 👍 / 👎.

widingmarcus-cyber · 2026-03-22T16:37:13Z

@greptile-apps please review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1866593a91

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T16:41:51Z

src/commands/onboard-custom.ts

+    id.includes("gpt-4o") ||
+    id.includes("gpt-5") ||
+    id.includes("gemini") ||


Extend the vision heuristic to GPT-4.1 and o-series

This matcher still leaves several mainstream OpenAI vision models text-only. buildCopilotModelDefinition() marks gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1, o1-mini, and o3-mini as input: ["text", "image"] in src/providers/github-copilot-models.ts, and src/media-understanding/runner.vision-skip.test.ts treats gpt-4.1 as a native vision model. If a user onboards a custom OpenAI-compatible proxy serving one of those IDs, applyCustomApiConfig() still writes input: ["text"] because the heuristic only matches gpt-4o/gpt-5, so image attachments remain disabled for a common class of custom models this change is meant to fix.

Useful? React with 👍 / 👎.

widingmarcus-cyber · 2026-03-23T16:37:23Z

@greptile-apps please review

greptile-apps · 2026-03-23T16:39:45Z

src/commands/onboard-custom.ts

        reasoning: false,
      };
+  const shouldUpgradeNonAzureVisionInput = (model: ModelDefinitionConfig): boolean => {
+    if (isAzure || !isLikelyVisionCapableModel || model.id !== modelId) {


Redundant model.id !== modelId guard

shouldUpgradeNonAzureVisionInput is only ever called inside the model.id === modelId ? ... branch (line 706), so the model.id !== modelId clause will never be true. The guard is dead code and could mislead a future reader into thinking this function might be invoked in a broader context.

Suggested change

if (isAzure || !isLikelyVisionCapableModel || model.id !== modelId) {

if (isAzure || !isLikelyVisionCapableModel) {

Prompt To Fix With AI

This is a comment left during a code review. Path: src/commands/onboard-custom.ts Line: 693 Comment: **Redundant `model.id !== modelId` guard** `shouldUpgradeNonAzureVisionInput` is only ever called inside the `model.id === modelId ? ...` branch (line 706), so the `model.id !== modelId` clause will never be true. The guard is dead code and could mislead a future reader into thinking this function might be invoked in a broader context. ```suggestion if (isAzure || !isLikelyVisionCapableModel) { ``` How can I resolve this? If you propose a fix, please make it concise.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bc8bd8c9ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-23T16:42:58Z

src/commands/onboard-custom.ts

+    id.includes("gpt-4o") ||
+    id.includes("gpt-4.1") ||
+    /(^|[^a-z0-9])gpt-5($|[-._])/i.test(id) ||
+    /(^|[^a-z0-9])o(1|3)($|[-._])/i.test(id) ||


Include o4 models in the non-Azure vision matcher

This widens the OpenAI heuristic to o1/o3, but o4 is still excluded. The repo already treats o4-mini as image-capable in the Azure path (src/commands/onboard-custom.test.ts:472-490 expects input: ["text", "image"]), so a non-Azure OpenAI-compatible endpoint serving the same model will still be onboarded as text-only here. Because downstream image forwarding is gated on model.input.includes("image"), attachments stay disabled for that model family.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-23T16:42:58Z

src/commands/onboard-custom.ts

+function isLikelyVisionModel(modelId: string): boolean {
+  const id = modelId.toLowerCase();
+  return (
+    id.includes("claude-3") ||
+    id.includes("claude-sonnet") ||


Cover multimodal families we already ship in the catalog

The new heuristic still misses several image-capable IDs that are already first-class in our own catalogs. For example, moonshotai/Kimi-K2.5 and both meta-llama/Llama-4-* models are input: ["text", "image"] in src/agents/together-models.ts:21-33 and src/agents/together-models.ts:49-75, and unsloth/gemma-3-27b-it is marked the same in src/agents/chutes-models.ts:111-117. If a user onboards an OpenAI-compatible proxy exposing any of those IDs, this code still writes input: ["text"], so image attachments remain unavailable even though the upstream model supports vision.

Useful? React with 👍 / 👎.

widingmarcus-cyber · 2026-03-24T21:36:16Z

@greptile-apps please review

aisle-research-bot · 2026-03-24T21:36:24Z

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

#	Severity	Title
1	🟠 High	Heuristic vision auto-enable allows prompt-driven local image file exfiltration
2	🔵 Low	Azure deployment path injection via unencoded modelId in transformAzureUrl

Vulnerabilities

1. 🟠 Heuristic vision auto-enable allows prompt-driven local image file exfiltration

Property	Value
Severity	High
CWE	CWE-22
Location	`src/commands/onboard-custom.ts:683-688`

Description

applyCustomApiConfig() now heuristically marks certain custom models as vision-capable based only on modelId substrings/regexes. That flips model.input from ['text'] to ['text','image'] (and may auto-upgrade previously onboarded models).

Downstream, model.input.includes('image') is treated as authorization to run native prompt image loading: the runner scans user prompts for local image paths / file:// URLs and loads them from disk (loadWebMedia) into base64 blocks that are then sent to the model provider.

Impact if an attacker can influence the onboarded modelId (or induce re-onboarding) and can send prompts:

A crafted modelId containing e.g. vision, -vl, gemini, gpt-4.1, etc. will enable input: ['text','image'] even if the provider/model is not actually intended for multimodal.
Once enabled, prompt text containing /absolute/path/to/secret.png or file:///.../secret.jpg can cause the system to read local files (images) and transmit them to the LLM provider.
The default FS policy is not necessarily workspace-only; when tools.fs.workspaceOnly is false, assertSandboxPath will not restrict reads to the workspace.

Vulnerable code (heuristic auto-enable):

input: isLikelyVisionCapableModel
  ? (["text", "image"] as Array<"text" | "image">)
  : (["text"] as ["text"]),

Related sink (authorization by capability flag):

params.model.input?.includes("image") => enables detectAndLoadPromptImages() which loads local paths / file:// via loadWebMedia().

Recommendation

Do not infer vision capability from user-controlled modelId strings.

Recommended fixes:

Default custom models to text-only unless the user explicitly opts into images.
If you want auto-detection, base it on a trusted source (provider metadata / discovery API), not substring matching.
Treat model.input as advisory and still gate native prompt image loading behind an explicit security setting (e.g., media.allowPromptFileImages=true) and enforce workspaceOnly=true unless explicitly overridden.

Example safer default:

// For custom providers, default to text-only.
// Users must explicitly set input=["text","image"] in config.
input: ["text"] as ["text"],

And in the image loader path, require explicit opt-in:

if (!cfg.media?.allowPromptLocalImages) return empty;
if (!resolveEffectiveToolFsWorkspaceOnly({ cfg, agentId })) throw new Error("...workspaceOnly required...");

2. 🔵 Azure deployment path injection via unencoded modelId in transformAzureUrl

Property	Value
Severity	Low
CWE	CWE-74
Location	`src/commands/onboard-custom.ts:85-92`

Description

The Azure URL helper transformAzureUrl concatenates the user-supplied modelId directly into a URL path segment without encoding or validation.

Input: modelId originates from onboarding input (prompt/CLI/config) and is later reused for verification requests.
Issue: modelId is appended as raw text in a URL string: .../openai/deployments/${modelId}.
Impact: A crafted modelId containing path-control characters (e.g., ../, //, ?, #) can alter the final request path once later processed by new URL(...), potentially escaping the intended /openai/deployments/<deployment> prefix and hitting unintended endpoints on the same Azure host. This is a form of path injection that can undermine assumptions about which Azure API path is contacted.

Vulnerable code:

return `${normalizedUrl}/openai/deployments/${modelId}`;

Recommendation

Treat modelId as a URL path segment and encode/validate it before concatenation.

Encode with encodeURIComponent (or stricter allowlist validation for Azure deployment names).
Build URLs with URL and explicit path joining to avoid .. normalization surprises.

Example fix:

function transformAzureUrl(baseUrl: string, modelId: string): string {
  const normalizedUrl = baseUrl.endsWith("/") ? baseUrl.slice(0, -1) : baseUrl;
  if (normalizedUrl.includes("/openai/deployments/")) return normalizedUrl;

  // Option A: encode as a single path segment
  const safeDeployment = encodeURIComponent(modelId);
  return `${normalizedUrl}/openai/deployments/${safeDeployment}`;

  // Option B (stricter): allowlist Azure deployment name characters
  // if (!/^[A-Za-z0-9._-]+$/.test(modelId)) throw new Error("Invalid deployment name");
}

This prevents ../, ?, #, and other reserved characters from altering the constructed endpoint path.

Analyzed PR: #51893 at commit 9d9573c

_{Last updated on: 2026-03-24T21:45:58Z}

fix(onboard): infer vision input for non-azure custom models

4d065ed

openclaw-barnacle bot added the commands Command implementations label Mar 21, 2026

openclaw-barnacle bot added the size: XS label Mar 21, 2026

chatgpt-codex-connector bot reviewed Mar 21, 2026

View reviewed changes

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

fix(onboard): upgrade existing text-only vision model input on re-onb…

dff2b28

…oard

openclaw-barnacle bot added size: S and removed size: XS labels Mar 22, 2026

greptile-apps bot reviewed Mar 22, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

fix(onboard): avoid overriding deliberate text-only custom model input

1866593

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

fix(onboard): tighten vision inference for custom OpenAI model ids

bc8bd8c

greptile-apps bot reviewed Mar 23, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 23, 2026

View reviewed changes

refactor(onboard): drop unreachable model id guard

9d9573c

BradGroux mentioned this pull request Mar 26, 2026

WORKING: All Microsoft Issues and PRs #49126

Draft

	if (isAzure \|\| !isLikelyVisionCapableModel \|\| model.id !== modelId) {
	if (isAzure \|\| !isLikelyVisionCapableModel) {

Uh oh!

Conversation

widingmarcus-cyber commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Validation

Uh oh!

widingmarcus-cyber commented Mar 21, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

widingmarcus-cyber commented Mar 22, 2026

Uh oh!

greptile-apps bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

widingmarcus-cyber commented Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

widingmarcus-cyber commented Mar 23, 2026

Uh oh!

greptile-apps bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

widingmarcus-cyber commented Mar 24, 2026

Uh oh!

aisle-research-bot bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔒 Aisle Security Analysis

1. 🟠 Heuristic vision auto-enable allows prompt-driven local image file exfiltration

Description

Recommendation

2. 🔵 Azure deployment path injection via unencoded modelId in transformAzureUrl

Description

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

widingmarcus-cyber commented Mar 21, 2026 •

edited

Loading

greptile-apps bot commented Mar 21, 2026 •

edited

Loading

aisle-research-bot bot commented Mar 24, 2026 •

edited

Loading