fix(onboard): infer vision input for non-Azure custom models#51893
fix(onboard): infer vision input for non-Azure custom models#51893widingmarcus-cyber wants to merge 5 commits intoopenclaw:mainfrom
Conversation
|
@greptile-apps please review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4d065eddee
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
src/commands/onboard-custom.ts
Outdated
| return ( | ||
| id.includes("claude") || | ||
| id.includes("gpt-4o") || | ||
| id.includes("gpt-5") || |
There was a problem hiding this comment.
Narrow the
gpt-5 vision heuristic for custom models
This marks every non-Azure model ID containing gpt-5 as image-capable, but we already ship text-only GPT-5-family IDs such as openai-gpt-52 in src/agents/venice-models.ts:350-355. Because the runtime decides whether to forward image parts purely from model.input.includes("image") (src/agents/pi-embedded-runner/run/attempt.ts:1735 and src/agents/openai-ws-stream.ts:189-207), onboarding a custom proxy that serves one of those text-only IDs will now advertise image support and send input_image payloads that the upstream model cannot handle.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes a longstanding gap in Key changes:
All three concerns from the prior review round are addressed:
The heuristic is intentionally incomplete (e.g. Confidence Score: 4/5
Reviews (5): Last reviewed commit: "refactor(onboard): drop unreachable mode..." | Re-trigger Greptile |
| function isLikelyVisionModel(modelId: string): boolean { | ||
| const id = modelId.toLowerCase(); | ||
| return ( | ||
| id.includes("claude") || | ||
| id.includes("gpt-4o") || | ||
| id.includes("gpt-5") || | ||
| id.includes("gemini") || | ||
| id.includes("qwen-vl") || | ||
| id.includes("vision") || | ||
| id.includes("-vl") || | ||
| id.endsWith("vl") | ||
| ); | ||
| } |
There was a problem hiding this comment.
Broad
claude catch-all may over-match text-only models
id.includes("claude") matches every Claude variant, including known text-only deployments like claude-instant-1 and claude-2.0. Those models will now receive input: ["text", "image"] as the default even though they don't support images.
For a heuristic this is probably fine (better to default to vision-capable than to silently disable it for modern claude-3+ models), but it's worth noting. If you want tighter matching you could scope it to claude-3 and above:
id.includes("claude-3") ||
id.includes("claude-sonnet") ||
id.includes("claude-haiku") ||
id.includes("claude-opus") ||Also note that llava, pixtral, phi-3-vision, and minicpm-v style models won't match the current heuristic — but that's probably acceptable given those are less common custom-provider deployments.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 56-68
Comment:
**Broad `claude` catch-all may over-match text-only models**
`id.includes("claude")` matches every Claude variant, including known text-only deployments like `claude-instant-1` and `claude-2.0`. Those models will now receive `input: ["text", "image"]` as the default even though they don't support images.
For a heuristic this is probably fine (better to default to vision-capable than to silently disable it for modern claude-3+ models), but it's worth noting. If you want tighter matching you could scope it to `claude-3` and above:
```ts
id.includes("claude-3") ||
id.includes("claude-sonnet") ||
id.includes("claude-haiku") ||
id.includes("claude-opus") ||
```
Also note that `llava`, `pixtral`, `phi-3-vision`, and `minicpm-v` style models won't match the current heuristic — but that's probably acceptable given those are less common custom-provider deployments.
How can I resolve this? If you propose a fix, please make it concise.|
@greptile-apps please review |
| } | ||
| return model.input.length === 1 && model.input[0] === "text"; | ||
| }; | ||
|
|
||
| const mergedModels = hasModel | ||
| ? existingModels.map((model) => | ||
| model.id === modelId | ||
| ? { |
There was a problem hiding this comment.
Re-onboard silently overwrites deliberate text-only config
shouldUpgradeNonAzureVisionInput only checks whether model.input is currently ["text"], but this is indistinguishable from a user who intentionally restricted a vision-capable model ID to text-only (e.g., because their custom deployment endpoint doesn't actually expose vision, or they want cost controls). Re-running onboard --custom will silently change their config with no opt-out.
A concrete failure path: user deploys claude-sonnet-4-6 behind a text-only proxy and explicitly sets input: ["text"]. After re-onboard, input becomes ["text", "image"], and any image attachment will now be forwarded to a deployment that rejects it at runtime.
Consider adding a user-settable sentinel — for example only upgrade configs that still carry the generated name: "${modelId} (Custom Provider)" pattern (strongly suggesting they were never hand-edited), or introduce an explicit inferredInput: true flag set at generation time that the upgrade path uses as the gate:
// only auto-upgrade if the input was inferred by onboard, not user-set
return model.input.length === 1 && model.input[0] === "text" && model._inferredInput === true;Without such a guard the upgrade is a one-way destructive migration for users with intentionally restricted configs.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 693-700
Comment:
**Re-onboard silently overwrites deliberate text-only config**
`shouldUpgradeNonAzureVisionInput` only checks whether `model.input` is currently `["text"]`, but this is indistinguishable from a user who *intentionally* restricted a vision-capable model ID to text-only (e.g., because their custom deployment endpoint doesn't actually expose vision, or they want cost controls). Re-running `onboard --custom` will silently change their config with no opt-out.
A concrete failure path: user deploys `claude-sonnet-4-6` behind a text-only proxy and explicitly sets `input: ["text"]`. After re-onboard, `input` becomes `["text", "image"]`, and any image attachment will now be forwarded to a deployment that rejects it at runtime.
Consider adding a user-settable sentinel — for example only upgrade configs that still carry the generated `name: "${modelId} (Custom Provider)"` pattern (strongly suggesting they were never hand-edited), or introduce an explicit `inferredInput: true` flag set at generation time that the upgrade path uses as the gate:
```ts
// only auto-upgrade if the input was inferred by onboard, not user-set
return model.input.length === 1 && model.input[0] === "text" && model._inferredInput === true;
```
Without such a guard the upgrade is a one-way destructive migration for users with intentionally restricted configs.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dff2b282ef
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ...(shouldUpgradeNonAzureVisionInput(model) | ||
| ? ({ input: ["text", "image"] } as { input: Array<"text" | "image"> }) | ||
| : {}), |
There was a problem hiding this comment.
Preserve explicit text-only custom input on re-onboard
When a user has intentionally kept a non-Azure custom model text-only (for example because their proxy only accepts text for a Claude/Gemini-style ID), re-running openclaw onboard --custom now overwrites that choice to [text, image]. That regresses the existing “preserve user-customized fields” behavior covered in src/commands/onboard-custom.test.ts, and it has a real runtime effect because both src/agents/pi-embedded-runner/run/attempt.ts and src/agents/openai-ws-stream.ts decide whether to forward image parts from model.input. So a harmless API-key refresh can turn a working text-only custom provider into one that starts sending unsupported image payloads.
Useful? React with 👍 / 👎.
|
@greptile-apps please review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1866593a91
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| id.includes("gpt-4o") || | ||
| id.includes("gpt-5") || | ||
| id.includes("gemini") || |
There was a problem hiding this comment.
Extend the vision heuristic to GPT-4.1 and o-series
This matcher still leaves several mainstream OpenAI vision models text-only. buildCopilotModelDefinition() marks gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1, o1-mini, and o3-mini as input: ["text", "image"] in src/providers/github-copilot-models.ts, and src/media-understanding/runner.vision-skip.test.ts treats gpt-4.1 as a native vision model. If a user onboards a custom OpenAI-compatible proxy serving one of those IDs, applyCustomApiConfig() still writes input: ["text"] because the heuristic only matches gpt-4o/gpt-5, so image attachments remain disabled for a common class of custom models this change is meant to fix.
Useful? React with 👍 / 👎.
|
@greptile-apps please review |
src/commands/onboard-custom.ts
Outdated
| reasoning: false, | ||
| }; | ||
| const shouldUpgradeNonAzureVisionInput = (model: ModelDefinitionConfig): boolean => { | ||
| if (isAzure || !isLikelyVisionCapableModel || model.id !== modelId) { |
There was a problem hiding this comment.
Redundant
model.id !== modelId guard
shouldUpgradeNonAzureVisionInput is only ever called inside the model.id === modelId ? ... branch (line 706), so the model.id !== modelId clause will never be true. The guard is dead code and could mislead a future reader into thinking this function might be invoked in a broader context.
| if (isAzure || !isLikelyVisionCapableModel || model.id !== modelId) { | |
| if (isAzure || !isLikelyVisionCapableModel) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 693
Comment:
**Redundant `model.id !== modelId` guard**
`shouldUpgradeNonAzureVisionInput` is only ever called inside the `model.id === modelId ? ...` branch (line 706), so the `model.id !== modelId` clause will never be true. The guard is dead code and could mislead a future reader into thinking this function might be invoked in a broader context.
```suggestion
if (isAzure || !isLikelyVisionCapableModel) {
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc8bd8c9ec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| id.includes("gpt-4o") || | ||
| id.includes("gpt-4.1") || | ||
| /(^|[^a-z0-9])gpt-5($|[-._])/i.test(id) || | ||
| /(^|[^a-z0-9])o(1|3)($|[-._])/i.test(id) || |
There was a problem hiding this comment.
Include
o4 models in the non-Azure vision matcher
This widens the OpenAI heuristic to o1/o3, but o4 is still excluded. The repo already treats o4-mini as image-capable in the Azure path (src/commands/onboard-custom.test.ts:472-490 expects input: ["text", "image"]), so a non-Azure OpenAI-compatible endpoint serving the same model will still be onboarded as text-only here. Because downstream image forwarding is gated on model.input.includes("image"), attachments stay disabled for that model family.
Useful? React with 👍 / 👎.
| function isLikelyVisionModel(modelId: string): boolean { | ||
| const id = modelId.toLowerCase(); | ||
| return ( | ||
| id.includes("claude-3") || | ||
| id.includes("claude-sonnet") || |
There was a problem hiding this comment.
Cover multimodal families we already ship in the catalog
The new heuristic still misses several image-capable IDs that are already first-class in our own catalogs. For example, moonshotai/Kimi-K2.5 and both meta-llama/Llama-4-* models are input: ["text", "image"] in src/agents/together-models.ts:21-33 and src/agents/together-models.ts:49-75, and unsloth/gemma-3-27b-it is marked the same in src/agents/chutes-models.ts:111-117. If a user onboards an OpenAI-compatible proxy exposing any of those IDs, this code still writes input: ["text"], so image attachments remain unavailable even though the upstream model supports vision.
Useful? React with 👍 / 👎.
|
@greptile-apps please review |
🔒 Aisle Security AnalysisWe found 2 potential security issue(s) in this PR:
Vulnerabilities1. 🟠 Heuristic vision auto-enable allows prompt-driven local image file exfiltration
Description
Downstream, Impact if an attacker can influence the onboarded
Vulnerable code (heuristic auto-enable): input: isLikelyVisionCapableModel
? (["text", "image"] as Array<"text" | "image">)
: (["text"] as ["text"]),Related sink (authorization by capability flag):
RecommendationDo not infer vision capability from user-controlled Recommended fixes:
Example safer default: // For custom providers, default to text-only.
// Users must explicitly set input=["text","image"] in config.
input: ["text"] as ["text"],And in the image loader path, require explicit opt-in: if (!cfg.media?.allowPromptLocalImages) return empty;
if (!resolveEffectiveToolFsWorkspaceOnly({ cfg, agentId })) throw new Error("...workspaceOnly required...");2. 🔵 Azure deployment path injection via unencoded modelId in transformAzureUrl
DescriptionThe Azure URL helper
Vulnerable code: return `${normalizedUrl}/openai/deployments/${modelId}`;RecommendationTreat
Example fix: function transformAzureUrl(baseUrl: string, modelId: string): string {
const normalizedUrl = baseUrl.endsWith("/") ? baseUrl.slice(0, -1) : baseUrl;
if (normalizedUrl.includes("/openai/deployments/")) return normalizedUrl;
// Option A: encode as a single path segment
const safeDeployment = encodeURIComponent(modelId);
return `${normalizedUrl}/openai/deployments/${safeDeployment}`;
// Option B (stricter): allowlist Azure deployment name characters
// if (!/^[A-Za-z0-9._-]+$/.test(modelId)) throw new Error("Invalid deployment name");
}This prevents Analyzed PR: #51893 at commit Last updated on: 2026-03-24T21:45:58Z |
Summary
onboard-customfor non-Azure custom providersinput: ["text", "image"]instead of hardcoded text-onlyclaude-sonnet-4-6on non-Azure custom providersWhy
openclaw onboard --customcurrently writes non-Azure models withinput: ["text"]only, which silently disables image attachments for vision-capable models unless users manually edit config.Fixes #51869
Validation
corepack pnpm test src/commands/onboard-custom.test.ts