Skip to content

Onboarding: heuristic vision inputs for non-Azure custom models (#51869) (#51869)#51965

Open
lurui1997 wants to merge 1 commit intoopenclaw:mainfrom
lurui1997:fix/51869-onboard-custom-vision-heuristic
Open

Onboarding: heuristic vision inputs for non-Azure custom models (#51869) (#51869)#51965
lurui1997 wants to merge 1 commit intoopenclaw:mainfrom
lurui1997:fix/51869-onboard-custom-vision-heuristic

Conversation

@lurui1997
Copy link
Copy Markdown

Summary

See commit message on branch fix/51869-onboard-custom-vision-heuristic.

Test plan

  • CI / targeted tests as noted in commit

Fixes #51869

Made with Cursor

…claw#51869)

Mark common GPT-4o/5, o3/o4, Claude 3+, and Gemini-style model ids as
supporting image input when applying custom OpenAI-compatible provider config.

Made-with: Cursor
@openclaw-barnacle openclaw-barnacle bot added commands Command implementations size: XS labels Mar 22, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR adds a isLikelyVisionCapableNonAzureModel heuristic to onboard-custom.ts so that well-known vision-capable model IDs (OpenAI GPT-4o/4.1/turbo/5+, o3/o4, Claude 3/4.5+, Gemini) automatically default to input: ["text", "image"] during fresh onboarding on non-Azure custom providers. Previously all non-Azure custom models defaulted to input: ["text"], requiring manual config edits for multimodal use.

  • The vision heuristic is correctly scoped to the non-Azure branch of applyCustomApiConfig; existing user-customised model entries are preserved on re-onboard (consistent with prior behaviour).
  • The isClaude45OrHigher helper's third regex alternative (-[5-9](?:\b|[.-])) can false-positive on claude-3-5-* ids due to the non-greedy quantifier consuming 3 and landing on -5. This is currently harmless at the call site (the \bclaude-3\b guard fires first), but the function's stated semantic contract is broken in isolation and could mislead future callers.
  • o3/o4 are now explicitly recognised as vision-capable, but their reasoning flag stays false on the non-Azure path — inconsistent with the Azure branch which already marks them as reasoning models via isLikelyReasoningModel.
  • The single new test exercises the happy path (gpt-4o-mini) but does not cover the Claude, Gemini, o3/o4, or negative branches of the heuristic.

Confidence Score: 4/5

  • Safe to merge after addressing the isClaude45OrHigher regex false-positive; the o3/o4 reasoning flag inconsistency and thin test coverage are non-blocking follow-ups.
  • The change is small and well-contained in the non-Azure onboarding path. The heuristic logic is correct for all tested cases and the existing re-onboard preservation tests remain green. The isClaude45OrHigher third-alternative false positive is a P1 concern because the function's contract is demonstrably wrong (it matches claude-3-5-sonnet), but it does not produce an incorrect end result today due to the surrounding guard. The o3/o4 reasoning gap and limited test coverage are P2 improvements rather than blockers.
  • src/commands/onboard-custom.tsisClaude45OrHigher regex and o3/o4 reasoning handling; src/commands/onboard-custom.test.ts — additional heuristic branch coverage.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 56-60

Comment:
**`isClaude45OrHigher` third alternative matches `claude-3-5-*` models**

The third regex alternative `-[5-9](?:\b|[.-])` can fire for `claude-3-5-sonnet` because the non-greedy `[^\s/]*?` consumes `3` and then `-5` satisfies `-[5-9]`, followed by `-` which satisfies `[.-]`. In practice this is harmless here because the call site guards with `\bclaude-3\b || isClaude45OrHigher(…)` (so the result is still `true`), but the function's semantic contract ("4-5 or higher") no longer holds in isolation. Anchoring the third alternative with a preceding version separator makes the intent explicit:

```suggestion
function isClaude45OrHigher(id: string): boolean {
  // Match claude-*-4-5+, claude-*-45+, claude-*4.5+, or future 5.x+ majors.
  return /\bclaude-[^\s/]*?(?:-4-?(?:[5-9]|[1-9]\d)\b|4\.(?:[5-9]|[1-9]\d)\b|-(?:[5-9])(?:\b|[.-]))/i.test(
    id,
  );
}
```

Alternatively, the simplest fix is to only test the third branch when the version number directly follows `claude-`:
```
/\bclaude-(?:[a-z]+-)*(?:4-?(?:[5-9]|[1-9]\d)|[5-9])(?:\b|[.-])/i
```
Either way, the edge case should be covered by a dedicated test for `claude-3-5-sonnet` to confirm it is *not* matched by `isClaude45OrHigher` in isolation.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 68-70

Comment:
**`o3`/`o4` vision flag inconsistent with Azure `reasoning` detection**

This PR explicitly adds `o3` and `o4` to the vision heuristic, but the non-Azure `nextModel` block always sets `reasoning: false`. The Azure branch already identifies these as reasoning models:

```ts
// Azure branch (line 671)
const isLikelyReasoningModel = isAzure && /\b(o[134]|gpt-([5-9]|\d{2,}))\b/i.test(modelId);
```

A user who onboards an `o3` or `o4` model on a non-Azure custom endpoint (e.g., a self-hosted proxy or a third-party provider) would therefore get `reasoning: false` despite the model being a reasoning model. Consider mirroring the Azure logic for non-Azure as well, or at minimum document the intentional gap in a comment.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/commands/onboard-custom.test.ts
Line: 597-609

Comment:
**Test only exercises one of four heuristic branches**

The new test only verifies `gpt-4o-mini` (the OpenAI branch). Consider adding cases for the other branches so regressions surface immediately:

- A Claude 3 model (e.g. `claude-3-sonnet`) → `["text", "image"]`
- A `claude-opus-4-5`-style model → `["text", "image"]`
- A Gemini model (e.g. `gemini-1.5-pro`) → `["text", "image"]`
- An `o3-mini` model → `["text", "image"]`
- A generic custom model (e.g. `my-llm`) → `["text"]` (negative case)

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Onboarding: heuristi..."

Comment on lines +56 to +60
function isClaude45OrHigher(id: string): boolean {
// Match claude-*-4-5+, claude-*-45+, claude-*4.5+, or future 5.x+ majors.
return /\bclaude-[^\s/]*?(?:-4-?(?:[5-9]|[1-9]\d)\b|4\.(?:[5-9]|[1-9]\d)\b|-[5-9](?:\b|[.-]))/i.test(
id,
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 isClaude45OrHigher third alternative matches claude-3-5-* models

The third regex alternative -[5-9](?:\b|[.-]) can fire for claude-3-5-sonnet because the non-greedy [^\s/]*? consumes 3 and then -5 satisfies -[5-9], followed by - which satisfies [.-]. In practice this is harmless here because the call site guards with \bclaude-3\b || isClaude45OrHigher(…) (so the result is still true), but the function's semantic contract ("4-5 or higher") no longer holds in isolation. Anchoring the third alternative with a preceding version separator makes the intent explicit:

Suggested change
function isClaude45OrHigher(id: string): boolean {
// Match claude-*-4-5+, claude-*-45+, claude-*4.5+, or future 5.x+ majors.
return /\bclaude-[^\s/]*?(?:-4-?(?:[5-9]|[1-9]\d)\b|4\.(?:[5-9]|[1-9]\d)\b|-[5-9](?:\b|[.-]))/i.test(
id,
);
function isClaude45OrHigher(id: string): boolean {
// Match claude-*-4-5+, claude-*-45+, claude-*4.5+, or future 5.x+ majors.
return /\bclaude-[^\s/]*?(?:-4-?(?:[5-9]|[1-9]\d)\b|4\.(?:[5-9]|[1-9]\d)\b|-(?:[5-9])(?:\b|[.-]))/i.test(
id,
);
}

Alternatively, the simplest fix is to only test the third branch when the version number directly follows claude-:

/\bclaude-(?:[a-z]+-)*(?:4-?(?:[5-9]|[1-9]\d)|[5-9])(?:\b|[.-])/i

Either way, the edge case should be covered by a dedicated test for claude-3-5-sonnet to confirm it is not matched by isClaude45OrHigher in isolation.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 56-60

Comment:
**`isClaude45OrHigher` third alternative matches `claude-3-5-*` models**

The third regex alternative `-[5-9](?:\b|[.-])` can fire for `claude-3-5-sonnet` because the non-greedy `[^\s/]*?` consumes `3` and then `-5` satisfies `-[5-9]`, followed by `-` which satisfies `[.-]`. In practice this is harmless here because the call site guards with `\bclaude-3\b || isClaude45OrHigher(…)` (so the result is still `true`), but the function's semantic contract ("4-5 or higher") no longer holds in isolation. Anchoring the third alternative with a preceding version separator makes the intent explicit:

```suggestion
function isClaude45OrHigher(id: string): boolean {
  // Match claude-*-4-5+, claude-*-45+, claude-*4.5+, or future 5.x+ majors.
  return /\bclaude-[^\s/]*?(?:-4-?(?:[5-9]|[1-9]\d)\b|4\.(?:[5-9]|[1-9]\d)\b|-(?:[5-9])(?:\b|[.-]))/i.test(
    id,
  );
}
```

Alternatively, the simplest fix is to only test the third branch when the version number directly follows `claude-`:
```
/\bclaude-(?:[a-z]+-)*(?:4-?(?:[5-9]|[1-9]\d)|[5-9])(?:\b|[.-])/i
```
Either way, the edge case should be covered by a dedicated test for `claude-3-5-sonnet` to confirm it is *not* matched by `isClaude45OrHigher` in isolation.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +68 to +70
if (/\b(o3|o4)\b/i.test(modelId)) {
return true;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 o3/o4 vision flag inconsistent with Azure reasoning detection

This PR explicitly adds o3 and o4 to the vision heuristic, but the non-Azure nextModel block always sets reasoning: false. The Azure branch already identifies these as reasoning models:

// Azure branch (line 671)
const isLikelyReasoningModel = isAzure && /\b(o[134]|gpt-([5-9]|\d{2,}))\b/i.test(modelId);

A user who onboards an o3 or o4 model on a non-Azure custom endpoint (e.g., a self-hosted proxy or a third-party provider) would therefore get reasoning: false despite the model being a reasoning model. Consider mirroring the Azure logic for non-Azure as well, or at minimum document the intentional gap in a comment.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.ts
Line: 68-70

Comment:
**`o3`/`o4` vision flag inconsistent with Azure `reasoning` detection**

This PR explicitly adds `o3` and `o4` to the vision heuristic, but the non-Azure `nextModel` block always sets `reasoning: false`. The Azure branch already identifies these as reasoning models:

```ts
// Azure branch (line 671)
const isLikelyReasoningModel = isAzure && /\b(o[134]|gpt-([5-9]|\d{2,}))\b/i.test(modelId);
```

A user who onboards an `o3` or `o4` model on a non-Azure custom endpoint (e.g., a self-hosted proxy or a third-party provider) would therefore get `reasoning: false` despite the model being a reasoning model. Consider mirroring the Azure logic for non-Azure as well, or at minimum document the intentional gap in a comment.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +597 to +609
it("sets image input for likely vision non-Azure OpenAI-compatible models", () => {
const result = applyCustomApiConfig({
config: {},
baseUrl: "https://llm.example.com/v1",
modelId: "gpt-4o-mini",
compatibility: "openai",
apiKey: "key123",
providerId: "custom",
});
const model = result.config.models?.providers?.custom?.models?.[0];
expect(model?.input).toEqual(["text", "image"]);
expect(model?.reasoning).toBe(false);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test only exercises one of four heuristic branches

The new test only verifies gpt-4o-mini (the OpenAI branch). Consider adding cases for the other branches so regressions surface immediately:

  • A Claude 3 model (e.g. claude-3-sonnet) → ["text", "image"]
  • A claude-opus-4-5-style model → ["text", "image"]
  • A Gemini model (e.g. gemini-1.5-pro) → ["text", "image"]
  • An o3-mini model → ["text", "image"]
  • A generic custom model (e.g. my-llm) → ["text"] (negative case)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/onboard-custom.test.ts
Line: 597-609

Comment:
**Test only exercises one of four heuristic branches**

The new test only verifies `gpt-4o-mini` (the OpenAI branch). Consider adding cases for the other branches so regressions surface immediately:

- A Claude 3 model (e.g. `claude-3-sonnet`) → `["text", "image"]`
- A `claude-opus-4-5`-style model → `["text", "image"]`
- A Gemini model (e.g. `gemini-1.5-pro`) → `["text", "image"]`
- An `o3-mini` model → `["text", "image"]`
- A generic custom model (e.g. `my-llm`) → `["text"]` (negative case)

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30e0039cb2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (/\b(o3|o4)\b/i.test(modelId)) {
return true;
}
if (/\bclaude-3\b/i.test(modelId) || isClaude45OrHigher(modelId)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Recognize Claude 4.x IDs as image-capable

This branch only treats claude-3* and claude-4.5+ names as multimodal, so onboarding claude-sonnet-4, claude-opus-4, or claude-opus-4-1-* on a non-Azure custom endpoint still stores input: ["text"]. Downstream image flows gate on model.input.includes("image") (for example src/media-understanding/providers/image.ts:56), which means those Anthropic 4/4.1 models will continue rejecting image uploads in exactly the scenario this change is trying to fix.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations size: XS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: onboard-custom hardcodes input: ["text"] for non-Azure custom providers, silently disabling image/vision support

1 participant