refactor(agents): unify canonical codex model facts by who96 · Pull Request #38173 · openclaw/openclaw

who96 · 2026-03-06T16:55:21Z

User Impact

Users trying newer Codex models such as openai-codex/gpt-5.4 could see capability drift between what the catalog listed and what runtime actually used, especially around reasoning/image capability and forward-compat fallback behavior. This phase makes catalog and runtime agree on the same canonical codex facts.

Summary

phase 1 of the RFC RFC: treat models.providers-injected models as first-class forward-compat models #37804 follow-up: unify canonical codex forward-compat facts under src/agents
make model-catalog and embedded runtime resolution consume the same codex facts layer
add regression coverage for shared gpt-5.4 facts and codex transport/limit ordering

Context

tracking RFC: RFC: treat models.providers-injected models as first-class forward-compat models #37804
implementation intent comment: RFC: treat models.providers-injected models as first-class forward-compat models #37804 (comment)
this PR is intentionally limited to the src/agents fact-source layer; CLI / auto-reply / picker consumer migration is deferred to later phases

Test Plan

pnpm exec vitest run src/agents/model-catalog.test.ts src/agents/pi-embedded-runner/model.test.ts src/agents/model-selection.test.ts
pnpm tsgo

Related: #37804 (RFC: treat models.providers-injected models as first-class forward-compat models)
Ref: #37623 (openai-codex/gpt-5.4 configurable but not supported in runtime)

greptile-apps · 2026-03-06T17:02:15Z

Greptile Summary

This PR completes phase 1 of RFC #37804 by extracting all canonical codex forward-compat model facts into a single model-facts.ts module and making both model-catalog and the pi-embedded-runner runtime resolution consume the same source of truth. The refactor is clean and well-tested; the previous gap where reasoning and input were silently inherited from a stale base model (flagged in a prior review round) is now correctly fixed.

Key observations:

loadCanonicalFactsModule() in both test files has a misleading .catch(() => null) guard: if the dynamic import throws, the expect(...).toBeTypeOf("function") call inside the if block will fail the test rather than skipping it — the return null below is unreachable dead code.
applyCanonicalForwardCompatCatalogEntries correctly applies reasoning and input from canonical facts, but never consults fallbackContextWindow/fallbackMaxTokens. For derived models like gpt-5.3-codex-spark whose canonical facts have no explicit contextWindow/maxTokens, the catalog entry will have undefined for those fields if the base model also lacks them, even though the canonical struct carries safe fallback values.
gpt-5.3-codex has no catalogBaseModelId, so applyCanonicalForwardCompatCatalogEntries skips it entirely; its canonical reasoning: true has no effect on the catalog. This is consistent with the intent (the base model comes from the SDK), but means catalog and canonical facts can silently disagree for the base model.

Confidence Score: 4/5

Safe to merge; the core refactor is correct and well-covered by tests, with only minor test-helper and catalog-fallback gaps.
The logic changes are correct: reasoning and input are now explicitly set from canonical facts in both catalog and runtime paths. The new tests verify gpt-5.4 end-to-end across both consumers. The two flagged issues (dead return null in test helpers, unused fallbackContextWindow in the catalog path) are non-critical and don't affect runtime correctness in the common case.
src/agents/model-facts.ts — verify whether fallbackContextWindow/fallbackMaxTokens should be applied as a floor in applyCanonicalForwardCompatCatalogEntries.

Comments Outside Diff (2)

src/agents/model-catalog.test.ts, line 11-18 (link)

return null is dead code — import failure causes a test failure, not a skip

The .catch(() => null) pattern implies that a missing model-facts.js module is tolerated gracefully, but that's not what happens. When the import throws, module becomes null, so module?.getCanonicalForwardCompatModelFacts is undefined. The condition !undefined is true, so execution enters the if block and calls expect(undefined).toBeTypeOf("function") — which throws a test-failure assertion. The return null on the next line is unreachable dead code.

If graceful skipping on a missing module is the intent, use Vitest's it.skipIf / it.runIf at the test level instead:
```
const factsModule = await import("./model-facts.js").catch(() => null);
const hasFactsModule = typeof factsModule?.getCanonicalForwardCompatModelFacts === "function";

it.skipIf(!hasFactsModule)("matches the canonical codex facts layer for gpt-5.4", async () => { ... });
```
If the intent is always to assert the module exists (fail loudly), remove the .catch(() => null) so the rejection is unambiguous:
```
async function loadCanonicalFactsModule() {
  const module = await import("./model-facts.js");
  expect(module.getCanonicalForwardCompatModelFacts).toBeTypeOf("function");
  return module;
}
```
The same issue exists in src/agents/pi-embedded-runner/model.test.ts lines 19–27.
src/agents/model-facts.ts, line 134-142 (link)

fallbackContextWindow/fallbackMaxTokens are defined but never used in the catalog path

CanonicalForwardCompatModelFacts has fallbackContextWindow and fallbackMaxTokens specifically for situations where no runtime-discovered value exists. The runtime path in pi-embedded-runner/model.ts (lines 69–70) uses them as the hard floor when no template model is available. However, applyCanonicalForwardCompatCatalogEntries never consults them.

For derived models where facts.contextWindow/facts.maxTokens are undefined (e.g. gpt-5.3-codex-spark), the catalog entry inherits whatever the base model has. If the base gpt-5.3-codex is missing those fields, the derived spark entry also ends up with contextWindow: undefined even though canonical facts define a safe fallback.

Consider applying the fallback values when the base model provides no context window:
```
models.push({
  ...baseModel,
  id: facts.id,
  name: facts.name,
  reasoning: facts.reasoning,
  input: [...facts.input],
  contextWindow:
    typeof facts.contextWindow === "number"
      ? facts.contextWindow
      : (baseModel.contextWindow ?? facts.fallbackContextWindow),
  maxTokens:
    typeof facts.maxTokens === "number"
      ? facts.maxTokens
      : (baseModel.maxTokens ?? facts.fallbackMaxTokens),
} as T);
```

_{Last reviewed commit: 8b666a6}

greptile-apps · 2026-03-06T17:02:19Z

src/agents/model-facts.ts

+    models.push({
+      ...baseModel,
+      id: facts.id,
+      name: facts.name,
+      ...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
+      ...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
+    } as T);


reasoning and input canonical facts not applied to catalog entries

applyCanonicalForwardCompatCatalogEntries overrides contextWindow and maxTokens from canonical facts but silently inherits reasoning and input from the base model via the spread. This diverges from the full runtime fallback path (lines 58-69 in pi-embedded-runner/model.ts), which correctly uses facts.reasoning and facts.input.

Concrete example: the gpt-5.3-codex-spark canonical facts declare input: ["text", "image"] (model-facts.ts line 84), but the catalog test mocks the base gpt-5.3-codex with input: ["text"] only (model-catalog.test.ts line 111). When applyCanonicalForwardCompatCatalogEntries spreads the base model, the derived spark entry will inherit input: ["text"], silently contradicting the canonical fact. The test doesn't assert on input for the spark entry (lines 131-133), so this gap goes undetected. The same issue applies to gpt-5.4: the test doesn't verify that reasoning and input match the canonical facts (lines 177-183).

Consider applying canonical reasoning and input explicitly, just as they are applied in the full runtime fallback:

Suggested change

models.push({

...baseModel,

id: facts.id,

name: facts.name,

...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),

...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),

} as T);

models.push({

...baseModel,

id: facts.id,

name: facts.name,

reasoning: facts.reasoning,

input: [...facts.input],

...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),

...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),

} as T);

Prompt To Fix With AI

This is a comment left during a code review. Path: src/agents/model-facts.ts Line: 134-140 Comment: `reasoning` and `input` canonical facts not applied to catalog entries `applyCanonicalForwardCompatCatalogEntries` overrides `contextWindow` and `maxTokens` from canonical facts but silently inherits `reasoning` and `input` from the base model via the spread. This diverges from the full runtime fallback path (lines 58-69 in `pi-embedded-runner/model.ts`), which correctly uses `facts.reasoning` and `facts.input`. Concrete example: the `gpt-5.3-codex-spark` canonical facts declare `input: ["text", "image"]` (model-facts.ts line 84), but the catalog test mocks the base `gpt-5.3-codex` with `input: ["text"]` only (model-catalog.test.ts line 111). When `applyCanonicalForwardCompatCatalogEntries` spreads the base model, the derived spark entry will inherit `input: ["text"]`, silently contradicting the canonical fact. The test doesn't assert on `input` for the spark entry (lines 131-133), so this gap goes undetected. The same issue applies to `gpt-5.4`: the test doesn't verify that `reasoning` and `input` match the canonical facts (lines 177-183). Consider applying canonical `reasoning` and `input` explicitly, just as they are applied in the full runtime fallback: ```suggestion models.push({ ...baseModel, id: facts.id, name: facts.name, reasoning: facts.reasoning, input: [...facts.input], ...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}), ...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}), } as T); ``` How can I resolve this? If you propose a fix, please make it concise.

Addressed in 61aa0b365d8fd63f660c7b2e97e3b0ffbf28e8d5. That commit applies canonical reasoning and input to catalog-derived entries and extends the spark / gpt-5.4 assertions to lock the behavior.

who96 · 2026-03-06T17:23:17Z

@greptile-apps review - New commit applies canonical reasoning and input facts explicitly in applyCanonicalForwardCompatCatalogEntries, matching the runtime fallback path. Test assertions for spark and gpt-5.4 now verify reasoning and input values.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61aa0b365d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-06T17:26:00Z

src/agents/pi-embedded-runner/model.ts

+      ...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
+      ...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),


Apply canonical reasoning/input in codex template fallback

When a Codex fallback is built from a discovered template, this branch only overrides contextWindow/maxTokens, so reasoning and input still come from the template model. If the template metadata is stale (for example, reports text-only or non-reasoning), resolveModel("openai-codex", "gpt-5.4", ...) will return capabilities that contradict the canonical facts layer, which can disable thinking/vision behavior for these forward-compat models.

Useful? React with 👍 / 👎.

Addressed in 8b666a6d8bb695ba601a14daaa07a8f19f12e1c5. That change renames the helper and applies canonical reasoning and input in the template fallback path so runtime fallback matches the canonical facts layer.

who96 · 2026-03-06T18:41:29Z

@greptile-apps review - New commit 8b666a6 renames resolveOpenAICodexGpt53FallbackModel to resolveOpenAICodexForwardCompatModel and applies canonical reasoning/input in the template fallback path, addressing both Greptile comments.

who96 · 2026-03-12T14:11:34Z

Closing — the core gpt-5.4 forward-compat fixes have landed via #37876, #38736, #39753, #39902, and #40160. The remaining reasoning-default and NO_REPLY work is covered by #37940 and #38232 respectively.

Agents: unify canonical codex model facts

60217ea

openclaw-barnacle bot added agents Agent runtime and tooling size: M labels Mar 6, 2026

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

fix(agents): honor canonical catalog facts

61aa0b3

chatgpt-codex-connector bot reviewed Mar 6, 2026

View reviewed changes

fix(agents): align codex forward-compat fallback facts

8b666a6

This was referenced Mar 6, 2026

fix(auto-reply): add unified outbound text sanitizer for NO_REPLY and streaming leaks #38232

Open

[codex] Add GPT-5.4 Codex support and fix status-card reasoning defaults #37674

Closed

Sapientropic mentioned this pull request Mar 9, 2026

fix(models): include config-injected entries in models list --all #37648

Closed

18 tasks

who96 closed this Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(agents): unify canonical codex model facts#38173

refactor(agents): unify canonical codex model facts#38173
who96 wants to merge 3 commits intoopenclaw:mainfrom
who96:codex/model-registry-phase1

who96 commented Mar 6, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 6, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 6, 2026

Uh oh!

who96 Mar 6, 2026

Uh oh!

who96 commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 6, 2026

Uh oh!

who96 Mar 6, 2026

Uh oh!

who96 commented Mar 6, 2026

Uh oh!

who96 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
		...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),

Uh oh!

Conversation

who96 commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test Plan

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

who96 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

who96 commented Mar 6, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

who96 Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

who96 commented Mar 6, 2026

Uh oh!

who96 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

who96 commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading