Skip to content

refactor(agents): unify canonical codex model facts#38173

Closed
who96 wants to merge 3 commits intoopenclaw:mainfrom
who96:codex/model-registry-phase1
Closed

refactor(agents): unify canonical codex model facts#38173
who96 wants to merge 3 commits intoopenclaw:mainfrom
who96:codex/model-registry-phase1

Conversation

@who96
Copy link
Copy Markdown

@who96 who96 commented Mar 6, 2026

User Impact

Users trying newer Codex models such as openai-codex/gpt-5.4 could see capability drift between what the catalog listed and what runtime actually used, especially around reasoning/image capability and forward-compat fallback behavior. This phase makes catalog and runtime agree on the same canonical codex facts.

Summary

Context

Test Plan

  • pnpm exec vitest run src/agents/model-catalog.test.ts src/agents/pi-embedded-runner/model.test.ts src/agents/model-selection.test.ts
  • pnpm tsgo

Related: #37804 (RFC: treat models.providers-injected models as first-class forward-compat models)
Ref: #37623 (openai-codex/gpt-5.4 configurable but not supported in runtime)

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: M labels Mar 6, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR completes phase 1 of RFC #37804 by extracting all canonical codex forward-compat model facts into a single model-facts.ts module and making both model-catalog and the pi-embedded-runner runtime resolution consume the same source of truth. The refactor is clean and well-tested; the previous gap where reasoning and input were silently inherited from a stale base model (flagged in a prior review round) is now correctly fixed.

Key observations:

  • loadCanonicalFactsModule() in both test files has a misleading .catch(() => null) guard: if the dynamic import throws, the expect(...).toBeTypeOf("function") call inside the if block will fail the test rather than skipping it — the return null below is unreachable dead code.
  • applyCanonicalForwardCompatCatalogEntries correctly applies reasoning and input from canonical facts, but never consults fallbackContextWindow/fallbackMaxTokens. For derived models like gpt-5.3-codex-spark whose canonical facts have no explicit contextWindow/maxTokens, the catalog entry will have undefined for those fields if the base model also lacks them, even though the canonical struct carries safe fallback values.
  • gpt-5.3-codex has no catalogBaseModelId, so applyCanonicalForwardCompatCatalogEntries skips it entirely; its canonical reasoning: true has no effect on the catalog. This is consistent with the intent (the base model comes from the SDK), but means catalog and canonical facts can silently disagree for the base model.

Confidence Score: 4/5

  • Safe to merge; the core refactor is correct and well-covered by tests, with only minor test-helper and catalog-fallback gaps.
  • The logic changes are correct: reasoning and input are now explicitly set from canonical facts in both catalog and runtime paths. The new tests verify gpt-5.4 end-to-end across both consumers. The two flagged issues (dead return null in test helpers, unused fallbackContextWindow in the catalog path) are non-critical and don't affect runtime correctness in the common case.
  • src/agents/model-facts.ts — verify whether fallbackContextWindow/fallbackMaxTokens should be applied as a floor in applyCanonicalForwardCompatCatalogEntries.

Comments Outside Diff (2)

  1. src/agents/model-catalog.test.ts, line 11-18 (link)

    return null is dead code — import failure causes a test failure, not a skip

    The .catch(() => null) pattern implies that a missing model-facts.js module is tolerated gracefully, but that's not what happens. When the import throws, module becomes null, so module?.getCanonicalForwardCompatModelFacts is undefined. The condition !undefined is true, so execution enters the if block and calls expect(undefined).toBeTypeOf("function") — which throws a test-failure assertion. The return null on the next line is unreachable dead code.

    If graceful skipping on a missing module is the intent, use Vitest's it.skipIf / it.runIf at the test level instead:

    const factsModule = await import("./model-facts.js").catch(() => null);
    const hasFactsModule = typeof factsModule?.getCanonicalForwardCompatModelFacts === "function";
    
    it.skipIf(!hasFactsModule)("matches the canonical codex facts layer for gpt-5.4", async () => { ... });

    If the intent is always to assert the module exists (fail loudly), remove the .catch(() => null) so the rejection is unambiguous:

    async function loadCanonicalFactsModule() {
      const module = await import("./model-facts.js");
      expect(module.getCanonicalForwardCompatModelFacts).toBeTypeOf("function");
      return module;
    }

    The same issue exists in src/agents/pi-embedded-runner/model.test.ts lines 19–27.

  2. src/agents/model-facts.ts, line 134-142 (link)

    fallbackContextWindow/fallbackMaxTokens are defined but never used in the catalog path

    CanonicalForwardCompatModelFacts has fallbackContextWindow and fallbackMaxTokens specifically for situations where no runtime-discovered value exists. The runtime path in pi-embedded-runner/model.ts (lines 69–70) uses them as the hard floor when no template model is available. However, applyCanonicalForwardCompatCatalogEntries never consults them.

    For derived models where facts.contextWindow/facts.maxTokens are undefined (e.g. gpt-5.3-codex-spark), the catalog entry inherits whatever the base model has. If the base gpt-5.3-codex is missing those fields, the derived spark entry also ends up with contextWindow: undefined even though canonical facts define a safe fallback.

    Consider applying the fallback values when the base model provides no context window:

    models.push({
      ...baseModel,
      id: facts.id,
      name: facts.name,
      reasoning: facts.reasoning,
      input: [...facts.input],
      contextWindow:
        typeof facts.contextWindow === "number"
          ? facts.contextWindow
          : (baseModel.contextWindow ?? facts.fallbackContextWindow),
      maxTokens:
        typeof facts.maxTokens === "number"
          ? facts.maxTokens
          : (baseModel.maxTokens ?? facts.fallbackMaxTokens),
    } as T);

Last reviewed commit: 8b666a6

Comment on lines +134 to +140
models.push({
...baseModel,
id: facts.id,
name: facts.name,
...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
} as T);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasoning and input canonical facts not applied to catalog entries

applyCanonicalForwardCompatCatalogEntries overrides contextWindow and maxTokens from canonical facts but silently inherits reasoning and input from the base model via the spread. This diverges from the full runtime fallback path (lines 58-69 in pi-embedded-runner/model.ts), which correctly uses facts.reasoning and facts.input.

Concrete example: the gpt-5.3-codex-spark canonical facts declare input: ["text", "image"] (model-facts.ts line 84), but the catalog test mocks the base gpt-5.3-codex with input: ["text"] only (model-catalog.test.ts line 111). When applyCanonicalForwardCompatCatalogEntries spreads the base model, the derived spark entry will inherit input: ["text"], silently contradicting the canonical fact. The test doesn't assert on input for the spark entry (lines 131-133), so this gap goes undetected. The same issue applies to gpt-5.4: the test doesn't verify that reasoning and input match the canonical facts (lines 177-183).

Consider applying canonical reasoning and input explicitly, just as they are applied in the full runtime fallback:

Suggested change
models.push({
...baseModel,
id: facts.id,
name: facts.name,
...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
} as T);
models.push({
...baseModel,
id: facts.id,
name: facts.name,
reasoning: facts.reasoning,
input: [...facts.input],
...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
} as T);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/model-facts.ts
Line: 134-140

Comment:
`reasoning` and `input` canonical facts not applied to catalog entries

`applyCanonicalForwardCompatCatalogEntries` overrides `contextWindow` and `maxTokens` from canonical facts but silently inherits `reasoning` and `input` from the base model via the spread. This diverges from the full runtime fallback path (lines 58-69 in `pi-embedded-runner/model.ts`), which correctly uses `facts.reasoning` and `facts.input`.

Concrete example: the `gpt-5.3-codex-spark` canonical facts declare `input: ["text", "image"]` (model-facts.ts line 84), but the catalog test mocks the base `gpt-5.3-codex` with `input: ["text"]` only (model-catalog.test.ts line 111). When `applyCanonicalForwardCompatCatalogEntries` spreads the base model, the derived spark entry will inherit `input: ["text"]`, silently contradicting the canonical fact. The test doesn't assert on `input` for the spark entry (lines 131-133), so this gap goes undetected. The same issue applies to `gpt-5.4`: the test doesn't verify that `reasoning` and `input` match the canonical facts (lines 177-183).

Consider applying canonical `reasoning` and `input` explicitly, just as they are applied in the full runtime fallback:

```suggestion
    models.push({
      ...baseModel,
      id: facts.id,
      name: facts.name,
      reasoning: facts.reasoning,
      input: [...facts.input],
      ...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
      ...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
    } as T);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 61aa0b365d8fd63f660c7b2e97e3b0ffbf28e8d5. That commit applies canonical reasoning and input to catalog-derived entries and extends the spark / gpt-5.4 assertions to lock the behavior.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 6, 2026

@greptile-apps review - New commit applies canonical reasoning and input facts explicitly in applyCanonicalForwardCompatCatalogEntries, matching the runtime fallback path. Test assertions for spark and gpt-5.4 now verify reasoning and input values.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 61aa0b365d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +53 to +54
...(typeof facts.contextWindow === "number" ? { contextWindow: facts.contextWindow } : {}),
...(typeof facts.maxTokens === "number" ? { maxTokens: facts.maxTokens } : {}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply canonical reasoning/input in codex template fallback

When a Codex fallback is built from a discovered template, this branch only overrides contextWindow/maxTokens, so reasoning and input still come from the template model. If the template metadata is stale (for example, reports text-only or non-reasoning), resolveModel("openai-codex", "gpt-5.4", ...) will return capabilities that contradict the canonical facts layer, which can disable thinking/vision behavior for these forward-compat models.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8b666a6d8bb695ba601a14daaa07a8f19f12e1c5. That change renames the helper and applies canonical reasoning and input in the template fallback path so runtime fallback matches the canonical facts layer.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 6, 2026

@greptile-apps review - New commit 8b666a6 renames resolveOpenAICodexGpt53FallbackModel to resolveOpenAICodexForwardCompatModel and applies canonical reasoning/input in the template fallback path, addressing both Greptile comments.

@who96
Copy link
Copy Markdown
Author

who96 commented Mar 12, 2026

Closing — the core gpt-5.4 forward-compat fixes have landed via #37876, #38736, #39753, #39902, and #40160. The remaining reasoning-default and NO_REPLY work is covered by #37940 and #38232 respectively.

@who96 who96 closed this Mar 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant