Skip to content

fix(model-capabilities): harden canonical alias guardrails#2834

Merged
code-yeongyu merged 3 commits intocode-yeongyu:devfrom
RaviTharuma:feat/model-capabilities-canonical-guardrails
Mar 25, 2026
Merged

fix(model-capabilities): harden canonical alias guardrails#2834
code-yeongyu merged 3 commits intocode-yeongyu:devfrom
RaviTharuma:feat/model-capabilities-canonical-guardrails

Conversation

@RaviTharuma
Copy link
Copy Markdown
Contributor

@RaviTharuma RaviTharuma commented Mar 25, 2026

Summary

  • add model-capability guardrail validation for alias drift and built-in requirement drift
  • replace the broad Claude -thinking pattern alias with an explicit legacy alias for claude-opus-4-6-thinking
  • drop the stale gpt-5.3-codex-spark remap now that it exists in the bundled snapshot
  • add a maintainer-facing maintenance note and run the guardrail suite in the scheduled models.dev refresh workflow

Why

The current models.dev integration is much better than the old hardcoded family logic, but the remaining risk is silent alias drift over time. This change makes that drift fail loudly in CI instead of quietly turning into long-term maintenance debt.

Verification

  • bun run test:model-capabilities
  • bun test src/plugin/chat-params.test.ts src/hooks/model-fallback/hook.test.ts src/plugin/event.model-fallback.test.ts src/features/background-agent/manager.test.ts --bail
  • bunx tsc --noEmit
  • bun run build

Summary by cubic

Hardened model capability alias guardrails to keep built‑in requirements canonical and catch alias drift or collisions with the bundled models.dev snapshot. Aligned Gemini tier aliases to gemini-3.1-pro, replaced the Claude “thinking” pattern with a single explicit legacy alias, and removed a stale GPT remap.

  • New Features

    • Added guardrail checks for alias target existence, alias/snapshot collisions, and canonical built‑in models; runs in the scheduled refresh workflow.
    • Added a maintainer doc outlining alias policy and maintenance steps.
  • Refactors

    • Replaced the -thinking pattern with an explicit legacy alias claude-opus-4-6-thinkingclaude-opus-4-6, and aligned Gemini tier aliases to gemini-3.1-pro.
    • Hardened alias lookup to avoid resolving prototype keys as aliases; removed the gpt-5.3-codex-spark alias now covered by the bundled snapshot.
    • Isolated Atlas SQLite test mocks to avoid shared barrel pollution.

Written for commit ce877ec. Summary will update on new commits.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@RaviTharuma
Copy link
Copy Markdown
Contributor Author

@code-yeongyu this is the follow-up hardening pass for the models.dev capability work.

The key change is not more aliasing, but tighter maintenance guardrails:

  • exact aliases are now inspectable and validated against the bundled snapshot
  • the broad Claude -thinking pattern alias is gone; it is now a single explicit legacy alias for claude-opus-4-6-thinking
  • the stale gpt-5.3-codex-spark remap was removed because it is now a real snapshot model
  • the scheduled snapshot refresh workflow now runs the guardrail suite before opening an automation PR

This should make future alias/snapshot drift fail loudly in CI instead of quietly accumulating maintenance debt.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files

Confidence score: 2/5

  • High-confidence, high-severity issue in src/shared/model-capability-aliases.ts: removing the gpt-5.3-codex-spark alias is likely to break model resolution for Opencode compatibility.
  • Because the alias is missing from the bundled Opencode snapshot, this change introduces a concrete regression risk rather than a minor cleanup concern.
  • Pay close attention to src/shared/model-capability-aliases.ts - ensure gpt-5.3-codex-spark remains mapped so model lookup does not fail.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/shared/model-capability-aliases.ts">

<violation number="1" location="src/shared/model-capability-aliases.ts:22">
P1: Custom agent: **Opencode Compatibility**

Do not remove the `gpt-5.3-codex-spark` alias. It does not exist in the bundled Opencode snapshot, so removing this alias will cause model resolution to fail.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@RaviTharuma
Copy link
Copy Markdown
Contributor Author

@cubic-dev-ai

check my replies and recheck everything

@cubic-dev-ai
Copy link
Copy Markdown

cubic-dev-ai bot commented Mar 25, 2026

@cubic-dev-ai

check my replies and recheck everything

@RaviTharuma I have started the AI code review. It will take a few minutes to complete.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 8 files

Confidence score: 3/5

  • There is concrete regression risk in src/shared/model-capability-aliases.ts: the canonical ID still points to gemini-3.1-pro-preview, while upstream Opencode expects gemini-3.1-pro, which can break compatibility and model resolution behavior.
  • A prototype-backed dictionary lookup in src/shared/model-capability-aliases.ts may match inherited keys (like constructor), potentially returning an undefined canonicalModelID for invalid aliases.
  • Given the high-confidence compatibility drift (8/10) plus a user-impacting alias-resolution edge case (6/10), this is mergeable only with moderate caution until these mappings/lookups are corrected.
  • Pay close attention to src/shared/model-capability-aliases.ts - canonical model mapping drift and unsafe direct dictionary lookup can cause incorrect or undefined alias resolution.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/shared/model-capability-aliases.ts">

<violation number="1" location="src/shared/model-capability-aliases.ts:24">
P1: Custom agent: **Opencode Compatibility**

Update the canonical model ID to `gemini-3.1-pro`. The upstream Opencode repository no longer includes a `gemini-3.1-pro-preview` model, making this a drift from the canonical definitions.</violation>

<violation number="2" location="src/shared/model-capability-aliases.ts:30">
P1: Custom agent: **Opencode Compatibility**

Update the canonical model ID to `gemini-3.1-pro` to match the current upstream Opencode definitions.</violation>

<violation number="3" location="src/shared/model-capability-aliases.ts:67">
P2: Direct lookup on a prototype-backed dictionary can match inherited keys (e.g., `constructor`), causing invalid alias resolution and returning `canonicalModelID` as `undefined`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

@RaviTharuma RaviTharuma force-pushed the feat/model-capabilities-canonical-guardrails branch from 5cf548e to 5043cc2 Compare March 25, 2026 21:13
@RaviTharuma
Copy link
Copy Markdown
Contributor Author

@code-yeongyu @cubic-dev-ai I rebased this PR onto current dev after #2832 merged, then pushed ec20a82b to address the new review follow-up.

Changes in the new head:

  • Gemini 3.1 tier aliases now normalize to gemini-3.1-pro, which matches current canonical OmO IDs on dev
  • exact alias lookup now uses a Map, with a regression test covering constructor
  • targeted model-capability/doctor tests and bunx tsc --noEmit passed locally after the update

All current cubic threads on this PR are replied to and resolved.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 4 files (changes from recent commits).

Requires human review: Narrowing the Claude thinking alias from a broad pattern to a single exact alias (claude-opus-4-6-thinking) could cause regressions for other Claude models using that suffix.

@RaviTharuma
Copy link
Copy Markdown
Contributor Author

@code-yeongyu I tracked the remaining red test check down to Atlas test isolation, not the model-capability code. src/hooks/atlas/session-last-agent.sqlite.test.ts was mocking the entire ../../shared barrel, which can hide createInternalAgentTextPart from other Atlas test files depending on Bun module load order. I narrowed that test to mock only the three concrete modules it needs and pushed ce877ec0.

Re-verified locally on the new head:

  • bun test src/plugin-handlers
  • bun test src/hooks/atlas --bail
  • bunx tsc --noEmit
  • git diff --check

This should let the PR rerun cleanly on GitHub without touching product behavior.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 1 file (changes from recent commits).

Auto-approved: This PR improves stability by adding explicit guardrails for model aliases and canonical IDs. It includes comprehensive tests to prevent regressions in model resolution.

@RaviTharuma
Copy link
Copy Markdown
Contributor Author

@code-yeongyu current head ce877ec0 is now fully green and approved on my side:

  • mergeStateStatus: CLEAN
  • CI: test, typecheck, build, actionlint, cla, GitGuardian, and cubic all passed
  • all review threads on this PR are resolved

This should be ready for normal maintainer merge whenever you are happy with it.

@code-yeongyu
Copy link
Copy Markdown
Owner

Architecture review:

Ravi's concern (#2835) is valid. The explicit alias approach in this PR loses generality — a future Claude variant won't auto-match the pattern anymore.

However: This is the right trade-off for this specific layer. Reason:

  1. Aliases should be boring — not a general-purpose transformation engine. Explicit rules force us to ship known variants consciously.

  2. The real gain is layering — if we adopt Ravi's proposed split (execution model ID vs. canonical capability ID), pattern aliases can still live at the edge/intake layer (provider-specific model parsing), while canonical aliases stay small + guardrailed in the core.

  3. fix(model-capabilities): harden canonical alias guardrails #2834 is correct as-is for the guardrail layer. It hardenes this specific boundary.

Recommendation:

This PR is clean. The discussion is the right follow-up.

@code-yeongyu code-yeongyu merged commit 02ab83f into code-yeongyu:dev Mar 25, 2026
9 checks passed
@RaviTharuma
Copy link
Copy Markdown
Contributor Author

RaviTharuma commented Mar 25, 2026

Aliases should be boring — not a general-purpose transformation engine. Explicit rules force us to ship known variants consciously.

@code-yeongyu this is exactly wha i am not sure about.
should we build it dynamically so that we don't have to maintain it and avoid the debt.

or should we make it simple and more static and accept the maintenance debt?

i am not sure what the right way is, that's why i also opened a discussion.

because it's a philosophical decision that will also affect other refactors and additions.

this is your decision 👍

code-yeongyu added a commit that referenced this pull request Mar 26, 2026
Move from hardcoded exact aliases to pattern-based canonicalization:

- Populate PATTERN_ALIAS_RULES with regex patterns for:
  - Claude thinking variants (claude-opus-4-6-thinking → claude-opus-4-6)
  - Gemini tier suffixes (gemini-3.1-pro-{high,low} → gemini-3.1-pro)
- Add stripProviderPrefixForAliasLookup() for provider-prefixed models
  (anthropic/claude-sonnet-4-6 → claude-sonnet-4-6 for capability lookup)
- Preserve requestedModelID (with prefix) for API transport
- Reduce EXACT_ALIAS_RULES to exceptional cases only
  (gemini-3-pro-{high,low} → gemini-3-pro-preview)
- Comprehensive test coverage for patterns, prefix stripping, negatives

Addresses Discussion #2835 (pattern matching architecture)
Related to PR #2834 (alias guardrails)

41 targeted tests pass, 4467 full suite tests pass, tsc clean.
@RaviTharuma RaviTharuma deleted the feat/model-capabilities-canonical-guardrails branch March 28, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants