Skip to content

Fix contextWindow collision across providers#9382

Closed
wbuntu wants to merge 4 commits intoopenclaw:mainfrom
wbuntu:fix/contextwindow-provider-collision
Closed

Fix contextWindow collision across providers#9382
wbuntu wants to merge 4 commits intoopenclaw:mainfrom
wbuntu:fix/contextwindow-provider-collision

Conversation

@wbuntu
Copy link
Copy Markdown

@wbuntu wbuntu commented Feb 5, 2026

Problem

When different providers expose the same model id (e.g. bigmodel/glm-4.7 and ark/glm-4.7), OpenClaw's context window inference could cache contextWindow by bare model id and effectively let one provider overwrite the other's value.

This can cause the wrong contextTokens to be persisted/displayed/used.

Root cause

lookupContextTokens populated an in-memory cache keyed only by model.id from the pi-coding-agent model registry. Provider was not part of the key, so ids could collide across providers.

Fix

  • Cache context windows by fully-qualified ref (<provider>/<id>) when provider is available.
  • Keep a bare-id cache only for ids that are unambiguous across providers.
  • Pass provider through call sites that already know it (status/session defaults/agent run persistence/cron followups).

Verification

A minimal container repro with fake providers that both define glm-4.7 but with different contextWindow values confirms that:

  • lookupContextTokens("glm-4.7", "bigmodel") returns 200000
  • lookupContextTokens("glm-4.7", "ark") returns 128000

and no longer collides.

Greptile Overview

Greptile Summary

This PR updates context-window inference to avoid collisions when multiple providers share the same model id. It changes the in-memory cache in src/agents/context.ts to key context windows by fully-qualified provider/model refs when available, while keeping a fallback bare-id cache only for ids that are unambiguous across providers. Call sites that already know the provider (auto-reply runner, status, session store updates, cron isolated agent, gateway defaults, sessions command) now pass the provider into lookupContextTokens() so the correct context window is persisted and displayed per provider.

Confidence Score: 3/5

  • Reasonably safe to merge after addressing remaining provider-less lookups
  • Core change (cache by provider/model ref) is straightforward and call sites were updated, but several existing code paths still call lookupContextTokens() without a provider, which can reintroduce incorrect context windows or return undefined for ambiguous ids. There is also a key normalization inconsistency for ref strings containing whitespace.
  • src/auto-reply/status.ts, src/commands/status.summary.ts, src/commands/sessions.ts, src/auto-reply/reply/model-selection.ts, src/auto-reply/reply/memory-flush.ts, src/auto-reply/reply/directive-handling.persist.ts, src/agents/context.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Greptile Overview

Greptile Summary

This PR updates context-window inference to avoid collisions when multiple providers share the same model id. It changes the in-memory cache in src/agents/context.ts to key context windows by fully-qualified provider/model refs when available, while keeping a fallback bare-id cache only for ids that are unambiguous across providers. Call sites that already know the provider (agent run persistence, followups, status, sessions, gateway defaults, cron isolated agent) now pass the provider into lookupContextTokens() so the correct context window is persisted and displayed per provider.

Confidence Score: 4/5

  • This PR is likely safe to merge once the already-noted ref-key normalization mismatch is addressed.
  • The core change (caching context windows by provider/model ref and only using bare ids when unambiguous) is localized and the updated call sites pass provider through correctly. Remaining provider-less lookup paths exist, but they were not modified in this PR; the main merge blocker is the normalization mismatch already raised in prior threads.
  • src/agents/context.ts

Copilot AI review requested due to automatic review settings February 5, 2026 05:12
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime commands Command implementations agents Agent runtime and tooling labels Feb 5, 2026
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +75 to +78
// If caller passed a fully-qualified ref already.
if (raw.includes("/")) {
return MODEL_CACHE_BY_REF.get(raw);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ref key normalization mismatch

lookupContextTokens() treats any modelIdOrRef containing / as a fully-qualified ref and does MODEL_CACHE_BY_REF.get(raw) without normalizing whitespace/case, but the cache keys are built via modelRefKey(provider.trim(), id) during load. If callers pass refs with extra spaces (e.g. from config/user input), lookups will fail even though the model exists. Consider normalizing the ref path (trim segments / maybe normalize provider) before get() to match how keys are stored.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/context.ts
Line: 75:78

Comment:
**Ref key normalization mismatch**

`lookupContextTokens()` treats any `modelIdOrRef` containing `/` as a fully-qualified ref and does `MODEL_CACHE_BY_REF.get(raw)` without normalizing whitespace/case, but the cache keys are built via `modelRefKey(provider.trim(), id)` during load. If callers pass refs with extra spaces (e.g. from config/user input), lookups will fail even though the model exists. Consider normalizing the ref path (trim segments / maybe normalize provider) before `get()` to match how keys are stored.

How can I resolve this? If you propose a fix, please make it concise.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 5, 2026

Additional Comments (1)

src/auto-reply/status.ts
Provider-less lookup reintroduces collision

In the transcript-usage fallback, contextTokens = lookupContextTokens(logUsage.model) ?? contextTokens; drops the provider, so if logUsage.model is a bare id that exists across providers, this can still pick the wrong cached context window (or now return undefined if ambiguous). This undermines the PR’s goal when includeTranscriptUsage is enabled and the session’s provider differs.

Also appears in src/commands/status.summary.ts:77-101, src/commands/sessions.ts:193-231, src/auto-reply/reply/model-selection.ts:504-510, src/auto-reply/reply/memory-flush.ts:62-68, src/auto-reply/reply/directive-handling.persist.ts:223-227.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/status.ts
Line: 313:316

Comment:
**Provider-less lookup reintroduces collision**

In the transcript-usage fallback, `contextTokens = lookupContextTokens(logUsage.model) ?? contextTokens;` drops the provider, so if `logUsage.model` is a bare id that exists across providers, this can still pick the wrong cached context window (or now return `undefined` if ambiguous). This undermines the PR’s goal when `includeTranscriptUsage` is enabled and the session’s provider differs.

Also appears in `src/commands/status.summary.ts:77-101`, `src/commands/sessions.ts:193-231`, `src/auto-reply/reply/model-selection.ts:504-510`, `src/auto-reply/reply/memory-flush.ts:62-68`, `src/auto-reply/reply/directive-handling.persist.ts:223-227`.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug where OpenClaw's context window cache could collide when different providers expose the same model ID. The previous implementation cached context windows only by model ID, allowing one provider's value to overwrite another's.

Changes:

  • Refactored context window caching to use two caches: one keyed by fully-qualified model references (provider/id) and one for bare IDs that are unambiguous across providers
  • Updated all relevant call sites to pass provider information to lookupContextTokens
  • Added collision detection logic to identify model IDs with conflicting context window values across providers

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/agents/context.ts Core fix: implements dual-cache strategy with provider-qualified keys and ambiguity detection
src/gateway/session-utils.ts Passes provider to context token lookup for session defaults
src/cron/isolated-agent/run.ts Passes provider to context token lookup for cron agent runs
src/commands/sessions.ts Passes provider to context token lookup for sessions command
src/commands/agent/session-store.ts Passes provider to context token lookup when updating session store
src/auto-reply/status.ts Passes provider to context token lookup in status message building
src/auto-reply/reply/followup-runner.ts Passes provider to context token lookup for followup runs
src/auto-reply/reply/agent-runner.ts Passes provider to context token lookup for agent runs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +196 to +199
const modelUsed = runResult.meta.agentMeta?.model ?? fallbackModel ?? defaultModel;
const contextTokensUsed =
agentCfgContextTokens ??
lookupContextTokens(modelUsed) ??
lookupContextTokens(modelUsed, fallbackProvider) ??
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent provider resolution compared to other files. In agent-runner.ts (line 370), cron/isolated-agent/run.ts (line 388), and commands/agent/session-store.ts (line 41), the providerUsed is determined by checking runResult.meta.agentMeta?.provider first before falling back to fallbackProvider. However, in this file, fallbackProvider is used directly without checking if runResult.meta.agentMeta?.provider exists.

For consistency and correctness, consider adding a providerUsed variable similar to line 196's modelUsed pattern.

Copilot uses AI. Check for mistakes.
lookupContextTokens previously keyed context windows by bare model id, which breaks when different providers expose the same id (e.g. bigmodel/glm-4.7 vs ark/glm-4.7). Cache by provider/id and only fall back to bare ids when unambiguous.
@wbuntu wbuntu force-pushed the fix/contextwindow-provider-collision branch from 77e55e0 to 894a4c7 Compare February 5, 2026 05:22
- Avoid single-line if statements without braces (curly)
- Remove unused import in cron runner
@wbuntu
Copy link
Copy Markdown
Author

wbuntu commented Feb 5, 2026

macOS runner still queued/running on CI (macos-app build/lint/test + checks-macos). Waiting for macOS capacity.

@wbuntu wbuntu marked this pull request as draft February 12, 2026 01:49
@wbuntu wbuntu marked this pull request as ready for review February 12, 2026 01:50
@sebslight
Copy link
Copy Markdown
Member

Closing as duplicate of #14744. If this is incorrect, comment and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations gateway Gateway runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants