-
-
Notifications
You must be signed in to change notification settings - Fork 69.3k
Context window cache uses min-across-providers for same model id, causing incorrect /status display #35976
Description
Summary
/status shows an incorrect (lower) context window for models that appear under multiple providers in the built-in catalog. The context window cache in src/agents/context.ts uses only the model id as cache key, so when the same model id exists under different providers with different context limits, the smallest value wins.
Reproduction
- Configure an agent with
google-gemini-cli/gemini-3.1-pro-previewas primary model - Run
/status - Context line shows
0/128kinstead of0/1024k
Root Cause
In src/agents/context.ts, the cache population loop iterates over all providers and uses model.id as the cache key:
// L44-48
const existing = params.cache.get(model.id);
// When multiple providers expose the same model id with different limits,
// prefer the smaller window so token budgeting is fail-safe (no overestimation).
if (existing === undefined || contextWindow < existing) {
params.cache.set(model.id, contextWindow);
}gemini-3.1-pro-preview appears in the pi-ai built-in catalog under multiple providers:
github-copilot→ contextWindow: 128,000google-gemini-cli→ contextWindow: 1,048,576
Since the cache takes the minimum, 128k is stored.
Impact
- Display only —
/statusshows the wrong context window - No runtime impact —
pi-embedded-runner/model.tsresolves contextWindow from the correct provider config at runtime, so actual context capacity and compaction thresholds are unaffected
Workaround
Manually set contextWindow in config to override the cache:
{
"agents": {
"defaults": {
"models": {
"google-gemini-cli/gemini-3.1-pro-preview": {
"contextWindow": 1048576
}
}
}
}
}Suggested Fix
Change the cache key from model.id to provider/model.id (or equivalent composite key) so that different providers with different context limits for the same model id do not collide. The lookupContextTokens caller would then need to pass the provider as well.
Alternatively, resolveContextTokensForModel could consult the model registry directly (which is provider-aware) instead of the flat cache.