fix: prevent model cache from persisting empty API responses #9623

daniel-lxs · 2025-11-26T19:11:42Z

Summary

This PR fixes a race condition in the model cache that caused "model ID not found" errors when API returned empty responses.

Root Cause

When the OpenRouter API (or other providers) failed silently and returned {} instead of throwing an error, the empty response was being cached to both memory and disk. This overwrote valid cached data, causing subsequent model lookups to fail with "model not found" and fall back to defaults.

Changes

1. Empty response protection

getModels(): Only cache non-empty API responses (>0 models)
refreshModels(): Return existing cache when API returns empty response

2. In-flight request deduplication

Added inFlightRefresh Map to track ongoing refresh requests. When multiple calls to refreshModels() happen concurrently for the same provider, they now share the same promise instead of racing against each other.

3. Telemetry

Added new MODEL_CACHE_EMPTY_RESPONSE telemetry event to track when API returns empty responses, which helps identify problematic API behavior in production.

Testing

Added 8 new test cases covering empty response handling
All 99 tests in the fetchers test suite pass
All 21 tests in modelCache.spec.ts pass

Files Changed

packages/types/src/telemetry.ts - Added telemetry event
src/api/providers/fetchers/modelCache.ts - Core fix
src/api/providers/fetchers/__tests__/modelCache.spec.ts - Tests

Important

Fixes caching of empty API responses in modelCache.ts and adds telemetry for tracking, with concurrency improvements and new tests.

Behavior:
- getModels(): Only caches non-empty API responses.
- refreshModels(): Returns existing cache if API response is empty.
Concurrency:
- Introduces inFlightRefresh Map to deduplicate concurrent refreshModels() calls.
Telemetry:
- Adds MODEL_CACHE_EMPTY_RESPONSE event in telemetry.ts to track empty API responses.
Testing:
- Adds 8 test cases for empty response handling in modelCache.spec.ts.
Files Changed:
- telemetry.ts: Adds telemetry event.
- modelCache.ts: Implements core fix and concurrency handling.
- modelCache.spec.ts: Adds tests.

^{This description was created by}^{for c02fe6b. You can customize this summary. It will automatically update as commits are pushed.}

- Added protection against caching empty API responses in getModels() and refreshModels() - Added in-flight request tracking to prevent concurrent refresh calls from racing - Added telemetry event MODEL_CACHE_EMPTY_RESPONSE to track these occurrences - Added comprehensive tests for empty cache protection Fixes #9597

roomote · 2025-11-26T19:12:03Z

Rooviewer See task on Roo Cloud

All issues resolved. The error logging has been successfully added to the refreshModels() catch block.

Add error logging in refreshModels() catch block to aid debugging when API failures occur

Previous reviews

c42c03a: Review #1

_{Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.}

src/api/providers/fetchers/modelCache.ts

Addresses PR review feedback to improve production debugging.

* fix: include mcpServers in getState() for auto-approval (RooCodeInc#9199) * fix: replace rate-limited badges with badgen.net (RooCodeInc#9200) * Batch settings updates from the webview to the extension host (RooCodeInc#9165) Co-authored-by: Roo Code <[email protected]> * fix: Apply updated API profile settings when provider/model unchanged (RooCodeInc#9208) (RooCodeInc#9210) fix: apply updated API profile settings when provider/model unchanged (RooCodeInc#9208) * fix: migrate Issue Fixer to REST + ProjectsV2 (RooCodeInc#9207) * fix(issue-fixer): migrate to REST for issue/comments and add ProjectsV2; remove Projects Classic mentions * Update .roo/rules-issue-fixer/4_github_cli_usage.xml Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Update .roo/rules-issue-fixer/4_github_cli_usage.xml Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> --------- Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Migrate conversation continuity to plugin-side encrypted reasoning items (Responses API) (RooCodeInc#9203) * Migrate conversation continuity to plugin-side encrypted reasoning items (Responses API) Summary We moved continuity off OpenAI servers and now maintain conversation state locally by persisting and replaying encrypted reasoning items. Requests are stateless (store=false) while retaining the performance/caching benefits of the Responses API. Why This aligns with how Roo manages context and simplifies our Responses API implementation while keeping all the benefits of continuity, caching, and latency improvements. What changed - All OpenAI models now use the Responses API; system instructions are passed via the top-level instructions field; requests include store=false and include=["reasoning.encrypted_content"]. - We persist encrypted reasoning items (type: "reasoning", encrypted_content, optional id) into API history and replay them on subsequent turns. - Reasoning summaries default to summary: "auto" when supported; text.verbosity only when supported. - Atomic persistence via safeWriteJson. Removed - previous_response_id flows, suppressPreviousResponseId/skipPrevResponseIdOnce, persistGpt5Metadata(), and GPT‑5 response ID metadata in UI messages. Kept - taskId and mode metadata for cross-provider features. Result - ZDR-friendly, stateless continuity with equal or better performance and a simpler codepath. * fix(webview): remove unused metadata prop from ReasoningBlock render * Responses API: retain response id for troubleshooting (not continuity) Continuity is stateless via encrypted reasoning items that we persist and replay. We now capture the top-level response id in OpenAiNativeHandler and persist the assistant message id into api_conversation_history.json solely for debugging/correlation with provider logs; it is not used for continuity or control flow. Also: silence request-body debug logging to avoid leaking prompts. * remove DEPRECATED tests * chore: remove unused Task types file to satisfy knip CI * fix(task): properly type cleanConversationHistory and createMessage args in Task to address Dan's review * chore: add changeset for v3.31.2 (RooCodeInc#9216) * Changeset version bump (RooCodeInc#9217) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * rename: sliding-window -> context-management; truncateConversationIfNeeded -> manageContext (RooCodeInc#9206) * Fix: Roo Anthropic input token normalization (avoid double-count) (RooCodeInc#9224) * OpenAI Native: gate encrypted_content include; remove gpt-5-chat-latest verbosity flag (fixes RooCodeInc#9225) (RooCodeInc#9231) openai-native: include reasoning.encrypted_content only when reasoningEffort is set; prevent Responses API error on non-reasoning models. types: remove supportsVerbosity from gpt-5-chat-latest to avoid invalid verbosity error. Fixes RooCodeInc#9225 * docs: remove Contributors section from README files (RooCodeInc#9198) Co-authored-by: Roo Code <[email protected]> * Release v3.31.3 (RooCodeInc#9232) * Changeset version bump (RooCodeInc#9233) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * Add native tool call support (RooCodeInc#9159) Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com> * Consistently use Package.name for better support of the nightly app (RooCodeInc#9240) * fix: resolve 400 error with native tools on OpenRouter (RooCodeInc#9238) * fix: change tool_choice from required to auto for native protocol (RooCodeInc#9242) * docs: include PR numbers in release guide (RooCodeInc#9236) * Add enum support to configuration schema (RooCodeInc#9247) * refactor(task): switch to <feedback> wrapper to prevent focus drift after context-management event (condense/truncate) (RooCodeInc#9237) * refactor(task): wrap initial user message in <feedback> instead of <task> to prevent focus drift after context-management Rationale: After a successful context-management event, framing the next user block as feedback reduces model focus drift. Mentions parsing already supports <feedback>, and tool flows (attemptCompletion, responses) are aligned. No change to loop/persistence. * refactor(mentions): drop <task> parsing; standardize on <feedback>; update tests * fix: Filter native tools by mode restrictions (RooCodeInc#9246) * fix: filter native tools by mode restrictions Native tools are now filtered based on mode restrictions before being sent to the API, matching the behavior of XML tools. Previously, all native tools were sent to the API regardless of mode, causing the model to attempt using disallowed tools. Changes: - Created filterNativeToolsForMode() and filterMcpToolsForMode() utility functions - Extracted filtering logic from Task.ts into dedicated module - Applied same filtering approach used for XML tools in system prompt - Added comprehensive test coverage (10 tests) Impact: - Model only sees tools allowed by current mode - No more failed tool attempts due to mode restrictions - Consistent behavior between XML and Native protocols - Better UX with appropriate tool suggestions per mode * refactor: eliminate repetitive tool checking using group-based approach - Add getAvailableToolsInGroup() helper to check tools by group instead of individually - Refactor filterNativeToolsForMode() to reuse getToolsForMode() instead of duplicating logic - Simplify capabilities.ts by using group-based checks (60% reduction) - Refactor rules.ts to use group helper (56% reduction) - Remove debug console.log statements - Update tests and snapshots Benefits: - Eliminates code duplication - Leverages existing TOOL_GROUPS structure - More maintainable - new tools in groups work automatically - All tests passing (26/26) * fix: add fallback to default mode when mode config not found Ensures the agent always has functional tools even if: - A custom mode is deleted while tasks still reference it - Mode configuration becomes corrupted - An invalid mode slug is provided Without this fallback, the agent would have zero tools (not even ask_followup_question or attempt_completion), completely breaking it. * Fix broken share button (RooCodeInc#9253) fix(webview-ui): make Share button popover work by forwarding ref in LucideIconButton - Convert LucideIconButton to forwardRef so Radix PopoverTrigger(asChild) receives a focusable element - Enables Share popover and shareCurrentTask flow - Verified with ShareButton/TaskActions Vitest suites * Add GPT-5.1 models and clean up reasoning effort logic (RooCodeInc#9252) * Reasoning effort: capability-driven; add disable/none/minimal; remove GPT-5 minimal special-casing; document UI semantics; remove temporary logs * Remove Unused supportsReasoningNone * Roo reasoning: omit field on 'disable'; UI: do not flip enableReasoningEffort when selecting 'disable' * Update packages/types/src/model.ts Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * Update webview-ui/src/components/settings/SimpleThinkingBudget.tsx Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> --------- Co-authored-by: roomote[bot] <219738659+roomote[bot]@users.noreply.github.com> * fix: make line_ranges optional in read_file tool schema (RooCodeInc#9254) The OpenAI tool schema required both 'path' and 'line_ranges' in FileEntry, but the TypeScript type definition marks lineRanges as optional. This caused the AI to fail when trying to read files without specifying line_ranges. Changes: - Updated read_file tool schema to only require 'path' parameter - line_ranges remains available but optional, matching TypeScript types - Aligns with implementation which treats lineRanges as optional throughout Fixes issue where read_file tool kept failing with missing parameters. * fix: prevent consecutive user messages on streaming retry (RooCodeInc#9249) * feat(openai): OpenAI Responses: model-driven prompt caching and generic reasoning options refactor (RooCodeInc#9259) * revert out of scope changes from RooCodeInc#9252 (RooCodeInc#9258) * Revert "refactor(task): switch to <feedback> wrapper to prevent focus drift after context-management event (condense/truncate)" (RooCodeInc#9261) * Release v3.32.0 (RooCodeInc#9264) * Changeset version bump (RooCodeInc#9265) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Matt Rubens <[email protected]> * [FIX] Fix OpenAI Native handling of encrypted reasoning blocks to prevent error when condensing (RooCodeInc#9263) * fix: prevent duplicate tool_result blocks in native protocol mode for read_file (RooCodeInc#9272) When read_file encountered errors (e.g., file not found), it would call handleError() which internally calls pushToolResult(), then continue to call pushToolResult() again with the final XML. In native protocol mode, this created two tool_result blocks with the same tool_call_id, causing 400 errors on subsequent API calls. This fix replaces handleError() with task.say() for error notifications. The agent still receives error details through the XML in the single final pushToolResult() call. This change works for both protocols: - Native: Only one tool_result per tool_call_id (fixes duplicate issue) - XML: Only one text block with complete XML (cleaner than before) Agent visibility preserved: Errors are included in the XML response sent to the agent via pushToolResult(). Tests: All 44 tests passing. Updated test to verify say() is called. * Fix duplicate tool blocks causing 'tool has already been used' error (RooCodeInc#9275) * feat(openai-native): add abort controller for request cancellation (RooCodeInc#9276) * Disable XML parser for native tool protocol (

daniel-lxs requested review from cte, jr and mrubens as code owners November 26, 2025 19:11

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Nov 26, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Nov 26, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Nov 26, 2025

dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug Something isn't working labels Nov 26, 2025

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Nov 26, 2025

roomote bot reviewed Nov 26, 2025

View reviewed changes

src/api/providers/fetchers/modelCache.ts Outdated Show resolved Hide resolved

fix: add error logging in refreshModels catch block

c02fe6b

Addresses PR review feedback to improve production debugging.

daniel-lxs moved this from Triage to PR [Needs Review] in Roo Code Roadmap Nov 26, 2025

roomote bot approved these changes Nov 26, 2025

View reviewed changes

hannesrudolph added PR - Needs Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Nov 26, 2025

mrubens approved these changes Nov 26, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 26, 2025

mrubens merged commit f388919 into main Nov 26, 2025
16 checks passed

mrubens deleted the fix/model-cache-empty-response-9597 branch November 26, 2025 21:59

github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Nov 26, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Nov 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent model cache from persisting empty API responses #9623

fix: prevent model cache from persisting empty API responses #9623

Uh oh!

daniel-lxs commented Nov 26, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fix: prevent model cache from persisting empty API responses #9623

fix: prevent model cache from persisting empty API responses #9623

Uh oh!

Conversation

daniel-lxs commented Nov 26, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Changes

1. Empty response protection

2. In-flight request deduplication

3. Telemetry

Testing

Files Changed

Uh oh!

roomote bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniel-lxs commented Nov 26, 2025 •

edited by ellipsis-dev bot

Loading

roomote bot commented Nov 26, 2025 •

edited

Loading