Skip to content

Conversation

@daniel-lxs
Copy link
Member

@daniel-lxs daniel-lxs commented Dec 9, 2025

Problem

Tool_result blocks sometimes had incorrect tool_use_id values, causing the Anthropic API to reject requests with errors like:

ApiError: Expected toolResult blocks at messages.6.content for the following Ids: toolu_ABC, but found: toolu_XYZ

This could happen due to:

  • Race conditions during streaming
  • Message editing scenarios
  • Resume/delegation scenarios

Solution

Added centralized validation in addToApiConversationHistory() that:

  • Validates tool_result IDs against tool_use IDs from the previous assistant message
  • Fixes mismatches using position-based matching (first tool_result → first tool_use, etc.)
  • Logs warnings for debugging when corrections are made

Changes

  • Added src/core/task/validateToolResultIds.ts with centralized validation logic
  • Added src/core/task/__tests__/validateToolResultIds.spec.ts with 12 comprehensive tests
  • Modified src/core/task/Task.ts to call validation in addToApiConversationHistory()
  • Simplified src/core/assistant-message/presentAssistantMessage.ts by removing redundant source-level validation

Performance

The validation adds negligible overhead (~0.1-0.5ms) compared to API call times (500ms-30s). It only runs on user messages containing tool_result blocks.

Testing

All 4589 tests pass.


Important

Add validation and correction for tool_result IDs in user messages to match previous assistant tool_use IDs, with telemetry and tests.

  • Core / Validation
    • Add validateAndFixToolResultIds() and ToolResultIdMismatchError in validateToolResultIds.ts to validate/correct tool_result.tool_use_id by position against prior assistant tool_use IDs; reports mismatches via telemetry.
  • Task Integration
    • In Task.addToApiConversationHistory() and flushPendingToolResultsToHistory(), run validation on user messages before pushing to apiConversationHistory.
  • Testing
    • Add validateToolResultIds.spec.ts with comprehensive cases (matching, mismatched, extra/missing items, non-tool content, string content, telemetry).

This description was created by Ellipsis for 11fdfd8. You can customize this summary. It will automatically update as commits are pushed.



Note

Validates and auto-corrects tool_result.tool_use_id values against the previous assistant tool_use blocks before persisting user messages, with telemetry and tests.

  • Core / Validation
    • Add validateAndFixToolResultIds() and ToolResultIdMismatchError in src/core/task/validateToolResultIds.ts to validate/correct tool_result.tool_use_id by position against prior assistant tool_use IDs; reports mismatches via telemetry.
  • Task Integration
    • In Task.addToApiConversationHistory() and flushPendingToolResultsToHistory(), run validation on user messages before pushing to apiConversationHistory.
  • Testing
    • Add src/core/task/__tests__/validateToolResultIds.spec.ts with comprehensive cases (matching, mismatched, extra/missing items, non-tool content, string content, telemetry).

Written by Cursor Bugbot for commit c07b9f9. This will update automatically on new commits. Configure here.

Fixes ApiError: Expected toolResult blocks for mismatched IDs

## Problem
Tool_result blocks sometimes had incorrect tool_use_id values, causing
the Anthropic API to reject requests with errors like:
"Expected toolResult blocks at messages.6.content for the following Ids:
toolu_ABC, but found: toolu_XYZ"

## Solution
Added centralized validation in addToApiConversationHistory() that:
- Validates tool_result IDs against tool_use IDs from previous assistant message
- Fixes mismatches using position-based matching
- Logs warnings for debugging when corrections are made

## Changes
- Added src/core/task/validateToolResultIds.ts with validation logic
- Added src/core/task/__tests__/validateToolResultIds.spec.ts (12 tests)
- Modified src/core/task/Task.ts to call validation
- Simplified src/core/assistant-message/presentAssistantMessage.ts
  by removing redundant source-level validation
@daniel-lxs daniel-lxs requested review from cte, jr and mrubens as code owners December 9, 2025 18:22
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. bug Something isn't working labels Dec 9, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 9, 2025
@roomote
Copy link
Contributor

roomote bot commented Dec 9, 2025

Rooviewer Clock   See task on Roo Cloud

Review complete. No issues found.

This PR adds centralized validation to correct tool_result ID mismatches before API requests. The implementation is well-structured with comprehensive test coverage (12 tests) and handles edge cases gracefully. The position-based matching strategy is appropriate for the stated use cases (race conditions, message editing, resume/delegation scenarios).

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@roomote
Copy link
Contributor

roomote bot commented Dec 9, 2025

Rooviewer Clock   See task on Roo Cloud

Review complete. No issues found.

This commit (11fdfd8) fixes a subtle bug in position-based matching when duplicate tool_use_ids are present:

  • Changed from findIndex (matches by ID) to indexOf (matches by reference)
  • Ensures correct position-based matching when multiple tool_results share the same incorrect ID
  • Well-documented with clear comments explaining the rationale

All previous review comments remain resolved:

  • console.warn calls replaced with PostHog telemetry via TelemetryService.instance.captureException()
  • findLastIndex utility provides efficient O(n) backwards iteration without memory overhead
  • Code is DRY with the lookup logic centralized in the validation function
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@cte
Copy link
Collaborator

cte commented Dec 9, 2025

I think in general it would be nice to fully understand the root cause of this, and a solid repro case would help with that. Short of that, we should considering emitting an event to Posthog every time we detect that a correct is needed.

- Replace .slice().reverse().find() with direct backwards loop
- Avoids creating two intermediate arrays for large conversation histories
- Applied to both addToApiConversationHistory and flushPendingToolResultsToHistory
- Replace manual backwards loop with findLastIndex in both
  addToApiConversationHistory and flushPendingToolResultsToHistory
- Improves clarity and consistency with existing code patterns
@daniel-lxs daniel-lxs force-pushed the fix/tool-result-id-validation branch from f9952b0 to ddf88d8 Compare December 9, 2025 22:23
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Review] in Roo Code Roadmap Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants