Skip to content

Conversation

@hannesrudolph
Copy link
Collaborator

@hannesrudolph hannesrudolph commented Dec 8, 2025

Summary

Implements two related improvements to the evals system:

  1. Throttle token usage updates - Reduces TaskTokenUsageUpdated emissions by ~80-90% via 2-second throttling
  2. Stream tool usage stats - Makes tool usage data available in real-time, not just on task completion

Changes

Part 1: Token Usage Throttling

  • Added throttle state to Task class (TOKEN_USAGE_EMIT_INTERVAL_MS = 2000)
  • Modified saveClineMessages() to only emit when 2+ seconds have elapsed
  • Force final emission on task completion/abort to capture latest stats
  • Reduces Redis/SSE/DB load during streaming (10-50+ events → 2-5 events per request)

Part 2: Streaming Tool Usage

  • Updated TaskTokenUsageUpdated event signature to include toolUsage parameter
  • Updated all event handlers (Task, API, ClineProvider) to forward tool usage
  • Modified evals backend to save tool usage on every update (not just completion)
  • Updated web-evals UI to display live tool stats for running tasks
  • Tool stats now preserved on task abort/timeout

Testing

  • Added comprehensive unit tests for throttle logic (Task.throttle.test.ts)
  • All 357 existing tests pass
  • TypeScript compilation successful

Expected Results

  • ~80-90% reduction in TaskTokenUsageUpdated emissions
  • Real-time tool visibility in evals UI
  • No tool data loss on task abort/timeout
  • Reduced system load (Redis publish, SSE broadcast, DB writes)

Important

This PR introduces token usage throttling and real-time tool usage stats streaming, reducing system load and enhancing task visibility.

  • Behavior:
    • Throttle TaskTokenUsageUpdated emissions in Task class to every 2 seconds, reducing emissions by 80-90%.
    • Stream tool usage stats in real-time, updating TaskTokenUsageUpdated event signature to include toolUsage.
    • Ensure final token usage emission on task completion or abort.
  • Classes and Functions:
    • Modify Task class to include debouncedEmitTokenUsage for throttling in Task.ts.
    • Update ClineProvider and API classes to handle new TaskTokenUsageUpdated event signature.
    • Add hasToolUsageChanged() function in getApiMetrics.ts.
  • Testing:
    • Add unit tests for throttling logic in Task.throttle.test.ts.
    • Ensure all existing tests pass and TypeScript compilation is successful.

This description was created by Ellipsis for 452a0f1. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. Enhancement New feature or request labels Dec 8, 2025
@roomote
Copy link
Contributor

roomote bot commented Dec 8, 2025

Oroocle Clock   See task on Roo Cloud

Re-review complete for commits through f89798b163d4a3d7b4743482ae6abd162ddfaf9f; no new issues identified in this pass.

  • Ensure toolColumns recompute when streaming tool usage Map mutates (for example, include usageUpdatedAt in the dependency list).
Previous reviews

Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Dec 8, 2025
@hannesrudolph hannesrudolph force-pushed the feat/streaming-tool-stats-token-throttle branch from e310dbd to e52cd96 Compare December 8, 2025 22:14
@hannesrudolph hannesrudolph moved this from Triage to PR [Needs Review] in Roo Code Roadmap Dec 9, 2025
@hannesrudolph hannesrudolph added PR - Needs Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Dec 9, 2025
- Throttle TaskTokenUsageUpdated emissions to 2-second intervals (~80-90% reduction)
- Stream toolUsage alongside tokenUsage in real-time updates
- Force final emission on task completion/abort to capture latest stats
- Update evals UI to display live tool stats for running tasks
- Add comprehensive unit tests for throttle logic

This reduces system load (Redis/SSE/DB) while providing real-time tool visibility
and preventing data loss on task timeout/abort.
- Add ToolUsage type import and use proper types in ClineProvider event handlers
- Make emitFinalTokenUsageUpdate() public in Task.ts for reuse
- Use emitFinalTokenUsageUpdate() in AttemptCompletionTool for consistent behavior
- Add final token usage emission in handlePartial for completion with commands
- Add hasToolUsageChanged() helper function to getApiMetrics.ts
- Add toolUsageSnapshot property to Task.ts to track tool usage changes
- Update saveClineMessages() to emit when either token OR tool usage changes
- Update emitFinalTokenUsageUpdate() to also check tool usage changes
- Add comprehensive tests for tool usage change detection

This ensures final tool usage stats are captured on task abort even if
token usage hasn't changed (e.g., when task is aborted before API request
completes but tools were already executed).
…tedAt in toolColumns deps

- Add emitFinalTokenUsageUpdate mock to clineC and clineB in nested-delegation-resume test
- Add usageUpdatedAt to toolColumns useMemo dependency array for proper recomputation when streaming tool usage updates
When tasks timeout, there's a race condition where the DB might not have
the latest stats when the frontend refetches. This fix adds fallback logic
to use streaming values (from the in-memory Maps) when DB values are
empty or missing.

Changes:
- taskMetrics useMemo: prefer DB values but fall back to streaming if empty
- toolColumns useMemo: prefer DB values but fall back to streaming if empty
- stats useMemo: same fallback logic for aggregate tool usage
- tool cells in table: prefer DB values but fall back to streaming if missing
After a task is aborted due to timeout, the extension rehydrates the task
with a new instance. This new instance has empty toolUsage, and if it emits
any TaskTokenUsageUpdated events, they would overwrite the final metrics
that were saved before the abort.

This fix adds a check to ignore TaskTokenUsageUpdated events once
taskAbortedAt is set, preserving the final metrics from before the abort.
Instead of ignoring TaskTokenUsageUpdated events after TaskAborted,
accumulate tool usage data using a MAX strategy. This ensures:
- Empty rehydrated data won't overwrite existing: max(5, 0) = 5
- Legitimate restart with additional work is captured: max(5, 8) = 8

This approach is more robust than simply ignoring post-abort events,
as it handles both spurious rehydration and legitimate restart scenarios.
@mrubens mrubens deleted the feat/streaming-tool-stats-token-throttle branch December 9, 2025 03:40
@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Dec 9, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Review] to Done in Roo Code Roadmap Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement New feature or request lgtm This PR has been approved by a maintainer PR - Needs Review size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants