Skip to content

[MOD-12320] Ensure full profile output on timeout with RETURN policy#7345

Merged
meiravgri merged 12 commits intomasterfrom
meiravg_return_profile_ontimeout
Nov 19, 2025
Merged

[MOD-12320] Ensure full profile output on timeout with RETURN policy#7345
meiravgri merged 12 commits intomasterfrom
meiravg_return_profile_ontimeout

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Nov 12, 2025

Overview

Previously, shard profile data was omitted on timeout because it's only sent in the final CURSOR READ response. This PR ensures that profile data from all shards is collected and returned even when a timeout occurs during profiling with RETURN policy.

Implementation: Introduces a new internal FT.CURSOR PROFILE command that is sent to shards (instead of CURSOR READ) when a timeout occurs with RETURN policy during profiling, allowing the coordinator to retrieve profile data from all shards.

Execution Flow

Normal profiling flow (no timeout):

  1. Coordinator sends FT.AGGREGATE with PROFILE to shards
  2. Shards return partial results via CURSOR READ responses
  3. Final CURSOR READ (cursor_id=0) includes profile data
  4. Coordinator collects and prints all shard profiles

New timeout handling flow:

  1. Shard execution (sendChunk):

    • Detects timeout during execution
    • Sets req->has_timedout = true to preserve timeout state for later FT.CURSOR PROFILE call
    • Returns partial results with cursor_id (not 0)
  2. Coordinator thread (rpnetNextsendChunk):

    • Detects timeout (via coordinator timer or shard warning in RESP3)
    • Returns RS_RESULT_TIMEDOUT
    • Sets MRIteratorCallback_SetTimedOut() on iterator context only if detected by coordinator timer
  3. UV thread (netCursorCallback):

    • For RESP3 + profiling: may also detect timeout warning in shard reply and set MRIteratorCallback_SetTimedOut()
    • Calls getCursorCommand() to determine next command
  4. UV thread (getCursorCommand):

    • Checks timeout flag
    • If timeout + profiling: switches command from CURSOR READCURSOR PROFILE
    • Sends FT.CURSOR PROFILE {index} {CID} to shard
  5. Shard execution (RSCursorCommand with PROFILE):

    • Retrieves cursor with preserved has_timedout state
    • Calls sendChunk_ReplyOnly_EmptyResults() which uses createProfilePrinterCtx()
    • Returns an empty results containing the profile data with timeout information
    • Purges cursor
  6. Coordinator thread (printAggProfile):

    • Drains remaining replies from channel (the CURSOR PROFILE responses)

Note

Ensures complete profile output on timeout by adding FT.CURSOR PROFILE and wiring timeout propagation across shards, coordinator, and reply handling.

  • Profile/Timeout handling:
    • Add AREQ::has_timedout and set it during execution; use new createProfilePrinterCtx() everywhere profile is printed (incl. empty-results replies).
    • RESP3 warning parsing marks iterator as timed out; coordinator now preserves and uses timeout state.
  • New command:
    • Implement FT.CURSOR PROFILE {index} {CID} to fetch profile after timeout and purge cursor; RESP2/RESP3 supported.
  • Coordinator changes:
    • On profiling timeout, switch follow-up shard command from CURSOR READ to CURSOR PROFILE (MRRootCommand adds C_PROFILE).
    • printAggProfile drains pending replies and logs shard/profile count; uses iterator helpers MRIterator_GetChannelSize() and MRIterator_GetNumShards().
    • Expose getNextReply(RPNet*) for draining.
  • RMR/MRIterator:
    • Add iterator helpers MRIterator_GetChannelSize, MRIterator_GetNumShards and export them; minor command mutation/reset fixes when switching to PROFILE.
  • Tests:
    • Add cursor profile validations and coordinator profile timeout tests; assert shard profiles count and errors for invalid PROFILE usage.

Written by Cursor Bugbot for commit f8845f0. This will update automatically on new commits. Configure here.

req->has_timedout

FT.CURSOR PROFILE

while (getNextReply(rpnet)) to printAggProfile

expose getNextReply

set timeout in netCursor Callabck
send ft.cursor profile it timeout
@meiravgri meiravgri changed the title initial imp [MOD-12320] initial imp Nov 13, 2025
getNextReply read profile when cursor is depleted
use sendChunk_ReplyOnly_EmptyResults in cursor PROFILE
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RESP3 Timeout State Tracking Inconsistency

In sendChunk_Resp3, the has_timedout local variable is computed but req->has_timedout is never set, unlike in sendChunk_Resp2 where it's set on line 467. This inconsistency means timeout state won't be properly tracked for RESP3 queries, potentially affecting cursor profile commands that rely on req->has_timedout.

src/aggregate/aggregate_exec.c#L606-L611

cursor_done = (rc != RS_RESULT_OK
&& !(rc == RS_RESULT_TIMEDOUT
&& req->reqConfig.timeoutPolicy == TimeoutPolicy_Return));
bool has_timedout = (rc == RS_RESULT_TIMEDOUT) || hasTimeoutError(qctx->err);

Fix in Cursor Fix in Web


…allback

netCursorCallback: fix set timeout from reply

expose MRIterator_GetChannelSize

done check if its not md->rootCommand = C_PROFILE
@meiravgri meiravgri changed the title [MOD-12320] initial imp [MOD-12320] Return shards profile on timeout Nov 16, 2025
@codecov
Copy link

codecov bot commented Nov 16, 2025

Codecov Report

❌ Patch coverage is 94.80519% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.15%. Comparing base (5e5f483) to head (f8845f0).
⚠️ Report is 33 commits behind head on master.

Files with missing lines Patch % Lines
src/coord/hybrid/dist_utils.c 92.00% 2 Missing ⚠️
src/aggregate/aggregate_exec.c 97.22% 1 Missing ⚠️
src/coord/dist_aggregate.c 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7345      +/-   ##
==========================================
- Coverage   85.17%   85.15%   -0.03%     
==========================================
  Files         346      346              
  Lines       53027    53106      +79     
  Branches    13734    13773      +39     
==========================================
+ Hits        45167    45220      +53     
- Misses       7666     7692      +26     
  Partials      194      194              
Flag Coverage Δ
flow 84.57% <94.80%> (-0.01%) ⬇️
unit 52.29% <1.29%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meiravgri meiravgri changed the title [MOD-12320] Return shards profile on timeout [MOD-12320] Ensure full profile output on timeout with RETURN policy Nov 16, 2025
@meiravgri meiravgri requested a review from GuyAv46 November 16, 2025 11:45
Copy link
Collaborator

@GuyAv46 GuyAv46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE

Cursors_Purge->Cursors_Free

free reply

netCursorCallback: move timeout check after the profile validation

add test_cursor_profile
use it->len for num shards
@meiravgri meiravgri requested a review from GuyAv46 November 17, 2025 10:41
GuyAv46
GuyAv46 previously approved these changes Nov 18, 2025
@meiravgri meiravgri requested a review from GuyAv46 November 18, 2025 08:40
@meiravgri meiravgri enabled auto-merge November 18, 2025 08:41
@meiravgri meiravgri added this pull request to the merge queue Nov 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2025
@meiravgri meiravgri added this pull request to the merge queue Nov 18, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 18, 2025
@meiravgri meiravgri added this pull request to the merge queue Nov 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 19, 2025
@meiravgri meiravgri added this pull request to the merge queue Nov 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 19, 2025
@meiravgri meiravgri added this pull request to the merge queue Nov 19, 2025
Merged via the queue into master with commit 38408d7 Nov 19, 2025
25 of 27 checks passed
@meiravgri meiravgri deleted the meiravg_return_profile_ontimeout branch November 19, 2025 20:41
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-7345-to-2.8 origin/2.8
cd .worktree/backport-7345-to-2.8
git switch --create backport-7345-to-2.8
git cherry-pick -x 38408d7d4623b4a2401874a18f6a044e670a0a87

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7345-to-2.10 origin/2.10
cd .worktree/backport-7345-to-2.10
git switch --create backport-7345-to-2.10
git cherry-pick -x 38408d7d4623b4a2401874a18f6a044e670a0a87

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-7345-to-8.2 origin/8.2
cd .worktree/backport-7345-to-8.2
git switch --create backport-7345-to-8.2
git cherry-pick -x 38408d7d4623b4a2401874a18f6a044e670a0a87

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 19, 2025
…7345)

* initial imp

req->has_timedout

FT.CURSOR PROFILE

while (getNextReply(rpnet)) to printAggProfile

expose getNextReply

set timeout in netCursor Callabck
send ft.cursor profile it timeout

* don't send C_DEL if profile
getNextReply read profile when cursor is depleted
use sendChunk_ReplyOnly_EmptyResults in cursor PROFILE

* sendChunk_ReplyOnly_EmptyResults: move createProfilePrinterCtx to a callback

netCursorCallback: fix set timeout from reply

expose MRIterator_GetChannelSize

done check if its not md->rootCommand = C_PROFILE

* cleanups

* address review:

Cursors_Purge->Cursors_Free

free reply

netCursorCallback: move timeout check after the profile validation

add test_cursor_profile

* fix

* fix test

* address review:

use it->len for num shards

* take spec ref

* remove return statement

* Update src/aggregate/aggregate_exec.c

Co-authored-by: GuyAv46 <[email protected]>

---------

Co-authored-by: GuyAv46 <[email protected]>
(cherry picked from commit 38408d7)
@redisearch-backport-pull-request
Copy link
Contributor

Successfully created backport PR for 8.4:

github-merge-queue bot pushed a commit that referenced this pull request Nov 20, 2025
…olicy (#7431)

* [MOD-12320] Ensure full profile output on timeout with RETURN policy (#7345)

* initial imp

req->has_timedout

FT.CURSOR PROFILE

while (getNextReply(rpnet)) to printAggProfile

expose getNextReply

set timeout in netCursor Callabck
send ft.cursor profile it timeout

* don't send C_DEL if profile
getNextReply read profile when cursor is depleted
use sendChunk_ReplyOnly_EmptyResults in cursor PROFILE

* sendChunk_ReplyOnly_EmptyResults: move createProfilePrinterCtx to a callback

netCursorCallback: fix set timeout from reply

expose MRIterator_GetChannelSize

done check if its not md->rootCommand = C_PROFILE

* cleanups

* address review:

Cursors_Purge->Cursors_Free

free reply

netCursorCallback: move timeout check after the profile validation

add test_cursor_profile

* fix

* fix test

* address review:

use it->len for num shards

* take spec ref

* remove return statement

* Update src/aggregate/aggregate_exec.c

Co-authored-by: GuyAv46 <[email protected]>

---------

Co-authored-by: GuyAv46 <[email protected]>
(cherry picked from commit 38408d7)

* fix error code

---------

Co-authored-by: meiravgri <[email protected]>
Co-authored-by: GuyAv46 <[email protected]>
Co-authored-by: meiravgri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants