Skip to content

[MOD-12640] Handle warnings in empty FT.AGGREGATE replies (cluster)#7855

Merged
meiravgri merged 13 commits intomasterfrom
meiravg_handle_warnings_on_empty_reply_agg_cluster
Dec 23, 2025
Merged

[MOD-12640] Handle warnings in empty FT.AGGREGATE replies (cluster)#7855
meiravgri merged 13 commits intomasterfrom
meiravg_handle_warnings_on_empty_reply_agg_cluster

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Dec 21, 2025

Fix coordinator ignoring warnings from FT.AGGREGATE when shards return empty results in cluster mode (RESP3).

Problem

When shards returned empty results with warnings (e.g., timeout, max prefix expansions, indexing OOM), the coordinator discarded the reply without processing warnings:

// Before the fix - in getNextReply()
if (MRReply_Length(rows) <= empty_rows_len) {
    RedisModule_Log(RSDummyContext, "verbose", "An empty reply was received from a shard");
    MRReply_Free(root);
    root = NULL;
    rows = NULL;
    meta = NULL;  // Warnings in meta were never extracted!
}

The reply was freed and nullified without checking for warnings in the metadata.

processWarningsAndCleanup() Function

New helper function that:

  1. Extracts warnings from RESP3 metadata (nc->current.meta)
  2. Sets appropriate error flags for each warning type:
    • QUERY_ERROR_CODE_TIMED_OUT → returns RS_RESULT_TIMEDOUT
    • QUERY_WMAXPREFIXEXPANSIONS → sets max prefix expansion warning
    • QUERY_WOOM_SHARD → sets OOM warning
    • QUERY_WINDEXING_FAILURE → sets bgScanOOM flag
  3. Frees the reply and resets state

Solution

Call processWarningsAndCleanup() for empty replies (not just when all rows are consumed):

// After the fix
if (MRReply_Length(rows) <= empty_rows_len) {
    RedisModule_Log(RSDummyContext, "verbose", "An empty reply was received from a shard");
    int ret = processWarningsAndCleanup(nc, nc->cmd.protocol == 3);
    if (ret == RS_RESULT_TIMEDOUT) {
      return RS_RESULT_TIMEDOUT;
    }
}

Note

Fixes warning propagation for empty replies in cluster (RESP3)

  • Introduces processWarningsAndCleanup() in rpnet.c and calls it when shard replies are empty and when a batch is fully consumed; extracts RESP3 warning metadata (timeout → RS_RESULT_TIMEDOUT, max-prefix, query OOM, indexing OOM), then frees reply and resets state.
  • Ensures nc->current.{root,rows,meta} are set before empty-reply handling.
  • In aggregate_debug.c, only force a coordinator timeout for TIMEOUT_AFTER_N 0 INTERNAL_ONLY under RESP2; RESP3 now relies on propagated warnings. Header docs updated to clarify RESP3 vs RESP2 behavior.
  • Tests: adds test_empty_reply_warnings.py; updates test_info_modules.py and test_query_oom.py to assert warnings/metrics for FT.AGGREGATE and PROFILE under RESP3 (and RESP2 expectations where applicable).

Written by Cursor Bugbot for commit 0bfa9e8. This will update automatically on new commits. Configure here.

use at the end of the reply
or when empty reply is recieved
@meiravgri meiravgri changed the title introduce setWarnings [MOD-12640] introduce setWarnings Dec 21, 2025
@codecov
Copy link

codecov bot commented Dec 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.06%. Comparing base (e6d5b86) to head (0bfa9e8).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7855      +/-   ##
==========================================
- Coverage   84.16%   84.06%   -0.10%     
==========================================
  Files         360      360              
  Lines       56242    56295      +53     
  Branches    15686    15717      +31     
==========================================
- Hits        47335    47326       -9     
- Misses       8690     8753      +63     
+ Partials      217      216       -1     
Flag Coverage Δ
flow 85.03% <100.00%> (+0.04%) ⬆️
unit 52.08% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

testEmptyReplyTimeoutWarningAggregate get stuck if we remove enforcing coord timeout

testEmptyReplyTimeoutWarningProfileAggregate
we dont get warning
@meiravgri meiravgri changed the title [MOD-12640] introduce setWarnings [MOD-12640] Handle warnings in empty FT.AGGREGATE replies (cluster) Dec 22, 2025
@meiravgri meiravgri requested a review from lerman25 December 23, 2025 13:46
@meiravgri meiravgri enabled auto-merge December 23, 2025 15:47
@meiravgri meiravgri added this pull request to the merge queue Dec 23, 2025
Merged via the queue into master with commit 701f719 Dec 23, 2025
32 checks passed
@meiravgri meiravgri deleted the meiravg_handle_warnings_on_empty_reply_agg_cluster branch December 23, 2025 17:53
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-7855-to-8.2 origin/8.2
cd .worktree/backport-7855-to-8.2
git switch --create backport-7855-to-8.2
git cherry-pick -x 701f7190c77d1854e7c23d2b9d6b7b308ef4ce0b

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.4, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.4
git worktree add -d .worktree/backport-7855-to-8.4 origin/8.4
cd .worktree/backport-7855-to-8.4
git switch --create backport-7855-to-8.4
git cherry-pick -x 701f7190c77d1854e7c23d2b9d6b7b308ef4ce0b

meiravgri added a commit that referenced this pull request Dec 23, 2025
…7855)

* introduce setWarnings

use at the end of the reply
or when empty reply is recieved

* fixes

* cleanups

fix test

* fix force timeout in debug

* add tests:
testEmptyReplyTimeoutWarningAggregate get stuck if we remove enforcing coord timeout

testEmptyReplyTimeoutWarningProfileAggregate
we dont get warning

* more tests

* move to file

* fix type

* remove comment

* rename test file

* add testEmptyReplyQueryOomWarning
address todos

* trmobr skip duplication

* check waning exact

(cherry picked from commit 701f719)
meiravgri added a commit that referenced this pull request Dec 24, 2025
…7855)

* introduce setWarnings

use at the end of the reply
or when empty reply is recieved

* fixes

* cleanups

fix test

* fix force timeout in debug

* add tests:
testEmptyReplyTimeoutWarningAggregate get stuck if we remove enforcing coord timeout

testEmptyReplyTimeoutWarningProfileAggregate
we dont get warning

* more tests

* move to file

* fix type

* remove comment

* rename test file

* add testEmptyReplyQueryOomWarning
address todos

* trmobr skip duplication

* check waning exact

(cherry picked from commit 701f719)
github-merge-queue bot pushed a commit that referenced this pull request Dec 24, 2025
…ter) (#7872)

[MOD-12640] Handle warnings in empty FT.AGGREGATE replies (cluster) (#7855)

* introduce setWarnings

use at the end of the reply
or when empty reply is recieved

* fixes

* cleanups

fix test

* fix force timeout in debug

* add tests:
testEmptyReplyTimeoutWarningAggregate get stuck if we remove enforcing coord timeout

testEmptyReplyTimeoutWarningProfileAggregate
we dont get warning

* more tests

* move to file

* fix type

* remove comment

* rename test file

* add testEmptyReplyQueryOomWarning
address todos

* trmobr skip duplication

* check waning exact

(cherry picked from commit 701f719)
github-merge-queue bot pushed a commit that referenced this pull request Dec 24, 2025
…ter) (#7855) (#7873)

* [MOD-12640] Handle warnings in empty FT.AGGREGATE replies (cluster) (#7855)

* introduce setWarnings

use at the end of the reply
or when empty reply is recieved

* fixes

* cleanups

fix test

* fix force timeout in debug

* add tests:
testEmptyReplyTimeoutWarningAggregate get stuck if we remove enforcing coord timeout

testEmptyReplyTimeoutWarningProfileAggregate
we dont get warning

* more tests

* move to file

* fix type

* remove comment

* rename test file

* add testEmptyReplyQueryOomWarning
address todos

* trmobr skip duplication

* check waning exact

(cherry picked from commit 701f719)

* fix if

* reomve query oom test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants