Skip to content

[2.8] CP TIMEOUT related bugs & Query Debug Mechanism#5788

Merged
meiravgri merged 4 commits into2.8from
backport-5734-to-2.8
Mar 20, 2025
Merged

[2.8] CP TIMEOUT related bugs & Query Debug Mechanism#5788
meiravgri merged 4 commits into2.8from
backport-5734-to-2.8

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Mar 18, 2025

Cherry-pick PR #5734 (2.10 → 2.8)

fixing:
MOD-8483, MOD-8482, MOD-8515, MOD-8606, MOD-8794

debug query mechanism mod:
MOD-8605

Background

This PR cherry-picks PR #5734 to 2.8, which previously cherry-picked PR #5554 to 2.10, since 2.10 is structurally closer to 2.8 than master.

However, 2.8 does not support FT.DEBUG in coordinator mode, so the query debug mechanism is only available in standalone mode. This required removing certain coordinator-related code that was included in the original PR.


Key Adjustments for 2.8

Discarded Code

  • Coordinator changes: Removed all modifications to coord/src/module.c.
  • Debug commands:
    • Support only FT.SEARCH / FT.AGGREGATE commands.
    • Removed support for _FT.SEARCH / _FT.AGGREGATE commands (used for coordinator-shard communication).
  • Distributed aggregation: Removed split distributed command in dist_aggregate.c.
  • Internal-only options:
    • Removed COORDINATOR_FORCED_TIMEOUT.
    • Removed isClusterCoord.

Test Adjustments

Test Name Change
testTimeoutPartialWithEmptyResults Removed: Required simulating a scenario where the coordinator reaches getNextReply before the timeout elapsed, while the shard returns empty results—this is not possible to control without the debug mechanism, which 2.8 lacks.
test_resp3: test_error_with_partial_results In cluster, timeout verification only if document count < expected. In SA, stable debug mechanism is used.
test_timeout Skipped on cluster: Only checks a bug in the sorter, fine to test in SA.
test_debug_commands:TestQueryDebugCommands Skipped on cluster, added messages, Removed _MT tests as WORKER_THREADS can't be configured after env setup in 2.8, making them messy, and MT isn't critical for this scenario.
verifyResultsResp3 Moved from est_debug_commands:TestQueryDebugCommands to common tests.

- discard all coord/srd/module.c changed
- debug_command support only FT. commands, and not _FT. commands
- discard internal_only option, including COORDINATOR_FORCED_TIMEOUT
isClusterCoord
- dist aggregate: discard split distrubeted commd
tests:
- removed testTimeoutPartialWithEmptyResults as it should test the bug with the debug command in cluster mode, but this command is not supported

test_resp3: test_error_with_partial_results give a flaky version for cluster and a stable one for SA

WIP: test_timeout, test_cursors: ,odify to clster as well
log it didn't find RPINDEX
@codecov
Copy link

codecov bot commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 98.37838% with 3 lines in your changes missing coverage. Please review.

Project coverage is 86.09%. Comparing base (eb7efa6) to head (f7903d4).
Report is 22 commits behind head on 2.8.

Files with missing lines Patch % Lines
src/aggregate/aggregate_debug.c 97.77% 1 Missing ⚠️
src/aggregate/aggregate_exec.c 98.24% 1 Missing ⚠️
src/result_processor.c 98.03% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              2.8    #5788      +/-   ##
==========================================
+ Coverage   86.06%   86.09%   +0.03%     
==========================================
  Files         200      201       +1     
  Lines       34386    34532     +146     
==========================================
+ Hits        29593    29731     +138     
- Misses       4793     4801       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

test timeout - skip on cluster
test_resp3:test_error_with_partial_results handle cluster

test_cursors:testTimeoutPartialWithEmptyResults : NOT MERGED

test_debug_commands: add messages

move verifyResultsResp3 to common
@meiravgri
Copy link
Collaborator Author

here as a reminder to CP the follow up PR:
#5784 (comment)

@meiravgri meiravgri changed the title not merged: debug mechansim in cluster mode. i.e: [2.8] CP TIMEOUT related bugs & Query Debug Mechanism Mar 19, 2025
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewd compared to the backport to 2.10 and the expected diffs described in the PR description. All looks as expected

@meiravgri meiravgri added this pull request to the merge queue Mar 20, 2025
Merged via the queue into 2.8 with commit e3553ef Mar 20, 2025
24 of 37 checks passed
@meiravgri meiravgri deleted the backport-5734-to-2.8 branch March 20, 2025 10:49
@meiravgri
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.6, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.6
git worktree add -d .worktree/backport-5788-to-2.6 origin/2.6
cd .worktree/backport-5788-to-2.6
git switch --create backport-5788-to-2.6
git cherry-pick -x e3553ef7ac4889c2684e6b0a893bc5252f7e9e0b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants