Skip to content

[8.2] [MOD-11155] Create basic query pause debug command#7579

Merged
meiravgri merged 4 commits into8.2from
backport-6750-to-8.2
Dec 1, 2025
Merged

[8.2] [MOD-11155] Create basic query pause debug command#7579
meiravgri merged 4 commits into8.2from
backport-6750-to-8.2

Conversation

@meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Nov 30, 2025

backport #6750 to 8.2


Note

Adds PAUSE_* debug args to insert a pause RP before/after specific processors with cluster-aware behavior, plus FT.DEBUG QUERY_CONTROLLER to resume/check/inspect, with validations and comprehensive tests.

  • Debug query controls:
    • Add PAUSE_AFTER_RP_N <RP_TYPE> <N> [INTERNAL_ONLY] and PAUSE_BEFORE_RP_N <RP_TYPE> <N> [INTERNAL_ONLY] in _FT.DEBUG for FT.SEARCH/FT.AGGREGATE.
    • Implement RP_PAUSE (RPPauseAfterCount) and PipelineAddPauseRPcount(...) to inject a pause RP before/after the first matching RP_TYPE (validates type/existence, disallows last-RP after-insert).
    • Introduce QueryDebugCtx (single paused query), with APIs to set/get pause state and the active debug RP.
    • Extend parsing/validation: require workers (or coordinator), enforce INTERNAL_ONLY semantics for FT.AGGREGATE, forbid CRASH with INTERNAL_ONLY, and error on standalone INTERNAL_ONLY.
    • Add StringToRPType(...) and expand ResultProcessorType with RP_PAUSE.
  • New debug commands:
    • FT.DEBUG QUERY_CONTROLLER subcommands: SET_PAUSE_RP_RESUME, GET_IS_RP_PAUSED, PRINT_RP_STREAM to control and inspect paused queries.
  • Timeout/crash handling tweaks:
    • Keep TIMEOUT_AFTER_N logic; force coordinator timeout for N==0 with disabled timeouts; validate argument combinations.
  • Tests and helpers:
    • New helpers (e.g., parseDebugQueryCommandArgs, pause/resume utilities) and extensive SA/cluster tests for pause insertion, controller ops, RP stream printing, and error cases.

Written by Cursor Bugbot for commit b5e1720. This will update automatically on new commits. Configure here.

* feat: add unified memory consumption checker for Redis

* feat: implement unified memory consumption checker for Redis

* refactor: move MIN_NOT_0 macro definition to redis_mem_info.c

* fix: include minmax.h for MIN_NOT_0 macro definition

* feat: add debug logging for memory usage check in ForkGC

* feat: add QueryDebugCtx struct for debugging queries

* fix: correct memory usage check and improve memory ratio calculation

* fix: update debug command structures for improved clarity and functionality

* reorder result_processor.h s.t Debug processors are at the end

* reorder result_processor.c s.t Debug processors are at the end

* fix: reorder ResultProcessorType enum values for clarity

* fix: categorize debug-only result processors for better readability

* add basic RP controller debug  command

* restructer parseAndCompile

* basic query pause

* remove condition as it might create a race

* add RP before type function

* remove problematic checks

* Add basic pytests

* Support insert debug rp before/after specific rp type

* test insert debug rp and print rp stream

* Add PRINT_RP_STREAM command to FT.DEBUG QUERY_CONTROLLER

* Skip cluster test

* fix error message

* add active queries check

* Don't forget the @

* cover invalid rp type

* Remove redundant function declaration and add comment

* fix TBD comment

* change to return found

* return error if RP is missing from stream, change to 1 return statement

* add new options documentation to aggregate_debug.h

* remove doube whitespace

* space before var name

* remove PAUSE_AFTER_N

* change order of args in pause before/after

* Test cluster

* cluster test

* confrim query in bg

* Cover error when ran without workers

* Guard against insert after Index RP

* fixing edge case

* fix leak

* Apply suggestions from code review

Co-authored-by: meiravgri <[email protected]>

* Add error handling for resume command when no query is paused

* Validate only 1 query is able to run at each time

* comment

* Add assertion in RPPauseAfterCount_Free to validate debugRP state before clearing

* Enhance debug documentation for PAUSE_AFTER_RP_N and PAUSE_BEFORE_RP_N commands to clarify RP type restrictions and timeout behavior.

* Remove unused field

* add rp str to error msg

* remvoe "IN" from error msg

* fix test

* Fix memory management in RPPauseAfterCount_Free by ensuring rm_free is called consistently.

* Global debug ctx API

* change expected doc number

* Check active queries on all shards

* Refactor test_query_controller_pause_and_resume to streamline index creation

* Change mechanism in ft.aggregate in cluster

* Test coord pause piepline

* Enhance error handling for INTERNAL_ONLY usage in debug commands

* Round 1

* Remove air conditionar

* remove revach

* cover invalid count

* Move debug RPs to end of file, add debug pause RP

(cherry picked from commit 583fee9)

* Adjust to new context

(cherry picked from commit ed69f6b)

* remove duplicate

* Re-add debug

* remove not thread safe check

---------

Co-authored-by: meiravgri <[email protected]>
(cherry picked from commit 9d344ab)
@meiravgri meiravgri changed the title Backport-6750-to-8.2 [8.2] [MOD-11155] Create basic query pause debug command Nov 30, 2025
@meiravgri meiravgri requested a review from lerman25 November 30, 2025 13:55
@meiravgri meiravgri enabled auto-merge November 30, 2025 14:10
@codecov
Copy link

codecov bot commented Nov 30, 2025

Codecov Report

❌ Patch coverage is 95.34884% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.48%. Comparing base (5a41e64) to head (b5e1720).
⚠️ Report is 7 commits behind head on 8.2.

Files with missing lines Patch % Lines
src/aggregate/aggregate_debug.c 84.84% 5 Missing ⚠️
src/debug_commands.c 92.85% 4 Missing ⚠️
src/result_processor.c 99.20% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              8.2    #7579      +/-   ##
==========================================
- Coverage   89.48%   89.48%   -0.01%     
==========================================
  Files         253      253              
  Lines       41009    41197     +188     
  Branches     3725     3725              
==========================================
+ Hits        36697    36864     +167     
- Misses       4263     4284      +21     
  Partials       49       49              
Flag Coverage Δ
flow 82.15% <95.34%> (-0.51%) ⬇️
unit 47.45% <0.00%> (-0.23%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lerman25
lerman25 previously approved these changes Dec 1, 2025
@meiravgri meiravgri added this pull request to the merge queue Dec 1, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Dec 1, 2025
@meiravgri meiravgri added this pull request to the merge queue Dec 1, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 1, 2025
* [MOD-11155] Create basic query pause debug command (#6750)

* feat: add unified memory consumption checker for Redis

* feat: implement unified memory consumption checker for Redis

* refactor: move MIN_NOT_0 macro definition to redis_mem_info.c

* fix: include minmax.h for MIN_NOT_0 macro definition

* feat: add debug logging for memory usage check in ForkGC

* feat: add QueryDebugCtx struct for debugging queries

* fix: correct memory usage check and improve memory ratio calculation

* fix: update debug command structures for improved clarity and functionality

* reorder result_processor.h s.t Debug processors are at the end

* reorder result_processor.c s.t Debug processors are at the end

* fix: reorder ResultProcessorType enum values for clarity

* fix: categorize debug-only result processors for better readability

* add basic RP controller debug  command

* restructer parseAndCompile

* basic query pause

* remove condition as it might create a race

* add RP before type function

* remove problematic checks

* Add basic pytests

* Support insert debug rp before/after specific rp type

* test insert debug rp and print rp stream

* Add PRINT_RP_STREAM command to FT.DEBUG QUERY_CONTROLLER

* Skip cluster test

* fix error message

* add active queries check

* Don't forget the @

* cover invalid rp type

* Remove redundant function declaration and add comment

* fix TBD comment

* change to return found

* return error if RP is missing from stream, change to 1 return statement

* add new options documentation to aggregate_debug.h

* remove doube whitespace

* space before var name

* remove PAUSE_AFTER_N

* change order of args in pause before/after

* Test cluster

* cluster test

* confrim query in bg

* Cover error when ran without workers

* Guard against insert after Index RP

* fixing edge case

* fix leak

* Apply suggestions from code review

Co-authored-by: meiravgri <[email protected]>

* Add error handling for resume command when no query is paused

* Validate only 1 query is able to run at each time

* comment

* Add assertion in RPPauseAfterCount_Free to validate debugRP state before clearing

* Enhance debug documentation for PAUSE_AFTER_RP_N and PAUSE_BEFORE_RP_N commands to clarify RP type restrictions and timeout behavior.

* Remove unused field

* add rp str to error msg

* remvoe "IN" from error msg

* fix test

* Fix memory management in RPPauseAfterCount_Free by ensuring rm_free is called consistently.

* Global debug ctx API

* change expected doc number

* Check active queries on all shards

* Refactor test_query_controller_pause_and_resume to streamline index creation

* Change mechanism in ft.aggregate in cluster

* Test coord pause piepline

* Enhance error handling for INTERNAL_ONLY usage in debug commands

* Round 1

* Remove air conditionar

* remove revach

* cover invalid count

* Move debug RPs to end of file, add debug pause RP

(cherry picked from commit 583fee9)

* Adjust to new context

(cherry picked from commit ed69f6b)

* remove duplicate

* Re-add debug

* remove not thread safe check

---------

Co-authored-by: meiravgri <[email protected]>
(cherry picked from commit 9d344ab)

* fix compilation

* fix
@meiravgri meiravgri removed this pull request from the merge queue due to a manual request Dec 1, 2025
@meiravgri meiravgri enabled auto-merge December 1, 2025 08:17
if (internal_only) {
QueryError_SetError(status, QUERY_EPARSEARGS,
"INTERNAL_ONLY must be used with TIMEOUT_AFTER_N");
return REDISMODULE_OK;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Pause arguments silently ignored when timeout is specified

When TIMEOUT_AFTER_N and PAUSE_BEFORE_RP_N/PAUSE_AFTER_RP_N are both specified, the early return REDISMODULE_OK at line 133 causes the pause arguments to be silently ignored. The function returns after processing timeout, never reaching the pause handling code at line 138. Users specifying both arguments would expect either both features to work together or an error indicating they're incompatible, not one argument being silently discarded.

Additional Locations (1)

Fix in Cursor Fix in Web

@meiravgri meiravgri added this pull request to the merge queue Dec 1, 2025
Merged via the queue into 8.2 with commit e2b405d Dec 1, 2025
23 checks passed
@meiravgri meiravgri deleted the backport-6750-to-8.2 branch December 1, 2025 10:13
@meiravgri
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7579-to-2.10 origin/2.10
cd .worktree/backport-7579-to-2.10
git switch --create backport-7579-to-2.10
git cherry-pick -x e2b405da83c679a356fffde3b506b0ba6b44cbf1

meiravgri added a commit that referenced this pull request Dec 4, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 8, 2025
* [MOD-12694] [MOD-12069] Add active_worker_threads metric (#7538)

WIP: needs #7579 backport

* wrapp reading from workers only if mt_build

fix test

* fix test

* review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants