Skip to content

[MOD-12416] Create basis for tracking Query Errors and track syntax and args errors#7386

Merged
lerman25 merged 11 commits intomasterfrom
omer-query-error-metrics-base
Nov 19, 2025
Merged

[MOD-12416] Create basis for tracking Query Errors and track syntax and args errors#7386
lerman25 merged 11 commits intomasterfrom
omer-query-error-metrics-base

Conversation

@lerman25
Copy link
Collaborator

@lerman25 lerman25 commented Nov 17, 2025

First PR from a series of PRs, aiming to extend the current info metric tracked in RediSearch.
This PR creates the basis for tracking specific query error, starting with QUERY_ERROR_CODE_ADD_ARGS and QUERY_ERROR_CODE_PARSE_ARGS.
It introduces QueryErrorsGlobalStats struct, which keep counters for each error we want to track, and
introduces QueryErrorsGlobalStats_UpdateError function, which receives QueryErrorCode, and update the relevant counter (if the code is tracked).
In cluster setting, some args error might be detected in the coord, while others in the shards, so this PR creates individual counter for coord and SA/shards. This separation might also help debug which specific error case was triggered.


Note

Adds global query error counters (syntax/args) split by shard vs coordinator, wires them into SEARCH/AGGREGATE/HYBRID error paths, and surfaces them in INFO MODULES with tests.

  • Stats/Infra:
    • Introduce QueryErrorsGlobalStats and QueryErrorsGlobalStats_UpdateError(...) to count query errors by type (syntax, parse_args) and origin (shard vs coordinator).
    • Extend QueriesGlobalStats/TotalGlobalStats_GetQueryStats() to include these counters.
  • Error accounting hooks:
    • Update error paths in aggregate_exec.c, dist_aggregate.c, dist_hybrid.c, hybrid_exec.c, and module.c to increment the appropriate shard/coordinator counters before replying.
  • INFO exposure:
    • Add new INFO MODULES fields under warnings_and_errors and coordinator_warnings_and_errors:
      • shard_total_query_errors_syntax, shard_total_query_errors_arguments
      • coord_total_query_errors_syntax, coord_total_query_errors_arguments
  • Tests:
    • Add comprehensive SA/cluster tests verifying counters for FT.SEARCH, FT.AGGREGATE, and FT.HYBRID across syntax/args errors and that successful queries don’t change metrics.

Written by Cursor Bugbot for commit 7f998a9. This will update automatically on new commits. Configure here.

@lerman25 lerman25 marked this pull request as ready for review November 17, 2025 09:40
@lerman25 lerman25 requested a review from meiravgri November 17, 2025 09:41
@codecov
Copy link

codecov bot commented Nov 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.12%. Comparing base (a28144d) to head (7f998a9).
⚠️ Report is 14 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7386      +/-   ##
==========================================
- Coverage   85.22%   85.12%   -0.10%     
==========================================
  Files         346      346              
  Lines       52978    53073      +95     
  Branches    13731    13773      +42     
==========================================
+ Hits        45150    45180      +30     
- Misses       7634     7699      +65     
  Partials      194      194              
Flag Coverage Δ
flow 84.54% <100.00%> (-0.07%) ⬇️
unit 52.32% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Comment on lines +796 to +800
for shardId in range(1, self.env.shardsCount + 1):
shard_conn = self.env.getConnection(shardId)
info_dict = info_modules_to_dict(shard_conn)
args_error_count = info_dict[WARN_ERR_SECTION][ARGS_ERROR_SHARD_METRIC]
self.env.assertEqual(args_error_count, '0', message=f"Shard {shardId} has wrong initial args error count")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider moving both ARGS_ERROR_SHARD_METRIC and SYNTAX_ERROR_SHARD_METRIC initialization to the init function under the same loop

Copy link
Collaborator Author

@lerman25 lerman25 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check might be redundant, but if we keep it, I think this is the correct place for this check.
We want to make sure that the other tests in the class didn't add a error to the count, by mistake (Like counting an arg error in the metric because of a syntax error).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to align the SA tests and check it was increased by 1? also, If you want to test that the args error doesn't increase syntax error (which i agree we should), it should be part of this test, and not the other way around

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added _verify_metrics_not_changed, that checks all field are not changed

coord_syntax_error_count = info_dict[COORD_WARN_ERR_SECTION][SYNTAX_ERROR_COORD_METRIC]
self.env.assertEqual(coord_syntax_error_count, '1')

def test_args_errors_cluster(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should cover here also params error coming from the coordinator, for example invalid limit, for all query types (i think that the fact that FT.SEARCH with invalid limit arg is not a coord error is a bug. you can use a negative number for testing coordintor error)

(cherry picked from commit 4c31dc3)
@lerman25 lerman25 force-pushed the omer-query-error-metrics-base branch from 4c31dc3 to ac85838 Compare November 18, 2025 09:58
@lerman25 lerman25 force-pushed the omer-query-error-metrics-base branch from 61269c0 to a5584f4 Compare November 18, 2025 10:46
self.env.assertEqual(coord_args_error_count, '0')

#### Commented out due to a bug (MOD-12465), when fixed, please enable
#Note that all the next coord_args_error_count checks should be updated as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#Note that all the next coord_args_error_count checks should be updated as well
# Note that all the next coord_args_error_count checks should be updated as well

Comment on lines 818 to 819
if False:
# Test args errors that are counted in the coord
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead of comment out compare to the current result so once it is fixed it will fail

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, arnt there other param errors we can test?

Copy link
Collaborator Author

@lerman25 lerman25 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, arnt there other param errors we can test?

Used missing DIALECT param, invokes args error on coord

self.env.assertEqual(coord_args_error_count, '1')

# Test args errors in aggregate
# All args errors in FT.AGGREGATE are counted on the coordinator
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you sure?

QueryError_SetError(status, QUERY_ERROR_CODE_PARSE_ARGS, "Count accepts 0 values only");

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried reaching this line with ft.aggregate idx hello_world groupby 1 @text reduce count 1 as count
but it reaches an error before that (in the coord)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is really not that clear in regards what is only caught by the shard,
so I changed the comment to
# All args errors in FT.AGGREGATE should be (de facto) counted on the coordinator
If we can find one that can, I will add it

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Malformed RESP3 Map Protocol Replies

In sendChunk_ReplyOnly_HybridEmptyResults, the warnings section uses RedisModule_Reply_SimpleString(reply, "warnings") instead of RedisModule_ReplyKV_Array(reply, "warnings"). This breaks RESP3 map protocol because warnings is being treated as a simple string value instead of a key-value pair. The subsequent RedisModule_Reply_Array and RedisModule_Reply_SimpleString calls will be malformed within the map structure.

src/hybrid/hybrid_exec.c#L499-L510

HybridRequest_Execute(hreq, ctx, sctx);
} else if (HybridRequest_StartCursors(hybrid_ref, ctx, status, depleteInBackground) != REDISMODULE_OK) {
return REDISMODULE_ERR;
}
freeHybridParams(hybridParams);
return REDISMODULE_OK;
}
// Background execution functions implementation
static blockedClientHybridCtx *blockedClientHybridCtx_New(StrongRef hybrid_ref,
HybridPipelineParams *hybridParams,

Fix in Cursor Fix in Web


@lerman25 lerman25 marked this pull request as draft November 18, 2025 14:49
@lerman25 lerman25 marked this pull request as ready for review November 18, 2025 14:49
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Broken RESP3 Map Structure

The sendChunk_ReplyOnly_HybridEmptyResults function incorrectly uses RedisModule_Reply_SimpleString(reply, "warnings") instead of RedisModule_ReplyKV_Array(reply, "warning") when building the RESP3 map response. This breaks the map structure and doesn't properly create a key-value pair, causing the warnings field to be incorrectly formatted in the response. The implementation should match the corresponding code in sendChunk_hybrid which correctly uses RedisModule_ReplyKV_Array.

src/hybrid/hybrid_exec.c#L461-L463

}
}
replyWithCursors(replyCtx, cursors);

Fix in Cursor Fix in Web


@lerman25 lerman25 added this pull request to the merge queue Nov 19, 2025
Merged via the queue into master with commit bf41b7f Nov 19, 2025
25 of 27 checks passed
@lerman25 lerman25 deleted the omer-query-error-metrics-base branch November 19, 2025 13:39
@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.8, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.8
git worktree add -d .worktree/backport-7386-to-2.8 origin/2.8
cd .worktree/backport-7386-to-2.8
git switch --create backport-7386-to-2.8
git cherry-pick -x bf41b7f0ae6d6a9f6625fd9435a6fb23816fef3b

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7386-to-2.10 origin/2.10
cd .worktree/backport-7386-to-2.10
git switch --create backport-7386-to-2.10
git cherry-pick -x bf41b7f0ae6d6a9f6625fd9435a6fb23816fef3b

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 8.2, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 8.2
git worktree add -d .worktree/backport-7386-to-8.2 origin/8.2
cd .worktree/backport-7386-to-8.2
git switch --create backport-7386-to-8.2
git cherry-pick -x bf41b7f0ae6d6a9f6625fd9435a6fb23816fef3b

redisearch-backport-pull-request bot pushed a commit that referenced this pull request Nov 19, 2025
…nd args errors (#7386)

* Query error metric with coord base

* remove coord - not used yet

* Count syntax and arg errors

* Revert "remove coord - not used yet"

This reverts commit 8cff9f0.

* Seperate into shard error and coord error

* tests

* Treat SA as shard

* Meirav fixups

(cherry picked from commit 4c31dc3)

* Add future test

* fix hybrid SA test

* Refractor previous dict check

(cherry picked from commit bf41b7f)
@redisearch-backport-pull-request
Copy link
Contributor

github-merge-queue bot pushed a commit that referenced this pull request Nov 25, 2025
…ntax and args errors (#7422)

* [MOD-12416] Create basis for tracking Query Errors and track syntax and args errors (#7386)

* Query error metric with coord base

* remove coord - not used yet

* Count syntax and arg errors

* Revert "remove coord - not used yet"

This reverts commit 8cff9f0.

* Seperate into shard error and coord error

* tests

* Treat SA as shard

* Meirav fixups

(cherry picked from commit 4c31dc3)

* Add future test

* fix hybrid SA test

* Refractor previous dict check

(cherry picked from commit bf41b7f)

* Replace to c enum

---------

Co-authored-by: lerman25 <[email protected]>
Co-authored-by: lerman25 <[email protected]>
lerman25 added a commit that referenced this pull request Nov 30, 2025
…and args errors ([#7422](#7422))

* [MOD-12416] Create basis for tracking Query Errors and track syntax and args errors ([#7386](#7386))

* Query error metric with coord base

* remove coord - not used yet

* Count syntax and arg errors

* Revert "remove coord - not used yet"

This reverts commit 8cff9f0.

* Seperate into shard error and coord error

* tests

* Treat SA as shard

* Meirav fixups

(cherry picked from commit 4c31dc3)

* Add future test

* fix hybrid SA test

* Refractor previous dict check

(cherry picked from commit bf41b7f)

* Replace to c enum

---------

Co-authored-by: lerman25 <[email protected]>
Co-authored-by: lerman25 <[email protected]>
(cherry picked from commit 958b361)
lerman25 added a commit that referenced this pull request Nov 30, 2025
…and args errors ([[#7422](https://github.com/RediSearch/RediSearch/issues/7422)](https://github.com/RediSearch/RediSearch/issues/7422))

* [MOD-12416] Create basis for tracking Query Errors and track syntax and args errors ([[#7386](https://github.com/RediSearch/RediSearch/issues/7386)](https://github.com/RediSearch/RediSearch/issues/7386))

* Query error metric with coord base

* remove coord - not used yet

* Count syntax and arg errors

* Revert "remove coord - not used yet"

This reverts commit 8cff9f0.

* Seperate into shard error and coord error

* tests

* Treat SA as shard

* Meirav fixups

(cherry picked from commit 4c31dc3)

* Add future test

* fix hybrid SA test

* Refractor previous dict check

(cherry picked from commit bf41b7f)

* Replace to c enum

---------

Co-authored-by: lerman25 <[email protected]>
Co-authored-by: lerman25 <[email protected]>
(cherry picked from commit 958b361)
github-merge-queue bot pushed a commit that referenced this pull request Nov 30, 2025
…ntax … (#7577)

[MOD-12416] Create basis for tracking Query Errors and track syntax and args errors ([[#7422](https://github.com/RediSearch/RediSearch/issues/7422)](https://github.com/RediSearch/RediSearch/issues/7422))

* [MOD-12416] Create basis for tracking Query Errors and track syntax and args errors ([[#7386](https://github.com/RediSearch/RediSearch/issues/7386)](https://github.com/RediSearch/RediSearch/issues/7386))

* Query error metric with coord base

* remove coord - not used yet

* Count syntax and arg errors

* Revert "remove coord - not used yet"

This reverts commit 8cff9f0.

* Seperate into shard error and coord error

* tests

* Treat SA as shard

* Meirav fixups

(cherry picked from commit 4c31dc3)

* Add future test

* fix hybrid SA test

* Refractor previous dict check

(cherry picked from commit bf41b7f)

* Replace to c enum

---------



(cherry picked from commit 958b361)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants