Skip to content

[8.2] [MOD-12419] Track Timeout errors and warnings in info#7614

Merged
lerman25 merged 4 commits into8.2from
backport-7414-to-8.2
Dec 3, 2025
Merged

[8.2] [MOD-12419] Track Timeout errors and warnings in info#7614
lerman25 merged 4 commits into8.2from
backport-7414-to-8.2

Conversation

@lerman25
Copy link
Collaborator

@lerman25 lerman25 commented Dec 2, 2025

backport #7414 to 8.2


Note

Adds tracking and exposure of timeout errors/warnings for queries (shard and coordinator) and updates error handling to use explicit codes.

  • Global Stats/Errors:
    • Add timeout counters to QueryErrorsGlobalStats and QueryWarningGlobalStats, with helpers to update/read them.
    • Treat standalone as coordinator for error/warning attribution.
    • Increment timeout error/warning stats across SEARCH/AGGREGATE execution paths (resp2/resp3, cursors, coordinator reducers).
    • Parse error/warning codes from messages (QueryError_GetCodeFromMessage, QueryWarningCode_*).
  • Coordinator/Execution:
    • Update distributed aggregate/search to decide error returns by code and to record timeout errors; record warnings from shard replies.
    • On RESP3 search results, iterate warning array and track each warning.
  • INFO Output:
    • Expose new metrics: shard_total_query_errors_timeout, coord_total_query_errors_timeout, shard_total_query_warnings_timeout, coord_total_query_warnings_timeout.
  • Misc:
    • Adjust error attribution in execCommandCommon based on shard count.
    • Tests extended for standalone/cluster to validate timeout error/warning metrics; add helpers to change timeout policy.

Written by Cursor Bugbot for commit d64876c. This will update automatically on new commits. Configure here.

* [MOD-12419] Track Timeout errors and warnings in info ([#7414](#7414))

* Track timeout in sendchunk resp2

* Track timeout warning in sendSearchResults

* Track timeout error in searchResultReducer

(cherry picked from commit fafe1dc)

* Track timeout in sendChunk_hybrid

(cherry picked from commit 8529cb9)

* test timeout metrics

(cherry picked from commit c56a0d9)

* fix isCoord check

* Add query warning code and add function and fields needed to track

(cherry picked from commit a414641)

* Track timeout in sendchunk resp3

(cherry picked from commit 6853bb9)

* readd skip

* Update syntax and args error to new SA as cluster

* format and enrico comment

* Change test to N=0 with Internal only (not working so revert afterwards)

* Revert "Change test to N=0 with Internal only (not working so revert afterwards)"

This reverts commit 829ac53.

* meirav comments

* Stablize tests

* Add resp3 test

* _disable_ hybrid sa timeout

* Make test robust

* fixup! Make test robust

* remove limits

(cherry picked from commit 6eee260)

* Align to 8.4

(cherry picked from commit ac92116)
@github-actions github-actions bot added the size:M label Dec 2, 2025
meiravgri
meiravgri previously approved these changes Dec 2, 2025
shard_conn = env.getConnection(shardId)
_verify_metrics_not_changed(env, shard_conn, before_info_dicts[shardId], tested_in_this_test)

# @skip(cluster=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Commented-out decorator accidentally committed to test file

A commented-out decorator # @skip(cluster=False) was added before the test_active_io_threads_stats function. This appears to be a remnant from testing or debugging that was accidentally left in the codebase. Commented-out decorators serve no functional purpose and add noise to the code.

Fix in Cursor Fix in Web

return (!errStr
|| RSGlobalConfig.requestConfigParams.timeoutPolicy == TimeoutPolicy_Fail
|| strcmp(errStr, "Timeout limit was reached"));
// TODO - get RequestConfig ptr as parameter instead of global config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the comment relevant?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

meiravgri
meiravgri previously approved these changes Dec 2, 2025
@lerman25 lerman25 enabled auto-merge December 2, 2025 15:14
meiravgri
meiravgri previously approved these changes Dec 3, 2025
@lerman25 lerman25 force-pushed the backport-7414-to-8.2 branch from f7c045b to d64876c Compare December 3, 2025 13:12
}
if (rc == RS_RESULT_TIMEDOUT) {
// Track warnings in global statistics
QueryWarningsGlobalStats_UpdateWarning(QUERY_WARNING_CODE_TIMED_OUT, 1, !IsInternal(req));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: RESP3 timeout warning tracking misses certain timeout conditions

The RESP3 code path (sendChunk_Resp3) tracks timeout warnings only when rc == RS_RESULT_TIMEDOUT, while the RESP2 code path (sendChunk_Resp2) uses the broader has_timedout condition which includes (rc == RS_RESULT_TIMEDOUT) || hasTimeoutError(req->qiter.err). This inconsistency means timeouts recorded in req->qiter.err but not reflected in rc will be tracked in RESP2 but missed in RESP3. The RESP3 warning tracking at line 662 uses a narrower check than the has_timedout variable computed at line 678.

Additional Locations (1)

Fix in Cursor Fix in Web

@lerman25 lerman25 added this pull request to the merge queue Dec 3, 2025
@codecov
Copy link

codecov bot commented Dec 3, 2025

Codecov Report

❌ Patch coverage is 90.90909% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.51%. Comparing base (0aa1ecf) to head (d64876c).
⚠️ Report is 20 commits behind head on 8.2.

Files with missing lines Patch % Lines
src/query_error.c 78.94% 4 Missing ⚠️
src/module.c 88.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              8.2    #7614      +/-   ##
==========================================
- Coverage   89.54%   89.51%   -0.04%     
==========================================
  Files         253      253              
  Lines       41236    41290      +54     
  Branches     3725     3725              
==========================================
+ Hits        36924    36959      +35     
- Misses       4263     4282      +19     
  Partials       49       49              
Flag Coverage Δ
flow 82.21% <90.90%> (-0.65%) ⬇️
unit 47.42% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Merged via the queue into 8.2 with commit c922ddc Dec 3, 2025
23 checks passed
@lerman25 lerman25 deleted the backport-7414-to-8.2 branch December 3, 2025 15:18
@lerman25
Copy link
Collaborator Author

/backport

@redisearch-backport-pull-request
Copy link
Contributor

Backport failed for 2.10, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.10
git worktree add -d .worktree/backport-7614-to-2.10 origin/2.10
cd .worktree/backport-7614-to-2.10
git switch --create backport-7614-to-2.10
git cherry-pick -x c922ddcb8b7b4d859a60d22b2d882ebc6ba1c058

alonre24 added a commit to alonre24/redis that referenced this pull request Jan 26, 2026
**Bug Fixes:**

* [redis#7219](RediSearch/RediSearch#7219) Fix a concurrency issue on Reducer in `FT.AGGREGATE`
* [redis#7255](RediSearch/RediSearch#7255) Fix `BM25STD` underflow wraparound
* [redis#7275](RediSearch/RediSearch#7275) Report used memory as `unsigned long long` to avoid underflows
* [redis#7264](RediSearch/RediSearch#7264) Fix `totalDocsLen` updates
* [redis#6995](RediSearch/RediSearch#6995) Do not fanout `FT.INFO` to replicas
* [redis#7350](RediSearch/RediSearch#7350) Fix `FT.CREATE` failure with LeanVec parameters on non-Intel architectures
* [redis#7384](RediSearch/RediSearch#7384) Fix index load from RDB temporary memory overhead
* [redis#7459](RediSearch/RediSearch#7459) Fix Fork GC potential double-free on error path
* [redis#7458](RediSearch/RediSearch#7458) Fix a GC performence regression
* [redis#7470](RediSearch/RediSearch#7470) Avoid draining worker thread pool from FLUSH callback to avoid deadlocks
* [redis#7554](RediSearch/RediSearch#7554) Handle the case where `SCORE` is sent alone without extra fields (coordinator)
* [redis#7685](RediSearch/RediSearch#7685) Fix cursor logical leak
* [redis#7794](RediSearch/RediSearch#7794) Fix `cmp_strings()` to correctly handle binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE
* [redis#7873](RediSearch/RediSearch#7873) Handle warnings in empty `FT.AGGREGATE` replies (cluster)
* [redis#7886](RediSearch/RediSearch#7886) Remove non-TEXT fields from the spec keys dictionary
* [redis#7904](RediSearch/RediSearch#7904) Refactor keys dictionary handling
* [redis#7901](RediSearch/RediSearch#7901) Support multiple warnings in reply
* [redis#8083](RediSearch/RediSearch#8083) Fix incorrect FULLTEXT field metric counts
* [redis#8153](RediSearch/RediSearch#8153) Fix configuration registration issues

**Improvements:**

* [redis#7154](RediSearch/RediSearch#7154) `FT.AGGREGATE` can return Background Indexing OOM warnings
* [redis#7083](RediSearch/RediSearch#7083) Add the default text scorer as a configuration option
* [redis#7341](RediSearch/RediSearch#7341) Rename `FT.PROFILE` counter fields
* [redis#7436](RediSearch/RediSearch#7436) Enhance `FT.PROFILE` with vector search execution details
* [redis#7435](RediSearch/RediSearch#7435) Ensure full `FT.PROFILE` output on timeout with RETURN policy
* [redis#7534](RediSearch/RediSearch#7534) Reduce the number of worker threads asynchronously to avoid deadlocks during queries
* [redis#7614](RediSearch/RediSearch#7614) Track timeout warnings and errors in INFO
* [redis#7646](RediSearch/RediSearch#7646) Track `maxprefixexpansions` warnings and errors in INFO
* [redis#7577](RediSearch/RediSearch#7577) Track query syntax/argument errors (basis for query error tracking)
* [redis#7737](RediSearch/RediSearch#7737) Add `Internal cursor reads` metric to cluster `FT.PROFILE` output
* [redis#7759](RediSearch/RediSearch#7759) Extend indexing metrics
* [redis#7710](RediSearch/RediSearch#7710) Support `WITHCOUNT` keyword in `FT.AGGREGATE`
* [redis#7957](RediSearch/RediSearch#7957) Persist query warnings across cursor reads
* [redis#8054](RediSearch/RediSearch#8054) Add logging for index-related commands
* [redis#8151](RediSearch/RediSearch#8151) Fix shard total profile time reporting in `FT.PROFILE`
* [redis#8103](RediSearch/RediSearch#8103) Output current thread IndexSpec information on crash
YaacovHazan pushed a commit to redis/redis that referenced this pull request Jan 26, 2026
**Bug Fixes:**

* [#7219](RediSearch/RediSearch#7219) Fix a
concurrency issue on Reducer in `FT.AGGREGATE`
* [#7255](RediSearch/RediSearch#7255) Fix
`BM25STD` underflow wraparound
* [#7275](RediSearch/RediSearch#7275) Report
used memory as `unsigned long long` to avoid underflows
* [#7264](RediSearch/RediSearch#7264) Fix
`totalDocsLen` updates
* [#6995](RediSearch/RediSearch#6995) Do not
fanout `FT.INFO` to replicas
* [#7350](RediSearch/RediSearch#7350) Fix
`FT.CREATE` failure with LeanVec parameters on non-Intel architectures
* [#7694](RediSearch/RediSearch#7694) Use
asynchronous jobs for GC in SVS to accelerate execution
* [#7384](RediSearch/RediSearch#7384) Fix index
load from RDB temporary memory overhead
* [#7459](RediSearch/RediSearch#7459) Fix Fork
GC potential double-free on error path
* [#7458](RediSearch/RediSearch#7458) Fix a GC
performence regression
* [#7470](RediSearch/RediSearch#7470) Avoid
draining worker thread pool from FLUSH callback to avoid deadlocks
* [#7554](RediSearch/RediSearch#7554) Handle the
case where `SCORE` is sent alone without extra fields (coordinator)
* [#7685](RediSearch/RediSearch#7685) Fix cursor
logical leak
* [#7794](RediSearch/RediSearch#7794) Fix
`cmp_strings()` to correctly handle binary data with embedded NULLs in
TOLIST reducer in FT.AGGREGATE
* [#7873](RediSearch/RediSearch#7873) Handle
warnings in empty `FT.AGGREGATE` replies (cluster)
* [#7886](RediSearch/RediSearch#7886) Remove
non-TEXT fields from the spec keys dictionary
* [#7904](RediSearch/RediSearch#7904) Refactor
keys dictionary handling
* [#7901](RediSearch/RediSearch#7901) Support
multiple warnings in reply
* [#8083](RediSearch/RediSearch#8083) Fix
incorrect FULLTEXT field metric counts
* [#8153](RediSearch/RediSearch#8153) Fix
configuration registration issues

**Improvements:**

* [#7154](RediSearch/RediSearch#7154)
`FT.AGGREGATE` can return Background Indexing OOM warnings
* [#7083](RediSearch/RediSearch#7083) Add the
default text scorer as a configuration option
* [#7341](RediSearch/RediSearch#7341) Rename
`FT.PROFILE` counter fields
* [#7436](RediSearch/RediSearch#7436) Enhance
`FT.PROFILE` with vector search execution details
* [#7435](RediSearch/RediSearch#7435) Ensure
full `FT.PROFILE` output on timeout with RETURN policy
* [#7534](RediSearch/RediSearch#7534) Reduce the
number of worker threads asynchronously to avoid deadlocks during
queries
* [#7614](RediSearch/RediSearch#7614) Track
timeout warnings and errors in INFO
* [#7646](RediSearch/RediSearch#7646) Track
`maxprefixexpansions` warnings and errors in INFO
* [#7577](RediSearch/RediSearch#7577) Track
query syntax/argument errors (basis for query error tracking)
* [#7737](RediSearch/RediSearch#7737) Add
`Internal cursor reads` metric to cluster `FT.PROFILE` output
* [#7759](RediSearch/RediSearch#7759) Extend
indexing metrics
* [#7710](RediSearch/RediSearch#7710) Support
`WITHCOUNT` keyword in `FT.AGGREGATE`
* [#7957](RediSearch/RediSearch#7957) Persist
query warnings across cursor reads
* [#8054](RediSearch/RediSearch#8054) Add
logging for index-related commands
* [#8151](RediSearch/RediSearch#8151) Fix shard
total profile time reporting in `FT.PROFILE`
* [#8103](RediSearch/RediSearch#8103) Output
current thread IndexSpec information on crash
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants