[8.4] [MOD-12419] Track Timeout errors and warnings in info#7507
[8.4] [MOD-12419] Track Timeout errors and warnings in info#7507
Conversation
There was a problem hiding this comment.
❌ Jit has detected 8 important findings in this PR that you should review.
Jit encountered an internal error and cannot comment on each finding.
You can ask a Jit admin to comment #jit_ignore_all on this PR to ignore the findings.
Here are the findings in this PR:
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/task-test.yml
- Lines: 396-401
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/generate-basic-matrix.yml
- Lines: 39-74
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/flow-build-artifacts.yml
- Lines: 69-90
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/task-get-config.yml
- Lines: 47-271
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/task-build-artifacts.yml
- Lines: 312-326
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/task-test.yml
- Lines: 387-390
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/generate-matrix.yml
- Lines: 43-112
- Security Control: Static Code Analysis Semgrep Pro
- Type: Yaml.Github-Actions.Security.Run-Shell-Injection.Run-Shell-Injection
- Description: Using variable interpolation
${{...}}withgithubcontext data in arun:step could allow an attacker to inject their own code into the runner. This would allow them to steal secrets and code.githubcontext data can have arbitrary user input and should be treated as untrusted. Instead, use an intermediate environment variable withenv:to store the data and use the environment variable in therun:script. Be sure to use double-quotes the environment variable, like this: "$ENVVAR". - Severity: HIGH
- Learn More: Link
- Filename: .github/workflows/task-test.yml
- Lines: 80-101
|
|
||
| if (ShouldReplyWithError(QueryError_GetCode(rp->parent->err), req->reqConfig.timeoutPolicy, IsProfile(req))) { | ||
| // Track errors in global statistics | ||
| QueryErrorsGlobalStats_UpdateError(QueryError_GetCode(rp->parent->err), 1, !IsInternal(req)); |
There was a problem hiding this comment.
Bug: Inconsistent error source in statistics tracking
The error tracking on line 532 uses rp->parent->err to get the error code, but line 533 immediately uses qctx->err to reply with the error message. This inconsistency means the wrong error code may be tracked in global statistics. Line 532 should use QueryError_GetCode(qctx->err) instead of QueryError_GetCode(rp->parent->err) to match the pattern used in line 403 (RESP2 version) and to be consistent with the error being replied on line 533.
| MRReply *currentWarning = MRReply_ArrayElement(rCtx->warning, i); | ||
| const char *warning_str = MRReply_String(currentWarning, NULL); | ||
| QueryWarningCode warningCode = QueryWarningCode_GetCodeFromMessage(warning_str); | ||
| QueryWarningsGlobalStats_UpdateWarning(warningCode, 1, COORD_ERR_WARN); |
There was a problem hiding this comment.
Bug: Shard warnings tracked as coordinator warnings
Warnings received from shards are incorrectly tracked as coordinator warnings. The code iterates over rCtx->warning which contains warnings from shard replies, but tracks them using COORD_ERR_WARN (true). Since these warnings originated from shards, they should be tracked as shard warnings using SHARD_ERR_WARN (false) instead. This causes incorrect statistics where shard-level warnings are counted as coordinator-level warnings.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 8.4 #7507 +/- ##
==========================================
- Coverage 85.92% 85.90% -0.03%
==========================================
Files 330 330
Lines 52558 52613 +55
Branches 12001 12001
==========================================
+ Hits 45161 45196 +35
- Misses 7230 7250 +20
Partials 167 167
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| // If an error occurred, or a timeout in strict mode - return a simple error | ||
| if (ShouldReplyWithError(QueryError_GetCode(rp->parent->err), req->reqConfig.timeoutPolicy, IsProfile(req))) { | ||
| // Track errors in global statistics | ||
| QueryErrorsGlobalStats_UpdateError(QueryError_GetCode(qctx->err), 1, !IsInternal(req)); |
There was a problem hiding this comment.
Bug: Inconsistent error object used for tracking
The condition on line 401 checks QueryError_GetCode(rp->parent->err) to determine if an error should be replied, but line 403 tracks QueryError_GetCode(qctx->err) instead. This inconsistency means the tracked error code may differ from the one that triggered the error reply, potentially causing incorrect statistics. Line 532 correctly uses rp->parent->err for both checking and tracking, showing the intended pattern.
* Track timeout in sendchunk resp2 * Track timeout warning in sendSearchResults * Track timeout error in searchResultReducer (cherry picked from commit fafe1dc) * Track timeout in sendChunk_hybrid (cherry picked from commit 8529cb9) * test timeout metrics (cherry picked from commit c56a0d9) * fix isCoord check * Add query warning code and add function and fields needed to track (cherry picked from commit a414641) * Track timeout in sendchunk resp3 (cherry picked from commit 6853bb9) * readd skip * Update syntax and args error to new SA as cluster * format and enrico comment * Change test to N=0 with Internal only (not working so revert afterwards) * Revert "Change test to N=0 with Internal only (not working so revert afterwards)" This reverts commit 829ac53. * meirav comments * Stablize tests * Add resp3 test * _disable_ hybrid sa timeout * Make test robust * fixup! Make test robust * remove limits (cherry picked from commit 6eee260)
557b6a8 to
3026209
Compare
| // Reply warning | ||
| MR_ReplyWithMRReply(reply, currentWarning); | ||
| } | ||
| RedisModule_Reply_ArrayEnd(reply); |
There was a problem hiding this comment.
Bug: Missing array wrapper for OOM warning reply
When req->queryOOM is true, the code replies with a simple string directly without wrapping it in an array. The warning field expects an array value (as seen in the rCtx->warning branch that starts an array at line 2696 and the else branch that replies with an empty array at line 2714). This causes a protocol mismatch where a string is sent instead of an array containing the warning string.
| RedisModule_Reply_SimpleString(reply, QUERY_WOOM_COORD); | ||
| } | ||
| if (rc == RS_RESULT_TIMEDOUT) { | ||
| // Track warnings in global statistics |
There was a problem hiding this comment.
Bug: OOM warnings not tracked in global statistics
When QueryError_HasQueryOOMWarning returns true or req->queryOOM is set, the code replies with QUERY_WOOM_COORD warning but doesn't track it in global statistics using QueryWarningsGlobalStats_UpdateWarning. This is inconsistent with timeout warnings which are tracked at line 614 in aggregate_exec.c and line 62 in hybrid_exec.c. The same tracking should be added for OOM warnings to maintain consistency and accurate statistics.
Additional Locations (2)
|
/backport |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 8.2
git worktree add -d .worktree/backport-7507-to-8.2 origin/8.2
cd .worktree/backport-7507-to-8.2
git switch --create backport-7507-to-8.2
git cherry-pick -x ac921169c63c4bac7176ee3501ff36e1baed2a17 |
* [MOD-12419] Track Timeout errors and warnings in info ([#7414](#7414)) * Track timeout in sendchunk resp2 * Track timeout warning in sendSearchResults * Track timeout error in searchResultReducer (cherry picked from commit fafe1dc) * Track timeout in sendChunk_hybrid (cherry picked from commit 8529cb9) * test timeout metrics (cherry picked from commit c56a0d9) * fix isCoord check * Add query warning code and add function and fields needed to track (cherry picked from commit a414641) * Track timeout in sendchunk resp3 (cherry picked from commit 6853bb9) * readd skip * Update syntax and args error to new SA as cluster * format and enrico comment * Change test to N=0 with Internal only (not working so revert afterwards) * Revert "Change test to N=0 with Internal only (not working so revert afterwards)" This reverts commit 829ac53. * meirav comments * Stablize tests * Add resp3 test * _disable_ hybrid sa timeout * Make test robust * fixup! Make test robust * remove limits (cherry picked from commit 6eee260) * Align to 8.4 (cherry picked from commit ac92116)
* [MOD-12419] Track Timeout errors and warnings in info ([#7507](#7507)) * [MOD-12419] Track Timeout errors and warnings in info ([#7414](#7414)) * Track timeout in sendchunk resp2 * Track timeout warning in sendSearchResults * Track timeout error in searchResultReducer (cherry picked from commit fafe1dc) * Track timeout in sendChunk_hybrid (cherry picked from commit 8529cb9) * test timeout metrics (cherry picked from commit c56a0d9) * fix isCoord check * Add query warning code and add function and fields needed to track (cherry picked from commit a414641) * Track timeout in sendchunk resp3 (cherry picked from commit 6853bb9) * readd skip * Update syntax and args error to new SA as cluster * format and enrico comment * Change test to N=0 with Internal only (not working so revert afterwards) * Revert "Change test to N=0 with Internal only (not working so revert afterwards)" This reverts commit 829ac53. * meirav comments * Stablize tests * Add resp3 test * _disable_ hybrid sa timeout * Make test robust * fixup! Make test robust * remove limits (cherry picked from commit 6eee260) * Align to 8.4 (cherry picked from commit ac92116) * Fixes * Backport more logic
**Bug Fixes:** * [redis#7385](RediSearch/RediSearch#7385) Fix high temporary memory consumption when loading multiple search indexes from RDB * [redis#7430](RediSearch/RediSearch#7430) Fix a potential deadlock in `FT.HYBRID` in cluster mode during updates. * [redis#7454](RediSearch/RediSearch#7454) Fix a garbage collection performence regression * [redis#7460](RediSearch/RediSearch#7460) Fix potential double-free in Fork GC error paths * [redis#7455](RediSearch/RediSearch#7455) Fix internal cursors not being deleted promptly in cluster mode * [redis#7667](RediSearch/RediSearch#7667) Fix a cursor logical leak upon dropping the index * [redis#7796](RediSearch/RediSearch#7796) Fix a potential use-after-free when removing connections * [redis#7792](RediSearch/RediSearch#7792) Fix string comparison for binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE * [redis#7823](RediSearch/RediSearch#7823) Update `FT.HYBRID` to accept vector blobs only via parameters * [redis#7903](RediSearch/RediSearch#7903) Fix a memory leak in Hybrid ASM * [redis#8052](RediSearch/RediSearch#8052) Fix `FT.HYBRID` behavior when used with `LOAD *` * [redis#8082](RediSearch/RediSearch#8082) Fix incorrect FULLTEXT field metric counts * [redis#8089](RediSearch/RediSearch#8089) Fix an edge case in `CLUSTERSET` handling * [redis#8152](RediSearch/RediSearch#8152) Fix configuration registration issues **Improvements:** * [redis#7427](RediSearch/RediSearch#7427) Enhance `FT.PROFILE` with vector search execution details * [redis#7431](RediSearch/RediSearch#7431) Ensure full `FT.PROFILE` output is returned on timeout with RETURN policy * [redis#7507](RediSearch/RediSearch#7507) Track timeout warnings and errors in INFO * [redis#7576](RediSearch/RediSearch#7576) Track OOM warnings and errors in INFO * [redis#7612](RediSearch/RediSearch#7612) Track `maxprefixexpansions` warnings and errors in INFO * [redis#7960](RediSearch/RediSearch#7960) Persist query warnings across cursor reads * [redis#7551](RediSearch/RediSearch#7551), [redis#7616](RediSearch/RediSearch#7616), [redis#7622](RediSearch/RediSearch#7622), [redis#7625](RediSearch/RediSearch#7625) Add runtime thread and pending-jobs metrics * [redis#7589](RediSearch/RediSearch#7589) Support multiple slot ranges in `search.CLUSTERSET` * [redis#7707](RediSearch/RediSearch#7707) Add `WITHCOUNT` support to `FT.AGGREGATE` * [redis#7862](RediSearch/RediSearch#7862) Add support for subquery `COUNT` in `FT.HYBRID` * [redis#8087](RediSearch/RediSearch#8087) Add warnings when cursor results may be affected by ASM and expose ASM warnings in `FT.PROFILE` * [redis#8049](RediSearch/RediSearch#8049) Add logging for index-related commands * [redis#8150](RediSearch/RediSearch#8150) Fix shard total profile time reporting in `FT.PROFILE`
**Bug Fixes:** * [#7385](RediSearch/RediSearch#7385) Fix high temporary memory consumption when loading multiple search indexes from RDB * [#7430](RediSearch/RediSearch#7430) Fix a potential deadlock in `FT.HYBRID` in cluster mode during updates. * [#7454](RediSearch/RediSearch#7454) Fix a garbage collection performence regression * [#7460](RediSearch/RediSearch#7460) Fix potential double-free in Fork GC error paths * [#7455](RediSearch/RediSearch#7455) Fix internal cursors not being deleted promptly in cluster mode * [#7667](RediSearch/RediSearch#7667) Fix a cursor logical leak upon dropping the index * [#7796](RediSearch/RediSearch#7796) Fix a potential use-after-free when removing connections * [#7792](RediSearch/RediSearch#7792) Fix string comparison for binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE * [#7704](RediSearch/RediSearch#7704) Use asynchronous jobs for GC in SVS to accelerate execution * [#7823](RediSearch/RediSearch#7823) Update `FT.HYBRID` to accept vector blobs only via parameters * [#7903](RediSearch/RediSearch#7903) Fix a memory leak in Hybrid ASM * [#8052](RediSearch/RediSearch#8052) Fix `FT.HYBRID` behavior when used with `LOAD *` * [#8082](RediSearch/RediSearch#8082) Fix incorrect FULLTEXT field metric counts * [#8089](RediSearch/RediSearch#8089) Fix an edge case in `CLUSTERSET` handling * [#8152](RediSearch/RediSearch#8152) Fix configuration registration issues **Improvements:** * [#7427](RediSearch/RediSearch#7427) Enhance `FT.PROFILE` with vector search execution details * [#7431](RediSearch/RediSearch#7431) Ensure full `FT.PROFILE` output is returned on timeout with RETURN policy * [#7507](RediSearch/RediSearch#7507) Track timeout warnings and errors in INFO * [#7576](RediSearch/RediSearch#7576) Track OOM warnings and errors in INFO * [#7612](RediSearch/RediSearch#7612) Track `maxprefixexpansions` warnings and errors in INFO * [#7960](RediSearch/RediSearch#7960) Persist query warnings across cursor reads * [#7551](RediSearch/RediSearch#7551), [#7616](RediSearch/RediSearch#7616), [#7622](RediSearch/RediSearch#7622), [#7625](RediSearch/RediSearch#7625) Add runtime thread and pending-jobs metrics * [#7589](RediSearch/RediSearch#7589) Support multiple slot ranges in `search.CLUSTERSET` * [#7707](RediSearch/RediSearch#7707) Add `WITHCOUNT` support to `FT.AGGREGATE` * [#7862](RediSearch/RediSearch#7862) Add support for subquery `COUNT` in `FT.HYBRID` * [#8087](RediSearch/RediSearch#8087) Add warnings when cursor results may be affected by ASM and expose ASM warnings in `FT.PROFILE` * [#8049](RediSearch/RediSearch#8049) Add logging for index-related commands * [#8150](RediSearch/RediSearch#8150) Fix shard total profile time reporting in `FT.PROFILE`
backport of #7414 to 8.4
Note
Adds global tracking of query timeout errors and warnings (coord/shard), wires counting into SEARCH/AGGREGATE/HYBRID paths, and surfaces new metrics in INFO; includes warning code utilities and tests.
QueryErrorsGlobalStats_UpdateError(QUERY_ETIMEDOUT, ...)and timeout warnings via newQueryWarningsGlobalStats_UpdateWarning(...).query_error.{h,c}.aggregate_exec.c,hybrid_exec.c, and coordinator inmodule.cto update global stats on errors/timeouts and to count warnings emitted in replies (RESP2/RESP3).TotalGlobalStats_GetQueryStatsandglobal_stats.{h,c}with shard/coord timeout counters for errors and warnings.info_redis.c:shard_total_query_errors_timeout,coord_total_query_errors_timeout,shard_total_query_warnings_timeout,coord_total_query_warnings_timeout.tests/pytests/test_info_modules.pyto validate new metrics in standalone/cluster (RESP2/RESP3).tests/pytests/common.pyto change timeout policy across shards.Written by Cursor Bugbot for commit 3026209. This will update automatically on new commits. Configure here.