Prevent bad memory access when NOTOUCH client gets unblocked by uriyage · Pull Request #2347 · valkey-io/valkey

uriyage · 2025-07-13T14:52:28Z

Fix missing check for executing client in lookupKey function

Issue

The lookupKey function in db.c accesses server.executing_client->cmd->proc without first verifying that server.executing_client is not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null.

The server crashes with a null pointer dereference when the current_client's flag.no_touch is set.

27719 valkey-server *
/lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0]
src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7]
src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc]
src/valkey-server 127.0.0.1:21113[0x52b8f1]
src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f]
src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9]
src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5]
src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5]
src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b]
src/valkey-server 127.0.0.1:21113[0x57daa6]
src/valkey-server 127.0.0.1:21113[0x57e280]
src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259]
src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450]
src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a]
src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a]

Fix

Added a null check for server.executing_client before attempting to dereference it:

Tests

Added a regression test in tests/unit/type/list.tcl.

Signed-off-by: Uri Yagelnik <[email protected]>

codecov · 2025-07-13T15:07:40Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.55%. Comparing base (ba7334c) to head (803c18f).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2347      +/-   ##
============================================
- Coverage     71.56%   71.55%   -0.02%     
============================================
  Files           125      125              
  Lines         69213    69214       +1     
============================================
- Hits          49534    49523      -11     
- Misses        19679    19691      +12

Files with missing lines	Coverage Δ
src/db.c	`90.47% <100.00%> (+<0.01%)`	⬆️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

PingXie

Thanks @uriyage

src/db.c

zuiderkwast · 2025-07-14T22:18:56Z

Backport to 8.1 since #1499 was released in 8.1, right?

zuiderkwast · 2025-07-14T22:21:16Z

For the release notes, the title should mention when it would happen, e.g. no-touch + blocking commands.

ranshid · 2025-08-06T13:11:07Z

Oooops - seems like a fix for that was also inserted as part of #2089 - I recall we had this issue which was blocking us during fuzzer testing and we just fixed it in order to progress and probably missed.
I still think this PR adds more tests and should be used. @uriyage do you want to drive it to the last mile?

Signed-off-by: Ran Shidlansik <[email protected]>

src/db.c

Signed-off-by: Ran Shidlansik <[email protected]>

zuiderkwast · 2025-08-06T14:11:09Z

If we want to backport it, we need to re-apply the actual fix too, i.e. the first commit in this PR.

ranshid · 2025-08-06T14:14:58Z

If we want to backport it, we need to re-apply the actual fix too, i.e. the first commit in this PR.

It still has the fix as part of the commit. I think we can cherry-pick it as is

Fix missing check for executing client in `lookupKey` function ### Issue The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in valkey-io#1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` ### Fix Added a null check for `server.executing_client` before attempting to dereference it: ### Tests Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>

Fix missing check for executing client in `lookupKey` function The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in valkey-io#1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` Added a null check for `server.executing_client` before attempting to dereference it: Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>

Fix missing check for executing client in `lookupKey` function The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` Added a null check for `server.executing_client` before attempting to dereference it: Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>

This PR is based on: valkey-io/valkey#2347 This was introduced in #13512 The server crashes with a null pointer dereference when lookupKey() is called from handleClientsBlockedOnKey(). The crash occurs because server.executing_client is NULL, but the code attempts to access server.executing_client->cmd->proc without checking. **Crash scenario:** Client 1 enables CLIENT NO-TOUCH Client 2 blocks on BRPOP mylist 0 Client 1 executes RPUSH mylist elem When unblocking Client 2, lookupKey() dereferences NULL server.executing_client → crash **Solution** Added proper null checks before dereferencing server.executing_client: Check if LOOKUP_NOTOUCH flag is already set before attempting to modify it Verify both server.current_client and server.executing_client are not NULL before accessing their members Maintain the TOUCH command exception for scripts **Testing** Added regression test in tests/unit/type/list.tcl that reproduces and verifies the fix for this crash scenario. This fix is based on valkey-io/valkey#2347 Co-authored-by: Uri Yagelnik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>

## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * #2111 * #2112 * #2109 * #2113 * #2108 * #2137 * #2106 * #2347 * #2973 * #2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>

## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * valkey-io#2111 * valkey-io#2112 * valkey-io#2109 * valkey-io#2113 * valkey-io#2108 * valkey-io#2137 * valkey-io#2106 * valkey-io#2347 * valkey-io#2973 * valkey-io#2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>

Fix missing check for executing client

99aebb3

Signed-off-by: Uri Yagelnik <[email protected]>

uriyage mentioned this pull request Jul 13, 2025

Valkey Fuzzer #2340

Merged

uriyage changed the title ~~Fix missing check for executing client~~ [CRASH] Fix missing check for executing client Jul 13, 2025

PingXie approved these changes Jul 13, 2025

View reviewed changes

src/db.c Outdated Show resolved Hide resolved

gmbnomis reviewed Jul 13, 2025

View reviewed changes

src/db.c Show resolved Hide resolved

gmbnomis approved these changes Jul 13, 2025

View reviewed changes

ranshid approved these changes Jul 14, 2025

View reviewed changes

enjoy-binbin approved these changes Jul 14, 2025

View reviewed changes

zuiderkwast added this to Valkey 8.1 Jul 14, 2025

zuiderkwast added the release-notes This issue should get a line item in the release notes label Jul 14, 2025

ranshid added this to Valkey 9.0 Aug 6, 2025

Merge branch 'unstable' into fix-missing-check-for-executing-client

de05091

Signed-off-by: Ran Shidlansik <[email protected]>

ranshid reviewed Aug 6, 2025

View reviewed changes

src/db.c Outdated Show resolved Hide resolved

Update src/db.c

803c18f

Signed-off-by: Ran Shidlansik <[email protected]>

ranshid merged commit fd8270a into valkey-io:unstable Aug 6, 2025
61 of 62 checks passed

github-project-automation bot moved this to Done in Valkey 9.0 Aug 6, 2025

github-project-automation bot moved this to To be backported in Valkey 8.1 Aug 6, 2025

ranshid changed the title ~~[CRASH] Fix missing check for executing client~~ Prevent bad memory access when NOTOUCH client gets unblocked Aug 6, 2025

ranshid moved this from To be backported to In Progress in Valkey 8.1 Sep 30, 2025

ranshid moved this from In Progress to 8.1.4 in Valkey 8.1 Sep 30, 2025

ranshid moved this from 8.1.4 to To be backported in Valkey 8.1 Sep 30, 2025

zuiderkwast moved this from To be backported to 8.1.4 in Valkey 8.1 Oct 1, 2025

moticless mentioned this pull request Oct 8, 2025

Fix crash in lookupKey() when executing_client is NULL redis/redis#14415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent bad memory access when NOTOUCH client gets unblocked#2347

Prevent bad memory access when NOTOUCH client gets unblocked#2347
ranshid merged 3 commits intovalkey-io:unstablefrom
uriyage:fix-missing-check-for-executing-client

uriyage commented Jul 13, 2025

Uh oh!

codecov bot commented Jul 13, 2025 •

edited

Loading

Uh oh!

PingXie left a comment

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Jul 14, 2025

Uh oh!

zuiderkwast commented Jul 14, 2025

Uh oh!

ranshid commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Aug 6, 2025

Uh oh!

ranshid commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

uriyage commented Jul 13, 2025

Issue

Fix

Tests

Uh oh!

codecov bot commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PingXie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Jul 14, 2025

Uh oh!

zuiderkwast commented Jul 14, 2025

Uh oh!

ranshid commented Aug 6, 2025

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Aug 6, 2025

Uh oh!

ranshid commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov bot commented Jul 13, 2025 •

edited

Loading