Prevent bad memory access when NOTOUCH client gets unblocked#2347
Conversation
Signed-off-by: Uri Yagelnik <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2347 +/- ##
============================================
- Coverage 71.56% 71.55% -0.02%
============================================
Files 125 125
Lines 69213 69214 +1
============================================
- Hits 49534 49523 -11
- Misses 19679 19691 +12
🚀 New features to boost your workflow:
|
|
Backport to 8.1 since #1499 was released in 8.1, right? |
|
For the release notes, the title should mention when it would happen, e.g. no-touch + blocking commands. |
|
Oooops - seems like a fix for that was also inserted as part of #2089 - I recall we had this issue which was blocking us during fuzzer testing and we just fixed it in order to progress and probably missed. |
Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
|
If we want to backport it, we need to re-apply the actual fix too, i.e. the first commit in this PR. |
It still has the fix as part of the commit. I think we can cherry-pick it as is |
Fix missing check for executing client in `lookupKey` function ### Issue The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in valkey-io#1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` ### Fix Added a null check for `server.executing_client` before attempting to dereference it: ### Tests Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
Fix missing check for executing client in `lookupKey` function The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in valkey-io#1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` Added a null check for `server.executing_client` before attempting to dereference it: Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
Fix missing check for executing client in `lookupKey` function The `lookupKey` function in db.c accesses `server.executing_client->cmd->proc` without first verifying that `server.executing_client` is not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null. The server crashes with a null pointer dereference when the current_client's flag.no_touch is set. ``` 27719 valkey-server * /lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0] src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7] src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc] src/valkey-server 127.0.0.1:21113[0x52b8f1] src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f] src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9] src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5] src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5] src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b] src/valkey-server 127.0.0.1:21113[0x57daa6] src/valkey-server 127.0.0.1:21113[0x57e280] src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259] src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450] src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6] /lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a] src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a] ``` Added a null check for `server.executing_client` before attempting to dereference it: Added a regression test in tests/unit/type/list.tcl. --------- Signed-off-by: Uri Yagelnik <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
This PR is based on: valkey-io/valkey#2347 This was introduced in #13512 The server crashes with a null pointer dereference when lookupKey() is called from handleClientsBlockedOnKey(). The crash occurs because server.executing_client is NULL, but the code attempts to access server.executing_client->cmd->proc without checking. **Crash scenario:** Client 1 enables CLIENT NO-TOUCH Client 2 blocks on BRPOP mylist 0 Client 1 executes RPUSH mylist elem When unblocking Client 2, lookupKey() dereferences NULL server.executing_client → crash **Solution** Added proper null checks before dereferencing server.executing_client: Check if LOOKUP_NOTOUCH flag is already set before attempting to modify it Verify both server.current_client and server.executing_client are not NULL before accessing their members Maintain the TOUCH command exception for scripts **Testing** Added regression test in tests/unit/type/list.tcl that reproduces and verifies the fix for this crash scenario. This fix is based on valkey-io/valkey#2347 Co-authored-by: Uri Yagelnik <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]>
## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * #2111 * #2112 * #2109 * #2113 * #2108 * #2137 * #2106 * #2347 * #2973 * #2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>
## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * valkey-io#2111 * valkey-io#2112 * valkey-io#2109 * valkey-io#2113 * valkey-io#2108 * valkey-io#2137 * valkey-io#2106 * valkey-io#2347 * valkey-io#2973 * valkey-io#2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>
Fix missing check for executing client in
lookupKeyfunctionIssue
The
lookupKeyfunction in db.c accessesserver.executing_client->cmd->procwithout first verifying thatserver.executing_clientis not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null.The server crashes with a null pointer dereference when the current_client's flag.no_touch is set.
Fix
Added a null check for
server.executing_clientbefore attempting to dereference it:Tests
Added a regression test in tests/unit/type/list.tcl.