Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET#2108
Conversation
… wrong length In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` Signed-off-by: Ran Shidlansik <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #2108 +/- ##
============================================
+ Coverage 71.24% 71.26% +0.01%
============================================
Files 122 122
Lines 66030 66049 +19
============================================
+ Hits 47045 47071 +26
+ Misses 18985 18978 -7
🚀 New features to boost your workflow:
|
1. add sanity check in blacklist existance check 2. modify the test. it can still be caught by sanitizer test. Signed-off-by: Ran Shidlansik <[email protected]>
cb2e515 to
1fe26df
Compare
src/cluster_legacy.c
Outdated
| clusterBlacklistCleanup(); | ||
|
|
||
| /* Sanity check. In case the provided node ID length is wrong we can bail early. */ | ||
| if (len != CLUSTER_NAMELEN) |
There was a problem hiding this comment.
this check is a bit odd. rather, I think nodeid's length being 40 bytes is part of the clusterBlacklistExists contract and it is the caller who is responsible for validating the nodeid's length.
There was a problem hiding this comment.
So I agree there is some "undocumented" contract here, which is why this is simply a sanity check which is meant to avoid allocating new string and searching the dictionary for no good reason. TBH we can drop this check and just search the dictionary (were we will not find it).
The reason for placing the check logic here was to make the code more defensive.
allow backlist search to handle any node-id size Signed-off-by: Ran Shidlansik <[email protected]>
| catch {r cluster forget 1} err | ||
| set _ $err | ||
| } {*ERR Unknown node*} |
There was a problem hiding this comment.
For future reference, you can do this in a one liner:
assert_error {*ERR Unknown node*} {r cluster forget 1}
The current pattern is useful primarily when the last command is already returning something.
… wrong length in CLUSTER FORGET (valkey-io#2108) In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` --------- Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: chzhoo <[email protected]>
… wrong length in CLUSTER FORGET (valkey-io#2108) In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` --------- Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: shanwan1 <[email protected]>
#14417) Fix a heap-buffer-overflow vulnerability in the `CLUSTER FORGET` command when provided with a node ID shorter than the expected 40 bytes. When `CLUSTER FORGET` is called with a node ID that has a length smaller than `CLUSTER_NAMELEN` (40 bytes), the `clusterBlacklistExists()` function would read beyond the allocated string buffer. This occurs because the function always attempted to read exactly 40 bytes via `sdsnewlen(nodeid, CLUSTER_NAMELEN)`. Changes: - Added a `size_t len` parameter to `clusterBlacklistExists()` to use the actual string length - Updated all call sites to pass the appropriate length - Added a test case to verify the fix - Test added to verify that `CLUSTER FORGET` with an invalid short node ID returns an appropriate error. This PR is based on: valkey-io/valkey#2108 Co-authored-by: Ran Shidlansik <[email protected]>
## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * #2111 * #2112 * #2109 * #2113 * #2108 * #2137 * #2106 * #2347 * #2973 * #2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>
## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * valkey-io#2111 * valkey-io#2112 * valkey-io#2109 * valkey-io#2113 * valkey-io#2108 * valkey-io#2137 * valkey-io#2106 * valkey-io#2347 * valkey-io#2973 * valkey-io#2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>
In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist.
This will mainly impact memory inspection codes and should not introduce any real bug.
Example: