Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET by ranshid · Pull Request #2108 · valkey-io/valkey

ranshid · 2025-05-20T18:26:55Z

In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist.

This will mainly impact memory inspection codes and should not introduce any real bug.

Example:

==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128
READ of size 40 at 0x603000024e3a thread T0

… wrong length In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` Signed-off-by: Ran Shidlansik <[email protected]>

src/cluster_legacy.c

codecov · 2025-05-20T18:42:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.26%. Comparing base (1531b44) to head (f9765d1).
Report is 20 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2108      +/-   ##
============================================
+ Coverage     71.24%   71.26%   +0.01%     
============================================
  Files           122      122              
  Lines         66030    66049      +19     
============================================
+ Hits          47045    47071      +26     
+ Misses        18985    18978       -7

Files with missing lines	Coverage Δ
src/cluster_legacy.c	`86.75% <100.00%> (+0.14%)`	⬆️

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

1. add sanity check in blacklist existance check 2. modify the test. it can still be caught by sanitizer test. Signed-off-by: Ran Shidlansik <[email protected]>

PingXie · 2025-05-26T04:59:35Z

src/cluster_legacy.c

+    clusterBlacklistCleanup();
+
+    /* Sanity check. In case the provided node ID length is wrong we can bail early. */
+    if (len != CLUSTER_NAMELEN)


this check is a bit odd. rather, I think nodeid's length being 40 bytes is part of the clusterBlacklistExists contract and it is the caller who is responsible for validating the nodeid's length.

So I agree there is some "undocumented" contract here, which is why this is simply a sanity check which is meant to avoid allocating new string and searching the dictionary for no good reason. TBH we can drop this check and just search the dictionary (were we will not find it).
The reason for placing the check logic here was to make the code more defensive.

src/cluster_legacy.c

allow backlist search to handle any node-id size Signed-off-by: Ran Shidlansik <[email protected]>

madolson · 2025-05-28T02:23:59Z

tests/unit/other.tcl

+         catch {r cluster forget 1} err
+         set _ $err
+    } {*ERR Unknown node*} 


For future reference, you can do this in a one liner:

assert_error {*ERR Unknown node*} {r cluster forget 1}

The current pattern is useful primarily when the last command is already returning something.

… wrong length in CLUSTER FORGET (valkey-io#2108) In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` --------- Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: chzhoo <[email protected]>

… wrong length in CLUSTER FORGET (valkey-io#2108) In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist. This will mainly impact memory inspection codes and should not introduce any real bug. Example: ``` ==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128 READ of size 40 at 0x603000024e3a thread T0 ``` --------- Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: shanwan1 <[email protected]>

#14417) Fix a heap-buffer-overflow vulnerability in the `CLUSTER FORGET` command when provided with a node ID shorter than the expected 40 bytes. When `CLUSTER FORGET` is called with a node ID that has a length smaller than `CLUSTER_NAMELEN` (40 bytes), the `clusterBlacklistExists()` function would read beyond the allocated string buffer. This occurs because the function always attempted to read exactly 40 bytes via `sdsnewlen(nodeid, CLUSTER_NAMELEN)`. Changes: - Added a `size_t len` parameter to `clusterBlacklistExists()` to use the actual string length - Updated all call sites to pass the appropriate length - Added a test case to verify the fix - Test added to verify that `CLUSTER FORGET` with an invalid short node ID returns an appropriate error. This PR is based on: valkey-io/valkey#2108 Co-authored-by: Ran Shidlansik <[email protected]>

## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * #2111 * #2112 * #2109 * #2113 * #2108 * #2137 * #2106 * #2347 * #2973 * #2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>

## Add Fuzzing Capability to Valkey ### Overview This PR adds a fuzzing capability to Valkey, allowing developers and users to stress test their Valkey deployments with randomly generated commands. The fuzzer is integrated with the existing valkey-benchmark tool, making it easy to use without requiring additional dependencies. ### Key Features • **Command Generator**: Automatically generates Valkey commands by retrieving command information directly from the server • **Two Fuzzing Modes**: - normal: Generates only valid commands, doesn't modify server configurations - aggressive: Includes malformed commands and allows CONFIG SET operations • **Multi-threaded Testing**: Each client runs in a dedicated thread to maximize interaction between clients and enable testing of complicated scenarios • **Integration with valkey-benchmark**: Uses the existing CLI interface ### Implementation Details • Added new files: - `fuzzer_command_generator.h/c`: Dynamically generates valkey commands. - `fuzzer_client.c`: Orchestrate all the client threads, report test progress, and handle errors. • Modified existing files: - valkey-benchmark.c: Added fuzzing mode options and integration ### Command Generation Approach The fuzzer dynamically retrieves command information from the server, allowing it to adapt to different Valkey versions and custom modules. Since the command information generated from JSON files is sometimes limited, not all generated commands will be valid, but approximately 95% valid command generation is achieved. It is important to generate valid commands to cover as much code path as possible and not just the invalid command/args path. The fuzzer prioritizes generating syntactically and semantically correct commands to ensure thorough testing of the server's core functionality, while still including a small percentage of invalid commands in `aggressive` mode to test error handling paths #### Config modification For CONFIG SET command, the situation is more complex as the server currently provides limited information through CONFIG GET *. Some hardcoded logic is implemented that will need to be modified in the future. Ideally, the server should provide self-inspection commands to retrieve config keys-values with their properties (enum values, modifiability status, etc.). ### Issue Detection The fuzzer is designed to identify several types of issues: • Server crashes • Server memory corruptions / memory leaks(when compiled with ASAN) • Server unresponsiveness • Server malformed replies For unresponsiveness detection, command timeout limits are implemented to ensure no command blocks for excessive periods. If a server doesn't respond within 30 seconds, the fuzzer signals that something is wrong. ### Proven Effectiveness When running against the latest unstable version, the fuzzer has already identified several issues, demonstrating its effectiveness: * valkey-io#2111 * valkey-io#2112 * valkey-io#2109 * valkey-io#2113 * valkey-io#2108 * valkey-io#2137 * valkey-io#2106 * valkey-io#2347 * valkey-io#2973 * valkey-io#2974 ### How to Use Run the fuzzer using the valkey-benchmark tool with the --fuzz flag: ```bash # Basic usage (10000 commands 1000 commands per client, 10 clients) ./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With aggressive fuzzing mode ./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10 # With detailed logging ./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10 ``` The fuzzer supports existing valkey-benchmark options, including TLS and cluster mode configuration. --------- Signed-off-by: Uri Yagelnik <[email protected]>

ranshid marked this pull request as ready for review May 20, 2025 18:27

ranshid requested review from madolson and uriyage May 20, 2025 18:27

ranshid commented May 20, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

uriyage approved these changes May 21, 2025

View reviewed changes

fix by PR comments:

1fe26df

1. add sanity check in blacklist existance check 2. modify the test. it can still be caught by sanitizer test. Signed-off-by: Ran Shidlansik <[email protected]>

ranshid force-pushed the fix-cluster-forget-bad-node-id branch from cb2e515 to 1fe26df Compare May 25, 2025 14:02

enjoy-binbin approved these changes May 26, 2025

View reviewed changes

enjoy-binbin changed the title ~~[BUG] Fix a case of out of bound read when cluster node ID is provided with wrong length~~ Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET May 26, 2025

PingXie reviewed May 26, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

enjoy-binbin reviewed May 26, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

suggest from latest PR comments:

f9765d1

allow backlist search to handle any node-id size Signed-off-by: Ran Shidlansik <[email protected]>

PingXie approved these changes May 26, 2025

View reviewed changes

PingXie merged commit 3ceae81 into valkey-io:unstable May 26, 2025
51 checks passed

madolson reviewed May 28, 2025

View reviewed changes

uriyage mentioned this pull request Jul 10, 2025

Valkey Fuzzer #2340

Merged

moticless mentioned this pull request Oct 8, 2025

Fix heap-buffer-overflow in CLUSTER FORGET with invalid node ID length redis/redis#14417

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET#2108

Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET#2108
PingXie merged 3 commits intovalkey-io:unstablefrom
ranshid:fix-cluster-forget-bad-node-id

ranshid commented May 20, 2025

Uh oh!

Uh oh!

codecov bot commented May 20, 2025 •

edited

Loading

Uh oh!

PingXie May 26, 2025

Uh oh!

ranshid May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ranshid commented May 20, 2025

Uh oh!

Uh oh!

codecov bot commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PingXie May 26, 2025

Choose a reason for hiding this comment

Uh oh!

ranshid May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madolson May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented May 20, 2025 •

edited

Loading

ranshid May 26, 2025 •

edited

Loading