Skip to content

Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET#2108

Merged
PingXie merged 3 commits intovalkey-io:unstablefrom
ranshid:fix-cluster-forget-bad-node-id
May 26, 2025
Merged

Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET#2108
PingXie merged 3 commits intovalkey-io:unstablefrom
ranshid:fix-cluster-forget-bad-node-id

Conversation

@ranshid
Copy link
Member

@ranshid ranshid commented May 20, 2025

In case we issue a cluster forget with a node ID which has a smaller length than 40 bytes, we can read over the allocated string when checking the blacklist.

This will mainly impact memory inspection codes and should not introduce any real bug.

Example:

==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128
READ of size 40 at 0x603000024e3a thread T0

… wrong length

In case we issue a cluster forget with a node ID which has a smaller length than
40 bytes, we can read over the allocated string when checking the blacklist.

This will mainly impact memory inspection codes and should not introduce any real bug.

Example:

```
==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128
READ of size 40 at 0x603000024e3a thread T0
```

Signed-off-by: Ran Shidlansik <[email protected]>
@ranshid ranshid marked this pull request as ready for review May 20, 2025 18:27
@ranshid ranshid requested review from madolson and uriyage May 20, 2025 18:27
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.26%. Comparing base (1531b44) to head (f9765d1).
Report is 20 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2108      +/-   ##
============================================
+ Coverage     71.24%   71.26%   +0.01%     
============================================
  Files           122      122              
  Lines         66030    66049      +19     
============================================
+ Hits          47045    47071      +26     
+ Misses        18985    18978       -7     
Files with missing lines Coverage Δ
src/cluster_legacy.c 86.75% <100.00%> (+0.14%) ⬆️

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

1. add sanity check in blacklist existance check
2. modify the test. it can still be caught by sanitizer test.

Signed-off-by: Ran Shidlansik <[email protected]>
@ranshid ranshid force-pushed the fix-cluster-forget-bad-node-id branch from cb2e515 to 1fe26df Compare May 25, 2025 14:02
@enjoy-binbin enjoy-binbin changed the title [BUG] Fix a case of out of bound read when cluster node ID is provided with wrong length Fix a case of out of bound read when cluster node ID is provided with wrong length in CLUSTER FORGET May 26, 2025
clusterBlacklistCleanup();

/* Sanity check. In case the provided node ID length is wrong we can bail early. */
if (len != CLUSTER_NAMELEN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check is a bit odd. rather, I think nodeid's length being 40 bytes is part of the clusterBlacklistExists contract and it is the caller who is responsible for validating the nodeid's length.

Copy link
Member Author

@ranshid ranshid May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I agree there is some "undocumented" contract here, which is why this is simply a sanity check which is meant to avoid allocating new string and searching the dictionary for no good reason. TBH we can drop this check and just search the dictionary (were we will not find it).
The reason for placing the check logic here was to make the code more defensive.

allow backlist search to handle any node-id size

Signed-off-by: Ran Shidlansik <[email protected]>
@PingXie PingXie merged commit 3ceae81 into valkey-io:unstable May 26, 2025
51 checks passed
Comment on lines +556 to +558
catch {r cluster forget 1} err
set _ $err
} {*ERR Unknown node*}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, you can do this in a one liner:

assert_error {*ERR Unknown node*} {r cluster forget 1}

The current pattern is useful primarily when the last command is already returning something.

chzhoo pushed a commit to chzhoo/valkey that referenced this pull request Jun 12, 2025
… wrong length in CLUSTER FORGET (valkey-io#2108)

In case we issue a cluster forget with a node ID which has a smaller
length than 40 bytes, we can read over the allocated string when
checking the blacklist.

This will mainly impact memory inspection codes and should not introduce
any real bug.

Example:

```
==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128
READ of size 40 at 0x603000024e3a thread T0
```

---------

Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: chzhoo <[email protected]>
shanwan1 pushed a commit to shanwan1/valkey that referenced this pull request Jun 13, 2025
… wrong length in CLUSTER FORGET (valkey-io#2108)

In case we issue a cluster forget with a node ID which has a smaller
length than 40 bytes, we can read over the allocated string when
checking the blacklist.

This will mainly impact memory inspection codes and should not introduce
any real bug.

Example:

```
==30858==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000024e3a at pc 0x7f9b9fe7821a bp 0x7ffeb5712980 sp 0x7ffeb5712128
READ of size 40 at 0x603000024e3a thread T0
```

---------

Signed-off-by: Ran Shidlansik <[email protected]>
Signed-off-by: shanwan1 <[email protected]>
@uriyage uriyage mentioned this pull request Jul 10, 2025
moticless added a commit to redis/redis that referenced this pull request Oct 9, 2025
#14417)

Fix a heap-buffer-overflow vulnerability in the `CLUSTER FORGET` command
when provided with a node ID shorter than the expected 40 bytes.
When `CLUSTER FORGET` is called with a node ID that has a length smaller
than `CLUSTER_NAMELEN` (40 bytes), the `clusterBlacklistExists()`
function would read beyond the allocated string buffer. This occurs
because the function always attempted to read exactly 40 bytes via
`sdsnewlen(nodeid, CLUSTER_NAMELEN)`.

Changes:
- Added a `size_t len` parameter to `clusterBlacklistExists()` to use
the actual string length
- Updated all call sites to pass the appropriate length
- Added a test case to verify the fix
- Test added to verify that `CLUSTER FORGET` with an invalid short node
ID returns an appropriate error.

This PR is based on: valkey-io/valkey#2108
Co-authored-by: Ran Shidlansik <[email protected]>
ranshid pushed a commit that referenced this pull request Jan 1, 2026
## Add Fuzzing Capability to Valkey

### Overview
This PR adds a fuzzing capability to Valkey, allowing developers and
users to stress test their Valkey deployments with randomly generated
commands. The fuzzer is integrated with the existing valkey-benchmark
tool, making it easy to use without requiring additional dependencies.

### Key Features
• **Command Generator**: Automatically generates Valkey commands by
retrieving command information directly from the server
• **Two Fuzzing Modes**:
- normal: Generates only valid commands, doesn't modify server
configurations
- aggressive: Includes malformed commands and allows CONFIG SET
operations

• **Multi-threaded Testing**: Each client runs in a dedicated thread to
maximize interaction between clients and enable testing of complicated
scenarios

• **Integration with valkey-benchmark**: Uses the existing CLI interface

### Implementation Details
• Added new files:
- `fuzzer_command_generator.h/c`: Dynamically generates valkey commands.
- `fuzzer_client.c`: Orchestrate all the client threads, report test
progress, and handle errors.

• Modified existing files:
  - valkey-benchmark.c: Added fuzzing mode options and integration

### Command Generation Approach
The fuzzer dynamically retrieves command information from the server,
allowing it to adapt to different Valkey versions and custom modules.
Since the command information generated from JSON files is sometimes
limited, not all generated commands will be valid, but approximately 95%
valid command generation is achieved.

It is important to generate valid commands to cover as much code path as
possible and not just the invalid command/args path. The fuzzer
prioritizes generating syntactically and semantically correct commands
to ensure thorough testing of the server's core functionality, while
still including a small percentage of invalid commands in `aggressive`
mode to test error handling paths

#### Config modification
For CONFIG SET command, the situation is more complex as the server
currently provides limited information through CONFIG GET *. Some
hardcoded logic is implemented that will need to be modified in the
future. Ideally, the server should provide self-inspection commands to
retrieve config keys-values with their properties (enum values,
modifiability status, etc.).

### Issue Detection
The fuzzer is designed to identify several types of issues:
• Server crashes
• Server memory corruptions / memory leaks(when compiled with ASAN)
• Server unresponsiveness
• Server malformed replies

For unresponsiveness detection, command timeout limits are implemented
to ensure no command blocks for excessive periods. If a server doesn't
respond within 30 seconds, the fuzzer signals that something is wrong.

### Proven Effectiveness
When running against the latest unstable version, the fuzzer has already
identified several issues, demonstrating its effectiveness:
* #2111
* #2112
* #2109
* #2113
* #2108
* #2137
* #2106
* #2347
* #2973
* #2974
### How to Use
Run the fuzzer using the valkey-benchmark tool with the --fuzz flag:

```bash
# Basic usage (10000 commands 1000 commands per client, 10 clients)
./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With aggressive fuzzing mode
./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With detailed logging
./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10
```

The fuzzer supports existing valkey-benchmark options, including TLS and
cluster mode configuration.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
jdheyburn pushed a commit to jdheyburn/valkey that referenced this pull request Jan 8, 2026
## Add Fuzzing Capability to Valkey

### Overview
This PR adds a fuzzing capability to Valkey, allowing developers and
users to stress test their Valkey deployments with randomly generated
commands. The fuzzer is integrated with the existing valkey-benchmark
tool, making it easy to use without requiring additional dependencies.

### Key Features
• **Command Generator**: Automatically generates Valkey commands by
retrieving command information directly from the server
• **Two Fuzzing Modes**:
- normal: Generates only valid commands, doesn't modify server
configurations
- aggressive: Includes malformed commands and allows CONFIG SET
operations

• **Multi-threaded Testing**: Each client runs in a dedicated thread to
maximize interaction between clients and enable testing of complicated
scenarios

• **Integration with valkey-benchmark**: Uses the existing CLI interface

### Implementation Details
• Added new files:
- `fuzzer_command_generator.h/c`: Dynamically generates valkey commands.
- `fuzzer_client.c`: Orchestrate all the client threads, report test
progress, and handle errors.

• Modified existing files:
  - valkey-benchmark.c: Added fuzzing mode options and integration

### Command Generation Approach
The fuzzer dynamically retrieves command information from the server,
allowing it to adapt to different Valkey versions and custom modules.
Since the command information generated from JSON files is sometimes
limited, not all generated commands will be valid, but approximately 95%
valid command generation is achieved.

It is important to generate valid commands to cover as much code path as
possible and not just the invalid command/args path. The fuzzer
prioritizes generating syntactically and semantically correct commands
to ensure thorough testing of the server's core functionality, while
still including a small percentage of invalid commands in `aggressive`
mode to test error handling paths

#### Config modification
For CONFIG SET command, the situation is more complex as the server
currently provides limited information through CONFIG GET *. Some
hardcoded logic is implemented that will need to be modified in the
future. Ideally, the server should provide self-inspection commands to
retrieve config keys-values with their properties (enum values,
modifiability status, etc.).

### Issue Detection
The fuzzer is designed to identify several types of issues:
• Server crashes
• Server memory corruptions / memory leaks(when compiled with ASAN)
• Server unresponsiveness
• Server malformed replies

For unresponsiveness detection, command timeout limits are implemented
to ensure no command blocks for excessive periods. If a server doesn't
respond within 30 seconds, the fuzzer signals that something is wrong.

### Proven Effectiveness
When running against the latest unstable version, the fuzzer has already
identified several issues, demonstrating its effectiveness:
* valkey-io#2111
* valkey-io#2112
* valkey-io#2109
* valkey-io#2113
* valkey-io#2108
* valkey-io#2137
* valkey-io#2106
* valkey-io#2347
* valkey-io#2973
* valkey-io#2974
### How to Use
Run the fuzzer using the valkey-benchmark tool with the --fuzz flag:

```bash
# Basic usage (10000 commands 1000 commands per client, 10 clients)
./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With aggressive fuzzing mode
./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With detailed logging
./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10
```

The fuzzer supports existing valkey-benchmark options, including TLS and
cluster mode configuration.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants