Skip to content

Prevent bad memory access when NOTOUCH client gets unblocked#2347

Merged
ranshid merged 3 commits intovalkey-io:unstablefrom
uriyage:fix-missing-check-for-executing-client
Aug 6, 2025
Merged

Prevent bad memory access when NOTOUCH client gets unblocked#2347
ranshid merged 3 commits intovalkey-io:unstablefrom
uriyage:fix-missing-check-for-executing-client

Conversation

@uriyage
Copy link
Contributor

@uriyage uriyage commented Jul 13, 2025

Fix missing check for executing client in lookupKey function

Issue

The lookupKey function in db.c accesses server.executing_client->cmd->proc without first verifying that server.executing_client is not NULL. This was introduced in #1499 where the check for executing client was added without verifying it could be null.

The server crashes with a null pointer dereference when the current_client's flag.no_touch is set.

27719 valkey-server *
/lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0]
src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7]
src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc]
src/valkey-server 127.0.0.1:21113[0x52b8f1]
src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f]
src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9]
src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5]
src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5]
src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b]
src/valkey-server 127.0.0.1:21113[0x57daa6]
src/valkey-server 127.0.0.1:21113[0x57e280]
src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259]
src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450]
src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a]
src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a]

Fix

Added a null check for server.executing_client before attempting to dereference it:

Tests

Added a regression test in tests/unit/type/list.tcl.

@uriyage uriyage mentioned this pull request Jul 13, 2025
@uriyage uriyage changed the title Fix missing check for executing client [CRASH] Fix missing check for executing client Jul 13, 2025
@codecov
Copy link

codecov bot commented Jul 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 71.55%. Comparing base (ba7334c) to head (803c18f).
⚠️ Report is 1 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2347      +/-   ##
============================================
- Coverage     71.56%   71.55%   -0.02%     
============================================
  Files           125      125              
  Lines         69213    69214       +1     
============================================
- Hits          49534    49523      -11     
- Misses        19679    19691      +12     
Files with missing lines Coverage Δ
src/db.c 90.47% <100.00%> (+<0.01%) ⬆️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @uriyage

@zuiderkwast
Copy link
Contributor

Backport to 8.1 since #1499 was released in 8.1, right?

@zuiderkwast
Copy link
Contributor

For the release notes, the title should mention when it would happen, e.g. no-touch + blocking commands.

@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Jul 14, 2025
@ranshid
Copy link
Member

ranshid commented Aug 6, 2025

Oooops - seems like a fix for that was also inserted as part of #2089 - I recall we had this issue which was blocking us during fuzzer testing and we just fixed it in order to progress and probably missed.
I still think this PR adds more tests and should be used. @uriyage do you want to drive it to the last mile?

Signed-off-by: Ran Shidlansik <[email protected]>
@ranshid ranshid merged commit fd8270a into valkey-io:unstable Aug 6, 2025
61 of 62 checks passed
@github-project-automation github-project-automation bot moved this to To be backported in Valkey 8.1 Aug 6, 2025
@ranshid ranshid changed the title [CRASH] Fix missing check for executing client Prevent bad memory access when NOTOUCH client gets unblocked Aug 6, 2025
@zuiderkwast
Copy link
Contributor

If we want to backport it, we need to re-apply the actual fix too, i.e. the first commit in this PR.

@ranshid
Copy link
Member

ranshid commented Aug 6, 2025

If we want to backport it, we need to re-apply the actual fix too, i.e. the first commit in this PR.

It still has the fix as part of the commit. I think we can cherry-pick it as is

allenss-amazon pushed a commit to allenss-amazon/valkey-core that referenced this pull request Aug 19, 2025
Fix missing check for executing client in `lookupKey` function

### Issue
The `lookupKey` function in db.c accesses
`server.executing_client->cmd->proc` without first verifying that
`server.executing_client` is not NULL. This was introduced in valkey-io#1499
where the check for executing client was added without verifying it
could be null.

The server crashes with a null pointer dereference when the
current_client's flag.no_touch is set.
```
27719 valkey-server *
/lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0]
src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7]
src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc]
src/valkey-server 127.0.0.1:21113[0x52b8f1]
src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f]
src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9]
src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5]
src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5]
src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b]
src/valkey-server 127.0.0.1:21113[0x57daa6]
src/valkey-server 127.0.0.1:21113[0x57e280]
src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259]
src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450]
src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a]
src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a]
```

### Fix
Added a null check for `server.executing_client` before attempting to
dereference it:

### Tests
Added a regression test in tests/unit/type/list.tcl.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Co-authored-by: Ran Shidlansik <[email protected]>
@ranshid ranshid moved this from To be backported to In Progress in Valkey 8.1 Sep 30, 2025
ranshid added a commit to ranshid/valkey that referenced this pull request Sep 30, 2025
Fix missing check for executing client in `lookupKey` function

The `lookupKey` function in db.c accesses
`server.executing_client->cmd->proc` without first verifying that
`server.executing_client` is not NULL. This was introduced in valkey-io#1499
where the check for executing client was added without verifying it
could be null.

The server crashes with a null pointer dereference when the
current_client's flag.no_touch is set.
```
27719 valkey-server *
/lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0]
src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7]
src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc]
src/valkey-server 127.0.0.1:21113[0x52b8f1]
src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f]
src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9]
src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5]
src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5]
src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b]
src/valkey-server 127.0.0.1:21113[0x57daa6]
src/valkey-server 127.0.0.1:21113[0x57e280]
src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259]
src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450]
src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a]
src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a]
```

Added a null check for `server.executing_client` before attempting to
dereference it:

Added a regression test in tests/unit/type/list.tcl.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Co-authored-by: Ran Shidlansik <[email protected]>
@ranshid ranshid moved this from In Progress to 8.1.4 in Valkey 8.1 Sep 30, 2025
@ranshid ranshid moved this from 8.1.4 to To be backported in Valkey 8.1 Sep 30, 2025
zuiderkwast pushed a commit that referenced this pull request Oct 1, 2025
Fix missing check for executing client in `lookupKey` function

The `lookupKey` function in db.c accesses
`server.executing_client->cmd->proc` without first verifying that
`server.executing_client` is not NULL. This was introduced in #1499
where the check for executing client was added without verifying it
could be null.

The server crashes with a null pointer dereference when the
current_client's flag.no_touch is set.
```
27719 valkey-server *
/lib64/libpthread.so.0(+0x118e0)[0x7f34cb96a8e0]
src/valkey-server 127.0.0.1:21113(lookupKey+0xf5)[0x4a14b7]
src/valkey-server 127.0.0.1:21113(lookupKeyReadWithFlags+0x50)[0x4a15fc]
src/valkey-server 127.0.0.1:21113[0x52b8f1]
src/valkey-server 127.0.0.1:21113(handleClientsBlockedOnKeys+0xa5)[0x52b16f]
src/valkey-server 127.0.0.1:21113(processCommand+0xf1e)[0x4712c9]
src/valkey-server 127.0.0.1:21113(processCommandAndResetClient+0x35)[0x490fd5]
src/valkey-server 127.0.0.1:21113(processInputBuffer+0xe1)[0x4912e5]
src/valkey-server 127.0.0.1:21113(readQueryFromClient+0x8c)[0x49177b]
src/valkey-server 127.0.0.1:21113[0x57daa6]
src/valkey-server 127.0.0.1:21113[0x57e280]
src/valkey-server 127.0.0.1:21113(aeProcessEvents+0x261)[0x45b259]
src/valkey-server 127.0.0.1:21113(aeMain+0x2a)[0x45b450]
src/valkey-server 127.0.0.1:21113(main+0xd43)[0x479bf6]
/lib64/libc.so.6(__libc_start_main+0xea)[0x7f34cb5cd13a]
src/valkey-server 127.0.0.1:21113(_start+0x2a)[0x454e3a]
```

Added a null check for `server.executing_client` before attempting to
dereference it:

Added a regression test in tests/unit/type/list.tcl.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Signed-off-by: Ran Shidlansik <[email protected]>
Co-authored-by: Ran Shidlansik <[email protected]>
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.4 in Valkey 8.1 Oct 1, 2025
moticless added a commit to redis/redis that referenced this pull request Oct 13, 2025
This PR is based on:
valkey-io/valkey#2347

This was introduced in #13512

The server crashes with a null pointer dereference when lookupKey() is
called from handleClientsBlockedOnKey(). The crash occurs because
server.executing_client is NULL, but the code attempts to access
server.executing_client->cmd->proc without checking.

**Crash scenario:**
Client 1 enables CLIENT NO-TOUCH
Client 2 blocks on BRPOP mylist 0
Client 1 executes RPUSH mylist elem
When unblocking Client 2, lookupKey() dereferences NULL
server.executing_client → crash

**Solution**
Added proper null checks before dereferencing server.executing_client:
Check if LOOKUP_NOTOUCH flag is already set before attempting to modify
it
Verify both server.current_client and server.executing_client are not
NULL before accessing their members
Maintain the TOUCH command exception for scripts

**Testing**
Added regression test in tests/unit/type/list.tcl that reproduces and
verifies the fix for this crash scenario.

This fix is based on valkey-io/valkey#2347

Co-authored-by: Uri Yagelnik <[email protected]>
Co-authored-by: Ran Shidlansik <[email protected]>
ranshid pushed a commit that referenced this pull request Jan 1, 2026
## Add Fuzzing Capability to Valkey

### Overview
This PR adds a fuzzing capability to Valkey, allowing developers and
users to stress test their Valkey deployments with randomly generated
commands. The fuzzer is integrated with the existing valkey-benchmark
tool, making it easy to use without requiring additional dependencies.

### Key Features
• **Command Generator**: Automatically generates Valkey commands by
retrieving command information directly from the server
• **Two Fuzzing Modes**:
- normal: Generates only valid commands, doesn't modify server
configurations
- aggressive: Includes malformed commands and allows CONFIG SET
operations

• **Multi-threaded Testing**: Each client runs in a dedicated thread to
maximize interaction between clients and enable testing of complicated
scenarios

• **Integration with valkey-benchmark**: Uses the existing CLI interface

### Implementation Details
• Added new files:
- `fuzzer_command_generator.h/c`: Dynamically generates valkey commands.
- `fuzzer_client.c`: Orchestrate all the client threads, report test
progress, and handle errors.

• Modified existing files:
  - valkey-benchmark.c: Added fuzzing mode options and integration

### Command Generation Approach
The fuzzer dynamically retrieves command information from the server,
allowing it to adapt to different Valkey versions and custom modules.
Since the command information generated from JSON files is sometimes
limited, not all generated commands will be valid, but approximately 95%
valid command generation is achieved.

It is important to generate valid commands to cover as much code path as
possible and not just the invalid command/args path. The fuzzer
prioritizes generating syntactically and semantically correct commands
to ensure thorough testing of the server's core functionality, while
still including a small percentage of invalid commands in `aggressive`
mode to test error handling paths

#### Config modification
For CONFIG SET command, the situation is more complex as the server
currently provides limited information through CONFIG GET *. Some
hardcoded logic is implemented that will need to be modified in the
future. Ideally, the server should provide self-inspection commands to
retrieve config keys-values with their properties (enum values,
modifiability status, etc.).

### Issue Detection
The fuzzer is designed to identify several types of issues:
• Server crashes
• Server memory corruptions / memory leaks(when compiled with ASAN)
• Server unresponsiveness
• Server malformed replies

For unresponsiveness detection, command timeout limits are implemented
to ensure no command blocks for excessive periods. If a server doesn't
respond within 30 seconds, the fuzzer signals that something is wrong.

### Proven Effectiveness
When running against the latest unstable version, the fuzzer has already
identified several issues, demonstrating its effectiveness:
* #2111
* #2112
* #2109
* #2113
* #2108
* #2137
* #2106
* #2347
* #2973
* #2974
### How to Use
Run the fuzzer using the valkey-benchmark tool with the --fuzz flag:

```bash
# Basic usage (10000 commands 1000 commands per client, 10 clients)
./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With aggressive fuzzing mode
./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With detailed logging
./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10
```

The fuzzer supports existing valkey-benchmark options, including TLS and
cluster mode configuration.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
jdheyburn pushed a commit to jdheyburn/valkey that referenced this pull request Jan 8, 2026
## Add Fuzzing Capability to Valkey

### Overview
This PR adds a fuzzing capability to Valkey, allowing developers and
users to stress test their Valkey deployments with randomly generated
commands. The fuzzer is integrated with the existing valkey-benchmark
tool, making it easy to use without requiring additional dependencies.

### Key Features
• **Command Generator**: Automatically generates Valkey commands by
retrieving command information directly from the server
• **Two Fuzzing Modes**:
- normal: Generates only valid commands, doesn't modify server
configurations
- aggressive: Includes malformed commands and allows CONFIG SET
operations

• **Multi-threaded Testing**: Each client runs in a dedicated thread to
maximize interaction between clients and enable testing of complicated
scenarios

• **Integration with valkey-benchmark**: Uses the existing CLI interface

### Implementation Details
• Added new files:
- `fuzzer_command_generator.h/c`: Dynamically generates valkey commands.
- `fuzzer_client.c`: Orchestrate all the client threads, report test
progress, and handle errors.

• Modified existing files:
  - valkey-benchmark.c: Added fuzzing mode options and integration

### Command Generation Approach
The fuzzer dynamically retrieves command information from the server,
allowing it to adapt to different Valkey versions and custom modules.
Since the command information generated from JSON files is sometimes
limited, not all generated commands will be valid, but approximately 95%
valid command generation is achieved.

It is important to generate valid commands to cover as much code path as
possible and not just the invalid command/args path. The fuzzer
prioritizes generating syntactically and semantically correct commands
to ensure thorough testing of the server's core functionality, while
still including a small percentage of invalid commands in `aggressive`
mode to test error handling paths

#### Config modification
For CONFIG SET command, the situation is more complex as the server
currently provides limited information through CONFIG GET *. Some
hardcoded logic is implemented that will need to be modified in the
future. Ideally, the server should provide self-inspection commands to
retrieve config keys-values with their properties (enum values,
modifiability status, etc.).

### Issue Detection
The fuzzer is designed to identify several types of issues:
• Server crashes
• Server memory corruptions / memory leaks(when compiled with ASAN)
• Server unresponsiveness
• Server malformed replies

For unresponsiveness detection, command timeout limits are implemented
to ensure no command blocks for excessive periods. If a server doesn't
respond within 30 seconds, the fuzzer signals that something is wrong.

### Proven Effectiveness
When running against the latest unstable version, the fuzzer has already
identified several issues, demonstrating its effectiveness:
* valkey-io#2111
* valkey-io#2112
* valkey-io#2109
* valkey-io#2113
* valkey-io#2108
* valkey-io#2137
* valkey-io#2106
* valkey-io#2347
* valkey-io#2973
* valkey-io#2974
### How to Use
Run the fuzzer using the valkey-benchmark tool with the --fuzz flag:

```bash
# Basic usage (10000 commands 1000 commands per client, 10 clients)
./src/valkey-benchmark --fuzz -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With aggressive fuzzing mode
./src/valkey-benchmark --fuzz --fuzz-level aggressive -h 127.0.0.1 -p 6379 -n 10000 -c 10

# With detailed logging
./src/valkey-benchmark --fuzz --fuzz-log-level debug -h 127.0.0.1 -p 6379 -n 10000 -c 10
```

The fuzzer supports existing valkey-benchmark options, including TLS and
cluster mode configuration.

---------

Signed-off-by: Uri Yagelnik <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: 8.1.4
Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants