Prevent active defrag from triggering during replicaof db flush#14274
Prevent active defrag from triggering during replicaof db flush#14274sundb merged 10 commits intoredis:unstablefrom
Conversation
🎉 Snyk checks have passed. No issues have been found so far.✅ security/snyk check is complete. No issues have been found. (View Details) ✅ license/snyk check is complete. No issues have been found. (View Details) |
|
New Issues (8)Checkmarx found the following issues in this Pull Request
Fixed Issues (4)Great job! The following issues were fixed in this Pull Request
|
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a critical bug where active defragmentation could be triggered during replica database flush operations, causing crashes due to concurrent database modifications. The fix temporarily disables active defrag before emptying the database and restores the setting afterward.
- Prevents active defrag from running during
replicationEmptyDbCallback()execution - Adds comprehensive test coverage to verify the fix works correctly
- Resolves crashes caused by defrag modifying the database while it's being emptied
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/replication.c | Adds temporary disabling of active defrag around database flush operation |
| tests/unit/memefficiency.tcl | Adds test case to reproduce and verify the fix for issue #14267 |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
i see #13495 is part of 8.2, so do we really need to backport that to 8.0? |
|
odd. the PR is part of the 8.2 project.. |
This PR fixes two crashes due to the defragmentation of the Lua script, which were by #13108 1. During long-running Lua script execution, active defragmentation may be triggered, causing the luaScript structure to be reallocated to a new memory location, then we access `l->node`(may be reallocatedd) after script execution to update the Lua LRU list. In this PR, we don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends. Note that defrag is now only permitted during loading. This PR also reverts the changes made by #14274. 2. Forgot to update the Lua LUR list node's value. Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s key, we also need to update `node->value` when the key is reallocated. In this PR, after performing defragmentation on a Lua script, if the script is in the LRU list, its reference in the LRU list will be unconditionally updated.
This PR fixes two crashes due to the defragmentation of the Lua script, which were by redis#13108 1. During long-running Lua script execution, active defragmentation may be triggered, causing the luaScript structure to be reallocated to a new memory location, then we access `l->node`(may be reallocatedd) after script execution to update the Lua LRU list. In this PR, we don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends. Note that defrag is now only permitted during loading. This PR also reverts the changes made by redis#14274. 2. Forgot to update the Lua LUR list node's value. Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s key, we also need to update `node->value` when the key is reallocated. In this PR, after performing defragmentation on a Lua script, if the script is in the LRU list, its reference in the LRU list will be unconditionally updated.
This PR fixes two crashes due to the defragmentation of the Lua script, which were by redis#13108 1. During long-running Lua script execution, active defragmentation may be triggered, causing the luaScript structure to be reallocated to a new memory location, then we access `l->node`(may be reallocatedd) after script execution to update the Lua LRU list. In this PR, we don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends. Note that defrag is now only permitted during loading. This PR also reverts the changes made by redis#14274. 2. Forgot to update the Lua LUR list node's value. Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s key, we also need to update `node->value` when the key is reallocated. In this PR, after performing defragmentation on a Lua script, if the script is in the LRU list, its reference in the LRU list will be unconditionally updated.
This PR fixes two crashes due to the defragmentation of the Lua script, which were by redis#13108 1. During long-running Lua script execution, active defragmentation may be triggered, causing the luaScript structure to be reallocated to a new memory location, then we access `l->node`(may be reallocatedd) after script execution to update the Lua LRU list. In this PR, we don't defrag during blocked scripts, so we don't mess up the LRU update when the script ends. Note that defrag is now only permitted during loading. This PR also reverts the changes made by redis#14274. 2. Forgot to update the Lua LUR list node's value. Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s key, we also need to update `node->value` when the key is reallocated. In this PR, after performing defragmentation on a Lua script, if the script is in the LRU list, its reference in the LRU list will be unconditionally updated.
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
…s#14274) Fix redis#14267 This bug was introduced by redis#13495 ### Summary When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process. If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted. The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation. Code Path: ``` replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked() ``` ### Solution This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete. --------- Co-authored-by: Copilot <[email protected]>
|
@sundb is this fix available in any of the 8.4.x releases (or later)? |
|
@kaplanben this bug was fixed since 8.0. |
|
@sundb just to clarify: are you saying that #14319 fixes the crashes related to active defrag triggered replicaof db flushes? The fix you posted looks like it addresses the cases of defrag triggered by lua scripts. We think we're seeing the same issue, but we're seeing it on replicas, not via lua scripts. |
|
@kaplanben #14319 is also applicable to this fix, so it fixes both the issues of replica and the lua script. |



Fix #14267
This bug was introduced by #13495
Summary
When a replica clears a large database, it periodically calls processEventsWhileBlocked() in the replicationEmptyDbCallback() callback during the key deletion process.
If defragmentation is enabled, this means that active defrag can be triggered while the database is being deleted.
The defragmentation process may also modify the database at this time, which could lead to crashes when the database is accessed after defragmentation.
Code Path:
Solution
This PR temporarily disables active defrag before emptying the database, then restores the active defrag setting after the empty is complete.
Crash Report