Skip to content

Backport: Optimize RDB load performance and fix cluster mode resizing on replica side (#1199)#1328

Merged
zuiderkwast merged 1 commit intovalkey-io:8.0from
naglera:8.0
Apr 15, 2025
Merged

Backport: Optimize RDB load performance and fix cluster mode resizing on replica side (#1199)#1328
zuiderkwast merged 1 commit intovalkey-io:8.0from
naglera:8.0

Conversation

@naglera
Copy link
Contributor

@naglera naglera commented Nov 20, 2024

This PR addresses two issues:

  1. Performance Degradation Fix - Resolves a significant performance issue during RDB load on replica nodes.
  • The problem was causing replicas to rehash multiple times during the load process. Local testing demonstrated up to 50% degradation in BGSAVE time.
  • The problem occurs when the replica tries to expand pre-created slot dictionaries. This operation fails quietly, resulting in undetected performance issues.
  • This fix aims to optimize the RDB load process and restore expected performance levels.
  1. Bug fix when reading RDB_OPCODE_RESIZEDB in Valkey 8.0 cluster mode-
  • Use the shard's master slots count when processing this opcode, as clusterNodeCoversSlot is not initialized for the currently syncing replica.
  • Previously, this problem went unnoticed because RDB_OPCODE_RESIZEDB had no practical impact (due to 1).

These improvements will enhance overall system performance and ensure smoother upgrades to Valkey 8.0 in the future.

Testing:

  • Conducted local tests to verify the performance improvement during RDB load.
  • Verified that ignoring RDB_OPCODE_RESIZEDB does not negatively impact functionality in the current version.

(cherry picked from commit c5012cc)

…a side (valkey-io#1199)

This PR addresses two issues:

1. Performance Degradation Fix - Resolves a significant performance
issue during RDB load on replica nodes.
- The problem was causing replicas to rehash multiple times during the
load process. Local testing demonstrated up to 50% degradation in BGSAVE
time.
- The problem occurs when the replica tries to expand pre-created slot
dictionaries. This operation fails quietly, resulting in undetected
performance issues.
- This fix aims to optimize the RDB load process and restore expected
performance levels.

2. Bug fix when reading `RDB_OPCODE_RESIZEDB` in Valkey 8.0 cluster
mode-
- Use the shard's master slots count when processing this opcode, as
`clusterNodeCoversSlot` is not initialized for the currently syncing
replica.
- Previously, this problem went unnoticed because `RDB_OPCODE_RESIZEDB`
had no practical impact (due to 1).

These improvements will enhance overall system performance and ensure
smoother upgrades to Valkey 8.0 in the future.

Testing:
- Conducted local tests to verify the performance improvement during RDB
load.
- Verified that ignoring `RDB_OPCODE_RESIZEDB` does not negatively
impact functionality in the current version.

Signed-off-by: naglera <[email protected]>
Co-authored-by: Binbin <[email protected]>
(cherry picked from commit c5012cc)
Signed-off-by: naglera <[email protected]>
@codecov
Copy link

codecov bot commented Nov 20, 2024

Codecov Report

Attention: Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 70.62%. Comparing base (4fbab57) to head (bb40d19).
Report is 35 commits behind head on 8.0.

Files with missing lines Patch % Lines
src/db.c 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##              8.0    #1328      +/-   ##
==========================================
- Coverage   70.66%   70.62%   -0.04%     
==========================================
  Files         114      114              
  Lines       61681    62951    +1270     
==========================================
+ Hits        43585    44458     +873     
- Misses      18096    18493     +397     
Files with missing lines Coverage Δ
src/kvstore.c 96.27% <100.00%> (+0.13%) ⬆️
src/db.c 88.50% <0.00%> (+0.08%) ⬆️

... and 90 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xbasel xbasel self-requested a review November 20, 2024 18:05
@zuiderkwast zuiderkwast changed the title Optimize RDB load performance and fix cluster mode resizing on replica side (#1199) Backport: Optimize RDB load performance and fix cluster mode resizing on replica side (#1199) Nov 20, 2024
@zuiderkwast zuiderkwast merged commit 9aeb848 into valkey-io:8.0 Apr 15, 2025
@github-project-automation github-project-automation bot moved this to To be backported in Valkey 8.0 Apr 15, 2025
@zuiderkwast zuiderkwast moved this from To be backported to Done in Valkey 8.0 Apr 15, 2025
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Apr 15, 2025
@murphyjacob4 murphyjacob4 moved this from Done to 8.0.3 in Valkey 8.0 Apr 22, 2025
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Aug 11, 2025
…e the hashtable

If we want to expand kvstoreHashtableExpand, we need to make sure the
hashtable exists. Currently, when processing RDB slot-info, our expand
has no effect because the hashtable does not exist (we initialize it only
when we need it).
```
if (server.cluster_enabled) {
    /* In cluster mode we resize individual slot specific dictionaries based on the number of keys that
     * slot holds. */
    kvstoreHashtableExpand(db->keys, slot_id, slot_size);
    kvstoreHashtableExpand(db->expires, slot_id, expires_slot_size);
    should_expand_db = 0;
}
```

We also update kvstoreExpand to use the kvstoreHashtableExpand to make
sure there is only one code path. Also see valkey-io#1328 for more details.

Signed-off-by: Binbin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes

Projects

Status: 8.0.3

Development

Successfully merging this pull request may close these issues.

3 participants