Fix bugs related to synchronous TLS connections blocking#837
Merged
madolson merged 23 commits intovalkey-io:unstablefrom Aug 12, 2024
Merged
Fix bugs related to synchronous TLS connections blocking#837madolson merged 23 commits intovalkey-io:unstablefrom
madolson merged 23 commits intovalkey-io:unstablefrom
Conversation
Signed-off-by: naglera <[email protected]>
ranshid
reviewed
Jul 29, 2024
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #837 +/- ##
============================================
- Coverage 70.38% 70.23% -0.16%
============================================
Files 112 112
Lines 61462 61470 +8
============================================
- Hits 43261 43173 -88
- Misses 18201 18297 +96
|
ranshid
reviewed
Jul 29, 2024
Co-authored-by: ranshid <[email protected]> Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Member
|
@naglera please fix the issue headline as it now include 2 fixes. please also add explanation for the fixes in the top comment |
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Member
|
There are some new errors I haven't seen before in the extended tests: https://github.com/valkey-io/valkey/actions/runs/10146979329/job/28056324528?pr=837 |
Signed-off-by: naglera <[email protected]>
On busy hosts, rdb-key-save-delay may pause the process for more then expected due to long recurreing context switches Signed-off-by: naglera <[email protected]>
…lly succeed" This reverts commit 10c73e5. Signed-off-by: naglera <[email protected]>
This reverts commit 5f1cd17. Signed-off-by: naglera <[email protected]>
1.Replica recover rdb-connection killed 2.Replica recover main-connection killed We ca only catch bgsave while in progress or expect replica to compleat the sync but not both. Using rdb-key-save-delay is not predictable enough when machine has high load. Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
Signed-off-by: naglera <[email protected]>
1. Test replica's buffer limit reached 2. dual-channel-replication fails when primary diskless disabled In both cases we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>
Test replica unable to join dual channel replication sync after started we should not count on rdb-key-save-delay to be precise on machines with high load. Signed-off-by: naglera <[email protected]>
…rite. Dual-channel file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`). The child process runs in blocking mode. If a write operation fails to write the entire buffer, SSL returns `SSL_ERROR_WANT_WRITE`. This error is ignored, causing the replica to fail in loading the RDB. This change sets file descriptors after the blocking write. Signed-off-by: xbasel <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Signed-off-by: naglera <[email protected]>
Fix "Test dual-channel-replication primary reject set-rdb-client after client killed". When replica is paused rdb child process can't recognize connection closed. Need to resume the replica in order to fail the sync. Signed-off-by: naglera <[email protected]>
When using blocking connection we can't normally close the connection at the main process context, since it may block the main proc if the replica is not responding. We are also unable to skip this since otherwise child process will continue the save. Signed-off-by: naglera <[email protected]>
- dual-channel-replication fails when primary diskless disabled - Replica recover rdb-connection killed In both tests we should make child proces sleep for shorter intervals so the save will be terminated on time Signed-off-by: naglera <[email protected]>
This reverts commit 75ff15d. Signed-off-by: naglera <[email protected]>
…nsfer error Currently lastbgsave_status is used in bgsave or disk-replication, and the target is the disk. In valkey-io#60, we update it when transfer error, i think it is mainly used in tests, so we can use log to replace it. It changes lastbgsave_status to err in this case, but it is strange that it does not set ok or err in the above if and the following else. Also noted this will affect stop-writes-on-bgsave-error. Signed-off-by: Binbin <[email protected]>
dual-channel-replication fails when primary diskless disabled - we should wait for bgproc to exit Signed-off-by: naglera <[email protected]>
madolson
reviewed
Aug 12, 2024
madolson
approved these changes
Aug 12, 2024
mapleFU
pushed a commit
to mapleFU/valkey
that referenced
this pull request
Aug 21, 2024
- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: mwish <[email protected]>
mapleFU
pushed a commit
to mapleFU/valkey
that referenced
this pull request
Aug 22, 2024
- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Signed-off-by: mwish <[email protected]>
This was referenced Aug 27, 2024
madolson
added a commit
that referenced
this pull request
Sep 2, 2024
- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]>
madolson
added a commit
that referenced
this pull request
Sep 3, 2024
- Fix TLS bug where connection were shutdown by primary's main process while the child process was still writing- causing main process to be blocked. - TLS connection fix -file descriptors are set to blocking mode in the main thread, followed by a blocking write. This sets the file descriptors to non-blocking if TLS is used (see `connTLSSyncWrite()`) (@xbasel). - Improve the reliability of dual-channel tests. Modify the pause mechanism to verify process status directly, rather than relying on log. - Ensure that `server.repl_offset` and `server.replid` are updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets. --------- Signed-off-by: naglera <[email protected]> Signed-off-by: naglera <[email protected]> Signed-off-by: xbasel <[email protected]> Signed-off-by: Madelyn Olson <[email protected]> Signed-off-by: Binbin <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]>
ranshid
pushed a commit
that referenced
this pull request
Nov 21, 2024
This change prevents unintended side effects on connection state and improves consistency with non-TLS sync operations. For example, when invoking `connTLSSyncRead` with a blocking file descriptor, the mode is switched to non-blocking upon `connTLSSyncRead` exit. If the code assumes the file descriptor remains blocking and calls the normal `read` expecting it to block, it may result in a short read. This caused a crash in dual-channel, which was fixed in this PR by relocating `connBlock()`: #837 Signed-off-by: xbasel <[email protected]>
tezc
added a commit
to redis/redis
that referenced
this pull request
Jan 13, 2025
This PR is based on: #12109 valkey-io/valkey#60 Closes: #11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: valkey-io/valkey#837 valkey-io/valkey#1173 valkey-io/valkey#804 valkey-io/valkey#945 valkey-io/valkey#989 --------- Co-authored-by: Yuan Wang <[email protected]> Co-authored-by: debing.sun <[email protected]> Co-authored-by: Moti Cohen <[email protected]> Co-authored-by: naglera <[email protected]> Co-authored-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]>
funny-dog
pushed a commit
to funny-dog/redis
that referenced
this pull request
Sep 17, 2025
This PR is based on: redis#12109 valkey-io/valkey#60 Closes: redis#11678 **Motivation** During a full sync, when master is delivering RDB to the replica, incoming write commands are kept in a replication buffer in order to be sent to the replica once RDB delivery is completed. If RDB delivery takes a long time, it might create memory pressure on master. Also, once a replica connection accumulates replication data which is larger than output buffer limits, master will kill replica connection. This may cause a replication failure. The main benefit of the rdb channel replication is streaming incoming commands in parallel to the RDB delivery. This approach shifts replication stream buffering to the replica and reduces load on master. We do this by opening another connection for RDB delivery. The main channel on replica will be receiving replication stream while rdb channel is receiving the RDB. This feature also helps to reduce master's main process CPU load. By opening a dedicated connection for the RDB transfer, the bgsave process has access to the new connection and it will stream RDB directly to the replicas. Before this change, due to TLS connection restriction, the bgsave process was writing RDB bytes to a pipe and the main process was forwarding it to the replica. This is no longer necessary, the main process can avoid these expensive socket read/write syscalls. It also means RDB delivery to replica will be faster as it avoids this step. In summary, replication will be faster and master's performance during full syncs will improve. **Implementation steps** 1. When replica connects to the master, it sends 'rdb-channel-repl' as part of capability exchange to let master to know replica supports rdb channel. 2. When replica lacks sufficient data for PSYNC, master sends +RDBCHANNELSYNC reply with replica's client id. As the next step, the replica opens a new connection (rdb-channel) and configures it against the master with the appropriate capabilities and requirements. It also sends given client id back to master over rdbchannel, so that master can associate these channels. (initial replica connection will be referred as main-channel) Then, replica requests fullsync using the RDB channel. 3. Prior to forking, master attaches the replica's main channel to the replication backlog to deliver replication stream starting at the snapshot end offset. 4. The master main process sends replication stream via the main channel, while the bgsave process sends the RDB directly to the replica via the rdb-channel. Replica accumulates replication stream in a local buffer, while the RDB is being loaded into the memory. 5. Once the replica completes loading the rdb, it drops the rdb channel and streams the accumulated replication stream into the db. Sync is completed. **Some details** - Currently, rdbchannel replication is supported only if `repl-diskless-sync` is enabled on master. Otherwise, replication will happen over a single connection as in before. - On replica, there is a limit to replication stream buffering. Replica uses a new config `replica-full-sync-buffer-limit` to limit number of bytes to accumulate. If it is not set, replica inherits `client-output-buffer-limit <replica>` hard limit config. If we reach this limit, replica stops accumulating. This is not a failure scenario though. Further accumulation will happen on master side. Depending on the configured limits on master, master may kill the replica connection. **API changes in INFO output:** 1. New replica state: `send_bulk_and_stream`. Indicates full sync is still in progress for this replica. It is receiving replication stream and rdb in parallel. ``` slave0:ip=127.0.0.1,port=5002,state=send_bulk_and_stream,offset=0,lag=0 ``` Replica state changes in steps: - First, replica sends psync and receives +RDBCHANNELSYNC :`state=wait_bgsave` - After replica connects with rdbchannel and delivery starts: `state=send_bulk_and_stream` - After full sync: `state=online` 2. On replica side, replication stream buffering metrics: - replica_full_sync_buffer_size: Currently accumulated replication stream data in bytes. - replica_full_sync_buffer_peak: Peak number of bytes that this instance accumulated in the lifetime of the process. ``` replica_full_sync_buffer_size:20485 replica_full_sync_buffer_peak:1048560 ``` **API changes in CLIENT LIST** In `client list` output, rdbchannel clients will have 'C' flag in addition to 'S' replica flag: ``` id=11 addr=127.0.0.1:39108 laddr=127.0.0.1:5001 fd=14 name= age=5 idle=5 flags=SC db=0 sub=0 psub=0 ssub=0 multi=-1 watch=0 qbuf=0 qbuf-free=0 argv-mem=0 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=1920 events=r cmd=psync user=default redir=-1 resp=2 lib-name= lib-ver= io-thread=0 ``` **Config changes:** - `replica-full-sync-buffer-limit`: Controls how much replication data replica can accumulate during rdbchannel replication. If it is not set, a value of 0 means replica will inherit `client-output-buffer-limit <replica>` hard limit config to limit accumulated data. - `repl-rdb-channel` config is added as a hidden config. This is mostly for testing as we need to support both rdbchannel replication and the older single connection replication (to keep compatibility with older versions and rdbchannel replication will not be enabled if repl-diskless-sync is not enabled). it affects both the master (not to respond to rdb channel requests), and the replica (not to declare capability) **Internal API changes:** Changes that were introduced to Redis replication: - New replication capability is added to replconf command: `capa rdb-channel-repl`. Indicates replica is capable of rdb channel replication. Replica sends it when it connects to master along with other capabilities. - If replica needs fullsync, master replies `+RDBCHANNELSYNC <client-id>` to the replica's PSYNC request. - When replica opens rdbchannel connection, as part of replconf command, it sends `rdb-channel 1` to let master know this is rdb channel. Also, it sends `main-ch-client-id <client-id>` as part of replconf command so master can associate channels. **Testing:** As rdbchannel replication is enabled by default, we run whole test suite with it. Though, as we need to support both rdbchannel and single connection replication, we'll be running some tests twice with `repl-rdb-channel yes/no` config. **Replica state diagram** ``` * * Replica state machine * * * Main channel state * ┌───────────────────┐ * │RECEIVE_PING_REPLY │ * └────────┬──────────┘ * │ +PONG * ┌────────▼──────────┐ * │SEND_HANDSHAKE │ RDB channel state * └────────┬──────────┘ ┌───────────────────────────────┐ * │+OK ┌───► RDB_CH_SEND_HANDSHAKE │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_AUTH_REPLY │ │ REPLCONF main-ch-client-id <clientid> * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_AUTH_REPLY │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_PORT_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_REPLCONF_REPLY│ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_IP_REPLY │ │ │ +OK * └────────┬──────────┘ │ ┌──────────────▼────────────────┐ * │+OK │ │ RDB_CH_RECEIVE_FULLRESYNC │ * ┌────────▼──────────┐ │ └──────────────┬────────────────┘ * │RECEIVE_CAPA_REPLY │ │ │+FULLRESYNC * └────────┬──────────┘ │ │Rdb delivery * │ │ ┌──────────────▼────────────────┐ * ┌────────▼──────────┐ │ │ RDB_CH_RDB_LOADING │ * │SEND_PSYNC │ │ └──────────────┬────────────────┘ * └─┬─────────────────┘ │ │ Done loading * │PSYNC (use cached-master) │ │ * ┌─▼─────────────────┐ │ │ * │RECEIVE_PSYNC_REPLY│ │ ┌────────────►│ Replica streams replication * └─┬─────────────────┘ │ │ │ buffer into memory * │ │ │ │ * │+RDBCHANNELSYNC client-id │ │ │ * ├──────┬───────────────────┘ │ │ * │ │ Main channel │ │ * │ │ accumulates repl data │ │ * │ ┌──▼────────────────┐ │ ┌───────▼───────────┐ * │ │ REPL_TRANSFER ├───────┘ │ CONNECTED │ * │ └───────────────────┘ └────▲───▲──────────┘ * │ │ │ * │ │ │ * │ +FULLRESYNC ┌───────────────────┐ │ │ * ├────────────────► REPL_TRANSFER ├────┘ │ * │ └───────────────────┘ │ * │ +CONTINUE │ * └──────────────────────────────────────────────┘ */ ``` ----- This PR also contains changes and ideas from: valkey-io/valkey#837 valkey-io/valkey#1173 valkey-io/valkey#804 valkey-io/valkey#945 valkey-io/valkey#989 --------- Co-authored-by: Yuan Wang <[email protected]> Co-authored-by: debing.sun <[email protected]> Co-authored-by: Moti Cohen <[email protected]> Co-authored-by: naglera <[email protected]> Co-authored-by: Amit Nagler <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Co-authored-by: Binbin <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Ping Xie <[email protected]> Co-authored-by: Ran Shidlansik <[email protected]> Co-authored-by: ranshid <[email protected]> Co-authored-by: xbasel <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
connTLSSyncWrite()) (@xbasel).server.repl_offsetandserver.replidare updated correctly when dual channel synchronization completes successfully. Thist led to failures in replication tests that validate replication IDs or compare replication offsets.