Skip to content

[WIP] replication: handling PINGs as out-of-band data#8440

Open
soloestoy wants to merge 3 commits intoredis:unstablefrom
soloestoy:ping-out-of-repl-band
Open

[WIP] replication: handling PINGs as out-of-band data#8440
soloestoy wants to merge 3 commits intoredis:unstablefrom
soloestoy:ping-out-of-repl-band

Conversation

@soloestoy
Copy link
Contributor

@soloestoy soloestoy commented Feb 2, 2021

This feature tries to handle PINGs as out-of-band data in replication stream. The main idea and implementation is simple:

  1. master just propagate PINGs to replica client's reply buffer, but do not feed replication backlog, then the replication offset would not be affected.

  2. when replica receive PINGs, just execute it to keep alive, but do not increase the master client's reploff, then the replica can have the same offset with master. And do not proxy the PINGs to it's sub-replicas, just remove PINGs from replication stream.

  3. now replica do not proxy PINGs, so it should send PINGs to sub-replicas by itself.

  4. master accept replica only when it support this feature, or the replication offset will be wrong, so replica need send replconf capa ping-out-of-band to master.

@soloestoy soloestoy requested a review from oranagra February 2, 2021 12:44
In master, do not feed PINGs to replication backlog,
then it would not affect replication offset, but still
send PINGs to replicas to keep alive.

In replica, do not increase offset when receive PINGs,
then replica can have same offset with master.
@soloestoy soloestoy force-pushed the ping-out-of-repl-band branch from 4a2d6e7 to f430a20 Compare February 2, 2021 12:50
@oranagra
Copy link
Member

oranagra commented Feb 2, 2021

@soloestoy i don't think this is the right approach.
it reminds me the meaningful-offset feature of early 6.0 releases too much.

i don't like the fact that the master decides not to just write use sporadic writes directly to the slaves bypassing the replication buffers, and that the replicas just delete certain commands from their replication offset, and that these two behaviors are not necessarily in sync (i.e. the master can mistakenly write a ping into the replication buffers and the replica will decide to exclude it on it's side causing repl-offset mismatch).

i think the right approach which was discussed a few times in the past (don't remember where) is to use multiplexing of several distinct byte streams:

imagine that every payload that is transferred on the socket between the master and replicas carries a type field and a length.
so when the master writes data to the replica it can flag each payload as one of:

  1. command stream - what we have today (which is incrementing the replication offset, and is written into the replication backlog)
  2. chit chat - e.g. pings replconf and other configuration exchange
  3. full sync - this is the rdb content that's generated by bgsave

one of the main advantages of this approach is that it let's us multiplex the replica output buffers to the replica during the rdb transfer, so that we don't have to buffer them in the master (who is also suffering from CoW).

the key difference between this design and what this PR currently has is that this is a major change to the replication protocol (maybe harder to design, and implement in a backwards compatible way), but it's a structured protocol.
the current code relies on the coincidence that pings will be treated the same on both sides (in each by a different trigger).

@ShooterIT FYI (i remember recently discussing that with you)

@soloestoy
Copy link
Contributor Author

soloestoy commented Feb 2, 2021

@oranagra I agree multiplexing replication packet is better, the key point is same, distinguish data stream and control stream. This PR is just like a demo, and you see I marked it as WIP, the PING command flag out-of-replication is just like type filed. As you said we need design a new protocol using in replication indeed.

@madolson
Copy link
Contributor

madolson commented Feb 2, 2021

I agree that I don't think this is a building block to a long term solution. This really only solves the singular problem of having meaningless data in the replication stream, like pings, but doesn't allow us to solve some other interesting problems. We could have the conversation here, but I think an issue to think through the design would be more useful.

@enjoy-binbin
Copy link
Contributor

may be related, ping goes into the replication stream, one test fails

test-freebsd

*** [err]: FLUSHDB / FLUSHALL should replicate in tests/integration/replication.tcl
2022-11-10T00:30:17.0558340Z Expected 'ping' to match 'flushall' (context: type source line 861 file /Users/runner/work/redis/redis/tests/test_helper.tcl cmd {assert_match [lindex $patterns $j] [read_from_replication_stream $s]} proc ::assert_replication_stream level 1) 

@oranagra
Copy link
Member

@enjoy-binbin for now, we just need to change the tests to avoid this issue in affected tests. see some tests that change repl-ping-replica-period

@enjoy-binbin
Copy link
Contributor

enjoy-binbin commented Nov 13, 2022

odd, i did see in attach_to_replication_stream, it will set repl-ping-replica-period to 3600

the reason and the fix is in #11609

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

5 participants