Skip to content

Update 6.0 devel#2

Merged
kasiawasiuta merged 139 commits intomemKeyDB:6.0-develfrom
kasiawasiuta:update-6.0-devel
Mar 11, 2020
Merged

Update 6.0 devel#2
kasiawasiuta merged 139 commits intomemKeyDB:6.0-develfrom
kasiawasiuta:update-6.0-devel

Conversation

@kasiawasiuta
Copy link
Member

Rebase 6.0-devel to current 6.0 branch

@kasiawasiuta kasiawasiuta requested a review from jschmieg March 10, 2020 13:51
yz1509 and others added 28 commits March 11, 2020 09:42
Instead of 512, use the defined max from networking.c
Funcion adjustOpenFilesLimit() has an implicit parameter, which is server.maxclients.
This function aims to ajust maximum file descriptor number according to server.maxclients
by best effort, which is "bestlimit" could be lower than "maxfiles" but greater than "oldlimit".
When we try to increase "maxclients" using CONFIG SET command, we could increase maximum
file descriptor number to a bigger value without calling aeResizeSetSize the same time.
When later more and more clients connect to server, the allocated fd could be bigger and bigger,
and eventually exceeds events size of aeEventLoop.events. When new nodes joins the cluster,
new link is created, together with new fd, but when calling aeCreateFileEvent, we did not
check the return value. In this case, we have a non-null "link" but the associated fd is not
registered.

So when we dynamically set "maxclients" we could reach an inconsistency between maximum file
descriptor number of the process and server.maxclients. And later could cause cluster link and link
fd inconsistency.

While setting "maxclients" dynamically, we consider it as failed when resulting "maxclients" is not
the same as expected. We try to restore back the maximum file descriptor number when we failed to set
"maxclients" to the specified value, so that server.maxclients could act as a guard as before.
This commit solves the following bug:
127.0.0.1:6379> XGROUP CREATE x grp $ MKSTREAM
OK
127.0.0.1:6379> XADD x 666 f v
"666-0"
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
   2) 1) 1) "666-0"
         2) 1) "f"
            2) "v"
127.0.0.1:6379> XADD x 667 f v
"667-0"
127.0.0.1:6379> XDEL x 667
(integer) 1
127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x >
1) 1) "x"
   2) (empty array)

The root cause is that we use s->last_id in streamCompareID
while we should use the last *valid* ID
We exit later, so no bug fixed, but it is more correct.

See redis#6054, thanks to @ShooterIT for finding the issue.
The directive tls-prefer-server-cipher is actually tls-prefer-server-ciphers in config.c. This results in a failed directive call shown below. This pull request adds the "s" in ciphers so that the directive is able to be properly called in config.c

ubuntu@ip-172-31-16-31:~/redis$ src/redis-server ./redis.conf 

*** FATAL CONFIG FILE ERROR ***
Reading the configuration file, at line 200
>>> 'tls-prefer-server-cipher yes'
Bad directive or wrong number of arguments
So error message `ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context` will become
`ERR 'get' command submitted, but only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context`
If a blocked module client times-out (or disconnects, unblocked
by CLIENT command, etc.) we need to call moduleUnblockClient
in order to free memory allocated by the module sub-system
and blocked-client private data

Other changes:
Made blockedonkeys.tcl tests a bit more aggressive in order
to smoke-out potential memory leaks
antirez and others added 23 commits March 11, 2020 09:47
This makes simpler to give people help when posting such kind of errors
in the mailing list or other help forums, because sometimes the
directive looks well spelled, but the version of Redis they are using is
not able to support it.
The callback approach we took is very efficient, the module can do any
filtering of keys without building any list and cloning strings, it can
also read data from the key's value. but if the user tries to re-open
the key, or any other key, this can cause dict re-hashing (dictFind does
that), and that's very bad to do from inside dictScan.

this commit protects the dict from doing any rehashing during scan, but
also warns the user not to attempt any writes or command calls from
within the callback, for fear of unexpected side effects and crashes.
now that we may use it more often (ACL), these excessive calls to malloc
and free can become an overhead.
The idea is that very few commands have a lot of keys, and when this
happens the allocation time becomes neglegible.
LRU_CYCLE_PERIOD is defined,but not used.
Modified doc files to match the project name and objectives
@kasiawasiuta kasiawasiuta force-pushed the update-6.0-devel branch 3 times, most recently from 46403e1 to 5d6c109 Compare March 11, 2020 09:12
@kasiawasiuta kasiawasiuta merged commit efec3ad into memKeyDB:6.0-devel Mar 11, 2020
kasiawasiuta pushed a commit that referenced this pull request Jul 8, 2020
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
michalbiesek pushed a commit that referenced this pull request Jul 28, 2021
Now both master and replicas keep track of the last replication offset
that contains meaningful data (ignoring the tailing pings), and both
trim that tail from the replication backlog, and the offset with which
they try to use for psync.

the implication is that if someone missed some pings, or even have
excessive pings that the promoted replica has, it'll still be able to
psync (avoid full sync).

the downside (which was already committed) is that replicas running old
code may fail to psync, since the promoted replica trims pings form it's
backlog.

This commit adds a test that reproduces several cases of promotions and
demotions with stale and non-stale pings

Background:
The mearningful offset on the master was added recently to solve a problem were
the master is left all alone, injecting PINGs into it's backlog when no one is
listening and then gets demoted and tries to replicate from a replica that didn't
have any of the PINGs (or at least not the last ones).

however, consider this case:
master A has two replicas (B and C) replicating directly from it.
there's no traffic at all, and also no network issues, just many pings in the
tail of the backlog. now B gets promoted, A becomes a replica of B, and C
remains a replica of A. when A gets demoted, it trims the pings from its
backlog, and successfully replicate from B. however, C is still aware of
these PINGs, when it'll disconnect and re-connect to A, it'll ask for something
that's not in the backlog anymore (since A trimmed the tail of it's backlog),
and be forced to do a full sync (something it didn't have to do before the
meaningful offset fix).

Besides that, the psync2 test was always failing randomly here and there, it
turns out the reason were PINGs. Investigating it shows the following scenario:

cycle 1: redis #1 is master, and all the rest are direct replicas of #1
cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1
now we see that when #1 is demoted it prints:
17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference)
17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964).
17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master.
and when #3 connects to the demoted #2, #2 says:
17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964

so the issue here is that the meaningful offset feature saved the day for the
demoted master (since it needs to sync from a replica that didn't get the last
ping), but it didn't help one of the other replicas which did get the last ping.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.