Update 6.0 devel by kasiawasiuta · Pull Request #2 · memKeyDB/memKeyDB

kasiawasiuta · 2020-03-10T13:01:31Z

Rebase 6.0-devel to current 6.0 branch

Instead of 512, use the defined max from networking.c

Funcion adjustOpenFilesLimit() has an implicit parameter, which is server.maxclients. This function aims to ajust maximum file descriptor number according to server.maxclients by best effort, which is "bestlimit" could be lower than "maxfiles" but greater than "oldlimit". When we try to increase "maxclients" using CONFIG SET command, we could increase maximum file descriptor number to a bigger value without calling aeResizeSetSize the same time. When later more and more clients connect to server, the allocated fd could be bigger and bigger, and eventually exceeds events size of aeEventLoop.events. When new nodes joins the cluster, new link is created, together with new fd, but when calling aeCreateFileEvent, we did not check the return value. In this case, we have a non-null "link" but the associated fd is not registered. So when we dynamically set "maxclients" we could reach an inconsistency between maximum file descriptor number of the process and server.maxclients. And later could cause cluster link and link fd inconsistency. While setting "maxclients" dynamically, we consider it as failed when resulting "maxclients" is not the same as expected. We try to restore back the maximum file descriptor number when we failed to set "maxclients" to the specified value, so that server.maxclients could act as a guard as before.

This commit solves the following bug: 127.0.0.1:6379> XGROUP CREATE x grp $ MKSTREAM OK 127.0.0.1:6379> XADD x 666 f v "666-0" 127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x > 1) 1) "x" 2) 1) 1) "666-0" 2) 1) "f" 2) "v" 127.0.0.1:6379> XADD x 667 f v "667-0" 127.0.0.1:6379> XDEL x 667 (integer) 1 127.0.0.1:6379> XREADGROUP GROUP grp Alice BLOCK 0 STREAMS x > 1) 1) "x" 2) (empty array) The root cause is that we use s->last_id in streamCompareID while we should use the last *valid* ID

@ShooterIT

We exit later, so no bug fixed, but it is more correct. See redis#6054, thanks to @ShooterIT for finding the issue.

The directive tls-prefer-server-cipher is actually tls-prefer-server-ciphers in config.c. This results in a failed directive call shown below. This pull request adds the "s" in ciphers so that the directive is able to be properly called in config.c ubuntu@ip-172-31-16-31:~/redis$ src/redis-server ./redis.conf *** FATAL CONFIG FILE ERROR *** Reading the configuration file, at line 200 >>> 'tls-prefer-server-cipher yes' Bad directive or wrong number of arguments

…(null char).

Related to redis#6110.

Related to redis#6054.

So error message `ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context` will become `ERR 'get' command submitted, but only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context`

If a blocked module client times-out (or disconnects, unblocked by CLIENT command, etc.) we need to call moduleUnblockClient in order to free memory allocated by the module sub-system and blocked-client private data Other changes: Made blockedonkeys.tcl tests a bit more aggressive in order to smoke-out potential memory leaks

This makes simpler to give people help when posting such kind of errors in the mailing list or other help forums, because sometimes the directive looks well spelled, but the version of Redis they are using is not able to support it.

The callback approach we took is very efficient, the module can do any filtering of keys without building any list and cloning strings, it can also read data from the key's value. but if the user tries to re-open the key, or any other key, this can cause dict re-hashing (dictFind does that), and that's very bad to do from inside dictScan. this commit protects the dict from doing any rehashing during scan, but also warns the user not to attempt any writes or command calls from within the callback, for fear of unexpected side effects and crashes.

now that we may use it more often (ACL), these excessive calls to malloc and free can become an overhead.

The idea is that very few commands have a lot of keys, and when this happens the allocation time becomes neglegible.

LRU_CYCLE_PERIOD is defined,but not used.

Modified doc files to match the project name and objectives

Now both master and replicas keep track of the last replication offset that contains meaningful data (ignoring the tailing pings), and both trim that tail from the replication backlog, and the offset with which they try to use for psync. the implication is that if someone missed some pings, or even have excessive pings that the promoted replica has, it'll still be able to psync (avoid full sync). the downside (which was already committed) is that replicas running old code may fail to psync, since the promoted replica trims pings form it's backlog. This commit adds a test that reproduces several cases of promotions and demotions with stale and non-stale pings Background: The mearningful offset on the master was added recently to solve a problem were the master is left all alone, injecting PINGs into it's backlog when no one is listening and then gets demoted and tries to replicate from a replica that didn't have any of the PINGs (or at least not the last ones). however, consider this case: master A has two replicas (B and C) replicating directly from it. there's no traffic at all, and also no network issues, just many pings in the tail of the backlog. now B gets promoted, A becomes a replica of B, and C remains a replica of A. when A gets demoted, it trims the pings from its backlog, and successfully replicate from B. however, C is still aware of these PINGs, when it'll disconnect and re-connect to A, it'll ask for something that's not in the backlog anymore (since A trimmed the tail of it's backlog), and be forced to do a full sync (something it didn't have to do before the meaningful offset fix). Besides that, the psync2 test was always failing randomly here and there, it turns out the reason were PINGs. Investigating it shows the following scenario: cycle 1: redis #1 is master, and all the rest are direct replicas of #1 cycle 2: redis #2 is promoted to master, #1 is a replica of #2 and #3 is replica of #1 now we see that when #1 is demoted it prints: 17339:S 21 Apr 2020 11:16:38.523 * Using the meaningful offset 3929963 instead of 3929977 to exclude the final PINGs (14 bytes difference) 17339:S 21 Apr 2020 11:16:39.391 * Trying a partial resynchronization (request e2b3f8817735fdfe5fa4626766daa938b61419e5:3929964). 17339:S 21 Apr 2020 11:16:39.392 * Successful partial resynchronization with master. and when #3 connects to the demoted #2, #2 says: 17339:S 21 Apr 2020 11:16:40.084 * Partial resynchronization not accepted: Requested offset for secondary ID was 3929978, but I can reply up to 3929964 so the issue here is that the meaningful offset feature saved the day for the demoted master (since it needs to sync from a replica that didn't get the last ping), but it didn't help one of the other replicas which did get the last ping.

kasiawasiuta requested a review from jschmieg March 10, 2020 13:51

jschmieg approved these changes Mar 10, 2020

View reviewed changes

yz1509 and others added 28 commits March 11, 2020 09:42

avoid sentinel changes promoted_slave to be its own replica.

7a95141

XCLAIM: Create the consumer only on successful claims.

e2fa8d1

Fixes redis#6744.

Adjusts 'io_threads_num' max to 128

21f6d79

Instead of 512, use the defined max from networking.c

typo fix in acl.c

20319f1

Add REDISMODULE_CTX_FLAGS_MULTI_DIRTY.

af7321d

Add support for incremental build with header files

37de569

Git ignore: ignore more files.

b9b309e

Free fakeclient argv on AOF error.

53df3d5

We exit later, so no bug fixed, but it is more correct. See redis#6054, thanks to @ShooterIT for finding the issue.

Fix bug on KEYS command where pattern starts with * followed by \x00 …

6ec4bba

…(null char).

Rename rdb asynchronously

8e6444d

Threaded IO: use main thread to handle write work

d30fd40

Threaded IO: use main thread to handle read work

bc4a2e3

A few comments about main thread serving I/O as well.

ad45102

Related to redis#6110.

Port PR redis#6110 to new connection object code.

9b5becc

Jump to right label on AOF parsing error.

18ca81f

Related to redis#6054.

Fix potential memory leak of rioWriteBulkStreamID().

7cc3e04

Fix potential memory leak of clusterLoadConfig().

c07af16

Free allocated sds in pfdebugCommand() to avoid memory leak.

93d6bdf

Simplify redis#6379 changes.

fe7fc7b

Setting N I/O threads should mean N-1 additional + 1 main thread.

69869da

Document I/O threads in redis.conf.

55cda8f

Make error when submitting command in incorrect context more explicit

3d08581

So error message `ERR only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context` will become `ERR 'get' command submitted, but only (P)SUBSCRIBE / (P)UNSUBSCRIBE / PING / QUIT allowed in this context`

Change error message for redis#6775.

a04026d

Add more info in the unblockClientFromModule() function.

2cc8641

antirez and others added 23 commits March 11, 2020 09:47

Changelog: explain Redis 6 SPOP change.

469699f

streamReplyWithRangeFromConsumerPEL: Redundant streamDecodeID

d071af8

module api docs for aux_save and aux_load

4638aae

add no_auth to COMMAND INFO

0e6448a

Add RM_CreateStringFromDouble

1c5d9d7

Modules: more details in RM_Scan API top comment.

d414744

Modules: reformat RM_Scan() top comment a bit.

cc9dd76

Optimize temporary memory allocations for getKeysFromCommand mechanism

201456d

now that we may use it more often (ACL), these excessive calls to malloc and free can become an overhead.

Use a smaller getkeys global buffer.

1c093dc

The idea is that very few commands have a lot of keys, and when this happens the allocation time becomes neglegible.

Remove RDB files used for replication in persistence-less instances.

7b44e4a

Introduce bg_unlink().

5b46ce7

Log RDB deletion in persistence-less instances.

fcb4df3

Check that the file exists in removeRDBUsedToSyncReplicas().

64ce9fa

Make sync RDB deletion configurable. Default to no.

518a131

RDB deletion: document it in example redis.conf.

db7c26c

Avoid compiler warnings

959373f

add missing file marco

d3f4bdf

Fix not used constant in lru_test_mode.

c4d29ab

LRU_CYCLE_PERIOD is defined,but not used.

Redis 6 RC2.

fa3051a

Fix release notes spelling mistake.

a7bbe1b

Merge pull request pmem#1 from kasiawasiuta/README-modification

5d6c109

Modified doc files to match the project name and objectives

kasiawasiuta force-pushed the update-6.0-devel branch 3 times, most recently from 46403e1 to 5d6c109 Compare March 11, 2020 09:12

kasiawasiuta merged commit efec3ad into memKeyDB:6.0-devel Mar 11, 2020

freecw mentioned this pull request Dec 4, 2020

redis-cli core down when using memkind built with empty jemalloc prefix #68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update 6.0 devel#2

Update 6.0 devel#2
kasiawasiuta merged 139 commits intomemKeyDB:6.0-develfrom
kasiawasiuta:update-6.0-devel

kasiawasiuta commented Mar 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

kasiawasiuta commented Mar 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants