Skip to content

Annotate module API functions in redismodule.h for use with -fno-common#6900

Merged
oranagra merged 9 commits intoredis:unstablefrom
natoscott:fno-common
Aug 14, 2020
Merged

Annotate module API functions in redismodule.h for use with -fno-common#6900
oranagra merged 9 commits intoredis:unstablefrom
natoscott:fno-common

Conversation

@natoscott
Copy link
Contributor

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the common attribute. This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at https://gcc.gnu.org/gcc-10/porting_to.html

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute.  This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at https://gcc.gnu.org/gcc-10/porting_to.html
@agmt
Copy link
Contributor

agmt commented May 26, 2020

Is it a problem for other compilers? If so, what about using extern (for all non-main files which must have some marker like #define REDISMODULE_ADDITIONAL_FILE)?

@natoscott
Copy link
Contributor Author

Is it a problem for other compilers? If so, what about using extern (for all non-main files which must have some marker like #define REDISMODULE_ADDITIONAL_FILE)?

@agmt the situation with this header is more complicated than 'extern' allows us to resolve, unfortunately.

The header is designed to be completely standalone such that the header file actually creates the functions its describing within the modules object files that use those functions. There is no function someplace else that we can declare an 'extern' reference to as a result, and AIUI it was the intent of the author of this header that this not be needed.

I'm not sure if it affects other compilers - seems like not, so far, none have this new strictness gcc-10 has now introduced by default. The annotations I added simply end up making the header provide its previous behaviour, for gcc.

natoscott added a commit to natoscott/redisraft that referenced this pull request Jul 1, 2020
Annotate module API functions in redismodule.h for use with -fno-common.

Relates to redis/redis#6900
@yossigo
Copy link
Collaborator

yossigo commented Jul 2, 2020

@natoscott Thanks for reporting this! This issue with redismodule.h already came up in the past a few times and at least in some cases was solved in other ways. For example: https://github.com/RediSearch/RediSearch/blob/master/src/redismodule.h

I tend to think the "correct" solution is really to properly define all function pointers as extern, and require the user to explicitly invoke a macro that handles actual definition in a compilation unit of choice. This would have avoided the problem in the first place, but will break compatibility of course.

If we do stick with your suggestion as a simpler and backwards compatible solution for GCC 10.x, I'd at least consider creating some macro for the attribute settings to have better control, disable it on compilers that don't need or worse, don't accept it, etc.

@natoscott
Copy link
Contributor Author

natoscott commented Jul 3, 2020

@yossigo I agree re the technically correct solution, but tend to think backwards compatibility should win out here now, with so many modules written and since its relatively easily achieved.

I've updated the redisraft PR - if that redismodule.h seems OK, I'll resolve the conflicts and update this PR too.

Related: RedisLabs/redisraft#58

natoscott added 2 commits July 3, 2020 11:02
Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax.  These macros only expand to API annotations under gcc.
@natoscott
Copy link
Contributor Author

I've updated the redisraft PR - if that redismodule.h seems OK, I'll resolve the conflicts and update this PR too.

I went ahead and merged in all the recent changes to redismodule.h then recreated the original change, along with the conditional macros discussed earlier.

@agmt
Copy link
Contributor

agmt commented Jul 3, 2020

@natoscott I don't see any backward-compatibility issue if extern will appear only in case if it's explicitly defined. All existing code, which just includes redismodule.h, remains to add variables into object file (no extern).

@yossigo
Copy link
Collaborator

yossigo commented Jul 28, 2020

@natoscott Thanks, this looks much better now!

I think in the long run (maybe for 7.0) we should really fix this properly and provide a way for users to specify where vars are defined. I don't consider backwards compatibility a big issue here because:

  1. It only affects compilation - ABI remains intact.
  2. At the moment users generally copy redismodule.h over so if things do break, they're likely to break due to an explicit upgrade step.

I tend to think of this as a good interim solution.
@oranagra any other thoughts on this matter?

@yossigo yossigo requested a review from oranagra July 28, 2020 19:06
@oranagra
Copy link
Member

this topic was discussed in:

I voiced my opinion in the last one, if you want to see what i dislike.
And the suggestion i gave there is somewhat similar to this PR is headed.
So i'd like to suggest a few changes.

my concerns:

  • I don't want to change these lines again and again (i care about the blame log and merge conflicts), so if we're gonna change all of them, let's have them ready for whatever comes in the future, i.e. i'd like to add another define that's used before the function declaration, so that we can use it to add extern.
  • I rather the C file that includes this header be the one that sets the decorators (be it extern, __attribute__ or whatever), so even if we do want to add some attributes out of the box, we should have them defined only if not already defined, so that if the file that did the include wants to use something else, we let it have its way.
  • maybe this is the opportunity to get rid of the REDISMODULE_API_FUNC "function" macro. personally the parenthesis are giving my brain parsing error. reading that line, it's not a C syntax. my brain does seem to be able to skip simple defines like the REDISMODULE_ATTR_COMMON that's added here.

So what i would suggest is either this:

#if !defined REDIS_MODULE_PRE_API and defined REDISMODULE_API_DECLARE
#define REDIS_MODULE_PRE_API extern
#endif

REDIS_MODULE_PRE_API void *(*RedisModule_Alloc)(size_t bytes) REDISMODULE_POST_API;

So that one C file includes the header without any define, and then it creates the function pointers (this is backwards compatible with projects that had only one C file interacting with redis), And other C files that just need forward declarations, define REDISMODULE_API_DECLARE before including the redismodule.h.

or if we wanna go with the common or weak attributes, do this:

#if !defined REDISMODULE_POST_API
#define REDISMODULE_POST_API __attribute__((__common__))
#endif

REDIS_MODULE_PRE_API void *(*RedisModule_Alloc)(size_t bytes) REDISMODULE_POST_API;

either way, the project that includes the header can revert to other ways since it has full control on the decorators.

p.s. obviously add a similar trick for REDISMODULE_ATTR_UNUSED and REDISMODULE_ATTR_PRINTF.

@natoscott
Copy link
Contributor Author

@yossigo @oranagra

this topic was discussed in:

Hmm, what a problem this whole area has become. :(

So i'd like to suggest a few changes.

I think these can be catered for relatively simply - see the suggestion below.

* I don't want to change these lines again and again (i care about the blame log and merge conflicts), so if we're gonna change all of them, let's have them ready for whatever comes in the future, i.e. i'd like to add another define that's used before the function declaration, so that we can use it to add `extern`.

Agreed.

* I rather the C file that includes this header be the one that sets the decorators (be it `extern`, `__attribute__` or whatever), so even if we do want to add some attributes out of the box, we should have them defined only if not already defined, so that if the file that did the include wants to use something else, we let it have its way.

Agreed.

* maybe this is the opportunity to get rid of the `REDISMODULE_API_FUNC` "function" macro. 

Strictly speaking, removing it breaks backwards compatibility (someone, somewhere will have used this macro). I've kept it for now.

#if !defined REDIS_MODULE_PRE_API and defined REDISMODULE_API_DECLARE
#define REDIS_MODULE_PRE_API extern
#endif

Here's another way with less line noise IMO, but keeping 100% backward compatibility. REDISMODULE_API can be defined outside of the header to add the 'extern', and REDISMODULE_ATTR_COMMON can be defined to nothing earlier too.

+++ b/redismodule.h
@@ -383,11 +383,15 @@ typedef long long mstime_t;
 /* Macro definitions specific to individual compilers */
 #ifdef __GNUC__
 #define REDISMODULE_ATTR_UNUSED __attribute__((unused))
+#ifndef REDISMODULE_ATTR_COMMON
 #define REDISMODULE_ATTR_COMMON __attribute__((__common__))
+#endif
 #define REDISMODULE_ATTR_PRINTF(idx,cnt) __attribute__((format(printf,idx,cnt)))
 #else
 #define REDISMODULE_ATTR_UNUSED
+#ifndef REDISMODULE_ATTR_COMMON
 #define REDISMODULE_ATTR_COMMON
+#endif
 #define REDISMODULE_ATTR_PRINTF(idx,cnt)
 #endif
 
@@ -448,13 +452,17 @@ typedef struct RedisModuleTypeMethods {
     RedisModule_GetApi("RedisModule_" #name, ((void **)&RedisModule_ ## name))
 
 #define REDISMODULE_API_FUNC(x) (*x)
+#ifndef REDISMODULE_API
+#define REDISMODULE_API(type,x) type REDISMODULE_API_FUNC(x)
+#endif
 
+REDISMODULE_API(void *, RedisModule_Alloc)(size_t bytes) REDISMODULE_ATTR_COMMON;
+REDISMODULE_API(void *, RedisModule_Realloc)(void *ptr, size_t bytes) REDISMODULE_ATTR_COMMON;
+REDISMODULE_API(void, RedisModule_Free)(void *ptr) REDISMODULE_ATTR_COMMON;
+REDISMODULE_API(void *, RedisModule_Calloc)(size_t nmemb, size_t size) REDISMODULE_ATTR_COMMON;
+REDISMODULE_API(char *, RedisModule_Strdup)(const char *str) REDISMODULE_ATTR_COMMON;

/* [...snip other similar APIs, but here's one with multiple attributes...] */

+REDISMODULE_API(void, RedisModule_Log)(RedisModuleCtx *ctx, const char *level, const char *fmt, ...) REDISMODULE_ATTR_PRINTF(3,4) REDISMODULE_ATTR_COMMON;

p.s. obviously add a similar trick for REDISMODULE_ATTR_UNUSED and REDISMODULE_ATTR_PRINTF.

I'm not quite sure what you have in mind there (not obvious to me) - see the RedisModule_Log API example above. I tend to think we must have separate macros for each trailing attribute - some of which could be defined away, but it's also more readable.

If this approach is generally acceptable, I'll make the change throughout the header and update the PR.

@oranagra
Copy link
Member

@natoscott i personally really dislike these function style macros which take return type and name. IMHO it makes the reading of these lines more difficult. the C syntax for function pointers is complicated enough, so i rather stick to it and not add any new parenthesis and macros that generate the whole thing.
Personally i think that adding a PRE and POST defines doesn't diverge these lines from C syntax and more readable to me.

The reason i insist on PRE and POST and not COMMON or EXTERN defines is because the user can add any decorators he wants (i.e. weak attribute instead of common).

Regarding the UNUSED and PRINTF defines, what i meant is that they should be defined only if not already defined. so that we don't force them on the user, and he has a way to disable or modify them from the C file.

regarding the REDISMODULE_API_FUNC macro, i really don't see what it can be used for. if anyone has any idea, please enlighten me.
@itamarhaber can you please run a quick grep to see if any of the modules you're familiar with messes with this define?
p.s. like Yossi said, even if we break one module, i don't think that's that bad (unless the removal of this macro induces some limitation that cannot be solved without it).

@oranagra
Copy link
Member

oranagra commented Aug 4, 2020

@yossigo please share your opinion on my proposal.
@itamarhaber can you check about any module explicitly messing with REDISMODULE_API_FUNC?

@itamarhaber
Copy link
Member

@Oran grep and github search do not seem to return anything to worry about.

Provide a pre- and post- macro for every API function, as
discussed in redis#6900; it
appears there are unlikely to be any compatibility issues
removing REDISMODULE_API_FUNC so its gone in the interest
of keeping the function declarations readable.
@natoscott
Copy link
Contributor Author

I've updated the code with before/after macros, allowing them to be set outside the header, and defaulting to values that maintain the previous behaviour. As requested, the REDISMODULE_API_FUNC is removed despite the backwards compatibility issue this introduces.

oranagra
oranagra previously approved these changes Aug 5, 2020
@oranagra
Copy link
Member

oranagra commented Aug 5, 2020

thanks @natoscott LGTM.
just to be on the safe side, can you please test to see how this behaves in a dummy project composed of multiple source files that interact with the API (both C and C++, with new and old GCC).

The C++ files would have to do extern "C" and that's their burden IMHO. but i'm wondering if we should make any other adjustments to make it work better for C projects out of the box, and also make sure that whatever needs to be adjusted can be done from outside the header file, or at the very least without modifying every API line.

@itamarhaber
Copy link
Member

@sewenew would you mind, if you have the time, to take a look and see whether your CPP modules work as expected with this change.

@sewenew
Copy link

sewenew commented Aug 11, 2020

@itamarhaber This seems not work for C++. It seems __common__ is a C specific attribute. I tried to compile my C++ module with this new redismodule.h file. It failed to compile and report the following errors:

redis-protobuf/src/sw/redis-protobuf/redismodule.h:460:59: error: '__common__' attribute is not supported in C++
REDISMODULE_API void * (*RedisModule_Alloc)(size_t bytes) REDISMODULE_ATTR;
                                                          ^
redis-protobuf/src/sw/redis-protobuf/redismodule.h:457:26: note: expanded from macro 'REDISMODULE_ATTR'
#define REDISMODULE_ATTR REDISMODULE_ATTR_COMMON
                         ^
redis-protobuf/src/sw/redis-protobuf/redismodule.h:386:48: note: expanded from macro 'REDISMODULE_ATTR_COMMON'
#define REDISMODULE_ATTR_COMMON __attribute__((__common__))
                                               ^

My compiler is Apple clang version 11.0.3 (clang-1103.0.32.62). I don't have a GCC 10 compiler so far, and cannot test with it.

Regards

@sewenew
Copy link

sewenew commented Aug 11, 2020

In fact, the old redismodule.h does not work with C++ either. C++ strictly distinguishes definition and declaration, and the compiler always complains about multiple definition errors.

In order to solve the problem, I split redismodule.h into a .h file and .cpp file. Declare these function pointers as extern in header file, and define them in the source file. You can check redismodule.h and redismodule.cpp for detail.

Regards

@oranagra
Copy link
Member

@sewenew the point is that with the new header you can do what you did without editing the header file.

in one C file you do:

#define REDISMODULE_API
#define REDISMODULE_ATTR
#include "redismodule.h

and in the others you do:

#define REDISMODULE_API extern
#define REDISMODULE_ATTR
#include "redismodule.h"

i'm hoping that for C projects that use multiple files the default header would be enough with no need to define anything.

@sewenew
Copy link

sewenew commented Aug 12, 2020

@oranagra Thanks for the explanation!

I followed your instruction, and it works with two minor fixes:

  1. There's a redefinition of RedisModule_Alloc on line 461 of the new redismodule.h file, and need to be removed.

  2. REDISMODULE_ATTR_PRINTF doesn't work with C++ variadic template, since there's no format string as parameter in this case (check this for reference). I use the following code to fix the problem. FYI.

 /* Macro definitions specific to individual compilers */
 #ifdef __GNUC__
 #define REDISMODULE_ATTR_UNUSED __attribute__((unused))
 #define REDISMODULE_ATTR_COMMON __attribute__((__common__))
+#ifndef REDISMODULE_ATTR_PRINTF
 #define REDISMODULE_ATTR_PRINTF(idx,cnt) __attribute__((format(printf,idx,cnt)))
+#endif
 #else
 #define REDISMODULE_ATTR_UNUSED
 #define REDISMODULE_ATTR_COMMON
 #define REDISMODULE_ATTR_PRINTF(idx,cnt)
 #endif

So that I can use something like the following to disable it:

 #define REDISMODULE_API
 #define REDISMODULE_ATTR
+#define REDISMODULE_ATTR_PRINTF(idx,cnt)
 #include "redismodule.h"

Regards

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to put all if these defines only if not already defines.
allows greater flexibility, also for other compilers, let's let people define what they want.

Comment on lines 383 to 392
/* Macro definitions specific to individual compilers */
#ifdef __GNUC__
#define REDISMODULE_ATTR_UNUSED __attribute__((unused))
#define REDISMODULE_ATTR_COMMON __attribute__((__common__))
#define REDISMODULE_ATTR_PRINTF(idx,cnt) __attribute__((format(printf,idx,cnt)))
#else
#define REDISMODULE_ATTR_UNUSED
#define REDISMODULE_ATTR_COMMON
#define REDISMODULE_ATTR_PRINTF(idx,cnt)
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Macro definitions specific to individual compilers */
#ifdef __GNUC__
#define REDISMODULE_ATTR_UNUSED __attribute__((unused))
#define REDISMODULE_ATTR_COMMON __attribute__((__common__))
#define REDISMODULE_ATTR_PRINTF(idx,cnt) __attribute__((format(printf,idx,cnt)))
#else
#define REDISMODULE_ATTR_UNUSED
#define REDISMODULE_ATTR_COMMON
#define REDISMODULE_ATTR_PRINTF(idx,cnt)
#endif
/* Macro definitions specific to individual compilers */
#ifdef __GNUC__
# ifndef REDISMODULE_ATTR_UNUSED
# define REDISMODULE_ATTR_UNUSED __attribute__((unused))
# endif
# ifndef REDISMODULE_ATTR_COMMON
# define REDISMODULE_ATTR_COMMON __attribute__((__common__))
# endif
# ifndef REDISMODULE_ATTR_PRINTF
# define REDISMODULE_ATTR_PRINTF(idx,cnt) __attribute__((format(printf,idx,cnt)))
# endif
#else
# ifndef REDISMODULE_ATTR_UNUSED
# define REDISMODULE_ATTR_UNUSED
# endif
# ifndef REDISMODULE_ATTR_COMMON
# define REDISMODULE_ATTR_COMMON
# endif
# ifndef REDISMODULE_ATTR_PRINTF
# define REDISMODULE_ATTR_PRINTF(idx,cnt)
# endif
#endif

@natoscott
Copy link
Contributor Author

@oranagra Thanks for the explanation!

I followed your instruction, and it works with two minor fixes:

1. There's a redefinition of `RedisModule_Alloc` on [line 461](https://github.com/natoscott/redis/blob/fno-common/src/redismodule.h#L461) of the new redismodule.h file, and need to be removed.

Thanks - fixed.

2. `REDISMODULE_ATTR_PRINTF` doesn't work with [C++ variadic template](https://en.wikipedia.org/wiki/Variadic_template), since there's no format string as parameter in this case (check [this](https://stackoverflow.com/questions/12573968/how-to-use-gccs-printf-format-attribute-with-c11-variadic-templates) for reference). I use the following code to fix the problem. FYI.

This doesn't sound right to me - AIUI the stackoverflow link is not the same thing as what we're doing here. In that post the OP describes attempts to annotate the C++ code to match the C code its calling, its not about annotation on the underlying C code (which is what we're doing here).

It's important that the RedisModule_Log function is passed a constant string for the formatter - not just to be able to check the parameter types but also to ensure we are never in the situation that user/input data might be used as the formatting string, as that can cause crashes and/or security problems.

Can you post the warnings/errors you see, and explain a bit more how your C++ API / wrapper code ends up calling RedisModule_Log? (maybe with some sample code?) If the C++ code is producing the final formatted string (?) you might be able to just pass "%s" as the first parameter.

This is all quite off-topic for the original issue being solved here - making the header C++ friendly should be a separate PR.

So that I can use something like the following to disable it:

IMO the suggestions to 'solve' this are increasingly ugly and allow possibly dangerous ways of calling the RedisModule_Log routine that would otherwise have been perfectly well protected. The case for such changes should be argued in a separate PR and not piggy-backed onto this unrelated issue.

Thanks.

@sewenew
Copy link

sewenew commented Aug 12, 2020

@natoscott Yes, I can pass a format string, e.g. %s, to work around it. Not a big deal.

Regards

It seems like it does not require it when compiling C, and complains
about it not supported in C++.
@yossigo
Copy link
Collaborator

yossigo commented Aug 13, 2020

Looks good to me now, I've tested it on gcc 4.x, 9.x, 10.x (C and C++) and clang 10.x. It seems like clang (like gcc <= 9.x) doesn't need any special attribute for C and doesn't support common for C++, so I've added a check for that.

@yossigo yossigo requested a review from oranagra August 13, 2020 20:19
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo there is still no option for the outside user to modify the printf attribute.
for instance when the compiler is not GCC, we force it to be defined as empty, overriding anything that may have been provided from outside, and unlike the other attribute for which we let the user replace the define that uses it, for that one we use the original define in the individual APISs.

in my eyes, all of these should be defined only if not already defined, especially considering that attributes are non standard, and i don't see the disadvantage in doing so.
that said, i suppose it's a petty comment, considering that modules always carry their own header and modifying these lines won't cause major merge conflicts (not per API line)

so personally i would vote for something along the lines of #6900 (comment)
but since you tested it and saw no problem with all the common compilers, if you feel that's not needed, go ahead and merge.

oranagra
oranagra previously approved these changes Aug 14, 2020
Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossigo there is still no option for the outside user to modify the printf attribute.
for instance when the compiler is not GCC, we force it to be defined as empty, overriding anything that may have been provided from outside, and unlike the other attribute for which we let the user replace the define that uses it, for that one we use the original define in the individual APISs.

in my eyes, all of these should be defined only if not already defined, especially considering that attributes are non standard, and i don't see the disadvantage in doing so.
that said, i suppose it's a petty comment, considering that modules always carry their own header and modifying these lines won't cause major merge conflicts (not per API line)

so personally i would vote for something along the lines of #6900 (comment)
but since you tested it and saw no problem with all the common compilers, if you feel that's not needed, go ahead and merge.

Provide greater flexibility in case specific environments/compilers
require additional tweaking outside the header file.
@oranagra oranagra merged commit 11cd983 into redis:unstable Aug 14, 2020
@oranagra
Copy link
Member

@natoscott thank you for suggesting it, and thank you for keeping up with my requests.
I hope we'll have a smoother sailing from here on. this header file was causing pain for many.

JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Aug 21, 2020
…on (redis#6900)

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute.  This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at gcc.gnu.org/gcc-10/porting_to.html

Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax.  These macros only expand to API annotations under gcc.

Provide a pre- and post- macro for every API function, so that they can
be defined differently by the file that includes redismodule.h.

Removing REDISMODULE_API_FUNC in the interest of keeping the function
declarations readable.

Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
whq4123 added a commit to whq4123/redis that referenced this pull request Aug 22, 2020
* fix server crash for STRALGO command

* using moreargs variable

* EAGAIN for tls during diskless load

* Disconnect chained replicas when the replica performs PSYNC with the master always to avoid replication offset mismatch between master and chained replicas.

* Fix #7306 less aggressively.

Citing from the issue:

btw I suggest we change this fix to something else:
* We revert the fix.
* We add a call that disconnects chained replicas in the place where we trim the replica (that is a master i this case) offset.
This way we can avoid disconnections when there is no trimming of the backlog.

Note that we now want to disconnect replicas asynchronously in
disconnectSlaves(), because it's in general safer now that we can call
it from freeClient(). Otherwise for instance the command:

    CLIENT KILL TYPE master

May crash: clientCommand() starts running the linked of of clients,
looking for clients to kill. However it finds the master, kills it
calling freeClient(), but this in turn calls replicationCacheMaster()
that may also call disconnectSlaves() now. So the linked list iterator
of the clientCommand() will no longer be valid.

* Make disconnectSlaves() synchronous in the base case.

Otherwise we run into that:

Backtrace:
src/redis-server 127.0.0.1:21322(logStackTrace+0x45)[0x479035]
src/redis-server 127.0.0.1:21322(sigsegvHandler+0xb9)[0x4797f9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd373c5e390]
src/redis-server 127.0.0.1:21322(_serverAssert+0x6a)[0x47660a]
src/redis-server 127.0.0.1:21322(freeReplicationBacklog+0x42)[0x451282]
src/redis-server 127.0.0.1:21322[0x4552d4]
src/redis-server 127.0.0.1:21322[0x4c5593]
src/redis-server 127.0.0.1:21322(aeProcessEvents+0x2e6)[0x42e786]
src/redis-server 127.0.0.1:21322(aeMain+0x1d)[0x42eb0d]
src/redis-server 127.0.0.1:21322(main+0x4c5)[0x42b145]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fd3738a3830]
src/redis-server 127.0.0.1:21322(_start+0x29)[0x42b409]

Since we disconnect all the replicas and free the replication backlog in
certain replication paths, and the code that will free the replication
backlog expects that no replica is connected.

However we still need to free the replicas asynchronously in certain
cases, as documented in the top comment of disconnectSlaves().

* add CI for 32bit build

* PSYNC2: second_replid_offset should be real meaningful offset

After adjustMeaningfulReplOffset(), all the other related variable
should be updated, including server.second_replid_offset.

Or the old version redis like 5.0 may receive wrong data from
replication stream, cause redis 5.0 can sync with redis 6.0,
but doesn't know meaningful offset.

* Clarify what is happening in PR #7320.

* Test: PSYNC2 test can now show server logs.

* tests: each test client work on a distinct port range

apparently when running tests in parallel (the default of --clients 16),
there's a chance for two tests to use the same port.
specifically, one test might shutdown a master and still have the
replica up, and then another test will re-use the port number of master
for another master, and then that replica will connect to the master of
the other test.

this can cause a master to count too many full syncs and fail a test if
we run the tests with --single integration/psync2 --loop --stop

see Probmem 2 in #7314

* avoid using sendfile if tls-replication is enabled

this obviously broke the tests, but went unnoticed so far since tls
wasn't often tested.

* daily CI test with tls

* Replication: log backlog creation event.

* fix clusters mixing accidentally by gossip

`clusterStartHandshake` will start hand handshake
and eventually send CLUSTER MEET message, which is strictly prohibited
in the REDIS CLUSTER SPEC.
Only system administrator can initiate CLUSTER MEET message.
Futher, according to the SPEC, rather than IP/PORT pairs, only nodeid
can be trusted.

* Set a protocol error if master use the inline protocol.

We want to react a bit more aggressively if we sense that the master is
sending us some corrupted stream. By setting the protocol error we both
ensure that the replica will disconnect, and avoid caching the master so
that a full SYNC will be required. This is protective against
replication bugs.

* Remove the meaningful offset feature.

After a closer look, the Redis core devleopers all believe that this was
too fragile, caused many bugs that we didn't expect and that were very
hard to track. Better to find an alternative solution that is simpler.

* Remove the PSYNC2 meaningful offset test.

* Another meaningful offset test removed.

* Fix TLS certificate loading for chained certificates.

This impacts client verification for chained certificates (such as Lets
Encrypt certificates). Client Verify requires the full chain in order to
properly verify the certificate.

* tests: find_available_port start search from next port

i.e. don't start the search from scratch hitting the used ones again.
this will also reduce the likelihood of collisions (if there are any
left) by increasing the time until we re-use a port we did use in the
past.

* Drop useless line from replicationCacheMaster().

* 32bit CI needs to build modules correctly

* revive meaningful offset tests

* adjust revived meaningful offset tests

these tests create several edge cases that are otherwise uncovered (at
least not consistently) by the test suite, so although they're no longer
testing what they were meant to test, it's still a good idea to keep
them in hope that they'll expose some issue in the future.

* Replication: showLatestBacklog() refactored out.

* Test: add the tracking unit as default.

* Test: take PSYNC2 test master timeout high during switch.

This will likely avoid false positives due to trailing pings.

* Fix handling of special chars in ACL LOAD.

Now it is also possible for ACL SETUSER to accept empty strings
as valid operations (doing nothing), so for instance

    ACL SETUSER myuser ""

Will have just the effect of creating a user in the default state.

This should fix #7329.

* fix pingoff  test race

* donot free protected client in freeClientsInAsyncFreeQueue

related #7234

* AOF: append origin SET if no expire option

* Revert "avoid using sendfile if tls-replication is enabled"

This reverts commit b9abecfc4c97db01fa6f09180c68a92ea5974ac1.

* Revert "Implements sendfile for redis."

This reverts commit 9cf500a3f67e4e2ce51414c354e3472faf095d5b.

* return the correct proto version
HELLO should return the current proto version, while the code hardcoded
3

* Avoid rejecting WATCH / UNWATCH, like MULTI/EXEC/DISCARD

Much like MULTI/EXEC/DISCARD, the WATCH and UNWATCH are not actually
operating on the database or server state, but instead operate on the
client state. the client may send them all in one long pipeline and check
all the responses only at the end, so failing them may lead to a
mismatch between the client state on the server and the one on the
client end, and execute the wrong commands (ones that were meant to be
discarded)

the watched keys are not actually stored in the client struct, but they
are in fact part of the client state. for instance, they're not cleared
or moved in SWAPDB or FLUSHDB.

* Don't queue commands in an already aborted MULTI state

* fix disconnectSlaves, to try to free each slave.

the recent change in that loop (iteration rather than waiting for it to
be empty) was intended to avoid an endless loop in case some slave would
refuse to be freed.

but the lookup of the first client remained, which would have caused it
to try the first one again and again instead of moving on.

* fix server crash in STRALGO command

* Temporary fix for #7353 issue about EVAL during -BUSY.

* Adapt EVAL+busy script test to new behavior.

* LRANK: Add command (the command will be renamed LPOS).

The `LRANK` command returns the index (position) of a given element
within a list. Using the `direction` argument it is possible to specify
going from head to tail (acending, 1) or from tail to head (decending,
-1). Only the first found index is returend. The complexity is O(N).

When using lists as a queue it can be of interest at what position a
given element is, for instance to monitor a job processing through a
work queue. This came up within the Python `rq` project which is based
on Redis[0].

[0]: https://github.com/rq/rq/issues/1197

Signed-off-by: Paul Spooren <[email protected]>

* LPOS: implement the final design.

* LPOS: update to latest proposal.

See https://gist.github.com/antirez/3591c5096bc79cad8b5a992e08304f48

* LPOS: tests + crash fix.

* fix memory leak

* help.h updated.

* Fix LCS object type checking. Related to #7379.

* Fix RM_ScanKey module api not to return int encoded strings

The scan key module API provides the scan callback with the current
field name and value (if it exists). Those arguments are RedisModuleString*
which means it supposes to point to robj which is encoded as a string.
Using createStringObjectFromLongLong function might return robj that
points to an integer and so break a module that tries for example to
use RedisModule_StringPtrLen on the given field/value.

The PR introduces a fix that uses the createObject function and sdsfromlonglong function.
Using those function promise that the field and value pass to the to the
scan callback will be Strings.

The PR also changes the Scan test module to use RedisModule_StringPtrLen
to catch the issue. without this, the issue is hidden because
RedisModule_ReplyWithString knows to handle integer encoding of the
given robj (RedisModuleString).

The PR also introduces a new test to verify the issue is solved.

* cluster.c remove if of clusterSendFail in markNodeAsFailingIfNeeded

* Tracking: fix enableBcastTrackingForPrefix() invalid sdslen() call.

Related to #7387.

* Use cluster connections too, to limit maxclients.

See #7401.

* fix comments in listpack.c

* ensure SHUTDOWN_NOSAVE in Sentinel mode

- enforcing of SHUTDOWN_NOSAVE flag in one place to make it consitent
  when running in Sentinel mode

* Fix comments in function raxLowWalk of listpack.c

* fix memory leak in sentinel connection sharing

* Clarify maxclients and cluster in conf. Remove myself too.

* Fix BITFIELD i64 type handling, see #7417.

* Include cluster.h for getClusterConnectionsCount().

* EXEC always fails with EXECABORT and multi-state is cleared

In order to support the use of multi-exec in pipeline, it is important that
MULTI and EXEC are never rejected and it is easy for the client to know if the
connection is still in multi state.

It was easy to make sure MULTI and DISCARD never fail (done by previous
commits) since these only change the client state and don't do any actual
change in the server, but EXEC is a different story.

Since in the past, it was possible for clients to handle some EXEC errors and
retry the EXEC, we now can't affort to return any error on EXEC other than
EXECABORT, which now carries with it the real reason for the abort too.

Other fixes in this commit:
- Some checks that where performed at the time of queuing need to be re-
  validated when EXEC runs, for instance if the transaction contains writes
  commands, it needs to be aborted. there was one check that was already done
  in execCommand (-READONLY), but other checks where missing: -OOM, -MISCONF,
  -NOREPLICAS, -MASTERDOWN
- When a command is rejected by processCommand it was rejected with addReply,
  which was not recognized as an error in case the bad command came from the
  master. this will enable to count or MONITOR these errors in the future.
- make it easier for tests to create additional (non deferred) clients.
- add tests for the fixes of this commit.

* updated copyright year

Changed "2015" to "2020"

* LPOS: option FIRST renamed RANK.

* Update comment to clarify change in #7398.

* BITOP: propagate only when it really SET or DEL targetkey (#5783)

For example:
    BITOP not targetkey sourcekey

If targetkey and sourcekey doesn't exist, BITOP has no effect,
we do not propagate it, thus can save aof and replica flow.

* change references to the github repo location (#7479)

* tests/valgrind: don't use debug restart (#7404)

* tests/valgrind: don't use debug restart

DEBUG REATART causes two issues:
1. it uses execve which replaces the original process and valgrind doesn't
   have a chance to check for errors, so leaks go unreported.
2. valgrind report invalid calls to close() which we're unable to resolve.

So now the tests use restart_server mechanism in the tests, that terminates
the old server and starts a new one, new PID, but same stdout, stderr.

since the stderr can contain two or more valgrind report, it is not enough
to just check for the absence of leaks, we also need to check for some known
errors, we do both, and fail if we either find an error, or can't find a
report saying there are no leaks.

other changes:
- when killing a server that was already terminated we check for leaks too.
- adding DEBUG LEAK which was used to test it.
- adding --trace-children to valgrind, although no longer needed.
- since the stdout contains two or more runs, we need slightly different way
  of checking if the new process is up (explicitly looking for the new PID)
- move the code that handles --wait-server to happen earlier (before
  watching the startup message in the log), and serve the restarted server too.

* squashme - CR fixes

* stabilize tests that look for log lines (#7367)

tests were sensitive to additional log lines appearing in the log
causing the search to come empty handed.

instead of just looking for the n last log lines, capture the log lines
before performing the action, and then search from that offset.

* skip a test that uses +inf on valgrind (#7440)

On some platforms strtold("+inf") with valgrind returns a non-inf result

[err]: INCRBYFLOAT does not allow NaN or Infinity in tests/unit/type/incr.tcl
Expected 'ERR*would produce*' to equal or match '1189731495357231765085759.....'

* defrag.c activeDefragSdsListAndDict when defrag sdsele, We can't use (#7492)

it to calculate hash, we should use newsds.

* RESTORE ABSTTL won't store expired keys into the db (#7472)

Similarly to EXPIREAT with TTL in the past, which implicitly deletes the
key and return success, RESTORE should not store key that are already
expired into the db.
When used together with REPLACE it should emit a DEL to keyspace
notification and replication stream.

* redis-cli --bigkeys fixed to handle non-printable key names

* redis-cli --hotkeys fixed to handle non-printable key names

* TLS: Add missing redis-cli options. (#7456)

* Tests: fix and reintroduce redis-cli tests.

These tests have been broken and disabled for 10 years now!

* TLS: add remaining redis-cli support.

This adds support for the redis-cli --pipe, --rdb and --replica options
previously unsupported in --tls mode.

* Fix writeConn().

* Use pkg-config to properly detect libssl and libcrypto libraries (#7452)

* TLS: Ignore client cert when tls-auth-clients off. (#7457)

* TLS: Session caching configuration support. (#7420)

* TLS: Session caching configuration support.
* TLS: Remove redundant config initialization.

* Add missing latency-monitor tcl test to test_helper.tcl. (#6782)

* Fix typo in deps README (#7500)

* fix: typo in CI job name (#7466)

* fix benchmark in cluster mode fails to authenticate (#7488)

Co-authored-by: Oran Agra <[email protected]> (styling)

* STORE variants: SINTER,SUNION,SDIFF,ZUNION use setKey instead of dbDelete+dbAdd (#7489)

one of the differences (other than consistent code with SORT, GEORADIUS), is that the LFU of the old key is retained.

* fix description about ziplist, the code is ok (#6318)

* fix description about ZIP_BIG_PREVLEN(the code is ok), it's similar to
antirez#4705

* fix description about ziplist entry encoding field (the code is ok),
the max length should be 2^32 - 1 when encoding is 5 bytes

* update release scripts for new hosts, and CI to run more tests (#7480)

* update daily CI to include cluster and sentinel tests
* update daily CI to run when creating a new release
* update release scripts to work on the new redis.io hosts

* runtest --stop pause stops before terminating the redis server (#7513)

in the majority of the cases (on this rarely used feature) we want to
stop and be able to connect to the shard with redis-cli.
since these are two different processes interracting with the tty we
need to stop both, and we'll have to hit enter twice, but it's not that
bad considering it is rarely used.

* fix recently added time sensitive tests failing with valgrind (#7512)

interestingly the latency monitor test fails because valgrind is slow
enough so that the time inside PEXPIREAT command from the moment of
the first mstime() call to get the basetime until checkAlreadyExpired
calls mstime() again is more than 1ms, and that test was too sensitive.

using this opportunity to speed up the test (unrelated to the failure)
the fix is just the longer time passed to PEXPIRE.

* RESTORE ABSTTL skip expired keys - leak (#7511)

* Replica always reports master's config epoch in CLUSTER NODES output. (#7235)

* Fix out of update help info in tcl tests. (#7516)

Before this commit, the output of "./runtest-cluster --help" is incorrect.
After this commit, the format of the following 3 output is consistent:
./runtest --help
./runtest-cluster --help
./runtest-sentinel --help

* redis-cli tests, fix valgrind timing issue (#7519)

this test when run with valgrind on github actions takes 160 seconds

* diskless master disconnect replicas when rdb child failed (#7518)

in case the rdb child failed, crashed or terminated unexpectedly redis
would have marked the replica clients with repl_put_online_on_ack and
then kill them only after a minute when no ack was received.

it would not stream anything to these connections, so the only effect of
this bug is a delay of 1 minute in the replicas attempt to re-connect.

* Refactor RM_KeyType() by using macro. (#7486)

* Fix command help for unexpected options (#7476)

* correct error msg for num connections reaching maxclients in cluster mode (#7444)

* Add registers dump support for Apple silicon (#7453)

Export following environment variables before building on macOS on Apple silicon

export ARCH_FLAGS="-arch arm64"
export SDK_NAME=macosx
export SDK_PATH=$(xcrun --show-sdk-path --sdk $SDK_NAME)
export CFLAGS="$ARCH_FLAGS -isysroot $SDK_PATH -I$SDK_PATH/usr/include"
export CXXFLAGS=$CFLAGS
export LDFLAGS="$ARCH_FLAGS"
export CC="$(xcrun -sdk $SDK_PATH --find clang) $CFLAGS"
export CXX="$(xcrun -sdk $SDK_PATH --find clang++) $CXXFLAGS"
export LD="$(xcrun -sdk $SDK_PATH --find ld) $LDFLAGS"

make
make test
..
All tests passed without errors!

Backtrack logging assumes x86 and required updating

* Notify systemd on sentinel startup (#7168)

Co-authored-by: Daniel Murnane <[email protected]>

* Send null for invalidate on flush (#7469)

* Stream avoid duplicate parse id (#7450)

* Support passing stack allocated module strings to moduleCreateArgvFromUserFormat (#7528)

Specifically, the key passed to the module aof_rewrite callback is a stack allocated robj. When passing it to RedisModule_EmitAOF (with appropriate "s" fmt string) redis used to panic when trying to inc the ref count of the stack allocated robj. Now support such robjs by coying them to a new heap robj. This doesn't affect performance because using the alternative "c" or "b" format strings also copies the input to a new heap robj.

* Adds GitHub issue templates (#7468)

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: yoav-steinberg <[email protected]>

* Adds SHA256SUM to redis-stable tarball upload

* GitHub Actions workflows - use latest version of actions/checkout (#7534)

* Refactor streamAppendItem() by deleting redundancy condition. (#7487)

It will never happen that "lp != NULL && lp_bytes >= server.stream_node_max_bytes".
Assume that "lp != NULL && lp_bytes >= server.stream_node_max_bytes",
we got the following conditions:
a. lp != NULL
b. lp_bytes >= server.stream_node_max_bytes

If server.stream_node_max_bytes is 0, given condition a, condition b is always satisfied
If server.stream_node_max_bytes is not 0, given condition a and condition b, the codes just a
	few lines above set lp to NULL, a controdiction with condition a

So that condition b is recundant. We could delete it safely.

* Run daily CI on PRs to release a branch (#7535)

* Fix cluster redirect for module command with no firstkey. (#7539)

Before this commit, processCommand() did not notice that cmd could be a module command
which declared `getkeys-api` and handled it for the purpose of cluster redirect it
as if it doesn't use any keys.

This commit fixed it by reusing the codes in addReplyCommand().

* replication: need handle -NOPERM error after send ping (#7538)

* add missing caching command in client help (#7399)

* Add missing calls to raxStop (#7532)

Since the dynamic allocations in raxIterator are only used for deep walks, memory
leak due to missing call to raxStop can only happen for rax with key names longer
than 32 bytes.

Out of all the missing calls, the only ones that may lead to a leak are the rax
for consumer groups and consumers, and these were only in AOFRW and rdbSave, which
normally only happen in fork or at shutdown.

* Fix deprecated tail syntax in tests (#7543)

* Clarification on the bug that was fixed in PR #7539. (#7541)

Before that PR, processCommand() did not notice that cmd could be a module
command in which case getkeys_proc member has a different meaning.

The outcome was that a module command which doesn't take any key names in its
arguments (similar to SLOWLOG) would be handled as if it might have key name arguments
(similar to MEMORY), would consider cluster redirect but will end up with 0 keys
after an excessive call to getKeysFromCommand, and eventually do the right thing.

* Fixes to release scripts (#7547)

* Tests: drop TCL 8.6 dependency. (#7548)

This re-implements the redis-cli --pipe test so it no longer depends on a close feature available only in TCL 8.6.

Basically what this test does is run redis-cli --pipe, generates a bunch of commands and pipes them through redis-cli, and inspects the result in both Redis and the redis-cli output.

To do that, we need to close stdin for redis-cli to indicate we're done so it can flush its buffers and exit. TCL has bi-directional channels can only offers a way to "one-way close" a channel with TCL 8.6. To work around that, we now generate the commands into a file and feed that file to redis-cli directly.

As we're writing to an actual file, the number of commands is now reduced.

* testsuite may leave servers alive on error (#7549)

in cases where you have
test name {
  start_server {
    start_server {
      assert
    }
  }
}

the exception will be thrown to the test proc, and the servers are
supposed to be killed on the way out. but it seems there was always a
bug of not cleaning the server stack, and recently (#7404) we started
relying on that stack in order to kill them, so with that bug sometimes
we would have tried to kill the same server twice, and leave one alive.

luckly, in most cases the pattern is:
start_server {
  test name {
  }
}

* Properly reset errno for rdbLoad (#7542)

* Fix typo in deps/README.md (#7553)

* Outdent github issues guidelines

* Add contribution guidelines for vulnerability reports

* Fix harmless bug in rioConnRead (#7557)

this code is in use only if the master is disk-based, and the replica is
diskless. In this case we use a buffered reader, but we must avoid reading
past the rdb file, into the command stream. which Luckly rdb.c doesn't
really attempt to do (it knows how much it should read).

When rioConnRead detects that the extra buffering attempt reaches beyond
the read limit it should read less, but if the caller actually requested
more, then it should return with an error rather than a short read. the
bug would have resulted in short read.

in order to fix it, the code must consider the real requested size, and
not the extra buffering size.

* This PR introduces a new loaded keyspace event (#7536)

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Itamar Haber <[email protected]>

* Stabilize bgsave test that sometimes fails with valgrind (#7559)

on ci.redis.io the test fails a lot, reporting that bgsave didn't end.
increaseing the timeout we wait for that bgsave to get aborted.
in addition to that, i also verify that it indeed got aborted by
checking that the save counter wasn't reset.

add another test to verify that a successful bgsave indeed resets the
change counter.

* more strict check in rioConnRead (#7564)

* Fix prepareForShutdown function declaration (#7566)

* TLS: support cluster/replication without tls-port.

Initialize and configure OpenSSL even when tls-port is not used, because
we may still have tls-cluster or tls-replication.

Also, make sure to reconfigure OpenSSL when these parameters are changed
as TLS could have been enabled for the first time.

* Daily github action: run cluster and sentinel tests with tls (#7575)

* Add optional tls verification (#7502)

Adds an `optional` value to the previously boolean `tls-auth-clients` configuration keyword.

Co-authored-by: Yossi Gottlieb <[email protected]>

* Fix failing tests due to issues with wait_for_log_message (#7572)

- the test now waits for specific set of log messages rather than wait for
  timeout looking for just one message.
- we don't wanna sample the current length of the log after an action, due
  to a race, we need to start the search from the line number of the last
  message we where waiting for.
- when attempting to trigger a full sync, use multi-exec to avoid a race
  where the replica manages to re-connect before we completed the set of
  actions that should force a full sync.
- fix verify_log_message which was broken and unused

* TLS: Propagate and handle SSL_new() failures. (#7576)

The connection API may create an accepted connection object in an error
state, and callers are expected to check it before attempting to use it.

Co-authored-by: mrpre <[email protected]>

* Fix TLS cluster tests. (#7578)

Fix consistency test added in af5167b7f without considering TLS
redis-cli configuration.

* fix leak in error handling of debug populate command (#7062)

valsize was not modified during the for loop below instead of getting from c->argv[4], therefore there is no need to put inside the for loop.. Moreover, putting the check outside loop will also avoid memory leaking, decrRefCount(key) should be called in the original code if we put the check in for loop

* Add SignalModifiedKey hook in XGROUP CREATE with MKSTREAM option (#7562)

* Avoid an out-of-bounds read in the redis-sentinel (#7443)

The Redis sentinel would crash with a segfault after a few minutes
because it tried to read from a page without read permissions. Check up
front whether the sds is long enough to contain redis:slave or
redis:master before memcmp() as is done everywhere else in
sentinelRefreshInstanceInfo().

Bug report and commit message from Theo Buehler. Fix from Nam Nguyen.

Co-authored-by: Nam Nguyen <[email protected]>

* Show threading configuration in INFO output (#7446)

Co-authored-by: Oran Agra <[email protected]>

* Clarify  RM_BlockClient() error condition. (#6093)

* Call propagate instead of writing directly to AOF/replicas (#6658)

Use higher-level API to funnel all generic propagation through
single function call.

* broadcast a PONG message when slot's migration is over, which may reduce the moved request of clients (#7571)

* Fix running single test 14-consistency-check.tcl (#7587)

* CI: Add daily CentOS 7.x jobs. (#7582)

* Remove dead code from update_zmalloc_stat_alloc (#7589)

this seems like leftover from before 6eb51bf

* module hook for master link up missing on successful psync (#7584)

besides, hooks test was time sensitive. when the replica managed to
reconnect quickly after the client kill, the test would fail

* Fix test-centos7-tls daily job. (#7598)

* remove duplicate semicolon (#7438)

* fix new rdb test failing on timing issues (#7604)

apparenlty on github actions sometimes 500ms is not enough

* Add a ZMSCORE command returning an array of scores. (#7593)

Syntax: `ZMSCORE KEY MEMBER [MEMBER ...]`

This is an extension of #2359
amended by Tyson Andre to work with the changed unstable API,
add more tests, and consistently return an array.

- It seemed as if it would be more likely to get reviewed
  after updating the implementation.

Currently, multi commands or lua scripting to call zscore multiple times
would almost definitely be less efficient than a native ZMSCORE
for the following reasons:

- Need to fetch the set from the string every time instead of reusing the C
  pointer.
- Using pipelining or multi-commands would result in more bytes sent by
  the client for the repeated `ZMSCORE KEY` sections.
- Need to specially encode the data and decode it from the client
  for lua-based solutions.
- The fastest solution I've seen for large sets(thousands or millions)
  involves lua and a variadic ZADD, then a ZINTERSECT, then a ZRANGE 0 -1,
  then UNLINK of a temporary set (or lua). This is still inefficient.

Co-authored-by: Tyson Andre <[email protected]>

* Fix tests/cluster/cluster.tcl about wrong usage of lrange. (#6702)

* add force option to 'create-cluster create' script call (#7612)

* reintroduce REDISCLI_CLUSTER_YES env variable in redis-cli

the variable was introduced only in the 5.0 branch in #5879 bc6c1c40db

* redis-cli --cluster-yes - negate force flag for clarity

this internal flag is there so that some commands do not comply to `--cluster-yes`

* [Redis-benchmark] Support zset type

* Assertion and panic, print crash log without generating SIGSEGV

This makes it possible to add tests that generate assertions, and run
them with valgrind, making sure that there are no memory violations
prior to the assertion.

New config options:
- crash-log-enabled - can be disabled for cleaner core dumps
- crash-memcheck-enabled - useful for faster termination after a crash
- use-exit-on-panic - to be used by the test suite so that valgrind can
  detect leaks and memory corruptions

Other changes:
- Crash log is printed even on system that dont HAVE_BACKTRACE, i.e. in
  both SIGSEGV and assert / panic
- Assertion and panic won't print registers and code around EIP (which
  was useless), but will do fast memory test (which may still indicate
  that the assertion was due to memory corrpution)

I had to reshuffle code in order to re-use it, so i extracted come code
into function without actually doing any changes to the code:
- logServerInfo
- logModulesInfo
- doFastMemoryTest (with the exception of it being conditional)
- dumpCodeAroundEIP

changes to the crash report on segfault:
- logRegisters is called right after the stack trace (before info) done
  just in order to have more re-usable code
- stack trace skips the first two items on the stack (the crash log and
  signal handler functions)

* Fix potential race in bugReportStart

this race would only happen when two threads paniced at the same time,
and even then the only consequence is some extra log lines.

race reported in #7391

* Accelerate diskless master connections, and general re-connections (#6271)

Diskless master has some inherent latencies.
1) fork starts with delay from cron rather than immediately
2) replica is put online only after an ACK. but the ACK
   was sent only once a second.
3) but even if it would arrive immediately, it will not
   register in case cron didn't yet detect that the fork is done.

Besides that, when a replica disconnects, it doesn't immediately
attempts to re-connect, it waits for replication cron (one per second).
in case it was already online, it may be important to try to re-connect
as soon as possible, so that the backlog at the master doesn't vanish.

In case it disconnected during rdb transfer, one can argue that it's
not very important to re-connect immediately, but this is needed for the
"diskless loading short read" test to be able to run 100 iterations in 5
seconds, rather than 3 (waiting for replication cron re-connection)

changes in this commit:
1) sync command starts a fork immediately if no sync_delay is configured
2) replica sends REPLCONF ACK when done reading the rdb (rather than on 1s cron)
3) when a replica unexpectedly disconnets, it immediately tries to
   re-connect rather than waiting 1s
4) when when a child exits, if there is another replica waiting, we spawn a new
   one right away, instead of waiting for 1s replicationCron.
5) added a call to connectWithMaster from replicationSetMaster. which is called
   from the REPLICAOF command but also in 3 places in cluster.c, in all of
   these the connection attempt will now be immediate instead of delayed by 1
   second.

side note:
we can add a call to rdbPipeReadHandler in replconfCommand when getting
a REPLCONF ACK from the replica to solve a race where the replica got
the entire rdb and EOF marker before we detected that the pipe was
closed.
in the test i did see this race happens in one about of some 300 runs,
but i concluded that this race is unlikely in real life (where the
replica is on another host and we're more likely to first detect the
pipe was closed.
the test runs 100 iterations in 3 seconds, so in some cases it'll take 4
seconds instead (waiting for another REPLCONF ACK).

Removing unneeded startBgsaveForReplication from updateSlavesWaitingForBgsave
Now that CheckChildrenDone is calling the new replicationStartPendingFork
(extracted from serverCron) there's actually no need to call
startBgsaveForReplication from updateSlavesWaitingForBgsave anymore,
since as soon as updateSlavesWaitingForBgsave returns, CheckChildrenDone is
calling replicationStartPendingFork that handles that anyway.
The code in updateSlavesWaitingForBgsave had a bug in which it ignored
repl-diskless-sync-delay, but removing that code shows that this bug was
hiding another bug, which is that the max_idle should have used >= and
not >, this one second delay has a big impact on my new test.

* Remove hiredis so we can add it as a subtree

* Squashed 'deps/hiredis/' content from commit 39de5267c

git-subtree-dir: deps/hiredis
git-subtree-split: 39de5267c092859b4cab4bdf79081e9634b70e39

* fix migration's broadcast PONG message, after the slot modification (#7590)

* remove superfluous else block (#7620)

The else block would be executed when newlen == 0 and in the case memmove won't be called, so there's no need to set start.

* Fix applying zero offset to null pointer when creating moduleFreeContextReusedClient (#7323)

Before this fix we where attempting to select a db before creating db the DB, see: #7323

This issue doesn't seem to have any implications, since the selected DB index is 0,
the db pointer remains NULL, and will later be correctly set before using this dummy
client for the first time.

As we know, we call 'moduleInitModulesSystem()' before 'initServer()'. We will allocate
memory for server.db in 'initServer', but we call 'createClient()' that will call 'selectDb()'
in 'moduleInitModulesSystem()', before the databases where created. Instead, we should call
'createClient()' for moduleFreeContextReusedClient after 'initServer()'.

* fix memory leak in ACLLoadFromFile error handling (#7623)

* [Redis-benchmark] Remove zrem test, add zpopmin test

* Create PUSH handlers in redis-cli

Add logic to redis-cli to display RESP3 PUSH messages when we detect
STDOUT is a tty, with an optional command-line argument to override
the default behavior.

The new argument:  --show-pushes <yn>

Examples:

$ redis-cli -3 --show-pushes no
$ echo "client tracking on\nget k1\nset k1 v1"| redis-cli -3 --show-pushes y

* Print error info if failed opening config file (#6943)

* Optimize calls to mstime in trackInstantaneousMetric() (#6472)

* add sentinel command help (#7579)

* Reduce the probability of failure when start redis in runtest-cluster #7554 (#7635)

When runtest-cluster, at first, we need to create a cluster use spawn_instance,
a port which is not used is choosen, however sometimes we can't run server on
the port. possibley due to a race with another process taking it first.
such as redis/redis/runs/896537490. It may be due to the machine problem or
In order to reduce the probability of failure when start redis in
runtest-cluster, we attemp to use another port when find server do not start up.

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: yanhui13 <[email protected]>

* see #7544, added RedisModule_HoldString api. (#7577)

Added RedisModule_HoldString that either returns a
shallow copy of the given String (by increasing
the String ref count) or a new deep copy of String
in case its not possible to get a shallow copy.

Co-authored-by: Itamar Haber <[email protected]>

* fixed a typo in redis.conf (#7636)

* see #7250, fix signature of RedisModule_DeauthenticateAndCloseClient (#7645)

In redismodule.h, RedisModule_DeauthenticateAndCloseClient returns void
`void REDISMODULE_API_FUNC(RedisModule_DeauthenticateAndCloseClient)(RedisModuleCtx *ctx, uint64_t client_id);`
But in module.c, RM_DeauthenticateAndCloseClient returns int
`int RM_DeauthenticateAndCloseClient(RedisModuleCtx *ctx, uint64_t client_id)`

It it safe to change return value from `void` to `int` from the user's perspective.

* Removes dead code (#7642)

Appears to be handled by server.stream_node_max_bytes in reality.

* Avoid redundant calls to signalKeyAsReady (#7625)

signalKeyAsReady has some overhead (namely dictFind) so we should
only call it when there are clients blocked on the relevant type (BLOCKED_*)

* Prevent dictRehashMilliseconds from rehashing while a safe iterator is present (#5948)

* Fix comment about ACLGetCommandPerm()

* Run daily workflow on main repo only (no forks). (#7646)

* Implement SMISMEMBER key member [member ...] (#7615)

This is a rebased version of #3078 originally by shaharmor
with the following patches by TysonAndre made after rebasing
to work with the updated C API:

1. Add 2 more unit tests
   (wrong argument count error message, integer over 64 bits)
2. Use addReplyArrayLen instead of addReplyMultiBulkLen.
3. Undo changes to src/help.h - for the ZMSCORE PR,
   I heard those should instead be automatically
   generated from the redis-doc repo if it gets updated

Motivations:

- Example use case: Client code to efficiently check if each element of a set
  of 1000 items is a member of a set of 10 million items.
  (Similar to reasons for working on #7593)
- HMGET and ZMSCORE already exist. This may lead to developers deciding
  to implement functionality that's best suited to a regular set with a
  data type of sorted set or hash map instead, for the multi-get support.

Currently, multi commands or lua scripting to call sismember multiple times
would almost definitely be less efficient than a native smismember
for the following reasons:

- Need to fetch the set from the string every time
  instead of reusing the C pointer.
- Using pipelining or multi-commands would result in more bytes sent
  and received by the client for the repeated SISMEMBER KEY sections.
- Need to specially encode the data and decode it from the client
  for lua-based solutions.
- Proposed solutions using Lua or SADD/SDIFF could trigger writes to
  memory, which is undesirable on a redis replica server
  or when commands get replicated to replicas.

Co-Authored-By: Shahar Mor <[email protected]>
Co-Authored-By: Tyson Andre <[email protected]>

* Start redis after network is online (#7639)

The two lines allow systemd to start redis.service after the network is online. Only after the network is online that Redis could bind to IP address other than 127.0.0.1 during initial boot up process.

* Correct error message of runtest-cluster and runtest-moduleapi (#7647)

* using proto-max-bulk-len in checkStringLength for SETRANGE and APPEND

* config: proto-max-bulk-len must be 1mb or greater

* CLIENT_MASTER should ignore server.proto_max_bulk_len

* Fix unidentical function declaration in bio.c.  lazyfree.c: lazyfreeFreeSlotsMapFromBioThread (#7228)

* redis-benchmark: fix wrong random key for hset (#4895)

* allow --pattern to be used along with --bigkeys (#3586)

Adds --pattern option to cli's --bigkeys, --hotkeys & --scan modes

* Adds redis-cli and redis-benchmark dependencies for `make test` target

Obsoletes the need to run `make` before `make test`.

* Test:Fix invalid cases in hash.tcl and dump.tcl (#4611)

* Replace usage of wrongtypeerr with helper (#7633)

* Replace usage of wrongtypeerr with helper

* Fixed timer warning (#5953)

* fix misleading typo hasActiveChildProcess doc comment (#7588)

and a misspell in rax.c

* Add oom-score-adj configuration option to control Linux OOM killer. (#1690)

Add Linux kernel OOM killer control option.

This adds the ability to control the Linux OOM killer oom_score_adj
parameter for all Redis processes, depending on the process role (i.e.
master, replica, background child).

A oom-score-adj global boolean flag control this feature. In addition,
specific values can be configured using oom-score-adj-values if
additional tuning is required.

* Set the initial seed for random() (#5679)

* wait command optimization (#7333)

Client that issued WAIT last will most likely have the highest replication offset, so imagine a probably common case where all clients are waiting for the same number of replicas. we prefer the loop to start from the last client (one waiting for the highest offset), so that the optimization in the function will call replicationCountAcksByOffset for each client until it found a good one, and stop calling it for the rest of the clients.
the way the loop was implemented would mean that in such case it is likely to call replicationCountAcksByOffset for all clients.

Note: the change from > to >= is not directly related to the above.

Co-authored-by: 曹正斌 <[email protected]>

* Annotate module API functions in redismodule.h for use with -fno-common (#6900)

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute.  This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at gcc.gnu.org/gcc-10/porting_to.html

Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax.  These macros only expand to API annotations under gcc.

Provide a pre- and post- macro for every API function, so that they can
be defined differently by the file that includes redismodule.h.

Removing REDISMODULE_API_FUNC in the interest of keeping the function
declarations readable.

Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: Oran Agra <[email protected]>

* Squashed 'deps/hiredis/' changes from d5b4c69b7..00272d669

00272d669 Rename sds calls so they don't conflict in Redis.

git-subtree-dir: deps/hiredis
git-subtree-split: 00272d669b11e96b8311d9bfe167c117f8887dd6

* Use Hiredis' sdscompat.h to map sds* calls to hi_sds*

* TLS: relax verification on CONFIG SET. (#7665)

Avoid re-configuring (and validating) SSL/TLS configuration on `CONFIG
SET` when TLS is not actively enabled for incoming connections, cluster
bus or replication.

This fixes failures when tests run without `--tls` on binaries that were
built with TLS support.

An additional benefit is that it's now possible to perform a multi-step
configuration process while TLS is disabled. The new configuration will
be verified and applied only when TLS is effectively enabled.

* Module API: fix missing RM_CLIENTINFO_FLAG_SSL. (#7666)

The `REDISMODULE_CLIENTINFO_FLAG_SSL` flag was already a part of the `RedisModuleClientInfo` structure but was not implemented.

* Trim trailing spaces in error replies coming from rejectCommand (#7668)

65a3307bc9 added rejectCommand which takes an robj reply and passes it
through addReplyErrorSafe to addReplyErrorLength.
The robj contains newline at it's end, but addReplyErrorSafe converts it
to spaces, and passes it to addReplyErrorLength which adds the protocol
newlines.

The result was that most error replies (like OOM) had extra two trailing
spaces in them.

* [module] using predefined REDISMODULE_NO_EXPIRE in RM_GetExpire (#7669)

It was already defined in the API header and the documentation, but not used by the implementation.

* edit auth failed message (#7648)

Edit auth failed message include user disabled case in hello command

* OOM Crash log include size of allocation attempt. (#7670)

Since users often post just the crash log in github issues, the log
print that's above it is missing.
No reason not to include the size in the panic message itself.

* PERSIST should signalModifiedKey (Like EXPIRE does) (#7671)

* Add comments on 'slave.repldboff' when use diskless replication (#7679)

* Fixed hset error since it's shared with hmset (#7678)

* Update clusterMsgDataPublish to clusterMsgModule (#7682)

Correcting the variable to clusterMsgModule.

* Fix flock cluster config may cause failure to restart after kill -9 (#7674)

After fork, the child process(redis-aof-rewrite) will get the fd opened
by the parent process(redis), when redis killed by kill -9, it will not
graceful exit(call prepareForShutdown()), so redis-aof-rewrite thread may still
alive, the fd(lock) will still be held by redis-aof-rewrite thread, and
redis restart will fail to get lock, means fail to start.

This issue was causing failures in the cluster tests in github actions.

Co-authored-by: Oran Agra <[email protected]>

* Modules: Invalidate saved_oparray after use (#7688)

We wanna avoid a chance of someone using the pointer in it after it'll be freed / realloced.

* RedisModuleEvent_LoadingProgress always at 100% progress (#7685)

It was also using the wrong struct, but luckily RedisModuleFlushInfo and RedisModuleLoadingProgress
are identical.

* use dictSlots for getting total slots number in dict (#7691)

* fix make warnings (#7692)

Co-authored-by: Salvatore Sanfilippo <[email protected]>
Co-authored-by: hwware <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: Qu Chen <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: zhaozhao.zz <[email protected]>
Co-authored-by: Liu Zhen <[email protected]>
Co-authored-by: Kevin Fwu <[email protected]>
Co-authored-by: xhe <[email protected]>
Co-authored-by: Paul Spooren <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: root <[email protected]>
Co-authored-by: chenhui0212 <[email protected]>
Co-authored-by: Tomasz Poradowski <[email protected]>
Co-authored-by: Dave Nielsen <[email protected]>
Co-authored-by: zhaozhao.zz <[email protected]>
Co-authored-by: huangzhw <[email protected]>
Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: James Hilliard <[email protected]>
Co-authored-by: WuYunlong <[email protected]>
Co-authored-by: Jiayuan Chen <[email protected]>
Co-authored-by: Abhishek Soni <[email protected]>
Co-authored-by: 马永泽 <[email protected]>
Co-authored-by: 杨博东 <[email protected]>
Co-authored-by: jimgreen2013 <[email protected]>
Co-authored-by: Qu Chen <[email protected]>
Co-authored-by: Developer-Ecosystem-Engineering <65677710+Developer-Ecosystem-Engineering@users.noreply.github.com>
Co-authored-by: dmurnane <[email protected]>
Co-authored-by: Daniel Murnane <[email protected]>
Co-authored-by: Luke Palmer <[email protected]>
Co-authored-by: yoav-steinberg <[email protected]>
Co-authored-by: Itamar Haber <[email protected]>
Co-authored-by: Scott Brenner <[email protected]>
Co-authored-by: Remi Collet <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: Sungho Hwang <[email protected]>
Co-authored-by: Brian P O'Rourke <[email protected]>
Co-authored-by: grishaf <[email protected]>
Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: namtsui <[email protected]>
Co-authored-by: Nam Nguyen <[email protected]>
Co-authored-by: Arun Ranganathan <[email protected]>
Co-authored-by: Kevin McGehee <[email protected]>
Co-authored-by: fayadexinqing <[email protected]>
Co-authored-by: hujie <[email protected]>
Co-authored-by: Tyson Andre <[email protected]>
Co-authored-by: Tyson Andre <[email protected]>
Co-authored-by: Frank Meier <[email protected]>
Co-authored-by: Frank Meier <[email protected]>
Co-authored-by: ShooterIT <[email protected]>
Co-authored-by: michael-grunder <[email protected]>
Co-authored-by: xuannianz <[email protected]>
Co-authored-by: Wang Yuan <[email protected]>
Co-authored-by: yanhui13 <[email protected]>
Co-authored-by: Hamed Momeni <[email protected]>
Co-authored-by: 杨博东 <[email protected]>
Co-authored-by: Jim Brunner <[email protected]>
Co-authored-by: Rajat Pawar <[email protected]>
Co-authored-by: Shahar Mor <[email protected]>
Co-authored-by: YoongHM <[email protected]>
Co-authored-by: pingfan108 <[email protected]>
Co-authored-by: Muhammad Zahalqa <[email protected]>
Co-authored-by: Wagner Francisco Mezaroba <[email protected]>
Co-authored-by: Mota <[email protected]>
Co-authored-by: HarveyLiu <[email protected]>
Co-authored-by: RemRain <[email protected]>
Co-authored-by: caozb <[email protected]>
Co-authored-by: 曹正斌 <[email protected]>
Co-authored-by: Nathan Scott <[email protected]>
Co-authored-by: guybe7 <[email protected]>
Co-authored-by: Raghav Muddur <[email protected]>
Co-authored-by: huangzhw <[email protected]>
oranagra added a commit that referenced this pull request Sep 1, 2020
…on (#6900)

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute.  This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at gcc.gnu.org/gcc-10/porting_to.html

Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax.  These macros only expand to API annotations under gcc.

Provide a pre- and post- macro for every API function, so that they can
be defined differently by the file that includes redismodule.h.

Removing REDISMODULE_API_FUNC in the interest of keeping the function
declarations readable.

Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 11cd983)
jschmieg pushed a commit to memKeyDB/memKeyDB that referenced this pull request Nov 6, 2020
…on (redis#6900)

In order to keep the redismodule.h self-contained but still usable with
gcc v10 and later, annotate each API function tentative definition with
the __common__ attribute.  This avoids the 'multiple definition' errors
modules will otherwise see for all API functions at link time.

Further details at gcc.gnu.org/gcc-10/porting_to.html

Turn the existing __attribute__ ((unused)), ((__common__)) and ((print))
annotations into conditional macros for any compilers not accepting this
syntax.  These macros only expand to API annotations under gcc.

Provide a pre- and post- macro for every API function, so that they can
be defined differently by the file that includes redismodule.h.

Removing REDISMODULE_API_FUNC in the interest of keeping the function
declarations readable.

Co-authored-by: Yossi Gottlieb <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 11cd983)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants