Skip to content

pull from main branch#4

Merged
JimB123 merged 62 commits intoJimB123:unstablefrom
redis:unstable
Oct 5, 2020
Merged

pull from main branch#4
JimB123 merged 62 commits intoJimB123:unstablefrom
redis:unstable

Conversation

@JimB123
Copy link
Owner

@JimB123 JimB123 commented Oct 5, 2020

No description provided.

oranagra and others added 30 commits September 13, 2020 13:50
This test was nearly always failing on MacOS github actions.
This is because of bugs in the test that caused it to nearly always run
all 3 attempts and just look at the last one as the pass/fail creteria.

i.e. the test was nearly always running all 3 attempts and still sometimes
succeed. this is because the break condition was different than the test
completion condition.

The reason the test succeeded is because the break condition tested the
results of all 3 tests (PSETEX/PEXPIRE/PEXPIREAT), but the success check
at the end was only testing the result of PSETEX.

The reason the PEXPIREAT test nearly always failed is because it was
getting the current time wrong: getting the current second and loosing
the sub-section time, so the only chance for it to succeed is if it run
right when a certain second started.

Because i now get the time from redis, adding another round trip, i
added another 100ms to the PEXPIRE test to make it less fragile, and
also added many more attempts.

Adding many more attempts before failure to account for slow platforms,
github actions and valgrind
These tests started failing every day on http 404 (not being able to
install valgrind)
Before this commit, following command did not show --tls option:
./runtest-cluster --help
./runtest-sentinel --help
This commit makes stream object returning "stream" as encoding type in OBJECT ENCODING subcommand and DEBUG OBJECT command.

Till now, it would return "unknown"
Rather than blindly evicting until maxmemory limit is achieved, this
update adds a time limit to eviction.  While over the maxmemory limit,
eviction will process before each command AND as a timeProc when no
commands are running.

This will reduce the latency impact on many cases, especially pathological
cases like massive used memory increase during dict rehashing.

There is a risk that some other edge cases (like massive pipelined use
of MGET) could cause Redis memory usage to keep growing despite the
eviction attempts, so a new maxmemory-eviction-tenacity config is
introduced to let users mitigate that.
If one thread got SIGSEGV, function sigsegvHandler() would be triggered,
it would call bioKillThreads(). But call pthread_cancel() to cancel itself
would make it block. Also note that if SIGSEGV is caught by bio thread, it
should kill the main thread in order to give a positive report.
The fix in error handling of rdbGenericLoadStringObject is an actual bugfix
Redis 6.0 introduces I/O threads, it is so cool and efficient, we use C11
_Atomic to establish inter-thread synchronization without mutex. But the
compiler that must supports C11 _Atomic can compile redis code, that brings a
lot of inconvenience since some common platforms can't support by default such
as CentOS7, so we want to implement redis atomic type to make it more portable.

We have implemented our atomic variable for redis that only has 'relaxed'
operations in src/atomicvar.h, so we implement some operations with
'sequentially-consistent', just like the default behavior of C11 _Atomic that
can establish inter-thread synchronization. And we replace all uses of C11
_Atomic with redis atomic variable.

Our implementation of redis atomic variable uses C11 _Atomic, __atomic or
__sync macros if available, it supports most common platforms, and we will
detect automatically which feature we use. In Makefile we use a dummy file to
detect if the compiler supports C11 _Atomic. Now for gcc, we can compile redis
code theoretically if your gcc version is not less than 4.1.2(starts to support
__sync_xxx operations). Otherwise, we remove use mutex fallback to implement
redis atomic variable for performance and test. You will get compiling errors
if your compiler doesn't support all features of above.

For cover redis atomic variable tests, we add other CI jobs that build redis on
CentOS6 and CentOS7 and workflow daily jobs that run the tests on them.
For them, we just install gcc by default in order to cover different compiler
versions, gcc is 4.4.7 by default installation on CentOS6 and 4.8.5 on CentOS7.

We restore the feature that we can test redis with Helgrind to find data race
errors. But you need install Valgrind in the default path configuration firstly
before running your tests, since we use macros in helgrind.h to tell Helgrind
inter-thread happens-before relationship explicitly for avoiding false positives.
Please open an issue on github if you find data race errors relate to this commit.

Unrelated:
- Fix redefinition of typedef 'RedisModuleUserChangedFunc'
  For some old version compilers, they will report errors or warnings, if we
  re-define function type.
We're already using bg_unlink in several places to delete the rdb file in the background,
and avoid paying the cost of the deletion from our main thread.
This commit uses bg_unlink to remove the temporary rdb file in the background too.

However, in case we delete that rdb file just before exiting, we don't actually wait for the
background thread or the main thread to delete it, and just let the OS clean up after us.
i.e. we open the file, unlink it and exit with the fd still open.

Furthermore, rdbRemoveTempFile can be called from a thread and was using snprintf which is
not async-signal-safe, we now use ll2string instead.
The symbol base address is a const on this system.
445a4b6 introudced a makefile script that detects if the toolchain
supports c11, and it looked that it was passing on MacOS and fails on
Ubuntu, looks like Ubuntu's Dash was spawning a background process,
deleted foo.c before gcc tried to compile it.
Refine comment of makeThreadKillable().

This commit can be backported to 5.0, only if we also backport 8b70cb0.

Co-authored-by: Oran Agra <[email protected]>
This commit can be cherry picked to 6.0 only if we also cherry pick f866023.
jemalloc configure shows this:
    configure: WARNING: unrecognized options: --enable-cc-silence

The changelog of jemalloc 4.0 has:
  - Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
    warnings by default.
Change `val` to `unsigned char` before being tested.
The fix is identical to the one that's been made in upstream jemalloc.

warning is:
src/malloc_io.c: In function ‘malloc_vsnprintf’:
src/malloc_io.c:369:2: warning: case label value exceeds maximum value for type
  369 |  case '?' | 0x80:      \
      |  ^~~~
src/malloc_io.c:581:5: note: in expansion of macro ‘GET_ARG_NUMERIC’
  581 |     GET_ARG_NUMERIC(val, 'p');
      |     ^~~~~~~~~~~~~~~
…op call (#7829)

This commit adds streamIteratorStop call in rewriteStreamObject function in some of the return statement. Although currently this will not cause memory leak since stream id is only 16 bytes long.
…nabled (#7819)

When all replicas waiting for a bgsave get disconnected (possibly due to output buffer limit),
It may be good to kill the bgsave child. in diskless replication it already happens, but in
disk-based, the child may still serve some purpose (for persistence).

By killing the child, we prevent it from eating COW memory in vain, and we also allow a new child fork sooner for the next full synchronization or bgsave.
We do that only if rdb persistence wasn't enabled in the configuration.

Btw, now, rdbRemoveTempFile in killRDBChild won't block server, so we can killRDBChild safely.
redis-check-rdb was unable to parse rdb files containing module aux data.

Co-authored-by: Oran Agra <[email protected]>
This happens only on diskless replicas when attempting to reconnect after 
failing to load an RDB file. It is more likely to occur with larger datasets.

After reconnection is initiated, replicationEmptyDbCallback() may get called 
and try to write to an unconnected socket. This triggered another issue where
the connection is put into an error state and the connect handler never gets
called. The problem is a regression introduced by commit c17e597.
this is very dangerous bug, but it looks like it didn't cause any harm.
mainly backtrace and register dump support.
gkorland and others added 29 commits September 24, 2020 12:45
Before this commit, we would have continued to add replies to the reply buffer even if client
output buffer limit is reached, so the used memory would keep increasing over the configured limit.
What's more, we shouldn’t write any reply to the client if it is set 'CLIENT_CLOSE_ASAP' flag
because that doesn't conform to its definition and we will close all clients flagged with
'CLIENT_CLOSE_ASAP' in ‘beforeSleep’.

Because of code execution order, before this, we may firstly write to part of the replies to
the socket before disconnecting it, but in fact, we may can’t send the full replies to clients
since OS socket buffer is limited. But this unexpected behavior makes some commands work well,
for instance ACL DELUSER, if the client deletes the current user, we need to send reply to client
and close the connection, but before, we close the client firstly and write the reply to reply
buffer. secondly, we shouldn't do this despite the fact it works well in most cases.

We add a flag 'CLIENT_CLOSE_AFTER_COMMAND' to mark clients, this flag means we will close the
client after executing commands and send all entire replies, so that we can write replies to
reply buffer during executing commands, send replies to clients, and close them later.

We also fix some implicit problems. If client output buffer limit is enforced in 'multi/exec',
all commands will be executed completely in redis and clients will not read any reply instead of
partial replies. Even more, if the client executes 'ACL deluser' the using user in 'multi/exec',
it will not read the replies after 'ACL deluser' just like before executing 'client kill' itself
in 'multi/exec'.

We added some tests for output buffer limit breach during multi-exec and using a pipeline of
many small commands rather than one with big response.

Co-authored-by: Oran Agra <[email protected]>
…7307)

When fclose would fail, the previous implementation would have attempted to do fclose again
this can in theory lead to segfault.

other changes:
check for non-zero return value as failure rather than a specific error code.
this doesn't fix a real bug, just a minor cleanup.
We should sync temp DB file before renaming as rdb_fsync_range does not use
flag `SYNC_FILE_RANGE_WAIT_AFTER`.

Refer to `Linux Programmer's Manual`:
SYNC_FILE_RANGE_WAIT_AFTER
    Wait upon write-out of all pages in the range after performing any write.
Make sure we handle short writes correctly, sync to disk after writing  and use
rename to make sure the replacement is actually atomic.
In any case of failure old configuration will remain in place.

Also, add some additional logging to make it easier to diagnose rewrite problems.
We may access and modify these two variables in signal handler function,
to guarantee them async-signal-safe, so we should set them to volatile
sig_atomic_t type.

It doesn't look like this could have caused any real issue, and it seems that
signals are handled in main thread on most platforms. But we want to follow C
and POSIX standard in signal handler function.
There's currently an issue with IO threads and gopher (issuing lookupKey from within the thread).
simply fix is to just not support it for now.
when slaveof config is "no one", reset any pre-existing config and resume.

also solve a memory leak if slaveof appears twice.
and fail loading if port number is out of range or not an integer.

Co-authored-by: caozhengbin <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
Also stabilize new shutdown tests on slow machines (valgrind)
- The argument `u` in for `ar` is ignored (and generates warnings since `D` became the default.
  All it does is avoid updating unchanged objects (shouldn't have any impact on our build)
- Enable `LUA_USE_MKSTEMP` to force the use of `mkstemp()` instead of `tmpname()` (which is dead
  code in redis anyway).
- Remove unused variable `c` in `f_parser()`
- Removed misleadingly indented space in `luaL_loadfile()` and ``addfield()`

Co-authored-by: Oran Agra <[email protected]>
The tls-ca-cert or tls-ca-cert-dir configuration parameters are only
used when Redis needs to authenticate peer certificates, in one of these
scenarios:

1. Incoming clients or replicas, with `tls-auth-clients` enabled.
2. A replica authenticating the master's peer certificate.
3. Cluster nodes authenticating other nodes when establishing the bus
   protocol connection.
The client pointed to by the module context may in some cases be a fake
client. RM_Authenticate*() calls in this case would be ineffective but
appear to succeed, and this change fails them to make it easier to catch
such cases.
When REDISMODULE_EVENT_CLIENT_CHANGE events are delivered, modules may
want to mutate the client state (e.g. perform authentication).

This change links the module context with the real client rather than a
fake client for these events.
…s” (#7871)

PROBLEM:

[$rd1 read] reads invalidation messages one by one, so it's never going to see the second invalidation message produced after INCR b, whether or not it exists. Adding another read will block incase no invalidation message is produced.

FIX:

We switch the order of "INCR a" and "INCR b" - now "INCR b" comes first. We still only read the first invalidation message produces. If an invalidation message is wrongly produces for b - then it will be produced before that of a, since "INCR b" comes before "INCR a".

Co-authored-by: Nitai Caro <[email protected]>
The MEMORY command is used for debugging memory usage, so it should include internal
fragmentation, same as used_memory
…'s internal frag (#7875)

This commit has two aspects:
1) improve memory reporting for all the places that use sdsAllocSize to compute
   memory used by a string, in this case it'll include the internal fragmentation.
2) reduce the need for realloc calls by making the sds implicitly take over
   the internal fragmentation of the block it allocated.
Add optional GET parameter to SET command in order to set a new value to
a key and retrieve the old key value. With this change we can deprecate
`GETSET` command and use only the SET command with the GET parameter.
Adding -D option for redis-cli to control newline between command
responses in raw mode.

Also removing cleanup code before calling exit, just in order
to avoid adding more adding more cleanup code (redis doesn't
bother to release allocations before exit anyway)

Co-authored-by: Oran Agra <[email protected]>
track and report memory used by clients argv.
this is very usaful in case clients started sending a command and didn't
complete it. in which case the first args of the command are already
trimmed from the query buffer.

in an effort to avoid cache misses and overheads while keeping track of
these, i avoid calling sdsZmallocSize and instead use the sdslen /
bulk-len which can at least give some insight into the problem.

This memory is now added to the total clients memory usage, as well as
the client list.
@JimB123 JimB123 merged commit 2e6cacb into JimB123:unstable Oct 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.