Skip to content

An idea for a better testing framework#9

Closed
madolson wants to merge 243 commits intounstablefrom
new-tests
Closed

An idea for a better testing framework#9
madolson wants to merge 243 commits intounstablefrom
new-tests

Conversation

@madolson
Copy link
Owner

An idea to try to refactor the OSS unit testing framework to make it a bit more usable. A couple of key ideas:

  1. All of the testing code is now in a separate folder, instead of requiring a special build flag at the bottom fo the existing files.
  2. Unit tests don't require any custom build flags, it's all exposed through a new compiled binary that can be built without rebuilding the environment.
  3. Tried to refactor some of the tests to use the common assertion code, so we can track it.

valentinogeron and others added 30 commits July 17, 2022 08:47
…s#10975)

`--skipfile` can be repeated.
For example: ./runtests --skipfile file1.txt --skipfile file2.txt

Co-authored-by: Valentino Geron <[email protected]>
According to the Redis functions documentation, FCALL command format could be
FCALL function_name numberOfKeys [key1, key2, key3.....] [arg1, arg2, arg3.....]

So in the json file of fcall and fcall_ro, we should add optional for key and arg part.
Just like EVAL...

Co-authored-by: Binbin <[email protected]>
…edis#10991)

In the newly added cluster hostnames test, the primary is failing over during the reboot
for valgrind so we are validating the wrong node. This change just sets the replica to
prevent taking over, which seems to fix the test.

We could have also set the timeout higher, but it slows down the test.
If we do not use jemalloc (mostly with valgrind) and use an old compiler that does not support C11
we will get compilation error

Co-authored-by: Valentino Geron <[email protected]>
replace use of:
sprintf --> snprintf
strcpy/strncpy  --> redis_strlcpy
strcat/strncat  --> redis_strlcat

**why are we making this change?**
Much of the code uses some unsafe variants or deprecated buffer handling
functions.
While most cases are probably not presenting any issue on the known path
programming errors and unterminated strings might lead to potential
buffer overflows which are not covered by tests.

**As part of this PR we change**
1. added implementation for redis_strlcpy and redis_strlcat based on the strl implementation: https://linux.die.net/man/3/strl
2. change all occurrences of use of sprintf with use of snprintf
3. change occurrences of use of  strcpy/strncpy with redis_strlcpy
4. change occurrences of use of strcat/strncat with redis_strlcat
5. change the behavior of ll2string/ull2string/ld2string so that it will always place null
  termination ('\0') on the output buffer in the first index. this was done in order to make
  the use of these functions more safe in cases were the user will not check the output
  returned by them (for example in rdbRemoveTempFile)
6. we added a compiler directive to issue a deprecation error in case a use of
  sprintf/strcpy/strcat is found during compilation which will result in error during compile time.
  However keep in mind that since the deprecation attribute is not supported on all compilers,
  this is expected to fail during push workflows.


**NOTE:** while this is only an initial milestone. We might also consider
using the *_s implementation provided by the C11 Extensions (however not
yet widly supported). I would also suggest to start
looking at static code analyzers to track unsafe use cases.
For example LLVM clang checker supports security.insecureAPI.DeprecatedOrUnsafeBufferHandling
which can help locate unsafe function usage.
https://clang.llvm.org/docs/analyzer/checkers.html#security-insecureapi-deprecatedorunsafebufferhandling-c
The main reason not to onboard it at this stage is that the alternative
excepted by clang is to use the C11 extensions which are not always
supported by stdlib.
)

The temporary array for deleted entries reply of XAUTOCLAIM was
insufficient, but also in fact the COUNT argument should be used to
control the size of the reply, so instead of terminating the loop by
only counting the claimed entries, we'll count deleted entries as well.

Fix redis#10968
Addresses CVE-2022-31144
A timing issue like this was reported in freebsd daily CI:
```
*** [err]: Sanity test push cmd after resharding in tests/unit/cluster/cli.tcl
Expected 'CLUSTERDOWN The cluster is down' to match '*MOVED*'
```

We additionally wait for each node to reach a consensus on the cluster
state in wait_for_condition to avoid the cluster down error.

The fix just like redis#10495, quoting madolson's comment:
Cluster check just verifies the the config state is self-consistent,
waiting for cluster_state to be okay is an independent check that all
the nodes actually believe each other are healthy.

At the same time i noticed that unit/moduleapi/cluster.tcl has an exact
same test, may have the same problem, also modified it.
Following redis#10996, it forgot to modify RM_StringCompare in module.c

Modified RM_StringCompare, compareStringObjectsWithFlags,
compareStringObjects and collateStringObjects.
these are missing from the RO_ commands, present in the other ones.

Co-authored-by: Ubuntu <[email protected]>
As an outstanding part mentioned in redis#10737, we could just make the cluster config file and
ACL file saving done with a more safe and atomic pattern (write to temp file, fsync, rename, fsync dir).

The cluster config file uses an in-place overwrite and truncation (which was also used by the
main config file before redis#7824).
The ACL file is using the temp file and rename approach, but was missing an fsync.

Co-authored-by: 朱天 <[email protected]>
It seems it was overlooked when we first created the json files
Swap M and N in the complexity formula of [B]ZMPOP

Co-authored-by: Pavel Krush <[email protected]>
RM_SetAbsExpire and RM_GetAbsExpire were not actually operational since
they were introduced, due to omission in API registration.
In rewriteAppendOnlyFileBackground, after flushAppendOnlyFile(1),
and before openNewIncrAofForAppend, we should call redis_fsync
to fsync the aof file.

Because we may open a new INCR AOF in openNewIncrAofForAppend,
in the case of using everysec policy, the old AOF file may not
be fsynced in time (or even at all).

When using everysec, we don't want to pay the disk latency from
the main thread, so we will do a background fsync.

Adding a argument for bioCreateCloseJob, a `need_fsync` flag to
indicate that a fsync is required before the file is closed. So we will
fsync the old AOF file before we close it.

A cleanup, we make union become a union, since the free_* args and
the fd / fsync args are never used together.

Co-authored-by: Oran Agra <[email protected]>
…s#11036)

When failing to open the rdb file, there was no specific error printed (unlike a
corrupt file), so it was not clear what failed and why.
Gossip the cluster node blacklist in ping and pong messages.
This means that CLUSTER FORGET doesn't need to be sent to all nodes in a cluster.
It can be sent to one or more nodes and then be propagated to the rest of them.

For each blacklisted node, its node id and its remaining blacklist TTL is gossiped in a
cluster bus ping extension (introduced in redis#9530).
…ash. (redis#11032)

Fix redis#11030, use lua_rawget to avoid triggering metatables.

redis#11030 shows how return `_G` from the Lua script (either function or eval), cause the
Lua interpreter to Panic and the Redis processes to exit with error code 1.
Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older
versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable
such that the metatable raises an error.

The following example demonstrate the issue:
```
127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0
Error: Server closed the connection
```
```
PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo')
```

The Lua panic happened because when returning the result to the client, Redis needs to
introspect the returning table and transform the table into a resp. In order to scan the table,
Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error.
This code is not running inside `pcall` (Lua protected call), so raising an error causes the
Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1.

Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable
that raises error when trying to fetch a none existing key.

### Solution

Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget`
that simply return the value from the table without triggering any metatable logic.
This is promised not to raise and error.

The downside of this solution is that it might be considered as breaking change, if someone
rely on metatable in the returned value. An alternative solution is to wrap this entire logic
with `pcall` (Lua protected call), this alternative require a much bigger refactoring.

### Back Porting

The same fix will work on older versions as well (6.2, 6.0). Notice that on those version,
the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses
Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done
from inside the script itself.

### Tests

Tests was added the verify the fix
…is#11038)

When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the
client side tracking invalidation message was interleaved with transaction response.
The reason we do this is because in redis#11036, we added error
log message when failing to open RDB file for reading.
In loadDdataFromDisk we call rdbLoad and also check errno,
now the logging corrupts errno (reported in alpine daily).

It is not safe to rely on errno as we do today, so we change
the return value of rdbLoad function to enums, like we have
when loading an AOF.
RM_Microseconds
Return the wall-clock Unix time, in microseconds

RM_CachedMicroseconds
Returns a cached copy of the Unix time, in microseconds.
It is updated in the server cron job and before executing a command.
It is useful for complex call stacks, such as a command causing a
key space notification, causing a module to execute a RedisModule_Call,
causing another notification, etc.
It makes sense that all these callbacks would use the same clock.
The kill above is sometimes successful and sometimes already too late.
The PING in pysnc wrong offset test got rejected by bgsaveerr because
lastbgsave_status is C_ERR.

In theory, using diskless can avoid PING being affected, because when
the replica is dropped, we will kill the child with SIGUSR1, and this
will not affect lastbgsave_status.

Anyway, this kill is not particularly needed here, dropping the kill
is the best one, since we do have the waitForBgsave, so just let it
take care of the bgsave. No need for fast termination.
…s#11053)

In clusterMsgPingExtForgottenNode, sizeof(name) is CLUSTER_NAMELEN,
and sizeof(clusterMsgPingExtForgottenNode) is > CLUSTER_NAMELEN.
Doing a (name + sizeof(clusterMsgPingExtForgottenNode)) sanitizer
generates an out-of-bounds error which is a false positive in here
trackingHandlePendingKeyInvalidations should use proto.
…f not plain (redis#11050)

Before this change, if the module has an embedded string, then uses RedisModule_SaveString
and RedisModule_LoadString, the result would be a raw string instead of an embedded string.

Now the `RDB_LOAD_ENC` flag to `moduleLoadString` only affects integer encoding, but not
embedded strings (which still hold an sds in the robj ptr, so they're actually still raw strings for
anyone who reads them).

Co-authored-by: Valentino Geron <[email protected]>
--stop: Blocks once the first test fails.
--loop: Execute the specified set of tests forever.
It is useful when we debug some test failures.
The docs state that there is a new and an old argument format.
The current state of the arguments allows mixing the old and new format,
thus the need for two additional oneof blocks.
One for differentiating the new from the old format and then one to
allow setting multiple filters using the new format.
There is a -Wimplicit-function-declaration warning in here:
```
keyspace_events.c: In function ‘KeySpace_NotificationGeneric’:
keyspace_events.c:67:9: warning: implicit declaration of function ‘usleep’; did you mean ‘sleep’? [-Wimplicit-function-declaration]
   67 |         usleep(1);
      |         ^~~~~~
      |         sleep
```
CharlesChen888 and others added 27 commits April 19, 2023 15:41
Add an option "withscores" to ZRANK and ZREVRANK.

Add `[withscore]` option to both `zrank` and `zrevrank`, like this:
```
z[rev]rank key member [withscore]
```
redis#11548)

In replica, the key expired before master's `INCR` was arrived, so INCR
creates a new key in the replica and the test failed.
```
*** [err]: Replication of an expired key does not delete the expired key in tests/integration/replication-4.tcl
Expected '0' to be equal to '1' (context: type eval line 13 cmd {assert_equal 0 [$slave exists k]} proc ::test)
```

This test is very likely to do a false positive if the `wait_for_ofs_sync`
takes longer than the expiration time, so give it a few more chances.

The test was introduced in redis#9572.
redis-benchmark: when trying to get the CONFIG before benchmark,
avoid printing any warning on most errors (e.g. NOPERM error).
avoid aborting the benchmark on NOPERM.
keep the warning only when we abort the benchmark on a NOAUTH error
…tes of memory for sparse representation (redis#11438)

Before this PR, we use sdsMakeRoomFor() to expand the size of hyperloglog
string (sparse representation). And because sdsMakeRoomFor() uses a greedy
strategy (allocate about twice what we need), the memory we allocated for the
hyperloglog may be more than `server.hll_sparse_max_bytes` bytes.
The memory more than` server.hll_sparse_max_bytes` will be wasted.

In this pull request, tone down the greediness of the allocation growth, and also
make sure it'll never request more than `server.hll_sparse_max_bytes`.

This could in theory mean the size of the hyperloglog string is insufficient for the
increment we need, should be ok since in this case we promote the hyperloglog
to dense representation, an assertion was added to make sure.

This PR also add some tests and fixes some typo and indentation issues.
…me (redis#11521)

As being discussed in redis#10981 we see a degradation in performance
between v6.2 and v7.0 of Redis on the EVAL command. 

After profiling the current unstable branch we can see that we call the
expensive function evalCalcFunctionName twice. 

The current "fix" is to basically avoid calling evalCalcFunctionName and
even dictFind(lua_scripts) twice for the same command.
Instead we cache the current script's dictEntry (for both Eval and Functions)
in the current client so we don't have to repeat these calls.
The exception would be when doing an EVAL on a new script that's not yet
in the script cache. in that case we will call evalCalcFunctionName (and even
evalExtractShebangFlags) twice.

Co-authored-by: Oran Agra <[email protected]>
)

Add a new module event `RedisModule_Event_Key`, this event is fired
when a key is removed from the keyspace.
The event includes an open key that can be used for reading the key before
it is removed. Modules can also extract the key-name, and use RM_Open
or RM_Call to access key from within that event, but shouldn't modify anything
from within this event.

The following sub events are available:
  - `REDISMODULE_SUBEVENT_KEY_DELETED`
  - `REDISMODULE_SUBEVENT_KEY_EXPIRED`
  - `REDISMODULE_SUBEVENT_KEY_EVICTED`
  - `REDISMODULE_SUBEVENT_KEY_OVERWRITE`

The data pointer can be casted to a RedisModuleKeyInfo structure
with the following fields:
```
     RedisModuleKey *key;    // Opened Key
 ```

### internals

* We also add two dict functions:
  `dictTwoPhaseUnlinkFind` finds an element from the table, also get the plink of the entry.
  The entry is returned if the element is found. The user should later call `dictTwoPhaseUnlinkFree`
  with it in order to unlink and release it. Otherwise if the key is not found, NULL is returned.
  These two functions should be used in pair. `dictTwoPhaseUnlinkFind` pauses rehash and
  `dictTwoPhaseUnlinkFree` resumes rehash.
* We change `dbOverwrite` to `dbReplaceValue` which just replaces the value of the key and
  doesn't fire any events. The "overwrite" part (which emits events) is just when called from `setKey`,
  the other places that called dbOverwrite were ones that just update the value in-place (INCR*, SPOP,
  and dbUnshareStringValue). This should not have any real impact since `moduleNotifyKeyUnlink` and
  `signalDeletedKeyAsReady` wouldn't have mattered in these cases anyway (i.e. module keys and
  stream keys didn't have direct calls to dbOverwrite)
* since we allow doing RM_OpenKey from withing these callbacks, we temporarily disable lazy expiry.
* We also temporarily disable lazy expiry when we are in unlink/unlink2 callback and keyspace 
  notification callback.
* Move special definitions to the top of redismodule.h
  This is needed to resolve compilation errors with RedisModuleKeyInfoV1
  that carries a RedisModuleKey member.

Co-authored-by: Oran Agra <[email protected]>
1. "Fixed" the current code so that seen-time/idle actually refers to interaction
  attempts (as documented; breaking change)
2. Added active-time/inactive to refer to successful interaction (what
  seen-time/idle used to be)

At first, I tried to avoid changing the behavior of seen-time/idle but then realized
that, in this case, the odds are the people read the docs and implemented their
code based on the docs (which didn't match the behavior).
For the most part, that would work fine, except that issue redis#9996 was found.

I was working under the assumption that people relied on the docs, and for
the most part, it could have worked well enough. so instead of fixing the docs,
as I would usually do, I fixed the code to match the docs in this particular case.

Note that, in case the consumer has never read any entries, the values
for both "active-time" (XINFO FULL) and "inactive" (XINFO CONSUMERS) will
be -1, meaning here that the consumer was never active.

Note that seen/active time is only affected by XREADGROUP / X[AUTO]CLAIM, not
by XPENDING, XINFO, and other "read-only" stream CG commands (always has been,
even before this PR)

Other changes:
* Another behavioral change (arguably a bugfix) is that XREADGROUP and X[AUTO]CLAIM
  create the consumer regardless of whether it was able to perform some reading/claiming
* RDB format change to save the `active_time`, and set it to the same value of `seen_time` in old rdb files.
This test sets the master ping interval to 1 hour, in order to avoid
pings in the replicatoin stream incrementing the replication offset,
however, it didn't increase the repl-timeout so on slow machines
where the test took more than 60 seconds, the replicas would drop
and reconnect.

```
*** [err]: PSYNC2: Partial resync after restart using RDB aux fields in tests/integration/psync2.tcl
Replica didn't partial sync
```

The test would detect 4 additional partial syncs where it expects
only one.
…eplyToRedisReply() (redis#11556)

profiling EVALSHA\ we see that luaReplyToRedisReply takes 8.73% out of the
56.90% of luaCallFunction CPU cycles. 

Using addReplyStatusLength instead of directly composing the protocol to avoid
sdscatprintf and addReplySds ( which imply multiple sdslen calls ).

The new approach drops
luaReplyToRedisReply CPU cycles to 3.77%
…1560)

The test failed with ERR DUMP payload version or checksum are wrong.
And it only fails on CentOS, this is due to the fact that tcl8.5 does not correctly
parse the hexadecimal abbreviation. And in Ubuntu we are using tcl8.6.
* Remove duplicate code, propagating SSL errors into connection state.
* Add missing error handling in synchronous IO functions.
* Fix connection error reporting in some replication flows.
…nprintf %.4f (redis#11552)

GEODIST used snprintf("%.4f") for the reply using addReplyDoubleDistance,
which was slow. This PR optimizes it without breaking compatibility by following
the approach of ll2string with some changes to match the use case of distance
and precision. I.e. we multiply it by 10000 format it as an integer, and then add
a decimal point. This can achieve about 35% increase in the achievable ops/sec. 

Co-authored-by: Oran Agra <[email protected]>
…7.0 (redis#11541)

This mechanism aims to reduce calls to malloc and free when
preparing the arguments the script sends to redis commands.
This is a mechanism was originally implemented in 48c49c4
and 4f68655, and was removed in redis#10220 (thinking it's not needed
and that it has no impact), but it now turns out it was wrong, and it
indeed provides some 5% performance improvement.

The implementation is a little bit too simplistic, it assumes consecutive
calls use the same size in the same arg index, but that's arguably
sufficient since it's only aimed at caching very small things.

We could even consider always pre-allocating args to the full
LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument,
that would increase the chance they'll be able to be re-used.
But in some way this is already happening since we're using
sdsalloc, which in turn uses s_malloc_usable and takes ownership
of the full side of the allocation, so we are padded to the allocator
bucket size.


Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: sundb <[email protected]>
… diff is 0 (redis#11579)

This is take 2 of `GEOSEARCH BYBOX` optimizations based on haversine
distance formula when longitude diff is 0.
The first one was in redis#11535 . 

- Given longitude diff is 0 the asin(sqrt(a)) on the haversine is asin(sin(abs(u))).
- arcsin(sin(x)) equal to x when x ∈[−𝜋/2,𝜋/2]. 
- Given latitude is between [−𝜋/2,𝜋/2] we can simplifiy arcsin(sin(x)) to x.

On the sample dataset with 60M datapoints, we've measured 55% increase
in the achievable ops/sec.
There is a issue with --sentinel:
```
[root]# src/redis-server sentinel.conf --sentinel --loglevel verbose

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 352
>>> 'sentinel "--loglevel" "verbose"'
Unrecognized sentinel configuration statement
```

This is because in redis#10660 (Redis 7.0.1), `--` prefix change break it.
In this PR, we will handle `--sentinel` the same as we did for `--save`
in redis#10866. i.e. it's a pseudo config option with no value.
…ed (redis#11559)

In most cases when a listpack or intset is converted to a dict, the conversion
is trigged when adding an element. The extra element is added after conversion
to dict (in all cases except when the conversion is triggered by
set-max-intset-entries being reached).

If set-max-listpack-entries is set to a power of two, let's say 128, when
adding the 129th element, the 128 element listpack is first converted to a dict
with a hashtable presized for 128 elements. After converting to dict, the 129th
element is added to the dict which immediately triggers incremental rehashing
to size 256.

This commit instead presizes the dict to one more element, with the assumption
that conversion to dict is followed by adding another element, so the dict
doesn't immediately need rehashing.

Co-authored-by: sundb <[email protected]>
Co-authored-by: Oran Agra <[email protected]>
… is disabled (redis#11348)

## Issue
During the client input/output buffer processing, the memory usage is
incrementally updated to keep track of clients going beyond a certain
threshold `maxmemory-clients` to be evicted. However, this additional
tracking activity leads to unnecessary CPU cycles wasted when no
client-eviction is required. It is applicable in two cases.

* `maxmemory-clients` is set to `0` which equates to no client eviction
  (applicable to all clients)
* `CLIENT NO-EVICT` flag is set to `ON` which equates to a particular
  client not applicable for eviction.  

## Solution
* Disable client memory usage tracking during the read/write flow when
  `maxmemory-clients` is set to `0` or `client no-evict` is `on`.
  The memory usage is tracked only during the `clientCron` i.e. it gets
  periodically updated.
* Cleanup the clients from the memory usage bucket when client eviction
  is disabled.
* When the maxmemory-clients config is enabled or disabled at runtime,
  we immediately update the memory usage buckets for all clients (tested
  scanning 80000 took some 20ms)

Benchmark shown that this can improve performance by about 5% in
certain situations.

Co-authored-by: Oran Agra <[email protected]>
When establishing an outgoing TLS connection using a hostname as a target, use TLS SNI extensions to include the hostname in use.
As Sentinel supports dynamic IP only when using hostnames, there
are few leftover addess comparison logic that doesn't take into
account that the IP might get change.

Co-authored-by: moticless <[email protected]>
From https://en.wikipedia.org/wiki/NaN#Display, it says
that apart from nan and -nan, we can also get NAN and even
nan(char-sequence) from libc.

In redis#11482, our conclusion was that we wanna normalize it in
Redis to a single nan type, like we already normalized inf.

For this, we also reverted the assert_match part of the test
added in redis#11506, using assert_equal to validate the changes.
There is overhead on Redis 7.0 EXPIRE command that is not present on 6.2.7. 

We could see that on the unstable profile there are around 7% of CPU cycles
spent on rewriteClientCommandVector that are not present on 6.2.7.
This was introduced in redis#8474.
This PR reduces the overhead by using 2X rewriteClientCommandArgument instead of
rewriteClientCommandVector. In this scenario rewriteClientCommandVector creates 4 arguments.
the above usage of rewriteClientCommandArgument reduces the overhead in half.

This PR should also improve PEXPIREAT performance by avoiding at all
rewriteClientCommandArgument usage. 

Co-authored-by: Oran Agra <[email protected]>
…11598)

We do defrag during AOF loading, but aim to detect fragmentation only
once a second, so this test aims to slow down the AOF loading and mimic
loading of a large file.
On fast machines the sleep, plus the actual work we did was insufficient
making it sleep longer so the test won't fail.

The error we used to get is this one:
Expected 0 > 100000 (context: type eval line 106 cmd {assert {$hits > 100000}} proc ::test)
…edis#11581)

In redis#11290, we added listpack encoding for SET object.
But forgot to support it in zuiFind, causes ZINTER, ZINTERSTORE,
ZINTERCARD, ZIDFF, ZDIFFSTORE to crash.
And forgot to support it in RM_ScanKey, causes it hang.

This PR add support SET listpack in zuiFind, and in RM_ScanKey.
And add tests for related commands to cover this case.

Other changes:
- There is no reason for zuiFind to go into the internals of the SET.
  It can simply use setTypeIsMember and don't care about encoding.
- Remove the `#include "intset.h"` from server.h reduce the chance of
  accidental intset API use.
- Move setTypeAddAux, setTypeRemoveAux and setTypeIsMemberAux
  interfaces to the header.
- In scanGenericCommand, use setTypeInitIterator and setTypeNext
  to handle OBJ_SET scan.
- In RM_ScanKey, improve hash scan mode, use lpGetValue like zset,
  they can share code and better performance.

The zuiFind part fixes redis#11578

Co-authored-by: Oran Agra <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
madolson pushed a commit that referenced this pull request Aug 10, 2023
…is missed cases to redis-server. (redis#12322)

Observed that the sanitizer reported memory leak as clean up is not done
before the process termination in negative/following cases:

**- when we passed '--invalid' as option to redis-server.**

```
 -vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments

=================================================================
==865778==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7f0985f65867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x558ec86686ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
    #2 0x558ec86686ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
    #3 0x558ec86686ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
    #4 0x558ec86686ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
    #5 0x558ec865dd7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
    #6 0x558ec87a1be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
    #7 0x558ec87a13b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
    #8 0x558ec85e6f15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
    #9 0x7f09856e5d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

```

**- when we pass '--port' as option and missed to add port number to redis-server.**

```
vm:~/mem-leak-issue/redis$ ./src/redis-server --port

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments

=================================================================
==865846==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7fdcdbb1f867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
    #1 0x557e8b04f6ec in ztrymalloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:117
    #2 0x557e8b04f6ec in ztrymalloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:135
    #3 0x557e8b04f6ec in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:276
    #4 0x557e8b04f6ec in zrealloc /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:327
    #5 0x557e8b044d7e in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1172
    #6 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
    #7 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
    #8 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
    #9 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

Indirect leak of 10 byte(s) in 1 object(s) allocated from:
    #0 0x7fdcdbb1fc18 in __interceptor_realloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:164
    #1 0x557e8b04f9aa in ztryrealloc_usable_internal /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:287
    #2 0x557e8b04f9aa in ztryrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:317
    #3 0x557e8b04f9aa in zrealloc_usable /home/ubuntu/mem-leak-issue/redis/src/zmalloc.c:342
    #4 0x557e8b033f90 in _sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:271
    #5 0x557e8b033f90 in sdsMakeRoomFor /home/ubuntu/mem-leak-issue/redis/src/sds.c:295
    #6 0x557e8b033f90 in sdscatlen /home/ubuntu/mem-leak-issue/redis/src/sds.c:486
    #7 0x557e8b044e1f in sdssplitargs /home/ubuntu/mem-leak-issue/redis/src/sds.c:1165
    #8 0x557e8b188be7 in loadServerConfigFromString /home/ubuntu/mem-leak-issue/redis/src/config.c:472
    #9 0x557e8b1883b3 in loadServerConfig /home/ubuntu/mem-leak-issue/redis/src/config.c:718
    #10 0x557e8afcdf15 in main /home/ubuntu/mem-leak-issue/redis/src/server.c:7258
    #11 0x7fdcdb29fd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58

SUMMARY: AddressSanitizer: 18 byte(s) leaked in 2 allocation(s).

```

As part analysis found that the sdsfreesplitres is not called when this condition checks are being hit.

Output after the fix:


```
vm:~/mem-leak-issue/redis$ ./src/redis-server --invalid

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'invalid'
Bad directive or wrong number of arguments
vm:~/mem-leak-issue/redis$

===========================================
vm:~/mem-leak-issue/redis$ ./src/redis-server --jdhg

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'jdhg'
Bad directive or wrong number of arguments

---------------------------------------------------------------------------
vm:~/mem-leak-issue/redis$ ./src/redis-server --port

*** FATAL CONFIG FILE ERROR (Redis 255.255.255) ***
Reading the configuration file, at line 2
>>> 'port'
wrong number of arguments
```

Co-authored-by: Oran Agra <[email protected]>
@madolson madolson closed this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.