Release 7.0 RC2 by oranagra · Pull Request #10355 · redis/redis

oranagra · 2022-02-28T09:40:36Z

New Features

Add stream consumer group lag tracking and reporting (Add stream consumer group lag tracking and reporting #9127)
Add API for functions and eval Lua scripts to check ACL explicitly (acl check api for functions and eval #10220)

New user commands or command arguments

COMMAND GETKEYSANDFLAGS sub-command (Handle key-spec flags with modules #10237)
INFO command can take multiple section arguments (Make INFO command variadic #6891)
XGROUP CREATE and SETID: new ENTRIESREAD optional argument (Add stream consumer group lag tracking and reporting #9127)
XSETID new ENTRIESADDED and MAXDELETEDID optional arguments (Add stream consumer group lag tracking and reporting #9127)

Command replies that have been extended

XINFO reports consumer group lag and a few other fields (Add stream consumer group lag tracking and reporting #9127)
XAUTOCLAIM returns a new element with a list of deletes IDs (X[AUTO]CLAIM should skip deleted entries #10227)

Potentially Breaking Changes

X[AUTO]CLAIM skips deleted entries instead of replying with Nil, and deletes
them from the pending entry list (X[AUTO]CLAIM should skip deleted entries #10227)
Fix messed up error codes returned from EVAL scripts (Consistent erros returned from EVAL scripts #10218, Sort out the mess around Lua error messages and error stats #10329)
COMMAND INFO, Renames key-spec "CHANNEL" flag to be "NOT_KEY" (Implemented module getchannels api and renamed channel keyspec #10299)

Performance and resource utilization improvements

Reduce system calls and small packets for client replies (Reduce system calls of write for client->reply by introducing writev #9934)
Reduce memory usage of stale clients (introduce dynamic client reply buffer size - save memory on idle clients #9822)
Fix regression in Z[REV]RANGE commands (by-rank) introduced in Redis 6.2 (Optimization: Avoid deferred array reply on ZRANGE commands BYRANK #10337)

Changes in CLI tools

Adapt redis-check-aof tool for Multi Part AOF (Adapt redis-check-aof tool for Multi Part Aof #10061)
Enable redis-benchmark to use RESP3 protocol mode (Enable redis-benchmark to use RESP3 protocol mode #10335)

Platform / toolchain support related improvements

Fix OpenSSL 3.0.x related issues (Fix OpenSSL 3.0.x related issues. #10291)

INFO fields and introspection changes

COMMAND INFO key-specs has new variable_flags flag (Handle key-spec flags with modules #10237, Allow SET without GET arg on write-only ACL. Allow BITFIELD GET on read-only ACL #10148)
INFO stats: add aof_rewrites and rdb_snapshots counters (aof rewrite and rdb save counters in info #10178)
INFO stats: add reply_buffer_shrinks and reply_buffer_expends (introduce dynamic client reply buffer size - save memory on idle clients #9822)
INFO modules: add no-implicit-signal-modified module option (Fix RM_SetModuleOptions flag collision #10284)

Module API changes

Add RM_SetCommandInfo API to set command metadata for the new COMMAND
introspection features and ACL key permissions (Command info module API #10108)
Add RM_KeyAtPosWithFlags and RM_GetCommandKeysWithFlags APIs (Handle key-spec flags with modules #10237)
Add getchannels-api command flag and RM_IsChannelsPositionRequest,
RM_ChannelAtPosWithFlags APIs (Implemented module getchannels api and renamed channel keyspec #10299)
Change RM_ACLCheckChannelPermissions and RM_ACLCheckKeyPermissions APIs
(released in RC1) to take different flags (Implemented module getchannels api and renamed channel keyspec #10299)
Fix RM_SetModuleOptions flag collision. Bug in 7.0 RC1 header file, modules
that used OPTIONS_HANDLE_REPL_ASYNC_LOAD will mess up key invalidations (Fix RM_SetModuleOptions flag collision #10284)

Bug Fixes

Modules: Fix thread safety violation when a module thread adds an error reply,
broken in 6.2 (Fix and improve module error reply statistics #10278)
Lua: Fix Eval scripts active defrag, broken 7.0 in RC1 (Fix Eval scripts defrag (broken 7.0 in RC1) #10271)
Fix geo search bounding box check causing missing results (Fix geo search bounding box check causing missing results #10018)
Lua: Add checks for min-slave-* configs when evaluating Lua scripts and
Functions (Add check min-slave-* feature when evaluating Lua scripts and Functions #10160)
Modules: Prevent crashes and memory leaks when MODULE UNLOAD is used on module
with a pending timer (forbid module to unload when it holds ongoing timer #10187)
Fix error stats and failed command stats for blocked clients (Fix error stats and failed command stats for blocked clients #10309)
Lua/Modules: Fix missing and duplicate error stats for scripts and modules (Sort out the mess around Lua error messages and error stats #10329, Fix and improve module error reply statistics #10278)
Check target node is a primary during cluster setslot (Check target node is a primary during cluster setslot. #10277)
Fix key deletion not to invalidate WATCH when used on a logically expired key (Delete key doesn't dirty client who watched stale key #10256)
Sentinel: return an error if configuration save fails (Sentinel: return an error if configuration save fails #10151)
Sentinel: fix a free-after-use issue re-registering Sentinels (Sentinel: fix a free-after-use issue re-registering Sentinels. #10333)

This is done to avoid a crash when the timer fires after the module was unloaded. Or memory leaks in case we wanted to just ignore the timer. It'll cause the MODULE UNLOAD command to return with an error Co-authored-by: sundb <[email protected]>

update function help message, changed DESC->DESCRIPTION (doc was outdated)

…range command (#10128) add more test cases for addslotsrange and delslotsrange

The script which generates the markdown docs from module.c is updated to include the version in which each module API function was introduced. The script uses git tags to find this information. If git is not available or if we're not in a git repo, the 'since' is silently skipped. The line `**Available since:** (version)` is added after the function prototype Rename to utils/generate-module-api-doc.rb

…ns (#10160) Add check enough good slaves for write command when evaluating scripts. This check is made before the script is executed, if we have function flags, and per redis command if we don't. Co-authored-by: Phuc. Vo Trong <[email protected]> Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Meir Shpilraien (Spielrein) <[email protected]>

When performing `SENTINEL SET`, Sentinel updates the local configuration file. Before this commit, failure to update the file would still result with an `+OK` reply. Now, a `-ERR Failed to save config file` error will be returned. Co-authored-by: Yossi Gottlieb <[email protected]>

Introduced in #10128

Change the sentinel config file to a directory in SENTINEL SET test. So it will now fail on the `rename` in `rewriteConfigOverwriteFile`. The test used to set the sentinel config file permissions to `000` to simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151) Other changes: 1. More error messages after the config rewrite failure. 2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304) 3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)

Adds RM_SetCommandInfo, allowing modules to provide the following command info: * summary * complexity * since * history * hints * arity * key specs * args This information affects the output of `COMMAND`, `COMMAND INFO` and `COMMAND DOCS`, Cluster, ACL and is used to filter commands with the wrong number of arguments before the call reaches the module code. The recently added API functions for key specs (never released) are removed. A minimalist example would look like so: ```c RedisModuleCommand *mycmd = RedisModule_GetCommand(ctx,"mymodule.mycommand"); RedisModuleCommandInfo mycmd_info = { .version = REDISMODULE_COMMAND_INFO_VERSION, .arity = -5, .summary = "some description", }; if (RedisModule_SetCommandInfo(mycmd, &mycmd_info) == REDISMODULE_ERR) return REDISMODULE_ERR; ```` Notes: * All the provided information (including strings) is copied, not keeping references to the API input data. * The version field is actually a static struct that contains the sizes of the the structs used in arrays, so we can extend these in the future and old version will still be able to take the part they can support.

1. Update fcall.json and fcall_ro.json 2. Update command.c 3. Update help.h

…10043) This is a followup to #9656 and implements the following step mentioned in that PR: * When possible, extract all the help and completion tips from COMMAND DOCS (Redis 7.0 and up) * If COMMAND DOCS fails, use the static help.h compiled into redis-cli. * Supplement additional command names from COMMAND (pre-Redis 7.0) The last step is needed to add module command and other non-standard commands. This PR does not change the interactive hinting mechanism, which still uses only the param strings to provide somewhat unreliable and inconsistent command hints (see #8084). That task is left for a future PR. Co-authored-by: Oran Agra <[email protected]>

make sure the scripts are executable

`PSYNC replicationid str_offset` will crash the server. The reason is in `masterTryPartialResynchronization`, we will call `getLongLongFromObjectOrReply` check the offset. With a wrong offset, it will add a reply and then trigger a full SYNC and the client become a replica. So crash in `c->bufpos == 0 && listLength(c->reply) == 0`. In this commit, we check the psync_offset before entering function `masterTryPartialResynchronization`, and return. Regardless of that crash, accepting the sync, but also replying with an error would have corrupt the replication stream.

when we fail opening `/proc`, we need to close the log file fd.

So far we only tested attributes using readraw, not the resp parser caches them, so that after getting the reply, you can query them if you want.

In #9788, now we stores all persistent append-only files in a dedicated directory. The name of the directory is determined by the appenddirname configuration parameter in redis.conf. Now each node have a separate folder. Update create-cluster clean to clean this default directory.

Changes: 1. Adds the `redis.acl_check_cmd()` api to lua scripts. It can be used to check if the current user has permissions to execute a given command. The new function receives the command to check as an argument exactly like `redis.call()` receives the command to execute as an argument. 2. In the PR I unified the code used to convert lua arguments to redis argv arguments from both the new `redis.acl_check_cmd()` API and the `redis.[p]call()` API. This cleans up potential duplicate code. 3. While doing the refactoring in 2 I noticed there's an optimization to reduce allocation calls when parsing lua arguments into an `argv` array in the `redis.[p]call()` implementation. These optimizations were introduced years ago in 48c49c4 and 4f68655. It is unclear why this was added. The original commit message claims a 4% performance increase which I couldn't recreate and might not be worth it even if it did recreate. This PR removes that optimization. Following are details of the benchmark I did that couldn't reveal any performance improvements due to this optimization: ``` benchmark 1: src/redis-benchmark -P 500 -n 10000000 eval 'return redis.call("ping")' 0 benchmark 2: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__")' 0 benchmark 3: src/redis-benchmark -P 500 -r 1000 -n 100000 eval "for i=1,100,1 do redis.call('set','kk'..i,'vv'..__rand_int__) end return redis.call('get','kk5')" 0 benchmark 4: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")' ``` I ran the benchmark on this branch with and without commit 68b7168 Results in requests per second: cmd | without optimization | without optimization 2nd run | with original optimization | with original optimization 2nd run -- | -- | -- | -- | -- 1 | 461233.34 | 477395.31 | 471098.16 | 469946.91 2 | 34774.14 | 35469.8 | 35149.38 | 34464.93 3 | 6390.59 | 6281.41 | 6146.28 | 6464.12 4 | 28005.71 | | 27965.77 | As you can see, different use cases showed identical or negligible performance differences. So finally I decided to chuck the original optimization and simplify the code.

…ry (#10250) Fix redis-cli with sentinel crash due to SENTINEL DEBUG missing summary Because SENTINEL DEBUG missing summary in its json file, with the change in #10043, the following assertion will fail. ``` [redis]# src/redis-cli -p 26379 redis-cli: redis-cli.c:678: cliInitCommandHelpEntry: Assertion `reply->type == 1' failed. ``` This commit add the summary and complexity for SENTINEL DEBUG, which introduced in #9291, and also improved the help message.

If summary or since is empty, we used to return NULL in COMMAND DOCS. Currently all redis commands will have these two fields. But not for module command, summary and since are optional for RM_SetCommandInfo. With the change in #10043, if a module command doesn't have the summary or since, redis-cli will crash (see #10250). In this commit, COMMAND DOCS avoid adding summary or since when they are missing.

…name (#10253)

The protocol error was caused by the buggy `writeHandler` in `redis-benchmark.c`, which didn't handle one of the cases, thereby repeating data, leading to protocol errors when the values being sent are very long. This PR fixes #10233, issue introduced by #7959

- add COMMAND GETKEYSANDFLAGS sub-command - add RM_KeyAtPosWithFlags and GetCommandKeysWithFlags - RM_KeyAtPos and RM_CreateCommand set flags requiring full access for keys - RM_CreateCommand set VARIABLE_FLAGS - expose `variable_flags` flag in COMMAND INFO key-specs - getKeysFromCommandWithSpecs prefers key-specs over getkeys-api - add tests for all of these

Fix #7021 #8924 #10198 # Intro Before this commit X[AUTO]CLAIM used to transfer deleted entries from one PEL to another, but reply with "nil" for every such entry (instead of the entry id). The idea (for XCLAIM) was that the caller could see this "nil", realize the entry no longer exists, and XACK it in order to remove it from PEL. The main problem with that approach is that it assumes there's a correlation between the index of the "id" arguments and the array indices, which there isn't (in case some of the input IDs to XCLAIM never existed/read): ``` 127.0.0.1:6379> XADD x 1 f1 v1 "1-0" 127.0.0.1:6379> XADD x 2 f1 v1 "2-0" 127.0.0.1:6379> XADD x 3 f1 v1 "3-0" 127.0.0.1:6379> XGROUP CREATE x grp 0 OK 127.0.0.1:6379> XREADGROUP GROUP grp Alice COUNT 2 STREAMS x > 1) 1) "x" 2) 1) 1) "1-0" 2) 1) "f1" 2) "v1" 2) 1) "2-0" 2) 1) "f1" 2) "v1" 127.0.0.1:6379> XDEL x 1 2 (integer) 2 127.0.0.1:6379> XCLAIM x grp Bob 0 0-99 1-0 1-99 2-0 1) (nil) 2) (nil) ``` # Changes Now, X[AUTO]CLAIM acts in the following way: 1. If one tries to claim a deleted entry, we delete it from the PEL we found it in (and the group PEL too). So de facto, such entry is not claimed, just cleared from PEL (since anyway it doesn't exist in the stream) 2. since we never claim deleted entries, X[AUTO]CLAIM will never return "nil" instead of an entry. 3. add a new element to XAUTOCLAIM's response (see below) # Knowing which entries were cleared from the PEL The caller may want to log any entries that were found in a PEL but deleted from the stream itself (it would suggest that there might be a bug in the application: trimming the stream while some entries were still no processed by the consumers) ## XCLAIM the set {XCLAIM input ids} - {XCLAIM returned ids} contains all the entry ids that were not claimed which means they were deleted (assuming the input contains only entries from some PEL). The user doesn't need to XACK them because XCLAIM had already deleted them from the source PEL. ## XAUTOCLAIM XAUTOCLAIM has a new element added to its reply: it's an array of all the deleted stream IDs it stumbled upon. This is somewhat of a breaking change since X[AUTO]CLAIM used to be able to reply with "nil" and now it can't... But since it was undocumented (and generally a bad idea to rely on it, as explained above) the breakage is not that bad.

This PR handles inconsistencies in errors returned from lua scripts. Details of the problem can be found in #10165. ### Changes - Remove double stack trace. It's enough that a stack trace is automatically added by the engine's error handler see https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/function_lua.c#L472-L485 and https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/eval.c#L243-L255 - Make sure all errors a preceded with an error code. Passing a simple string to `luaPushError()` will prepend it with a generic `ERR` error code. - Make sure lua error table doesn't include a RESP `-` error status. Lua stores redis error's as a lua table with a single `err` field and a string. When the string is translated back to RESP we add a `-` to it. See https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/script_lua.c#L510-L517 So there's no need to store it in the lua table. ### Before & After ```diff --- <unnamed> +++ <unnamed> @@ -1,14 +1,14 @@ 1: config set maxmemory 1 2: +OK 3: eval "return redis.call('set','x','y')" 0 - 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. 5: eval "return redis.pcall('set','x','y')" 0 - 6: -@user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 6: -OOM command not allowed when used memory > 'maxmemory'. 7: eval "return redis.call('select',99)" 0 8: -ERR Error running script (call to 4ad5abfc50bbccb484223905f9a16f09cd043ba8): @user_script:1: ERR DB index is out of range 9: eval "return redis.pcall('select',99)" 0 10: -ERR DB index is out of range 11: eval_ro "return redis.call('set','x','y')" 0 -12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: Write commands are not allowed from read-only scripts. +12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: ERR Write commands are not allowed from read-only scripts. 13: eval_ro "return redis.pcall('set','x','y')" 0 -14: -@user_script: 1: Write commands are not allowed from read-only scripts. +14: -ERR Write commands are not allowed from read-only scripts. ```

This is an enhancement for INFO command, previously INFO only support one argument for different info section , if user want to get more categories information, either perform INFO all / default or calling INFO for multiple times. **Description of the feature** The goal of adding this feature is to let the user retrieve multiple categories via the INFO command, and still avoid emitting the same section twice. A use case for this is like Redis Sentinel, which periodically calling INFO command to refresh info from monitored Master/Slaves, only Server and Replication part categories are used for parsing information. If the INFO command can return just enough categories that client side needs, it can save a lot of time for client side parsing it as well as network bandwidth. **Implementation** To share code between redis, sentinel, and other users of INFO (DEBUG and modules), we have a new `genInfoSectionDict` function that returns a dict and some boolean flags (e.g. `all`) to the caller (built from user input). Sentinel is later purging unwanted sections from that, and then it is forwarded to the info `genRedisInfoString`. **Usage Examples** INFO Server Replication INFO CPU Memory INFO default commandstats Co-authored-by: Oran Agra <[email protected]>

There's an assertion added recently to make sure that non-write commands don't use lookupKeyWrite, It was initially meant to be used only on read-only replicas, but we thought it'll not have enough coverage, so used it on the masters too. We now realize that in some cases this can cause issues for modules, so we remove the assert. Other than that, we also make sure not to force expireIfNeeded on read-only replicas. even if they somehow run a write command. See #9572 (comment)

Make sure the status return from loading multiple AOF files reflects the overall result, not just the one of the last file. When one of the AOF files succeeded to load, but the last AOF file was empty, the loadAppendOnlyFiles will return AOF_EMPTY. This commit changes this behavior, and return AOF_OK in that case. This can happen for example, when loading old AOF file, and no more commands processed, the manifest file will include base AOF file with data, and empty incr AOF file. Co-authored-by: chenyang8094 <[email protected]> Co-authored-by: Oran Agra <[email protected]>

Since we didn't copy the null terminator to temp_filepath, dirname could return the wrong result.

…ED doc flag to SLAVEOF, mark it as deprecated. (#10315) * Remove ALLOW_BUSY from REPLICAOF and add it to REPLCONF * mark SLAVEOF as deprecated

This implements the following main pieces of functionality: * Renames key spec "CHANNEL" to be "NOT_KEY", and update the documentation to indicate it's for cluster routing and not for any other key related purpose. * Add the getchannels-api, so that modules can now define commands that are subject to ACL channel permission checks. * Add 4 new flags that describe how a module interacts with a command (SUBSCRIBE, PUBLISH, UNSUBSCRIBE, and PATTERN). They are all technically composable, however not sure how a command could both subscribe and unsubscribe from a command at once, but didn't see a reason to add explicit validation there. * Add two new module apis RM_ChannelAtPosWithFlags and RM_IsChannelsPositionRequest to duplicate the functionality provided by the keys position APIs. * The RM_ACLCheckChannelPermissions (only released in 7.0 RC1) was changed to take flags rather than a boolean literal. * The RM_ACLCheckKeyPermissions (only released in 7.0 RC1) was changed to take flags corresponding to keyspecs instead of custom permission flags. These keyspec flags mimic the flags for ACLCheckChannelPermissions.

…nts (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Yoav Steinberg <[email protected]>

When WATCH is called on a key that's already logically expired, avoid discarding the transaction when the keys is actually deleted. When WATCH is called, a flag is stored if the key is already expired at the time of watch. The expired key is not deleted, only checked. When a key is "touched", if it is deleted and it was already expired when a client watched it, the client is not marked as dirty. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: zhaozhao.zz <[email protected]>

…9934) There are scenarios where it results in many small objects in the reply list, such as commands heavily using deferred array replies (`addReplyDeferredLen`). E.g. what COMMAND command and CLUSTER SLOTS used to do (see #10056, #7123), but also in case of a transaction or a pipeline of commands that use just one differed array reply. We used to have to run multiple loops along with multiple calls to `write()` to send data back to peer based on the current code, but by means of `writev()`, we can gather those scattered objects in reply list and include the static reply buffer as well, then send it by one system call, that ought to achieve higher performance. In the case of TLS, we simply check and concatenate buffers into one big buffer and send it away by one call to `connTLSWrite()`, if the amount of all buffers exceeds `NET_MAX_WRITES_PER_EVENT`, then invoke `connTLSWrite()` multiple times to avoid a huge massive of memory copies. Note that aside of reducing system calls, this change will also reduce the amount of small TCP packets sent.

argument was missing, affecting redis.io docs

The test will fail on slow machines (valgrind or FreeBsd). Because in #10256 when WATCH is called on a key that's already logically expired, we will add an `expired` flag, and we will skip it in `isWatchedKeyExpired` check. Apparently we need to increase the expiration time so that the key can not expire logically then the WATCH is called. Also added retries to make sure it doesn't fail. I suppose 100ms is enough in valgrind, tested locally, no need to retry.

In case HELLO message received from another sentinel, with same address like another instance registered in the past but with different runid. Then there was cumbersome logic to modify the instance the port to 0 to in order to mark as invalid and later on to delete it. But the deletion is happening during update of instances in such a way that we might end up accessing an instance that was deleted just before. Didn't find a good reason why to postpone the deletion action of an obsolete instance (deletion is taking place instantly, for other cases ) -> Lets delete at once There is a mixture of logic of Sentinel address update with the logic of deletion of Sentinels that match a given Address -> Split to two!

…10334) Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects. In some pipelined workloads this achieves about 10% improvement. Co-authored-by: Oran Agra <[email protected]>

Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <[email protected]> Co-authored-by: Oran Agra <[email protected]>

Add a comma, this would have resulted in missing newline in the message. Forgot to add in #9127

This is basically just a subtree pull of the latest (unreleased) hiredis. Unfortunately, the `sds -> hisds` patch was pulled as a subtree update from a remote branch rather than a local redis change. Because of that, it goes away on every subtree update. It is now applied as a local commit so it should survive in the future.

…10337) Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client. I.e. any ZRANGE / ZREVRNGE (when tank is used). This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads. Co-authored-by: Oran Agra <[email protected]>

Adds `-3` option to cause redis-benchmark to send a `HELLO 3` to it can benchmark the effects of RESP3 on the server.

This PR fix 2 issues on Lua scripting: * Server error reply statistics (some errors were counted twice). * Error code and error strings returning from scripts (error code was missing / misplaced). ## Statistics a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using `redis.error_reply`), we wanna count that too. Besides, we do count the `calls` in command statistics for the commands the script calls, we we should certainly also count `failed_calls`. So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once. The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise a Lua table in the following format: ``` { err='<error message (including error code)>', source='<User source file name>', line='<line where the error happned>', ignore_error_stats_update=true/false, } ``` The `luaPushError` function was modified to construct the new error table as describe above. The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise the table on the top of the Lua stack as the error object. The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug. The `source` and `line` fields are enriched by the error handler (if possible) and the `ignore_error_stats_update` is optional and if its not present then the default value is `false`. If `ignore_error_stats_update` is true, the error will not be counted on the error stats. When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the `ignore_error_stats_update` field is set to `true` so we will not count errors twice (we counted this error when we invoke the command). The changes in this PR might have been considered as a breaking change for users that used Lua `pcall` function. Before, the error was a string and now its a table. To keep backward comparability the PR override the `pcall` implementation and extract the error message from the error table and return it. Example of the error stats update: ``` 127.0.0.1:6379> lpush l 1 (integer) 2 127.0.0.1:6379> eval "return redis.call('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1. 127.0.0.1:6379> info Errorstats # Errorstats errorstat_WRONGTYPE:count=1 127.0.0.1:6379> info commandstats # Commandstats cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1 cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0 cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0 cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1 ``` ## error message We can now construct the error message (sent as a reply to the user) from the error table, so this solves issues where the error message was malformed and the error code appeared in the middle of the error message: ```diff 127.0.0.1:6379> eval "return redis.call('set','x','y')" 0 -(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. +(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479) ``` ```diff 127.0.0.1:6379> eval "redis.call('get', 'l')" 0 -(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value +(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1. ``` Notica that `redis.pcall` was not change: ``` 127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0 (error) WRONGTYPE Operation against a key holding the wrong kind of value ``` ## other notes Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote `realcmd` variable to be located on the client struct. Tests was added and modified to verify the changes. Related PR's: #10279, #10218, #10278, #10309 Co-authored-by: Oran Agra <[email protected]>

* The type of node-id should be string, not integer. * Also improve the CLUSTER SETSLOT help message.

re-generate help.h from commands.json

…10354) After introducing #9822 need to prevent client reply buffer shrink to maintain correct client memory math. add needs:debug missing one one test. Co-authored-by: Oran Agra <[email protected]>

oranagra · 2022-02-28T10:26:14Z

triggered full CI manually: ~~https://github.com/redis/redis/actions/runs/1909704050~~ https://github.com/redis/redis/actions/runs/1909738746

warriorguo and others added 30 commits February 1, 2022 14:54

Fix src/modules build issue on OS X 11 (#10224)

948285d

Bump vmactions/freebsd-vm from 0.1.5 to 0.1.6 (#10219)

edc050c

update function help message, changed DESC->DESCRIPTION. (#10228)

8eb19cc

update function help message, changed DESC->DESCRIPTION (doc was outdated)

Add test case to improve code coverage for Addslotsrange and Delslots…

c9e1602

…range command (#10128) add more test cases for addslotsrange and delslotsrange

Fix cluster tests failing due to subcommand names (#10231)

d2fde2f

Introduced in #10128

Update FCALL/FCALL_RO summary and complexity (#10240)

e662f68

1. Update fcall.json and fcall_ro.json 2. Update command.c 3. Update help.h

fix file permissions for scripts in utils folder (#10241)

c42b163

make sure the scripts are executable

Add tests for ACL command error cases (#10183)

6ebb679

Fix file descriptor leak in memtest_test_linux_anonymous_maps (#4241)

91cc205

when we fail opening `/proc`, we need to close the log file fd.

add test suite infra to test RESP3 attributes (#10247)

98b3f52

So far we only tested attributes using readraw, not the resp parser caches them, so that after getting the reply, you can query them if you want.

Supplement define guards to prevent multiple inclusion (#10246)

d6e9cde

Update function-delete.json - argument is library name, not function …

9a0ab2f

…name (#10253)

YaacovHazan and others added 24 commits February 22, 2022 08:59

Fix path copy error and add more logs. (#10324)

4916d79

Since we didn't copy the null terminator to temp_filepath, dirname could return the wrong result.

Remove ALLOW_BUSY from REPLICAOF and add it to REPLCONF. Add DEPRECAT…

c4c68f5

…ED doc flag to SLAVEOF, mark it as deprecated. (#10315) * Remove ALLOW_BUSY from REPLICAOF and add it to REPLCONF * mark SLAVEOF as deprecated

Fix PUBSUB SHARDNUMSUB command json file (#10331)

de6be88

argument was missing, affecting redis.io docs

Fix XGROUP HELP message missing a newline (#10339)

b4079ab

Add a comma, this would have resulted in missing newline in the message. Forgot to add in #9127

Fixed typo in variable name (#10347)

35fccd8

Enable redis-benchmark to use RESP3 protocol mode (#10335)

9f30dd0

Adds `-3` option to cause redis-benchmark to send a `HELLO 3` to it can benchmark the effects of RESP3 on the server.

Make git ignore all files starting with appendonly.aof (#10351)

a6fd237

Fix node-id type in cluster-setslot (#10348)

ad7a627

* The type of node-id should be string, not integer. * Also improve the CLUSTER SETSLOT help message.

update redis-cli help.h (#10352)

fa5d900

re-generate help.h from commands.json

deflake client-eviction test "evict clients only until below limit" (#…

5860fa3

…10354) After introducing #9822 need to prevent client reply buffer shrink to maintain correct client memory math. add needs:debug missing one one test. Co-authored-by: Oran Agra <[email protected]>

Merge remote-tracking branch 'origin/unstable' into 7.0

a83e366

7.0 RC2

10dc57a

oranagra requested a review from a team February 28, 2022 09:40

yossigo approved these changes Feb 28, 2022

View reviewed changes

oranagra merged commit d2b5a57 into redis:7.0 Feb 28, 2022

oranagra deleted the release-7.0-rc2 branch February 28, 2022 13:35

Fusl mentioned this pull request Jun 18, 2025

Ensure empty error tables in scripts don't crash Valkey valkey-io/valkey#2229

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 7.0 RC2#10355

Release 7.0 RC2#10355
oranagra merged 84 commits intoredis:7.0from
oranagra:release-7.0-rc2

oranagra commented Feb 28, 2022

Uh oh!

oranagra commented Feb 28, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

oranagra commented Feb 28, 2022

New Features

New user commands or command arguments

Command replies that have been extended

Potentially Breaking Changes

Performance and resource utilization improvements

Changes in CLI tools

Platform / toolchain support related improvements

INFO fields and introspection changes

Module API changes

Bug Fixes

Uh oh!

oranagra commented Feb 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

oranagra commented Feb 28, 2022 •

edited

Loading