Release 7.0 RC2#10355
Merged
Conversation
This is done to avoid a crash when the timer fires after the module was unloaded. Or memory leaks in case we wanted to just ignore the timer. It'll cause the MODULE UNLOAD command to return with an error Co-authored-by: sundb <[email protected]>
update function help message, changed DESC->DESCRIPTION (doc was outdated)
…range command (#10128) add more test cases for addslotsrange and delslotsrange
The script which generates the markdown docs from module.c is updated to include the version in which each module API function was introduced. The script uses git tags to find this information. If git is not available or if we're not in a git repo, the 'since' is silently skipped. The line `**Available since:** (version)` is added after the function prototype Rename to utils/generate-module-api-doc.rb
…ns (#10160) Add check enough good slaves for write command when evaluating scripts. This check is made before the script is executed, if we have function flags, and per redis command if we don't. Co-authored-by: Phuc. Vo Trong <[email protected]> Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Meir Shpilraien (Spielrein) <[email protected]>
When performing `SENTINEL SET`, Sentinel updates the local configuration file. Before this commit, failure to update the file would still result with an `+OK` reply. Now, a `-ERR Failed to save config file` error will be returned. Co-authored-by: Yossi Gottlieb <[email protected]>
Change the sentinel config file to a directory in SENTINEL SET test. So it will now fail on the `rename` in `rewriteConfigOverwriteFile`. The test used to set the sentinel config file permissions to `000` to simulate failure. But it fails on centos7 / freebsd / alpine. (introduced in #10151) Other changes: 1. More error messages after the config rewrite failure. 2. Modify arg name `force_all` in `rewriteConfig` to `force_write`. (was rename in #9304) 3. Fix a typo in debug quicklist-packed-threshold, then -> than. (#9357)
Adds RM_SetCommandInfo, allowing modules to provide the following command info:
* summary
* complexity
* since
* history
* hints
* arity
* key specs
* args
This information affects the output of `COMMAND`, `COMMAND INFO` and `COMMAND DOCS`,
Cluster, ACL and is used to filter commands with the wrong number of arguments before
the call reaches the module code.
The recently added API functions for key specs (never released) are removed.
A minimalist example would look like so:
```c
RedisModuleCommand *mycmd = RedisModule_GetCommand(ctx,"mymodule.mycommand");
RedisModuleCommandInfo mycmd_info = {
.version = REDISMODULE_COMMAND_INFO_VERSION,
.arity = -5,
.summary = "some description",
};
if (RedisModule_SetCommandInfo(mycmd, &mycmd_info) == REDISMODULE_ERR)
return REDISMODULE_ERR;
````
Notes:
* All the provided information (including strings) is copied, not keeping references to the API input data.
* The version field is actually a static struct that contains the sizes of the the structs used in arrays,
so we can extend these in the future and old version will still be able to take the part they can support.
1. Update fcall.json and fcall_ro.json 2. Update command.c 3. Update help.h
…10043) This is a followup to #9656 and implements the following step mentioned in that PR: * When possible, extract all the help and completion tips from COMMAND DOCS (Redis 7.0 and up) * If COMMAND DOCS fails, use the static help.h compiled into redis-cli. * Supplement additional command names from COMMAND (pre-Redis 7.0) The last step is needed to add module command and other non-standard commands. This PR does not change the interactive hinting mechanism, which still uses only the param strings to provide somewhat unreliable and inconsistent command hints (see #8084). That task is left for a future PR. Co-authored-by: Oran Agra <[email protected]>
make sure the scripts are executable
`PSYNC replicationid str_offset` will crash the server. The reason is in `masterTryPartialResynchronization`, we will call `getLongLongFromObjectOrReply` check the offset. With a wrong offset, it will add a reply and then trigger a full SYNC and the client become a replica. So crash in `c->bufpos == 0 && listLength(c->reply) == 0`. In this commit, we check the psync_offset before entering function `masterTryPartialResynchronization`, and return. Regardless of that crash, accepting the sync, but also replying with an error would have corrupt the replication stream.
when we fail opening `/proc`, we need to close the log file fd.
So far we only tested attributes using readraw, not the resp parser caches them, so that after getting the reply, you can query them if you want.
In #9788, now we stores all persistent append-only files in a dedicated directory. The name of the directory is determined by the appenddirname configuration parameter in redis.conf. Now each node have a separate folder. Update create-cluster clean to clean this default directory.
Changes: 1. Adds the `redis.acl_check_cmd()` api to lua scripts. It can be used to check if the current user has permissions to execute a given command. The new function receives the command to check as an argument exactly like `redis.call()` receives the command to execute as an argument. 2. In the PR I unified the code used to convert lua arguments to redis argv arguments from both the new `redis.acl_check_cmd()` API and the `redis.[p]call()` API. This cleans up potential duplicate code. 3. While doing the refactoring in 2 I noticed there's an optimization to reduce allocation calls when parsing lua arguments into an `argv` array in the `redis.[p]call()` implementation. These optimizations were introduced years ago in 48c49c4 and 4f68655. It is unclear why this was added. The original commit message claims a 4% performance increase which I couldn't recreate and might not be worth it even if it did recreate. This PR removes that optimization. Following are details of the benchmark I did that couldn't reveal any performance improvements due to this optimization: ``` benchmark 1: src/redis-benchmark -P 500 -n 10000000 eval 'return redis.call("ping")' 0 benchmark 2: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__")' 0 benchmark 3: src/redis-benchmark -P 500 -r 1000 -n 100000 eval "for i=1,100,1 do redis.call('set','kk'..i,'vv'..__rand_int__) end return redis.call('get','kk5')" 0 benchmark 4: src/redis-benchmark -P 500 -r 1000 -n 1000000 eval 'return redis.call("mset","k1__rand_int__","v1__rand_int__","k2__rand_int__","v2__rand_int__","k3__rand_int__","v3__rand_int__","k4__rand_int__","v4__rand_int__xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx")' ``` I ran the benchmark on this branch with and without commit 68b7168 Results in requests per second: cmd | without optimization | without optimization 2nd run | with original optimization | with original optimization 2nd run -- | -- | -- | -- | -- 1 | 461233.34 | 477395.31 | 471098.16 | 469946.91 2 | 34774.14 | 35469.8 | 35149.38 | 34464.93 3 | 6390.59 | 6281.41 | 6146.28 | 6464.12 4 | 28005.71 | | 27965.77 | As you can see, different use cases showed identical or negligible performance differences. So finally I decided to chuck the original optimization and simplify the code.
…ry (#10250) Fix redis-cli with sentinel crash due to SENTINEL DEBUG missing summary Because SENTINEL DEBUG missing summary in its json file, with the change in #10043, the following assertion will fail. ``` [redis]# src/redis-cli -p 26379 redis-cli: redis-cli.c:678: cliInitCommandHelpEntry: Assertion `reply->type == 1' failed. ``` This commit add the summary and complexity for SENTINEL DEBUG, which introduced in #9291, and also improved the help message.
If summary or since is empty, we used to return NULL in COMMAND DOCS. Currently all redis commands will have these two fields. But not for module command, summary and since are optional for RM_SetCommandInfo. With the change in #10043, if a module command doesn't have the summary or since, redis-cli will crash (see #10250). In this commit, COMMAND DOCS avoid adding summary or since when they are missing.
- add COMMAND GETKEYSANDFLAGS sub-command - add RM_KeyAtPosWithFlags and GetCommandKeysWithFlags - RM_KeyAtPos and RM_CreateCommand set flags requiring full access for keys - RM_CreateCommand set VARIABLE_FLAGS - expose `variable_flags` flag in COMMAND INFO key-specs - getKeysFromCommandWithSpecs prefers key-specs over getkeys-api - add tests for all of these
Fix #7021 #8924 #10198 # Intro Before this commit X[AUTO]CLAIM used to transfer deleted entries from one PEL to another, but reply with "nil" for every such entry (instead of the entry id). The idea (for XCLAIM) was that the caller could see this "nil", realize the entry no longer exists, and XACK it in order to remove it from PEL. The main problem with that approach is that it assumes there's a correlation between the index of the "id" arguments and the array indices, which there isn't (in case some of the input IDs to XCLAIM never existed/read): ``` 127.0.0.1:6379> XADD x 1 f1 v1 "1-0" 127.0.0.1:6379> XADD x 2 f1 v1 "2-0" 127.0.0.1:6379> XADD x 3 f1 v1 "3-0" 127.0.0.1:6379> XGROUP CREATE x grp 0 OK 127.0.0.1:6379> XREADGROUP GROUP grp Alice COUNT 2 STREAMS x > 1) 1) "x" 2) 1) 1) "1-0" 2) 1) "f1" 2) "v1" 2) 1) "2-0" 2) 1) "f1" 2) "v1" 127.0.0.1:6379> XDEL x 1 2 (integer) 2 127.0.0.1:6379> XCLAIM x grp Bob 0 0-99 1-0 1-99 2-0 1) (nil) 2) (nil) ``` # Changes Now, X[AUTO]CLAIM acts in the following way: 1. If one tries to claim a deleted entry, we delete it from the PEL we found it in (and the group PEL too). So de facto, such entry is not claimed, just cleared from PEL (since anyway it doesn't exist in the stream) 2. since we never claim deleted entries, X[AUTO]CLAIM will never return "nil" instead of an entry. 3. add a new element to XAUTOCLAIM's response (see below) # Knowing which entries were cleared from the PEL The caller may want to log any entries that were found in a PEL but deleted from the stream itself (it would suggest that there might be a bug in the application: trimming the stream while some entries were still no processed by the consumers) ## XCLAIM the set {XCLAIM input ids} - {XCLAIM returned ids} contains all the entry ids that were not claimed which means they were deleted (assuming the input contains only entries from some PEL). The user doesn't need to XACK them because XCLAIM had already deleted them from the source PEL. ## XAUTOCLAIM XAUTOCLAIM has a new element added to its reply: it's an array of all the deleted stream IDs it stumbled upon. This is somewhat of a breaking change since X[AUTO]CLAIM used to be able to reply with "nil" and now it can't... But since it was undocumented (and generally a bad idea to rely on it, as explained above) the breakage is not that bad.
This PR handles inconsistencies in errors returned from lua scripts. Details of the problem can be found in #10165. ### Changes - Remove double stack trace. It's enough that a stack trace is automatically added by the engine's error handler see https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/function_lua.c#L472-L485 and https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/eval.c#L243-L255 - Make sure all errors a preceded with an error code. Passing a simple string to `luaPushError()` will prepend it with a generic `ERR` error code. - Make sure lua error table doesn't include a RESP `-` error status. Lua stores redis error's as a lua table with a single `err` field and a string. When the string is translated back to RESP we add a `-` to it. See https://github.com/redis/redis/blob/d0bc4fff18afdf9e5421cc88e23ffbb876ecaec3/src/script_lua.c#L510-L517 So there's no need to store it in the lua table. ### Before & After ```diff --- <unnamed> +++ <unnamed> @@ -1,14 +1,14 @@ 1: config set maxmemory 1 2: +OK 3: eval "return redis.call('set','x','y')" 0 - 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 4: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'. 5: eval "return redis.pcall('set','x','y')" 0 - 6: -@user_script: 1: -OOM command not allowed when used memory > 'maxmemory'. + 6: -OOM command not allowed when used memory > 'maxmemory'. 7: eval "return redis.call('select',99)" 0 8: -ERR Error running script (call to 4ad5abfc50bbccb484223905f9a16f09cd043ba8): @user_script:1: ERR DB index is out of range 9: eval "return redis.pcall('select',99)" 0 10: -ERR DB index is out of range 11: eval_ro "return redis.call('set','x','y')" 0 -12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: @user_script: 1: Write commands are not allowed from read-only scripts. +12: -ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: ERR Write commands are not allowed from read-only scripts. 13: eval_ro "return redis.pcall('set','x','y')" 0 -14: -@user_script: 1: Write commands are not allowed from read-only scripts. +14: -ERR Write commands are not allowed from read-only scripts. ```
This is an enhancement for INFO command, previously INFO only support one argument for different info section , if user want to get more categories information, either perform INFO all / default or calling INFO for multiple times. **Description of the feature** The goal of adding this feature is to let the user retrieve multiple categories via the INFO command, and still avoid emitting the same section twice. A use case for this is like Redis Sentinel, which periodically calling INFO command to refresh info from monitored Master/Slaves, only Server and Replication part categories are used for parsing information. If the INFO command can return just enough categories that client side needs, it can save a lot of time for client side parsing it as well as network bandwidth. **Implementation** To share code between redis, sentinel, and other users of INFO (DEBUG and modules), we have a new `genInfoSectionDict` function that returns a dict and some boolean flags (e.g. `all`) to the caller (built from user input). Sentinel is later purging unwanted sections from that, and then it is forwarded to the info `genRedisInfoString`. **Usage Examples** INFO Server Replication INFO CPU Memory INFO default commandstats Co-authored-by: Oran Agra <[email protected]>
There's an assertion added recently to make sure that non-write commands don't use lookupKeyWrite, It was initially meant to be used only on read-only replicas, but we thought it'll not have enough coverage, so used it on the masters too. We now realize that in some cases this can cause issues for modules, so we remove the assert. Other than that, we also make sure not to force expireIfNeeded on read-only replicas. even if they somehow run a write command. See #9572 (comment)
Make sure the status return from loading multiple AOF files reflects the overall result, not just the one of the last file. When one of the AOF files succeeded to load, but the last AOF file was empty, the loadAppendOnlyFiles will return AOF_EMPTY. This commit changes this behavior, and return AOF_OK in that case. This can happen for example, when loading old AOF file, and no more commands processed, the manifest file will include base AOF file with data, and empty incr AOF file. Co-authored-by: chenyang8094 <[email protected]> Co-authored-by: Oran Agra <[email protected]>
Since we didn't copy the null terminator to temp_filepath, dirname could return the wrong result.
…ED doc flag to SLAVEOF, mark it as deprecated. (#10315) * Remove ALLOW_BUSY from REPLICAOF and add it to REPLCONF * mark SLAVEOF as deprecated
This implements the following main pieces of functionality: * Renames key spec "CHANNEL" to be "NOT_KEY", and update the documentation to indicate it's for cluster routing and not for any other key related purpose. * Add the getchannels-api, so that modules can now define commands that are subject to ACL channel permission checks. * Add 4 new flags that describe how a module interacts with a command (SUBSCRIBE, PUBLISH, UNSUBSCRIBE, and PATTERN). They are all technically composable, however not sure how a command could both subscribe and unsubscribe from a command at once, but didn't see a reason to add explicit validation there. * Add two new module apis RM_ChannelAtPosWithFlags and RM_IsChannelsPositionRequest to duplicate the functionality provided by the keys position APIs. * The RM_ACLCheckChannelPermissions (only released in 7.0 RC1) was changed to take flags rather than a boolean literal. * The RM_ACLCheckKeyPermissions (only released in 7.0 RC1) was changed to take flags corresponding to keyspecs instead of custom permission flags. These keyspec flags mimic the flags for ACLCheckChannelPermissions.
…nts (#9822) Current implementation simple idle client which serves no traffic still use ~17Kb of memory. this is mainly due to a fixed size reply buffer currently set to 16kb. We have encountered some cases in which the server operates in a low memory environments. In such cases a user who wishes to create large connection pools to support potential burst period, will exhaust a large amount of memory to maintain connected Idle clients. Some users may choose to "sacrifice" performance in order to save memory. This commit introduce a dynamic mechanism to shrink and expend the client reply buffer based on periodic observed peak. the algorithm works as follows: 1. each time a client reply buffer has been fully written, the last recorded peak is updated: new peak = MAX( last peak, current written size) 2. during clients cron we check for each client if the last observed peak was: a. matching the current buffer size - in which case we expend (resize) the buffer size by 100% b. less than half the buffer size - in which case we shrink the buffer size by 50% 3. In any case we will **not** resize the buffer in case: a. the current buffer peak is less then the current buffer usable size and higher than 1/2 the current buffer usable size b. the value of (current buffer usable size/2) is less than 1Kib c. the value of (current buffer usable size*2) is larger than 16Kib 4. the peak value is reset to the current buffer position once every **5** seconds. we maintain a new field in the client structure (buf_peak_last_reset_time) which is used to keep track of how long it passed since the last buffer peak reset. ### **Interface changes:** **CIENT LIST** - now contains 2 new extra fields: rbs= < the current size in bytes of the client reply buffer > rbp=< the current value in bytes of the last observed buffer peak position > **INFO STATS** - now contains 2 new statistics: reply_buffer_shrinks = < total number of buffer shrinks performed > reply_buffer_expends = < total number of buffer expends performed > Co-authored-by: Oran Agra <[email protected]> Co-authored-by: Yoav Steinberg <[email protected]>
When WATCH is called on a key that's already logically expired, avoid discarding the transaction when the keys is actually deleted. When WATCH is called, a flag is stored if the key is already expired at the time of watch. The expired key is not deleted, only checked. When a key is "touched", if it is deleted and it was already expired when a client watched it, the client is not marked as dirty. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: zhaozhao.zz <[email protected]>
…9934) There are scenarios where it results in many small objects in the reply list, such as commands heavily using deferred array replies (`addReplyDeferredLen`). E.g. what COMMAND command and CLUSTER SLOTS used to do (see #10056, #7123), but also in case of a transaction or a pipeline of commands that use just one differed array reply. We used to have to run multiple loops along with multiple calls to `write()` to send data back to peer based on the current code, but by means of `writev()`, we can gather those scattered objects in reply list and include the static reply buffer as well, then send it by one system call, that ought to achieve higher performance. In the case of TLS, we simply check and concatenate buffers into one big buffer and send it away by one call to `connTLSWrite()`, if the amount of all buffers exceeds `NET_MAX_WRITES_PER_EVENT`, then invoke `connTLSWrite()` multiple times to avoid a huge massive of memory copies. Note that aside of reducing system calls, this change will also reduce the amount of small TCP packets sent.
argument was missing, affecting redis.io docs
The test will fail on slow machines (valgrind or FreeBsd). Because in #10256 when WATCH is called on a key that's already logically expired, we will add an `expired` flag, and we will skip it in `isWatchedKeyExpired` check. Apparently we need to increase the expiration time so that the key can not expire logically then the WATCH is called. Also added retries to make sure it doesn't fail. I suppose 100ms is enough in valgrind, tested locally, no need to retry.
In case HELLO message received from another sentinel, with same address like another instance registered in the past but with different runid. Then there was cumbersome logic to modify the instance the port to 0 to in order to mark as invalid and later on to delete it. But the deletion is happening during update of instances in such a way that we might end up accessing an instance that was deleted just before. Didn't find a good reason why to postpone the deletion action of an obsolete instance (deletion is taking place instantly, for other cases ) -> Lets delete at once There is a mixture of logic of Sentinel address update with the logic of deletion of Sentinels that match a given Address -> Split to two!
…10334) Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects. In some pipelined workloads this achieves about 10% improvement. Co-authored-by: Oran Agra <[email protected]>
Adds the ability to track the lag of a consumer group (CG), that is, the number of entries yet-to-be-delivered from the stream. The proposed constant-time solution is in the spirit of "best-effort." Partially addresses #8737. ## Description of approach We add a new "entries_added" property to the stream. This starts at 0 for a new stream and is incremented by 1 with every `XADD`. It is essentially an all-time counter of the entries added to the stream. Given the stream's length and this counter value, we can trivially find the logical "entries_added" counter of the first ID if and only if the stream is contiguous. A fragmented stream contains one or more tombstones generated by `XDEL`s. The new "xdel_max_id" stream property tracks the latest tombstone. The CG also tracks its last delivered ID's as an "entries_read" counter and increments it independently when delivering new messages, unless the this read counter is invalid (-1 means invalid offset). When the CG's counter is available, the reported lag is the difference between added and read counters. Lastly, this also adds a "first_id" field to the stream structure in order to make looking it up cheaper in most cases. ## Limitations There are two cases in which the mechanism isn't able to track the lag. In these cases, `XINFO` replies with `null` in the "lag" field. The first case is when a CG is created with an arbitrary last delivered ID, that isn't "0-0", nor the first or the last entries of the stream. In this case, it is impossible to obtain a valid read counter (short of an O(N) operation). The second case is when there are one or more tombstones fragmenting the stream's entries range. In both cases, given enough time and assuming that the consumers are active (reading and lacking) and advancing, the CG should be able to catch up with the tip of the stream and report zero lag. Once that's achieved, lag tracking would resume as normal (until the next tombstone is set). ## API changes * `XGROUP CREATE` added with the optional named argument `[ENTRIESREAD entries-read]` for explicitly specifying the new CG's counter. * `XGROUP SETID` added with an optional positional argument `[ENTRIESREAD entries-read]` for specifying the CG's counter. * `XINFO` reports the maximal tombstone ID, the recorded first entry ID, and total number of entries added to the stream. * `XINFO` reports the current lag and logical read counter of CGs. * `XSETID` is an internal command that's used in replication/aof. It has been added with the optional positional arguments `[ENTRIESADDED entries-added] [MAXDELETEDID max-deleted-entry-id]` for propagating the CG's offset and maximal tombstone ID of the stream. ## The generic unsolved problem The current stream implementation doesn't provide an efficient way to obtain the approximate/exact size of a range of entries. While it could've been nice to have that ability (#5813) in general, let alone specifically in the context of CGs, the risk and complexities involved in such implementation are in all likelihood prohibitive. ## A refactoring note The `streamGetEdgeID` has been refactored to accommodate both the existing seek of any entry as well as seeking non-deleted entries (the addition of the `skip_tombstones` argument). Furthermore, this refactoring also migrated the seek logic to use the `streamIterator` (rather than `raxIterator`) that was, in turn, extended with the `skip_tombstones` Boolean struct field to control the emission of these. Co-authored-by: Guy Benoish <[email protected]> Co-authored-by: Oran Agra <[email protected]>
Add a comma, this would have resulted in missing newline in the message. Forgot to add in #9127
This is basically just a subtree pull of the latest (unreleased) hiredis. Unfortunately, the `sds -> hisds` patch was pulled as a subtree update from a remote branch rather than a local redis change. Because of that, it goes away on every subtree update. It is now applied as a local commit so it should survive in the future.
…10337) Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client. I.e. any ZRANGE / ZREVRNGE (when tank is used). This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads. Co-authored-by: Oran Agra <[email protected]>
Adds `-3` option to cause redis-benchmark to send a `HELLO 3` to it can benchmark the effects of RESP3 on the server.
This PR fix 2 issues on Lua scripting:
* Server error reply statistics (some errors were counted twice).
* Error code and error strings returning from scripts (error code was missing / misplaced).
## Statistics
a Lua script user is considered part of the user application, a sophisticated transaction,
so we want to count an error even if handled silently by the script, but when it is
propagated outwards from the script we don't wanna count it twice. on the other hand,
if the script decides to throw an error on its own (using `redis.error_reply`), we wanna
count that too.
Besides, we do count the `calls` in command statistics for the commands the script calls,
we we should certainly also count `failed_calls`.
So when a simple `eval "return redis.call('set','x','y')" 0` fails, it should count the failed call
to both SET and EVAL, but the `errorstats` and `total_error_replies` should be counted only once.
The PR changes the error object that is raised on errors. Instead of raising a simple Lua
string, Redis will raise a Lua table in the following format:
```
{
err='<error message (including error code)>',
source='<User source file name>',
line='<line where the error happned>',
ignore_error_stats_update=true/false,
}
```
The `luaPushError` function was modified to construct the new error table as describe above.
The `luaRaiseError` was renamed to `luaError` and is now simply called `lua_error` to raise
the table on the top of the Lua stack as the error object.
The reason is that since its functionality is changed, in case some Redis branch / fork uses it,
it's better to have a compilation error than a bug.
The `source` and `line` fields are enriched by the error handler (if possible) and the
`ignore_error_stats_update` is optional and if its not present then the default value is `false`.
If `ignore_error_stats_update` is true, the error will not be counted on the error stats.
When parsing Redis call reply, each error is translated to a Lua table on the format describe
above and the `ignore_error_stats_update` field is set to `true` so we will not count errors
twice (we counted this error when we invoke the command).
The changes in this PR might have been considered as a breaking change for users that used
Lua `pcall` function. Before, the error was a string and now its a table. To keep backward
comparability the PR override the `pcall` implementation and extract the error message from
the error table and return it.
Example of the error stats update:
```
127.0.0.1:6379> lpush l 1
(integer) 2
127.0.0.1:6379> eval "return redis.call('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1.
127.0.0.1:6379> info Errorstats
# Errorstats
errorstat_WRONGTYPE:count=1
127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1
cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1
```
## error message
We can now construct the error message (sent as a reply to the user) from the error table,
so this solves issues where the error message was malformed and the error code appeared
in the middle of the error message:
```diff
127.0.0.1:6379> eval "return redis.call('set','x','y')" 0
-(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
+(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479)
```
```diff
127.0.0.1:6379> eval "redis.call('get', 'l')" 0
-(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value
+(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1.
```
Notica that `redis.pcall` was not change:
```
127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value
```
## other notes
Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we
can not count on it to update the command stats. In order to be able to update those stats correctly
we needed to promote `realcmd` variable to be located on the client struct.
Tests was added and modified to verify the changes.
Related PR's: #10279, #10218, #10278, #10309
Co-authored-by: Oran Agra <[email protected]>
* The type of node-id should be string, not integer. * Also improve the CLUSTER SETSLOT help message.
re-generate help.h from commands.json
…10354) After introducing #9822 need to prevent client reply buffer shrink to maintain correct client memory math. add needs:debug missing one one test. Co-authored-by: Oran Agra <[email protected]>
yossigo
approved these changes
Feb 28, 2022
Member
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New Features
New user commands or command arguments
Command replies that have been extended
Potentially Breaking Changes
them from the pending entry list (X[AUTO]CLAIM should skip deleted entries #10227)
Performance and resource utilization improvements
Changes in CLI tools
Platform / toolchain support related improvements
INFO fields and introspection changes
Module API changes
introspection features and ACL key permissions (Command info module API #10108)
RM_ChannelAtPosWithFlags APIs (Implemented module getchannels api and renamed channel keyspec #10299)
(released in RC1) to take different flags (Implemented module getchannels api and renamed channel keyspec #10299)
that used OPTIONS_HANDLE_REPL_ASYNC_LOAD will mess up key invalidations (Fix RM_SetModuleOptions flag collision #10284)
Bug Fixes
broken in 6.2 (Fix and improve module error reply statistics #10278)
Functions (Add check min-slave-* feature when evaluating Lua scripts and Functions #10160)
with a pending timer (forbid module to unload when it holds ongoing timer #10187)