Skip to content

Sort out the mess around Lua error messages and error stats#10329

Merged
oranagra merged 9 commits intoredis:unstablefrom
MeirShpilraien:lua_error_handling
Feb 27, 2022
Merged

Sort out the mess around Lua error messages and error stats#10329
oranagra merged 9 commits intoredis:unstablefrom
MeirShpilraien:lua_error_handling

Conversation

@MeirShpilraien
Copy link

@MeirShpilraien MeirShpilraien commented Feb 22, 2022

Related PR's: #10279, #10218, #10278, #10309

This PR fix 2 issues on Lua scripting:

  • Server error reply statistics (some errors were counted twice).
  • Error code and error strings returning from scripts (error code was missing / misplaced).

Statistics

a Lua script user is considered part of the user application, a sophisticated transaction, so we want to count an error even if handled silently by the script, but when it is propagated outwards from the script we don't wanna count it twice. on the other hand, if the script decides to throw an error on its own (using redis.error_reply), we wanna count that too.
Besides, we do count the calls in command statistics for the commands the script calls, we we should certainly also count failed_calls.
So when a simple eval "return redis.call('set','x','y')" 0 fails, it should count the failed call to both SET and EVAL, but the errorstats and total_error_replies should be counted only once.

The PR changes the error object that is raised on errors. Instead of raising a simple Lua string, Redis will raise
a Lua table in the following format:

{
    err='<error message (including error code)>',
    source='<User source file name>',
    line='<line where the error happned>',
    ignore_error_stats_update=true/false,
}

The luaPushError function was modified to construct the new error table as describe above. The luaRaiseError was renamed to luaError and is now simply called lua_error to raise the table on the top of the Lua stack as the error object.
The reason is that since its functionality is changed, in case some Redis branch / fork uses it, it's better to have a compilation error than a bug.

The source and line fields are enriched by the error handler (if possible) and the ignore_error_stats_update
is optional and if its not present then the default value is false. If ignore_error_stats_update is true, the
error will not be counted on the error stats.

When parsing Redis call reply, each error is translated to a Lua table on the format describe above and the
ignore_error_stats_update field is set to true so we will not count errors twice (we counted this error
when we invoke the command).

The changes in this PR might have been considered as a breaking change for users that used Lua pcall function. Before, the error was a string and now its a table. To keep backward comparability the PR override the pcall implementation and extract the error message from the error table and return it.

Example of the error stats update:

127.0.0.1:6379> lpush l 1
(integer) 2
127.0.0.1:6379> eval "return redis.call('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1.

127.0.0.1:6379> info Errorstats
# Errorstats
errorstat_WRONGTYPE:count=1

127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1
cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1

error message

We can now construct the error message (sent as a reply to the user) from the error table, so this solves
issues where the error message was malformed and the error code appeared in the middle of the error message:

127.0.0.1:6379> eval "return redis.call('set','x','y')" 0
-(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
+(error) OOM command not allowed when used memory > 'maxmemory' @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479)
127.0.0.1:6379> eval "redis.call('get', 'l')" 0
-(error) ERR Error running script (call to f_8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1): @user_script:1: WRONGTYPE Operation against a key holding the wrong kind of value
+(error) WRONGTYPE Operation against a key holding the wrong kind of value script: 8a705cfb9fb09515bfe57ca2bd84a5caee2cbbd1, on @user_script:1.

Notica that redis.pcall was not change:

127.0.0.1:6379> eval "return redis.pcall('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value

other notes

Notice that Some commands (like GEOADD) changes the cmd variable on the client stats so we can not count on it to update the command stats. In order to be able to update those stats correctly we needed to promote realcmd variable to be located on the client struct.

Tests was added and modified to verify the changes.

* Server error reply statistics.
* Error code and error strings returning from scripts.

The PR changes the error object that is raised on errors.
Instead of raising a simple Lua string, Redis will raise
a Lua table in the following format:

```
{
    err='<error message (without error code)>',
    err_code='<error code>',
    source='<User source file name>',
    line='<line where the error happned>',
    ignore_error_stats_update=true/false,
}
```

The `luaPushError` function was modified to construct
the new error table as describe above. The `luaRaiseError`
is now simply calls `lua_error` to raise the table on the
top of the Lua stack as the error object.

The `source` and `line` fields are enriched by the error
hanlder (if possible) and the `ignore_error_stats_update`
is optional and if its not present then the default value
is `false`. If `ignore_error_stats_update` is true, the
error will not be counted on the error stats.

When parsing Redis call reply, each error is translated
to a Lua table on the format describe above and the
`ignore_error_stats_update` field is set to `true` so
we will not count errors twice (we counted this error
when we invoke the command).

Example of the error stats update:

```
127.0.0.1:6379> lpush l 1
(integer) 2
127.0.0.1:6379> eval "return redis.call('get', 'l')" 0
(error) WRONGTYPE Operation against a key holding the wrong kind of value. script: e471b73f1ef44774987ab00bdf51f21fd9f7974a, on @user_script:1.
127.0.0.1:6379> info Errorstats
# Errorstats
errorstat_WRONGTYPE:count=1
127.0.0.1:6379> info commandstats
# Commandstats
cmdstat_eval:calls=1,usec=341,usec_per_call=341.00,rejected_calls=0,failed_calls=1
cmdstat_info:calls=1,usec=35,usec_per_call=35.00,rejected_calls=0,failed_calls=0
cmdstat_lpush:calls=1,usec=14,usec_per_call=14.00,rejected_calls=0,failed_calls=0
cmdstat_get:calls=1,usec=10,usec_per_call=10.00,rejected_calls=0,failed_calls=1
```

In addition we can now construct the error message (sent
as a reply to the user) from the error table, so this solves
issues where the error message was malformed and the error
code appeared in the middle of the error message:
```
127.0.0.1:6379> eval "return redis.call('set','x','y')" 0
-(error) ERR Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479): @user_script:1: OOM command not allowed when used memory > 'maxmemory'.
+(error) OOM command not allowed when used memory > 'maxmemory'.. @user_script:1. Error running script (call to 71e6319f97b0fe8bdfa1c5df3ce4489946dda479)
```

The changes in this PR might be considered as a breaking change
for users that used Lua `pcall` function. Before the error was a
strind and now its a table. To keep backward comparability the PR
overide the `pcall` implementation and extract the error message
from the error table and return it.

Tests was added and modified to verify the changes.
meir added 3 commits February 23, 2022 14:37
* Better comments.
* Propagate flags to `afterErrorReply` and move stats update out of scripts units.
* Renamed luaRaiseError -> luaError
* Improve tests
@yossigo yossigo changed the title Fix mess around Lua error messages and error stats Sort out the mess around Lua error messages and error stats Feb 23, 2022
Meir Shpilraien (Spielrein) and others added 3 commits February 24, 2022 12:24
The realcmd holds the original command that was executed by the client.
We need this field to update error stats, we can not update the error
stats directly on client->cmd because this field might change during
the command invocation (like in GEOADD for example).
1. Rename incrCommandFailedCalls -> incrCommandStatsOnError and return whether or
   not a stats value was updated
2. Revert change or `err` not including the error code.
@oranagra
Copy link
Member

@yossigo please take a look at the top comment and approve it.
i suppose we don't consider it a breaking change.
it should be listed in the release notes together with #10218 (same topic)

@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants