Reintroduce lua argument cache in luaRedisGenericCommand removed in v7.0 by filipecosta90 · Pull Request #11541 · redis/redis

filipecosta90 · 2022-11-24T17:27:03Z

As discussed in #10981 (comment) this PR re-enables argv argument caching.

This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands.
This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in #10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement.

The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things.

We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used.
But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size.

To easily test it:

tclsh tests/test_helper.tcl --single unit/scripting

Here's the impact using the following benchmark command:

memtier_benchmark --command="eval \"redis.call('hset', 'h1', 'k', 'v');redis.call('hset', 'h2', 'k', 'v');return redis.call('ping')\" 0"  - --hide-histogram --test-time 60 --pipeline 10 -x 1

v6.2.6 ~ 222K ops/sec -- `4930d19`

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Evals      223707.29         8.93841         8.83100        17.53500        21.37500     28400.34 
Totals     223707.29         8.93841         8.83100        17.53500        21.37500     28400.34

Unstable ~ 164K ops/sec -- `abc345a`

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Evals      163821.59        12.20666        12.03100        24.06300        24.31900     20797.66 
Totals     163821.59        12.20666        12.03100        24.06300        24.31900     20797.66

This PR ~ 184K ops/sec -- `6310ea8`

ALL STATS
==================================================================================================
Type         Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
--------------------------------------------------------------------------------------------------
Evals      183052.96        10.92378        10.81500        21.37500        25.21500     23239.15 
Totals     183052.96        10.92378        10.81500        21.37500        25.21500     23239.15

src/script_lua.c

…gsToRedisArgv

…static variable

src/script_lua.c

oranagra · 2022-11-29T07:18:40Z

src/script_lua.c

+            lua_args_cached_objects_len[j] >= obj_len)
+        {
+            sds s = lua_args_cached_objects[j]->ptr;
+            lua_argv[j] = lua_args_cached_objects[j];


i now realize that this is affecting trimStringObjectIfNeeded.
i.e. if we run a SET command from a lua script, it'll get to tryObjectEncoding and possibly reallocate (memcpy) that string.
@uriyage this may mean that the assumption in #11508 that 32kb (BIG_ARG) is the only case where strings have excess space is wrong.
on the other hand, it means that the argv caching in this PR is sometimes not really achieving it's purpose (for SET command).

not sure what to do.. i don't particularly like that lame argument caching mechanism.
i wasn't actually aware of it until recently, and i see it only works efficiently if you re-use the same arguments sizes in the exact same argv indexes.

on the other hand, i'm not entirely comfortable with the BIG_ARG condition in trimStringObjectIfNeeded either.

This is a bit ugly, but we can pass an argument to tryObjectEncoding to indicate if we should check the BIG_ARG condition or not depending on the client's CLIENT_SCRIPT flag.

src/script_lua.c

oranagra · 2022-11-29T11:33:28Z

@filipecosta90 after reviewing this caching mechanism for the first time, i must say i don't like it.
it relies on the arguments to repeated calls from (various) scripts to be of the same size in the same argv index.
i find it odd that Yoav wasn't able to detect that this optimization he useful, and we now see it caused a regression on a very specific change.

maybe we should attack this from a different angle, and instead of blindly re-introduce the cache, try to profile is and maybe come up with a different conclusion and design?

filipecosta90 · 2022-11-29T11:43:10Z

maybe we should attack this from a different angle, and instead of blindly re-introduce the cache, try to profile is and maybe come up with a different conclusion and design?

Agree. Let's try to see what the profile gives us. I'll profile already with the change in #11521

filipecosta90 · 2022-11-29T11:43:29Z

closing this one based on the above discussion

oranagra · 2022-11-29T12:30:06Z

i'd rather keep it open until we have another lead.

sundb · 2022-12-03T09:51:38Z

@filipecosta90 The reason for the crash in CI:

    robj **argv = luaArgsToRedisArgv(lua, &argc);
    replaceClientCommandVector(...) <- c->argv will be freed

This will indirectly cause lua_argv to be fredd, but lua_argv_size is not updated.
You can comment zfree(c->argv); in freeClientArgv to find out that it doesn't crash anymore.

It seems that the same issue also occurs in 6.2.7.

filipecosta90 · 2022-12-03T13:32:20Z

replaceClientCommandVector

@sundb have you managed to replicate it on 6.2.7? I can easily replicate the issue on this branch running the following command twice but not on 6.2:

eval 'redis.call("SET", "k1", "value")  local encoded = redis.call("DUMP", "k1")  redis.call("RESTORE", "k2", 1, encoded, "REPLACE")  redis.call("RESTORE", "k3", 1, encoded, "REPLACE") redis.call("EXPIREAT", "k2", 1)' 0

oranagra · 2022-12-03T21:09:41Z

i don't see any problem in 6.2 around these area.
the elements themselves are removed from the cache, and are later re-cached if needed (we don't keep the old ones).
and the argv array itself has a piece of code that notices that it was replaced, and works around that (if (c->argv != argv))

the problem with this PR is that it forgets to correctly set c->argv_len (which is a new thing in 7.0), that's what causes the crash here. it's not use after free, or double free issue, it's just an array out of bounds issue.

oranagra · 2022-12-03T21:17:20Z

i still don't like this mechanism. the approach seems silly and overly simplistic, it assumes arguments at the same indexes will be the same size in sequential commands.
i.e. will not give the desired effect in many cases (e.g. SET key bigvalue and then SETRANGE key offset bigvalue)
also, it seems violates the assumptions in #11508

i'd like to check why the allocations make such a big difference, and learn why scripting requires that and MULTI-EXEC doesn't.
if we need that, i'd like to create a mechanism that's a lot smarter, and doesn't rely on the same indexes being used.

filipecosta90 · 2022-12-03T21:26:43Z

i still don't like this mechanism. the approach seems silly and overly simplistic, it assumes arguments at the same indexes will be the same size in sequential commands. i.e. will not give the desired effect in many cases (e.g. SET key bigvalue and then SETRANGE key offset bigvalue) also, it seems violates the assumptions in #11508

i'd like to check why the allocations make such a big difference, and learn why scripting requires that and MULTI-EXEC doesn't. if we need that, i'd like to create a mechanism that's a lot smarter, and doesn't rely on the same indexes being used.

@oranagra agree with the simplicity of caching vs the added complexity of code.
WRT to "check why the allocations make such a big difference" we can see that out of the luaRedisGenericCommand logic 7.04% is from luaArgsToRedisArgv (out of which 5.5 % are from zmalloc and zcalloc ) + 4.55% from freeClientArgv.

sundb · 2022-12-04T02:39:16Z

As i mentioned in this comment #10981 (comment)

Fix invalid access in lpFind on corrupted listpack #9819 -1.5%
We can force the hash to dict to check it.
Redis Function #9780 -3
I remember last time I found that luaSaveOnRegistry() was a relatively large overhead.

src/script_lua.c

oranagra · 2022-12-04T06:43:33Z

Fix invalid access in lpFind on corrupted listpack #9819 -1.5%
We can force the hash to dict to check it.

@sundb i don't understand what you suggest?
to tune the redis config of our benchmark to use dict rather than listpack? (i.e. just for the sake of better comparison with 6.2)

Redis Function #9780 -3
I remember last time I found that luaSaveOnRegistry() was a relatively large overhead.

@MeirShpilraien maybe you can think of another, more performant alternative?

sundb · 2022-12-04T06:51:16Z

@sundb i don't understand what you suggest? to tune the redis config of our benchmark to use dict rather than listpack? (i.e. just for the sake of better comparison with 6.2)

Yes.

* scriptCall takes a client, and the caller is anyway messing with it, so no need to pass argv and argc, instead script the caller can set them and other vars. * properly set c->argv_len before the call * no need to set c->argv_len_sum (only needed for real clients), and if we did need it, we need to update it during the building of argv. * add some comments * properly release the argv back to the cache on args parsing error * add test coverage for the above * add a call to freeClientArgv (who knows what it'll do one day)

oranagra · 2022-12-04T14:44:33Z

i thought about this mechanism a bit more, and performed some benchmarks.
eventually i concluded that it's not that bad that this mechanism is silly (assumes consecutive calls use the same size in the same arg index), since it's only aimed at caching very small things.

i'd even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64) rather than the right size, that would increase the chance they'll be able to be re-used.
I suppose that in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size.

i edited the code and changed a few things, see the commit comment.
let me know what you think and let's merge this.

src/script_lua.c

tests/unit/scripting.tcl

Co-authored-by: sundb <[email protected]>

…7.0 (#11541) This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands. This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in #10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement. The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things. We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used. But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]> (cherry picked from commit 2d80cd7)

…libc realloc (#11652) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.

…libc realloc (#11652) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len. (cherry picked from commit c805212)

…7.0 (redis#11541) This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands. This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in redis#10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement. The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things. We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used. But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]>

…libc realloc (redis#11652) TLDR: solve a problem introduced in Redis 7.0.6 (redis#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.

filipecosta90 added 3 commits November 24, 2022 16:04

Reverting lua cmd object cache removal

3e5af70

Merge remote-tracking branch 'origin/unstable' into lua.cmd.object.cache

3480d56

Ensure same flow on luaRedisGenericCommand

1c4db27

filipecosta90 requested review from MeirShpilraien and oranagra November 24, 2022 17:27

filipecosta90 added the action:run-benchmark Triggers the benchmark suite for this Pull Request label Nov 24, 2022

oranagra reviewed Nov 24, 2022

View reviewed changes

src/script_lua.c Outdated Show resolved Hide resolved

src/script_lua.c Outdated Show resolved Hide resolved

filipecosta90 added 5 commits November 28, 2022 20:20

Addressed review comments to reuse static lua args array within luaAr…

7333b32

…gsToRedisArgv

Merge remote-tracking branch 'origin/unstable' into lua.cmd.object.cache

93ec7a1

global scope of lua args

6310ea8

renamed lua cached objects to be self explanatory

94250e4

Removed zfree of argv on luaRedisAclCheckCmdPermissionsCommand given …

b604695

…static variable

filipecosta90 requested a review from oranagra November 28, 2022 21:19

using freeLuaRedisArgv on luaRedisAclCheckCmdPermissionsCommand

3629e5e

oranagra reviewed Nov 29, 2022

View reviewed changes

filipecosta90 closed this Nov 29, 2022

oranagra reopened this Nov 29, 2022

filipecosta90 added 4 commits December 2, 2022 14:19

Merge branch 'unstable' into lua.cmd.object.cache

bd1790f

WIP on freeClientArgv

deaabbb

revert usage of freeClientArgv

a1f66e8

Revert freeClientArgv changes

3ad7daf

oranagra reviewed Dec 4, 2022

View reviewed changes

src/script_lua.c Outdated Show resolved Hide resolved

oranagra mentioned this pull request Dec 4, 2022

acl check api for functions and eval #10220

Merged

sundb reviewed Dec 4, 2022

View reviewed changes

src/script_lua.c Outdated Show resolved Hide resolved

Updated freeLuaRedisArgv argc arg

848ea6f

filipecosta90 requested review from oranagra, sundb and uriyage December 4, 2022 18:39

sundb reviewed Dec 5, 2022

View reviewed changes

tests/unit/scripting.tcl Outdated Show resolved Hide resolved

Update tests/unit/scripting.tcl

30edd1d

Co-authored-by: sundb <[email protected]>

oranagra approved these changes Dec 5, 2022

View reviewed changes

oranagra merged commit 2d80cd7 into redis:unstable Dec 5, 2022

oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Dec 5, 2022

oranagra mentioned this pull request Dec 6, 2022

Call trimStringObjectIfNeeded only when applicable, don't call realloc on noop. #11508

Closed

filipecosta90 mentioned this pull request Dec 7, 2022

Performance degrade 7.0.3 vs 6.2.7 #10981

Closed

oranagra mentioned this pull request Dec 11, 2022

Release 7.0.6 #11606

Merged

oranagra mentioned this pull request Dec 22, 2022

Fix potential issue with Lua argv caching, module command filter and libc realloc #11652

Merged

eduardobr mentioned this pull request Mar 3, 2023

[QUESTION] Upgrade from 7.0.5 to 7.0.8 dramatically reduced slowlog entries in multiple clusters, why? #11881

Open

Conversation

filipecosta90 commented Nov 24, 2022 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v6.2.6 ~ 222K ops/sec -- 4930d19

Unstable ~ 164K ops/sec -- abc345a

This PR ~ 184K ops/sec -- 6310ea8

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

uriyage Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oranagra commented Nov 29, 2022

Uh oh!

filipecosta90 commented Nov 29, 2022

Uh oh!

filipecosta90 commented Nov 29, 2022

Uh oh!

oranagra commented Nov 29, 2022

Uh oh!

sundb commented Dec 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filipecosta90 commented Dec 3, 2022

Uh oh!

oranagra commented Dec 3, 2022

Uh oh!

oranagra commented Dec 3, 2022

Uh oh!

filipecosta90 commented Dec 3, 2022

Uh oh!

sundb commented Dec 4, 2022

Uh oh!

Uh oh!

oranagra commented Dec 4, 2022

Uh oh!

sundb commented Dec 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Dec 4, 2022

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

filipecosta90 commented Nov 24, 2022 •

edited by oranagra

Loading

v6.2.6 ~ 222K ops/sec -- `4930d19`

Unstable ~ 164K ops/sec -- `abc345a`

This PR ~ 184K ops/sec -- `6310ea8`

sundb commented Dec 3, 2022 •

edited

Loading

sundb commented Dec 4, 2022 •

edited

Loading