Reintroduce lua argument cache in luaRedisGenericCommand removed in v7.0 #11541
Reintroduce lua argument cache in luaRedisGenericCommand removed in v7.0 #11541oranagra merged 17 commits intoredis:unstablefrom
Conversation
| lua_args_cached_objects_len[j] >= obj_len) | ||
| { | ||
| sds s = lua_args_cached_objects[j]->ptr; | ||
| lua_argv[j] = lua_args_cached_objects[j]; |
There was a problem hiding this comment.
i now realize that this is affecting trimStringObjectIfNeeded.
i.e. if we run a SET command from a lua script, it'll get to tryObjectEncoding and possibly reallocate (memcpy) that string.
@uriyage this may mean that the assumption in #11508 that 32kb (BIG_ARG) is the only case where strings have excess space is wrong.
on the other hand, it means that the argv caching in this PR is sometimes not really achieving it's purpose (for SET command).
not sure what to do.. i don't particularly like that lame argument caching mechanism.
i wasn't actually aware of it until recently, and i see it only works efficiently if you re-use the same arguments sizes in the exact same argv indexes.
on the other hand, i'm not entirely comfortable with the BIG_ARG condition in trimStringObjectIfNeeded either.
There was a problem hiding this comment.
This is a bit ugly, but we can pass an argument to tryObjectEncoding to indicate if we should check the BIG_ARG condition or not depending on the client's CLIENT_SCRIPT flag.
|
@filipecosta90 after reviewing this caching mechanism for the first time, i must say i don't like it. maybe we should attack this from a different angle, and instead of blindly re-introduce the cache, try to profile is and maybe come up with a different conclusion and design? |
Agree. Let's try to see what the profile gives us. I'll profile already with the change in #11521 |
|
closing this one based on the above discussion |
|
i'd rather keep it open until we have another lead. |
|
@filipecosta90 The reason for the crash in CI: robj **argv = luaArgsToRedisArgv(lua, &argc);
replaceClientCommandVector(...) <- c->argv will be freedThis will indirectly cause It seems that the same issue also occurs in 6.2.7. |
@sundb have you managed to replicate it on 6.2.7? I can easily replicate the issue on this branch running the following command twice but not on 6.2: |
|
i don't see any problem in 6.2 around these area. the problem with this PR is that it forgets to correctly set |
|
i still don't like this mechanism. the approach seems silly and overly simplistic, it assumes arguments at the same indexes will be the same size in sequential commands. i'd like to check why the allocations make such a big difference, and learn why scripting requires that and MULTI-EXEC doesn't. |
@oranagra agree with the simplicity of caching vs the added complexity of code. |
|
As i mentioned in this comment #10981 (comment)
|
@sundb i don't understand what you suggest?
@MeirShpilraien maybe you can think of another, more performant alternative? |
Yes. |
* scriptCall takes a client, and the caller is anyway messing with it, so no need to pass argv and argc, instead script the caller can set them and other vars. * properly set c->argv_len before the call * no need to set c->argv_len_sum (only needed for real clients), and if we did need it, we need to update it during the building of argv. * add some comments * properly release the argv back to the cache on args parsing error * add test coverage for the above * add a call to freeClientArgv (who knows what it'll do one day)
|
i thought about this mechanism a bit more, and performed some benchmarks. i'd even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64) rather than the right size, that would increase the chance they'll be able to be re-used. i edited the code and changed a few things, see the commit comment. |
Co-authored-by: sundb <[email protected]>
…7.0 (#11541) This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands. This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in #10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement. The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things. We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used. But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]> (cherry picked from commit 2d80cd7)
…libc realloc (#11652) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.
…libc realloc (#11652) TLDR: solve a problem introduced in Redis 7.0.6 (#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len. (cherry picked from commit c805212)
…7.0 (redis#11541) This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands. This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in redis#10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement. The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things. We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used. But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]>
…7.0 (redis#11541) This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands. This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in redis#10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement. The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things. We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used. But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]>
…libc realloc (redis#11652) TLDR: solve a problem introduced in Redis 7.0.6 (redis#11541) with RM_CommandFilterArgInsert being called from scripts, which can lead to memory corruption. Libc realloc can return the same pointer even if the size was changed. The code in freeLuaRedisArgv had an assumption that if the pointer didn't change, then the allocation didn't change, and the cache can still be reused. However, if rewriteClientCommandArgument or RM_CommandFilterArgInsert were used, it could be that we realloced the argv array, and the pointer didn't change, then a consecutive command being executed from Lua can use that argv cache reaching beyond its size. This was actually only possible with modules, since the decision to realloc was based on argc, rather than argv_len.

As discussed in #10981 (comment) this PR re-enables argv argument caching.
This mechanism aims to reduce calls to malloc and free when preparing the arguments the script sends to redis commands.
This is a mechanism was originally implemented in 48c49c4 and 4f68655, and was removed in #10220 (thinking it's not needed and that it has no impact), but it now turns out it was wrong, and it indeed provides some 5% performance improvement.
The implementation is a little bit too simplistic, it assumes consecutive calls use the same size in the same arg index, but that's arguably sufficient since it's only aimed at caching very small things.
We could even consider always pre-allocating args to the full LUA_CMD_OBJCACHE_MAX_LEN (64 bytes) rather than the right size for the argument, that would increase the chance they'll be able to be re-used.
But in some way this is already happening since we're using sdsalloc, which in turn uses s_malloc_usable and takes ownership of the full side of the allocation, so we are padded to the allocator bucket size.
To easily test it:
Here's the impact using the following benchmark command:
v6.2.6 ~ 222K ops/sec -- 4930d19
Unstable ~ 164K ops/sec -- abc345a
This PR ~ 184K ops/sec -- 6310ea8