Optimize RM_Call by recycling the client by dvirsky · Pull Request #4623 · redis/redis

dvirsky · 2018-01-22T09:10:21Z

This PR makes the client used by the module to perform RM_Call, part of the module context. If repeated calls to RM_Call are made, we reuse the same client, only destroying it along with the module.

This achieves a huge performance gain in repeated RM_Call's. For comparison, here is an example of calling PING a million times with and without this patch.

Without:

127.0.0.1:6379> BENCH.CALL
OKAY
(3.48s)
127.0.0.1:6379> BENCH.CALL
OKAY
(3.42s)

After the patch is applied::

127.0.0.1:6379> BENCH.CALL
OKAY
(1.28s)
127.0.0.1:6379> BENCH.CALL
OKAY
(1.27s)

Notice that this only optimizes repeated calls within the same context, which is a rather common case but not always the case. If we just used the same client for ALL RM_Call's, we would optimize even for the case of a module performing a single call. I wasn't sure how safe that is, so I want with the more conservative option. If you think that we can recycle all the way - let's go for it!

…text

oranagra · 2018-01-22T10:46:45Z

wow, a simple optimization with a great impact. i don't see any downsides.

charsyam · 2018-01-22T10:53:54Z

+1

antirez · 2018-01-22T10:54:16Z

That looks awesome! Creating the client is quite some work... and Btw Lua also reuses the same client again and again. I can't see how this is going to create a problem, at max you may forget to reset some flag in the clients you are reusing, but this can be fixed anyway retaining the performance gain, if we find a problem. I'll review the patch just to see if I can find any potential state in the client we want to reset, and merge anyway (either I add the commit to reset such state or merge as it is). Thanks

antirez · 2018-01-22T10:56:15Z

It's not related to this work, but at this point a good question could be, is client creation so expensive because there is no way to optimize this further, or we are doing too much work? For instance, does it make sense to have many data structures that many clients will never use to be initialized ASAP, or we should do that work on-demand?

dvirsky · 2018-01-22T11:03:52Z

@antirez also, can clients maybe be pooled and reused with a global pool? But reading the code it looks like a lot of this stuff is not really needed, especially by a module client, and can either be lazily created, made opt-in by some flag signaling or something, if we know for certain the client will not need it.

antirez · 2018-01-22T11:15:41Z

Yep, first step will be profiling the function to understand where all this time is burn. Certain things are obviously too aggressively done, for instance the list of Pub/Sub channels. The dictionary should be created on demand as soon as we need to add something to it, and so forth. Btw back on your patch, I'll review ASAP and merge so we get this benefit to start. At a first glance to reuse the client forever may also be possible, since there should never be concurrent access, but I need to do a serious analysis. For instance when we use threaded contexts we create a fake client just for the thread scope, so there is no danger of accessing the global client everybody is using. Similarly given that Lua Scripting is already "bug fixed" we can check how the shared client handling is performed there and what state is reset across calls. After all there are many module commands doing just 1 or 2 RM_Call() calls, so having a single shared client may improve performances in a very significant way. Of course the fact we have so poor testing of the modules API does not make changing things in a safe way trivial :-( The good thing however if that by testing if serious modules like RediSearch still work is somewhat a partial assurance.

dvirsky · 2018-01-22T11:22:20Z

yeah, I suppose if valgrind doesn't find anything when running the entire search test suite, we are safe to assume that we're not leaking or anything.

concretevitamin · 2018-01-22T21:07:06Z

@dvirsky this is awesome!

Re: your question earlier, my use case is basically the following

MyFancyCmd_RedisCommand() {
    ... // Some stuff, no calls.
    // Occurs once in this function.
    RedisModule_Call(ctx, "PUBLISH", channel, ...);
    ...
}

So there is only one Call in MyFancyCmd. However, there are many (millions) of calls to MyFancyCmd, with only a few tens of distinct channel. Is there an optimization opportunity here?

@antirez for this use case, any chance to expose a native, more efficient RedisModule_Publish?

dvirsky · 2018-01-22T22:09:25Z

@concretevitamin I've replied to your private email with a workaround involving thread safe contexts, but it's a pretty complex one...

dvirsky · 2018-01-22T22:35:18Z

FWIW I had a go at implementing a low level API for PUBLISH. https://github.com/antirez/redis/pull/4625

concretevitamin · 2018-01-23T23:08:32Z

If there's any development on After all there are many module commands doing just 1 or 2 RM_Call() calls, so having a single shared client may improve performances in a very significant way. -- which I'm very in favor of -- please update this thread :-)

antirez · 2018-01-24T09:44:54Z

Hello @concretevitamin, when I'll review this PR (before end of this week) I'll try to see if we can go with just a shared client for all the requests which is optimal. However while to optimize RM_Call() is a very worthwhile effort, in the long run we should end with low level APIs for almost anything... Like RM_Publish() PRed by @dvirsky. More info ASAP.

concretevitamin · 2018-01-24T18:55:05Z

@antirez Thanks. I agree with most of your points, with one counter(?)example. Although we already have the low-level RM_StringSet, I've found it to have significant overheads due to RM_OpenKey and RM_CloseKey, which consist of ~70% of the actual StringSet runtime.

This is why I'd really like to see a shared client for all Call's, to benchmark against RM_Call("SET").

https://github.com/antirez/redis/pull/4623 https://github.com/antirez/redis/pull/4625

zuiderkwast

This would be a great optimization for us. Much of the CPU is used for the allocations related to creating the fake client.

Can this be merged as-is?

oranagra · 2021-02-07T11:41:49Z

i think merging it as is would be missing the opportunity to do something better.
i think this is a great improvement but we do need to make sure it applies for modules that do only one call per context.
so the two possible ways to improve it are:

find out what's so heavy client creation and improve it for the benefit of everyone, not just modules
create a cached client like the one used for Lua, that is used for all RM_Call (on behalf of different modules)

we'll need to reset some of the client fields and flags after each use (possibly reusing some of the logic of the RESET command)
we'll need to make sure that calls not made from within the main command execution pipeline (redis's main thread) are not using this cached client)

maybe @filipecosta90 wanna look into the profiling part and / or @MeirShpilraien at the other part (considering you are a heavy user of these)

MeirShpilraien · 2021-02-07T20:39:03Z

@oranagra the problem I see with one singe client for all rm_call (like Lua) is recursive rm_call (module that calls anothe module that calls another module..). So I think we need a pool of clients if we want to do it correctly... (or stay with the current optimization suggested in this PR)

Another question, I am not sure about your last point, why we can not use the same shared client/client pool from background thread? When we do RM_Call we take the GIL so I do not see the problem, what am I missing?

oranagra · 2021-02-07T20:56:45Z

@MeirShpilraien i think i would be ok with making an optimization just for the first level of nesting, and have the second level create a client each time, or create one on the first call, and destroy it when exiting (i.e. when RM_Call returns it destroys the second second level if there is one. possibly with a stack of clients).
i think merging the optimization like it is would miss many cases of single call to RM_Call, which are more common than nesting so i rather optimize that and leave the other, than not optimize either.

regarding background thread, i guess you're right. i thought about concurrent use of a client, but that can't happen since obviously the GIL will only be released by the main thread when the client isn't used.

MeirShpilraien · 2021-02-08T09:14:12Z

@oranagra I agree, we just need to make sure we cover this recursive RM_Call case...

oranagra · 2021-02-28T12:11:33Z

handled by #8516

Recycle the client used for RM_Call by attaching it to the module con…

f537cbe

…text

yossigo mentioned this pull request Dec 8, 2020

[META] Module API V1 #8157

Open

52 tasks

zuiderkwast approved these changes Feb 5, 2021

View reviewed changes

oranagra added this to the Next minor backlog milestone Feb 7, 2021

zuiderkwast mentioned this pull request Feb 19, 2021

Shared reusable client for RM_Call() #8516

Merged

oranagra closed this Feb 28, 2021

Conversation

dvirsky commented Jan 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jan 22, 2018

Uh oh!

charsyam commented Jan 22, 2018

Uh oh!

antirez commented Jan 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antirez commented Jan 22, 2018

Uh oh!

dvirsky commented Jan 22, 2018

Uh oh!

antirez commented Jan 22, 2018

Uh oh!

dvirsky commented Jan 22, 2018

Uh oh!

concretevitamin commented Jan 22, 2018

Uh oh!

dvirsky commented Jan 22, 2018

Uh oh!

dvirsky commented Jan 22, 2018

Uh oh!

concretevitamin commented Jan 23, 2018

Uh oh!

antirez commented Jan 24, 2018

Uh oh!

concretevitamin commented Jan 24, 2018

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra commented Feb 7, 2021

Uh oh!

MeirShpilraien commented Feb 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Feb 7, 2021

Uh oh!

MeirShpilraien commented Feb 8, 2021

Uh oh!

oranagra commented Feb 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dvirsky commented Jan 22, 2018 •

edited

Loading

antirez commented Jan 22, 2018 •

edited

Loading

MeirShpilraien commented Feb 7, 2021 •

edited

Loading