Skip to content

Optimize RM_Call by recycling the client#4623

Closed
dvirsky wants to merge 1 commit intoredis:unstablefrom
dvirsky:recycle_rm_call_client
Closed

Optimize RM_Call by recycling the client#4623
dvirsky wants to merge 1 commit intoredis:unstablefrom
dvirsky:recycle_rm_call_client

Conversation

@dvirsky
Copy link
Contributor

@dvirsky dvirsky commented Jan 22, 2018

This PR makes the client used by the module to perform RM_Call, part of the module context. If repeated calls to RM_Call are made, we reuse the same client, only destroying it along with the module.

This achieves a huge performance gain in repeated RM_Call's. For comparison, here is an example of calling PING a million times with and without this patch.

Without:

127.0.0.1:6379> BENCH.CALL
OKAY
(3.48s)
127.0.0.1:6379> BENCH.CALL
OKAY
(3.42s)

After the patch is applied::

127.0.0.1:6379> BENCH.CALL
OKAY
(1.28s)
127.0.0.1:6379> BENCH.CALL
OKAY
(1.27s)

Notice that this only optimizes repeated calls within the same context, which is a rather common case but not always the case. If we just used the same client for ALL RM_Call's, we would optimize even for the case of a module performing a single call. I wasn't sure how safe that is, so I want with the more conservative option. If you think that we can recycle all the way - let's go for it!

@oranagra
Copy link
Member

wow, a simple optimization with a great impact. i don't see any downsides.

@charsyam
Copy link
Contributor

+1

@antirez
Copy link
Contributor

antirez commented Jan 22, 2018

That looks awesome! Creating the client is quite some work... and Btw Lua also reuses the same client again and again. I can't see how this is going to create a problem, at max you may forget to reset some flag in the clients you are reusing, but this can be fixed anyway retaining the performance gain, if we find a problem. I'll review the patch just to see if I can find any potential state in the client we want to reset, and merge anyway (either I add the commit to reset such state or merge as it is). Thanks

@antirez
Copy link
Contributor

antirez commented Jan 22, 2018

It's not related to this work, but at this point a good question could be, is client creation so expensive because there is no way to optimize this further, or we are doing too much work? For instance, does it make sense to have many data structures that many clients will never use to be initialized ASAP, or we should do that work on-demand?

@dvirsky
Copy link
Contributor Author

dvirsky commented Jan 22, 2018

@antirez also, can clients maybe be pooled and reused with a global pool? But reading the code it looks like a lot of this stuff is not really needed, especially by a module client, and can either be lazily created, made opt-in by some flag signaling or something, if we know for certain the client will not need it.

@antirez
Copy link
Contributor

antirez commented Jan 22, 2018

Yep, first step will be profiling the function to understand where all this time is burn. Certain things are obviously too aggressively done, for instance the list of Pub/Sub channels. The dictionary should be created on demand as soon as we need to add something to it, and so forth. Btw back on your patch, I'll review ASAP and merge so we get this benefit to start. At a first glance to reuse the client forever may also be possible, since there should never be concurrent access, but I need to do a serious analysis. For instance when we use threaded contexts we create a fake client just for the thread scope, so there is no danger of accessing the global client everybody is using. Similarly given that Lua Scripting is already "bug fixed" we can check how the shared client handling is performed there and what state is reset across calls. After all there are many module commands doing just 1 or 2 RM_Call() calls, so having a single shared client may improve performances in a very significant way. Of course the fact we have so poor testing of the modules API does not make changing things in a safe way trivial :-( The good thing however if that by testing if serious modules like RediSearch still work is somewhat a partial assurance.

@dvirsky
Copy link
Contributor Author

dvirsky commented Jan 22, 2018

yeah, I suppose if valgrind doesn't find anything when running the entire search test suite, we are safe to assume that we're not leaking or anything.

@concretevitamin
Copy link

@dvirsky this is awesome!

Re: your question earlier, my use case is basically the following

MyFancyCmd_RedisCommand() {
    ... // Some stuff, no calls.
    // Occurs once in this function.
    RedisModule_Call(ctx, "PUBLISH", channel, ...);
    ...
}

So there is only one Call in MyFancyCmd. However, there are many (millions) of calls to MyFancyCmd, with only a few tens of distinct channel. Is there an optimization opportunity here?

@antirez for this use case, any chance to expose a native, more efficient RedisModule_Publish?

@dvirsky
Copy link
Contributor Author

dvirsky commented Jan 22, 2018

@concretevitamin I've replied to your private email with a workaround involving thread safe contexts, but it's a pretty complex one...

@dvirsky
Copy link
Contributor Author

dvirsky commented Jan 22, 2018

FWIW I had a go at implementing a low level API for PUBLISH. https://github.com/antirez/redis/pull/4625

@concretevitamin
Copy link

If there's any development on After all there are many module commands doing just 1 or 2 RM_Call() calls, so having a single shared client may improve performances in a very significant way. -- which I'm very in favor of -- please update this thread :-)

@antirez
Copy link
Contributor

antirez commented Jan 24, 2018

Hello @concretevitamin, when I'll review this PR (before end of this week) I'll try to see if we can go with just a shared client for all the requests which is optimal. However while to optimize RM_Call() is a very worthwhile effort, in the long run we should end with low level APIs for almost anything... Like RM_Publish() PRed by @dvirsky. More info ASAP.

@concretevitamin
Copy link

@antirez Thanks. I agree with most of your points, with one counter(?)example. Although we already have the low-level RM_StringSet, I've found it to have significant overheads due to RM_OpenKey and RM_CloseKey, which consist of ~70% of the actual StringSet runtime.

This is why I'd really like to see a shared client for all Call's, to benchmark against RM_Call("SET").

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a great optimization for us. Much of the CPU is used for the allocations related to creating the fake client.

Can this be merged as-is?

@oranagra oranagra added this to the Next minor backlog milestone Feb 7, 2021
@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

i think merging it as is would be missing the opportunity to do something better.
i think this is a great improvement but we do need to make sure it applies for modules that do only one call per context.
so the two possible ways to improve it are:

  1. find out what's so heavy client creation and improve it for the benefit of everyone, not just modules
  2. create a cached client like the one used for Lua, that is used for all RM_Call (on behalf of different modules)
  • we'll need to reset some of the client fields and flags after each use (possibly reusing some of the logic of the RESET command)
  • we'll need to make sure that calls not made from within the main command execution pipeline (redis's main thread) are not using this cached client)

maybe @filipecosta90 wanna look into the profiling part and / or @MeirShpilraien at the other part (considering you are a heavy user of these)

@MeirShpilraien
Copy link

MeirShpilraien commented Feb 7, 2021

@oranagra the problem I see with one singe client for all rm_call (like Lua) is recursive rm_call (module that calls anothe module that calls another module..). So I think we need a pool of clients if we want to do it correctly... (or stay with the current optimization suggested in this PR)

Another question, I am not sure about your last point, why we can not use the same shared client/client pool from background thread? When we do RM_Call we take the GIL so I do not see the problem, what am I missing?

@oranagra
Copy link
Member

oranagra commented Feb 7, 2021

@MeirShpilraien i think i would be ok with making an optimization just for the first level of nesting, and have the second level create a client each time, or create one on the first call, and destroy it when exiting (i.e. when RM_Call returns it destroys the second second level if there is one. possibly with a stack of clients).
i think merging the optimization like it is would miss many cases of single call to RM_Call, which are more common than nesting so i rather optimize that and leave the other, than not optimize either.

regarding background thread, i guess you're right. i thought about concurrent use of a client, but that can't happen since obviously the GIL will only be released by the main thread when the client isn't used.

@MeirShpilraien
Copy link

@oranagra I agree, we just need to make sure we cover this recursive RM_Call case...

@oranagra
Copy link
Member

handled by #8516

@oranagra oranagra closed this Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants