Optimize RM_Call by recycling the client#4623
Conversation
|
wow, a simple optimization with a great impact. i don't see any downsides. |
|
+1 |
|
That looks awesome! Creating the client is quite some work... and Btw Lua also reuses the same client again and again. I can't see how this is going to create a problem, at max you may forget to reset some flag in the clients you are reusing, but this can be fixed anyway retaining the performance gain, if we find a problem. I'll review the patch just to see if I can find any potential state in the client we want to reset, and merge anyway (either I add the commit to reset such state or merge as it is). Thanks |
|
It's not related to this work, but at this point a good question could be, is client creation so expensive because there is no way to optimize this further, or we are doing too much work? For instance, does it make sense to have many data structures that many clients will never use to be initialized ASAP, or we should do that work on-demand? |
|
@antirez also, can clients maybe be pooled and reused with a global pool? But reading the code it looks like a lot of this stuff is not really needed, especially by a module client, and can either be lazily created, made opt-in by some flag signaling or something, if we know for certain the client will not need it. |
|
Yep, first step will be profiling the function to understand where all this time is burn. Certain things are obviously too aggressively done, for instance the list of Pub/Sub channels. The dictionary should be created on demand as soon as we need to add something to it, and so forth. Btw back on your patch, I'll review ASAP and merge so we get this benefit to start. At a first glance to reuse the client forever may also be possible, since there should never be concurrent access, but I need to do a serious analysis. For instance when we use threaded contexts we create a fake client just for the thread scope, so there is no danger of accessing the global client everybody is using. Similarly given that Lua Scripting is already "bug fixed" we can check how the shared client handling is performed there and what state is reset across calls. After all there are many module commands doing just 1 or 2 RM_Call() calls, so having a single shared client may improve performances in a very significant way. Of course the fact we have so poor testing of the modules API does not make changing things in a safe way trivial :-( The good thing however if that by testing if serious modules like RediSearch still work is somewhat a partial assurance. |
|
yeah, I suppose if valgrind doesn't find anything when running the entire search test suite, we are safe to assume that we're not leaking or anything. |
|
@dvirsky this is awesome! Re: your question earlier, my use case is basically the following MyFancyCmd_RedisCommand() {
... // Some stuff, no calls.
// Occurs once in this function.
RedisModule_Call(ctx, "PUBLISH", channel, ...);
...
}So there is only one @antirez for this use case, any chance to expose a native, more efficient |
|
@concretevitamin I've replied to your private email with a workaround involving thread safe contexts, but it's a pretty complex one... |
|
FWIW I had a go at implementing a low level API for PUBLISH. https://github.com/antirez/redis/pull/4625 |
|
If there's any development on |
|
Hello @concretevitamin, when I'll review this PR (before end of this week) I'll try to see if we can go with just a shared client for all the requests which is optimal. However while to optimize RM_Call() is a very worthwhile effort, in the long run we should end with low level APIs for almost anything... Like RM_Publish() PRed by @dvirsky. More info ASAP. |
|
@antirez Thanks. I agree with most of your points, with one counter(?)example. Although we already have the low-level This is why I'd really like to see a shared client for all Call's, to benchmark against |
zuiderkwast
left a comment
There was a problem hiding this comment.
This would be a great optimization for us. Much of the CPU is used for the allocations related to creating the fake client.
Can this be merged as-is?
|
i think merging it as is would be missing the opportunity to do something better.
maybe @filipecosta90 wanna look into the profiling part and / or @MeirShpilraien at the other part (considering you are a heavy user of these) |
|
@oranagra the problem I see with one singe client for all rm_call (like Lua) is recursive rm_call (module that calls anothe module that calls another module..). So I think we need a pool of clients if we want to do it correctly... (or stay with the current optimization suggested in this PR) Another question, I am not sure about your last point, why we can not use the same shared client/client pool from background thread? When we do RM_Call we take the GIL so I do not see the problem, what am I missing? |
|
@MeirShpilraien i think i would be ok with making an optimization just for the first level of nesting, and have the second level create a client each time, or create one on the first call, and destroy it when exiting (i.e. when RM_Call returns it destroys the second second level if there is one. possibly with a stack of clients). regarding background thread, i guess you're right. i thought about concurrent use of a client, but that can't happen since obviously the GIL will only be released by the main thread when the client isn't used. |
|
@oranagra I agree, we just need to make sure we cover this recursive RM_Call case... |
|
handled by #8516 |
This PR makes the client used by the module to perform RM_Call, part of the module context. If repeated calls to RM_Call are made, we reuse the same client, only destroying it along with the module.
This achieves a huge performance gain in repeated RM_Call's. For comparison, here is an example of calling PING a million times with and without this patch.
Without:
After the patch is applied::
Notice that this only optimizes repeated calls within the same context, which is a rather common case but not always the case. If we just used the same client for ALL RM_Call's, we would optimize even for the case of a module performing a single call. I wasn't sure how safe that is, so I want with the more conservative option. If you think that we can recycle all the way - let's go for it!