Call trimStringObjectIfNeeded only when applicable, don't call realloc on noop.#11508
Call trimStringObjectIfNeeded only when applicable, don't call realloc on noop.#11508uriyage wants to merge 12 commits intoredis:unstablefrom
Conversation
|
IIRC there was a reason why i didn't modify this function. i'm actually surprised i can't find any place where i documented that reason, so maybe i'm wrong... I think the reason is that in some cases we call sdsRemoveFreeSpace when we see excessive free space and want to trim it. @uriyage WDYT? |
@oranagra I didn't find any place where we call I observed the issue as I noticed using perf that we call redundant Jemalloc realloc for every SET command from I fixed it with the following 2 steps:
I intended to open a PR for # 2 , but now I think we can do it in this PR, WDYT? |
|
i don't think i like the idea of #11509, i'll comment there as to why. in any case, i'd rather discuss all of these together in one PR... regarding
regarding 1, theoretically the recalloc call should be a NOP, if it doesn't change a bin size, it won't do any allocation, won't do any memory copying, and shouldn't even lock the arena mutex. regarding 2, one place where we can call sdsRemoveFreeSpace again and again is clientsCronResizeQueryBuffer. i.e. in large bins, we can have over 4kb of internal fragmentation. |
|
Jemalloc's realloc call is not a NOP as it should be theoretically. I profiled it with SET, 512 bytes value, pipeline mode, and I didn't notice the Per your comment regarding #11509, I agree this is less of an issue when using |
|
forgive me for being a bit out of focus (juggling between tasks). i don't like using je_nallocx, being non-standard (deep inside sds.c), but since you pointed out you experienced this problem only with jemalloc and not with libc, maybe it would be okay. |
|
@oranagra If we are to use |
|
@uriyage how about using je_nallocx just to avoid the realloc call, but keep other portions of sdsRemoveFreeSpace intact (i.e. set sds->alloc to the used size and not the allocated)? is that what you meat in your last message? |
|
@oranagra Since we agree that |
|
i don't like the use of the non-standard API, but considering the alternatives, i think that's what i'd go with. thanks! |
|
I will open a PR for it but I think we should reconsider the It seems logical to ask the opposite question: if we set sds->alloc in all sds functions to the allocation size, why should we deviate in There is only one argument for this deviation: it prevents possible repetitive calls from The counter arguments are: |
|
@uriyage I agree that having a unified logic which keep the sds->alloc to the usable size is correct, but as was indicated this will break some other optimizations. Same goes for sdsResize. In case of EmbeddedStringObject for example this is by design as it is a static string. |
|
I don't think it's a consistency issue. we can document it and as long as it doesn't cause bug, it's an okay design concept. when i did that (and forgot to document the reason), i don't think i was specifically thinking of clientsCronResizeQueryBuffer, but rather that i was afraid of the generic approach of someone trying to shrink a string if it has excess space, and wanted to avoid that, and wanted to avoid that conceptual problem and have any regression as a result of my change. what i didn't take into consideration back then, is places that shrink a string that was previously grown (i.e. i wanted to avoid repeated shrinking, but not non-repeated ones), and the fact that a single NOP realloc can be a considerable overhead.
why is that a memory efficiency issue? |
|
After running some benchmarks (see below), I can recall why I made the #11509 change. Regarding the On a related but different issue: In order to avoid code duplication I was wondering if we can refactor WDYT? Benchmarking: Benchmark Settings.
I tested both with libc and jemalloc. For Jemalloc I tested 3 scenarios:
For libc only 2 scenarios were relevant:
As can be seen with libc the call to trim makes little difference of 0.5% with 256 bytes (this may be explain with the memory access on sdslen call). Results:
256 bytes:
|
IIUC, the optimization here is not because we do it in processMultibulkBuffer instead of tryObjectEncoding, but because we don't do it on all strings (just on large ones), but on the other hand, we do it on all large strings, even ones not eventually retained in the db. the other alternative, is to store some flag indicating the origin of the string in the robj, or an indication of it's excess space in the sds (we have some spare bits). i'm not sure i'm happy with either of these options.
they're not exactly the same, i see sdsResize avoids downsizing the sds type, and sdsRemoveFreeSpace doesn't. |
…ly when applicable, dont call realloc on noop.
…Needed only when applicable, Dont call realloc on noop
…Needed only when applicable, Dont call realloc on noop
|
@oranagra following our conversation, I made the following changes:
|
…ved > PROTO_MBULK_BIG_ARG logic to trimStringObjectIfNeeded
oranagra
left a comment
There was a problem hiding this comment.
LGTM.
The CI failure is because the CU runs on the merge result with the target branch, so although there are no conflicts, on unstable we have an additional use of sdsResize, and that fails to build.
you'll need to merge unstable into your branch and add the missing argument..
| * To avoid repeated calls, the sds allocation size will be set to the requested size | ||
| * regardless of the actual allocation size. */ | ||
| sds sdsResize(sds s, size_t size) { | ||
| sds sdsResize(sds s, size_t size, int is_change_likely) { |
There was a problem hiding this comment.
not sure i like the name, how about immutable_string or would_regrow
it's true that passing it doesn't make it immutable, but i think the caller should pass 1 on strings that are not gonna change anymore (not that are unlikely to grow).
There was a problem hiding this comment.
immutable_string is a bit misleading since DB strings can still change (on APPEND, for example), so the only difference is the change likelihood between SET's strings and querybuf's string.
There was a problem hiding this comment.
@oranagra please LMK how to proceed, regarding the var name, and the module's RM_StringTruncate issue.
Thanks.
There was a problem hiding this comment.
bahh, let's keep it.. it's just an argument name for two small functions.
the worse that could happen if we find a better name, is that it'll be negated, and we'll need to fix all callers.
| /* If the string is too wasteful, reallocate it. */ | ||
| if (sdslen(key->value->ptr) < sdsavail(key->value->ptr)) | ||
| key->value->ptr = sdsRemoveFreeSpace(key->value->ptr); | ||
| key->value->ptr = sdsRemoveFreeSpace(key->value->ptr, 0); |
There was a problem hiding this comment.
maybe in this case we rather pass 1?
in the past calling sdsRemoveFreeSpace would have a chance to use TYPE_5, so passing 0 would keep the old behavior. but i'm not sure we know how modules use it, maybe they do intend to re-grow later?
@MeirShpilraien do you have any feedback on how you think modules use either RM_TrimStringAllocation or RM_StringTruncate?
There was a problem hiding this comment.
another related thought, when we introduced RM_TrimStringAllocation (calls trimStringObjectIfNeeded), we thought of trimming argv strings (as an alternative to the automatic trimming done in RedisModuleCommandDispatcher, we even documented it.
but maybe modules also call it on non-argv strings? in which case the new condition on BIG_ARG would cause damage?
There was a problem hiding this comment.
TestStringAppendAM() test is to use RM_TrimStringAllocation() for non-argv strings, maybe we could add a minimum reserved parameter to trimStringObjectIfNeeded(), if it's 0 then there is no limit.
There was a problem hiding this comment.
@MeirShpilraien do you have any feedback on how you think modules use either RM_TrimStringAllocation or RM_StringTruncate?
@oranagra I do not think any of our modules use it.
There was a problem hiding this comment.
it's hard to make a decision here, but i have a feeling the right thing is to keep the code as is.
i.e. assume that RM_TrimStringAllocation is used on argv strings, when deciding to store them in the db.
keep it's capability to use SDS_TYPE_5 since it's likely to be used on strings that are not gonna mutate (same goes for RM_StringTruncate, let's let it keep it's possibility for using TYPE_5 for now),
Also let's keep the condition that avoid extra work on small strings for RM_TrimStringAllocation, since if they are coming from argv, small ones will have no excess.
it's already documented in the module api what is it's purpose, so the only thing left is to document in the top comment of the PR the change it causes this API (not trimming small strings, in case they're not coming from argv)
|
@oranagra, |
|
you mean check that I don't think moving the check up to i have a feeling this will be easier if we start by implementing tests that reproduce these problems, before solving them. |
|
@oranagra, As you suggested I tried to add tests for all the use cases.
The later was a bit tricky as we trim the string only if it is greater than 44 ( I couldn't find a way to test the case of module-client where we have sds with free space which doesn't originated from the network since we don't have a module api call to setsdslen. |
|
isn't testing modules is as simple as a module creating a string an modifying it with RM_StringAppendBuffer (uses the greedy sdsMakeRoomFor)? p.s. i see the test filed on external mode, which re-uses an existing server and thus we have less control over what's in the lua cache. please look into it and see if we can overcome it, if not, maybe just skip that test in external mode. i started a full CI cycle to see how these tests behave on other malloc implementations https://github.com/redis/redis/actions/runs/3994464980 |
|
Thanks, I didn't notice it uses greedy - I wrote a test for the module case and as expected it failed as the check for We can either add a check for The external mode fails as we run it in cluster mode where we have few more bytes in the key memory usage I will update the test to account for this. |
|
i think i'm uncomfortable with just saying that modules need to take care of it themselves. I lean towards your suggestion of adding also, turns out i didn't trigger the CI correctly, here's another attempt: https://github.com/redis/redis/actions/runs/3996301870 |
|
i've discussed this with @guybe7 and he suggested that maybe when we introduce another (ugly) global (i.e. thinking about it more, i realize that it won't be the same, when a module uses RM_Call to execute EVAL. |
Thanks, Looks like when using realloc to trim memory with libc, the |
i don't remember where (if in this PR or another one), we recently had a discussion where it was suggested that sdsRemoveFreeSpace should set the real allocation size (use malloc_size, similarly to other sds function) in sdssetalloc rather than the requested size size. |
|
I looked for all the places where we access the
Looks like this logic has no effect because we delete the flags at
Here, in case the script_caller is a module client. I think we should change here the
Here we use the Given the above if we can always use the current_client, wdyt about removing the script_caller and instead of adding |
i think executing_client will give us more flexibility to do other things not just know the current mode, but i think some of the above mentioned issues cause complications removing script_caller and maybe should be dealt with an a separate PR. |
We discussed that in this issue #10728 (comment) some time ago, and we discussed it was likely not the right design but didn't fix it. I think I might have been concerned it has been around so long that it's basically a breaking change at this point. |
|
back then it wasn't in our direct path and we decide to leave it. but i think now (with executing_client) it is (in our direct path, so let's try to solve it). on the other hand, this change discussed in the recent comments is quite big and not directly related to the issue in this PR, so maybe we should split that to another cleanup PR. problem is that i'm not sure we can move on with this one without first solving the other, and the other seems like it can get complicated too. |
|
p.s. considering all these complications, and the fact the problem it fixes is just performance and not a buggy behavior, i don't think we can backport this one to 6.2 etc. |
|
i'll try to handle the executing_client thing and make a PR for that, let's keep this PR open until we see where it's going and if it can use that mechanism. meanwhile, since we realize that we'll want to backport parts of this improvement to 6.2 and 7.0 and this became too complicated, please issue another PR with the minimal and safe portions of this one (i.e. the je_nallocx) |
I think we need to get better at creating separate issues for these types of items. |
|
|
closing this one, replaced by #11817 |
As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at #11508). Only call the function when there is a possibility that the string contains free space. * For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG * For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN * For strings coming from modules, it could be anything. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]>
As `sdsRemoveFreeSpace` have an impact on performance even if it is a no-op (see details at redis#11508). Only call the function when there is a possibility that the string contains free space. * For strings coming from the network, it's only if they're bigger than PROTO_MBULK_BIG_ARG * For strings coming from scripts, it's only if they're smaller than LUA_CMD_OBJCACHE_MAX_LEN * For strings coming from modules, it could be anything. Co-authored-by: Oran Agra <[email protected]> Co-authored-by: sundb <[email protected]>
In #7875, we change the sds alloc to be the usable allocation size in order to:
This change was not done in
sdsRemoveFreeSpace, resulting in inconsistency and unnecessary reallocations.