Moving client flags to a more cache friendly position within client struct#10697
Moving client flags to a more cache friendly position within client struct#10697oranagra merged 1 commit intoredis:unstablefrom
Conversation
oranagra
left a comment
There was a problem hiding this comment.
@filipecosta90 maybe we wanna use this opportunity to also check what happens when we move the reply buffers and query buffers to the top?
will check it. 👍 |
|
How about set the option |
good idea.
I think this way also can effectively reduce the call number of |
@uvletter if we get the follow counters while doing the benchmark we can observe that indeed the improvement comes from a reducal of stall cycles when compared to the total uops executed :
As you can check below, the stall cycles drop is where it matters the most ( stalls_mem_any ). Furthermore we now also execute more uops as expected ( given we're reducing the expensive waits on memory ). full details bellow: unstable 2bcd890 (merge-base ) this PR: |
|
@filipecosta90 are you gonna try some other combinations (like moving the reply/response buffer variables to the top too)? |
@oranagra I'll confirm with perf perf LOC that this is the best scenario. Give me until tomorrow EOD to confirm if this is already an ideal scenario |
|
i didn't mean to move the flags somewhere else, i mean to take querybuf, qb_pos, querybuf_peak, reply, buf, bufpos, buf_usable_size, etc all to the top |
|
@oranagra Following your suggestion re moving stuff around in the client struct I decided to run Cachegrind on the specific |
|
I used cachegrind with the specified benchmark above and saw the following:
I did some dirty modifications to the code both to reorder the fields in the client struct and also use stack based iterators and ran the benchmark locally. Then in cachegrind I couldn't find much more to improve. Running the benchmark again showed consistently better results after the optimization, although they are pretty mild: 3-4%. @filipecosta90 Maybe you'd like to try the patch below and run the benchmark on your setup? I fear maintaining the client struct cache friendly will be a pain, whatever optimization we do now will just break when we change the struct. I'm not sure @oranagra WDYT? Regarding using stack based iterators I think we should transtion this in all the code, we can use what's in the patch below as a guidline. There are many places internally in quicklist.c that use dynamic allocations for the iterators that need to be changed. |
|
i think we have no choice but periodically re-optimize the cache efficiency of the client struct. |
|
i'm merging this as a step forward, keeping #10460 open for now and posting a reference to the above comment by yoav. |
…truct (redis#10697) Move the client flags to a more cache friendly position within the client struct we regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ). These are due to higher rate of calls to getClientType due to changes in redis#9166 and redis#10020
Fixes #10648 .
Following @oranagra 's recommendation to move the client flags to a more cache friendly position within the client struct we indeed regain the lost 2% of CPU cycles since v6.2 ( from 630532.57 to 647449.80 ops/sec ).
to reproduce:
unstable 2bcd890 (merge-base )
this PR