Use CLZ in _dictNextExp to get the next power of two#12815
Merged
oranagra merged 1 commit intoredis:unstablefrom Nov 29, 2023
Merged
Use CLZ in _dictNextExp to get the next power of two#12815oranagra merged 1 commit intoredis:unstablefrom
oranagra merged 1 commit intoredis:unstablefrom
Conversation
In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in redis#7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in redis#11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in redis#12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because redis#7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in redis#8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`, After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case.
oranagra
approved these changes
Nov 29, 2023
Member
oranagra
left a comment
There was a problem hiding this comment.
thanks for the detailed commit comment.
can you please add a comment here about the status of the release branches, which one of them needs this or the previous PR, and how desperately they need it.
This was referenced Nov 29, 2023
Closed
Contributor
Author
|
6.0, no need to backport, because we did not introduce the allow check. 6.2 and 7.0 and 7.2, we can backport #12789, since it can improve set performance by 5% in big dict. (reduce the calls to allow check (reduce the calls to nextExp)) 6.2 and 7.0 and 7.2, we can backport this PR #12815, since we can avoid the degradation when we met the thresholds. |
oranagra
pushed a commit
that referenced
this pull request
Jan 9, 2024
In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in #11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in #12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because #7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in #8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`. After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case. (cherry picked from commit 22cc9b5)
oranagra
pushed a commit
that referenced
this pull request
Jan 9, 2024
In the past, we did not call _dictNextExp frequently. It was only called when the dictionary was expanded. Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2. For the data dict and the expire dict, we can check maxmemory before actually expanding the dict. This is a good optimization to avoid maxmemory being exceeded due to the dict expansion. And in #11692, we moved the dictTypeExpandAllowed check before the threshold check, this caused a bit of performance degradation, every time a key is added to the dict, dictTypeExpandAllowed is called to check. The main reason for degradation is that in a large dict, we need to call _dictNextExp frequently, that is, every time we add a key, we need to call _dictNextExp once. Then the threshold is checked to see if the dict needs to be expanded. We can see that the order of checks here can be optimized. So we moved the dictTypeExpandAllowed check back to after the threshold check in #12789. In this way, before the dict is actually expanded (that is, before the threshold is reached), we will not do anything extra compared to before, that is, we will not call _dictNextExp frequently. But note we'll still hit the degradation when we over the thresholds. When the threshold is reached, because #7954, we may delay the dict expansion due to maxmemory limitations. In this case, we will call _dictNextExp every time we add a key during this period. This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count leading zeros) can easily give you the next power of two. It should be noted that we have actually introduced the use of __builtin_clzl in #8687, which is 7.0. So i suppose all the platforms we use have it (even if the CPU doesn't have an instruction). We build 67108864 (2**26) keys through DEBUG POPULTE, which will use approximately 5.49G memory (used_memory:5898522936). If expansion is triggered, the additional hash table will consume approximately 1G memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G), which will be less than 5.49G + 1G, so we will delay the dict rehash while addint the keys. After that, each time an element is added to the dict, an allow check will be performed, that is, we can frequently call _dictNextExp to test the comparison before and after the optimization. Using DEBUG HTSTATS 0 to check and make sure that our dict expansion is dealyed. Using `./src/redis-server redis.conf --save "" --maxmemory 6871947673`. Using `./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000`. After ten rounds of testing: ``` unstable: this PR: 769585.94 816860.00 771724.00 818196.69 775674.81 822368.44 781983.12 822503.69 783576.25 828088.75 784190.75 828637.75 791389.69 829875.50 794659.94 835660.69 798212.00 830013.25 801153.62 833934.56 ``` We can see there is about 4-5% performance improvement in this case. (cherry picked from commit 22cc9b5)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In the past, we did not call _dictNextExp frequently. It was only
called when the dictionary was expanded.
Later, dictTypeExpandAllowed was introduced in #7954, which is 6.2.
For the data dict and the expire dict, we can check maxmemory before
actually expanding the dict. This is a good optimization to avoid
maxmemory being exceeded due to the dict expansion.
And in #11692, we moved the dictTypeExpandAllowed check before the
threshold check, this caused a bit of performance degradation, every
time a key is added to the dict, dictTypeExpandAllowed is called to check.
The main reason for degradation is that in a large dict, we need to
call _dictNextExp frequently, that is, every time we add a key, we
need to call _dictNextExp once. Then the threshold is checked to see
if the dict needs to be expanded. We can see that the order of checks
here can be optimized.
So we moved the dictTypeExpandAllowed check back to after the threshold
check in #12789. In this way, before the dict is actually expanded (that
is, before the threshold is reached), we will not do anything extra
compared to before, that is, we will not call _dictNextExp frequently.
But note we'll still hit the degradation when we over the thresholds.
When the threshold is reached, because #7954, we may delay the dict
expansion due to maxmemory limitations. In this case, we will call
_dictNextExp every time we add a key during this period.
This PR use CLZ in _dictNextExp to get the next power of two. CLZ (count
leading zeros) can easily give you the next power of two. It should be
noted that we have actually introduced the use of __builtin_clzl in #8687,
which is 7.0. So i suppose all the platforms we use have it (even if the
CPU doesn't have an instruction).
We build 67108864 (2**26) keys through DEBUG POPULTE, which will use
approximately 5.49G memory (used_memory:5898522936). If expansion is
triggered, the additional hash table will consume approximately 1G
memory (2 ** 27 * 8). So we set maxmemory to 6871947673 (that is, 6.4G),
which will be less than 5.49G + 1G, so we will delay the dict rehash
while adding the keys.
After that, each time an element is added to the dict, an allow check
will be performed, that is, we can frequently call _dictNextExp to test
the comparison before and after the optimization. Using DEBUG HTSTATS 0 to
check and make sure that our dict expansion is dealyed.
Using
./src/redis-server redis.conf --save "" --maxmemory 6871947673.Using
./src/redis-benchmark -P 100 -r 1000000000 -t set -n 5000000.After ten rounds of testing:
We can see there is about 4-5% performance improvement in this case.