Skip to content

performance and memory reporting improvement - sds take control of it's internal frag#7875

Merged
oranagra merged 1 commit intoredis:unstablefrom
oranagra:sds_internal_frag
Oct 2, 2020
Merged

performance and memory reporting improvement - sds take control of it's internal frag#7875
oranagra merged 1 commit intoredis:unstablefrom
oranagra:sds_internal_frag

Conversation

@oranagra
Copy link
Member

@oranagra oranagra commented Oct 1, 2020

This commit has two aspects:

  1. improve memory reporting for all the places that use sdsAllocSize to compute
    memory used by a string, in this case it'll include the internal fragmentation.
  2. reduce the need for realloc calls by making the sds implicitly take over
    the internal fragmentation of the block it allocated.

…s internal frag

This commit has two aspects:
1) improve memory reporting for all the places that use sdsAllocSize to compute
   memory used by a string, in this case it'll include the internal fragmentation.
2) reduce the need for realloc calls by making the sds implicitly take over
   the internal fragmentation of the block it allocated.
@oranagra oranagra requested review from soloestoy and yossigo October 1, 2020 08:35
@oranagra
Copy link
Member Author

oranagra commented Oct 1, 2020

This is another portion of #5159
Note that Salvatore indicated that he was gonna merge it.

@oranagra oranagra merged commit 3945a32 into redis:unstable Oct 2, 2020
@oranagra oranagra deleted the sds_internal_frag branch October 2, 2020 05:19
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Nov 4, 2020
…'s internal frag (redis#7875)

This commit has two aspects:
1) improve memory reporting for all the places that use sdsAllocSize to compute
   memory used by a string, in this case it'll include the internal fragmentation.
2) reduce the need for realloc calls by making the sds implicitly take over
   the internal fragmentation of the block it allocated.
@ShooterIT
Copy link
Member

I'm not sure, just a doubt, should we use the extra memory of malloc in glibc
https://man7.org/linux/man-pages/man3/malloc_usable_size.3.html

   The value returned by malloc_usable_size() may be greater than the
   requested size of the allocation because of alignment and minimum
   size constraints.  Although the excess bytes can be overwritten by
   the application without ill effects, this is not good programming
   practice: the number of excess bytes in an allocation depends on the
   underlying implementation.

   The main use of this function is for debugging and introspection.

@oranagra
Copy link
Member Author

@ShooterIT I think we can use these bytes. the standard says that it's implementation dependent but we didn't make any assumption that an allocation of 30 bytes will always give us 32, we ask the allocator and use just what it reports is safe to use.

I think we want to aim to create an efficient software, and if using these bytes yields a better / more efficient software (even if considered "not good programming practice") we should still do it. if on the other hand we find out that some allocators have implementation that causes this to be buggy, we should stop using it (at least on these specific allocators).

@oranagra oranagra mentioned this pull request Jan 13, 2021
oranagra added a commit that referenced this pull request May 20, 2021
In theory we should have used zmalloc_usable_size, but it may be slower,
and now after #7875, there's no actual difference.
So this change isn't making a dramatic change.

Co-authored-by: Oran Agra <[email protected]>
oranagra pushed a commit that referenced this pull request Jun 15, 2021
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before #7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
In theory we should have used zmalloc_usable_size, but it may be slower,
and now after redis#7875, there's no actual difference.
So this change isn't making a dramatic change.

Co-authored-by: Oran Agra <[email protected]>
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before redis#7875, since the condition for resizing includes the sds headers (32k+6).

## Changes
1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
oranagra added a commit that referenced this pull request Jan 31, 2023
In #7875 (Redis 6.2), we changed the sds alloc to be the usable allocation
size in order to:

> reduce the need for realloc calls by making the sds implicitly take over
the internal fragmentation

This change was done most sds functions, excluding `sdsRemoveFreeSpace` and
`sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer)
we call sdsRemoveFreeSpace when we see excessive free space and want to trim it.
so if we don't trim it exactly to size, the caller may still see excessive free space and
call it again and again.

However, this resulted in some excessive calls to realloc, even when there's no need
and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13).

It turns out that a call for realloc with jemalloc can be expensive even if it ends up
doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid
the call for realloc.

in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common
code. the difference between them was that sdsResize would avoid using SDS_TYPE_5,
since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace
would permit using SDS_TYPE_5 and get an optimal memory consumption.
now both methods take a `would_regrow` argument that makes it more explicit.

the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize
and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both.

The new test that was added to cover this concern used to pass before this PR as well,
this PR is just a performance optimization and cleanup.

Benchmark:
`redis-benchmark -c 100 -t set  -d 512 -P 10  -n  100000000`
on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs).
some 4.5% improvement.

Co-authored-by: Oran Agra <[email protected]>
oranagra added a commit that referenced this pull request Feb 28, 2023
In #7875 (Redis 6.2), we changed the sds alloc to be the usable allocation
size in order to:

> reduce the need for realloc calls by making the sds implicitly take over
the internal fragmentation

This change was done most sds functions, excluding `sdsRemoveFreeSpace` and
`sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer)
we call sdsRemoveFreeSpace when we see excessive free space and want to trim it.
so if we don't trim it exactly to size, the caller may still see excessive free space and
call it again and again.

However, this resulted in some excessive calls to realloc, even when there's no need
and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13).

It turns out that a call for realloc with jemalloc can be expensive even if it ends up
doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid
the call for realloc.

in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common
code. the difference between them was that sdsResize would avoid using SDS_TYPE_5,
since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace
would permit using SDS_TYPE_5 and get an optimal memory consumption.
now both methods take a `would_regrow` argument that makes it more explicit.

the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize
and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both.

The new test that was added to cover this concern used to pass before this PR as well,
this PR is just a performance optimization and cleanup.

Benchmark:
`redis-benchmark -c 100 -t set  -d 512 -P 10  -n  100000000`
on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs).
some 4.5% improvement.

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 46393f9)
(cherry picked from commit b12eeccddd9318a5d97a5aee2dad88999dfad53f)
oranagra added a commit that referenced this pull request Feb 28, 2023
In #7875 (Redis 6.2), we changed the sds alloc to be the usable allocation
size in order to:

> reduce the need for realloc calls by making the sds implicitly take over
the internal fragmentation

This change was done most sds functions, excluding `sdsRemoveFreeSpace` and
`sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer)
we call sdsRemoveFreeSpace when we see excessive free space and want to trim it.
so if we don't trim it exactly to size, the caller may still see excessive free space and
call it again and again.

However, this resulted in some excessive calls to realloc, even when there's no need
and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13).

It turns out that a call for realloc with jemalloc can be expensive even if it ends up
doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid
the call for realloc.

in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common
code. the difference between them was that sdsResize would avoid using SDS_TYPE_5,
since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace
would permit using SDS_TYPE_5 and get an optimal memory consumption.
now both methods take a `would_regrow` argument that makes it more explicit.

the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize
and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both.

The new test that was added to cover this concern used to pass before this PR as well,
this PR is just a performance optimization and cleanup.

Benchmark:
`redis-benchmark -c 100 -t set  -d 512 -P 10  -n  100000000`
on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs).
some 4.5% improvement.

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 46393f9)
enjoy-binbin pushed a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
In redis#7875 (Redis 6.2), we changed the sds alloc to be the usable allocation
size in order to:

> reduce the need for realloc calls by making the sds implicitly take over
the internal fragmentation

This change was done most sds functions, excluding `sdsRemoveFreeSpace` and
`sdsResize`, the reason is that in some places (e.g. clientsCronResizeQueryBuffer)
we call sdsRemoveFreeSpace when we see excessive free space and want to trim it.
so if we don't trim it exactly to size, the caller may still see excessive free space and
call it again and again.

However, this resulted in some excessive calls to realloc, even when there's no need
and it's gonna be a no-op (e.g. when reducing 15 bytes allocation to 13).

It turns out that a call for realloc with jemalloc can be expensive even if it ends up
doing nothing, so this PR adds a check using `je_nallocx`, which is cheap to avoid
the call for realloc.

in addition to that this PR unifies sdsResize and sdsRemoveFreeSpace into common
code. the difference between them was that sdsResize would avoid using SDS_TYPE_5,
since it want to keep the string ready to be resized again, while sdsRemoveFreeSpace
would permit using SDS_TYPE_5 and get an optimal memory consumption.
now both methods take a `would_regrow` argument that makes it more explicit.

the only actual impact of that is that in clientsCronResizeQueryBuffer we call both sdsResize
and sdsRemoveFreeSpace for in different cases, and we now prevent the use of SDS_TYPE_5 in both.

The new test that was added to cover this concern used to pass before this PR as well,
this PR is just a performance optimization and cleanup.

Benchmark:
`redis-benchmark -c 100 -t set  -d 512 -P 10  -n  100000000`
on i7-9850H with jemalloc, shows improvement from 1021k ops/sec to 1067k (average of 3 runs).
some 4.5% improvement.

Co-authored-by: Oran Agra <[email protected]>
warrick1016 pushed a commit to ctripcorp/Redis-On-Rocks that referenced this pull request Sep 2, 2025
The initialize memory of `querybuf` is `PROTO_IOBUF_LEN(1024*16) * 2` (due to sdsMakeRoomFor being greedy), under `jemalloc`, the allocated memory will be 40k.
This will most likely result in the `querybuf` being resized when call `clientsCronResizeQueryBuffer` unless the client requests it fast enough.

Note that this bug existed even before redis#7875, since the condition for resizing includes the sds headers (32k+6).

1. Use non-greedy sdsMakeRoomFor when allocating the initial query buffer (of 16k).
1. Also use non-greedy allocation when working with BIG_ARG (we won't use that extra space anyway)
2. in case we did use a greedy allocation, read as much as we can into the buffer we got (including internal frag), to reduce system calls.
3. introduce a dedicated constant for the shrinking (same value as before)
3. Add test for querybuf.
4. improve a maxmemory test by ignoring the effect of replica query buffers (can accumulate many ACKs on slow env)
5. improve a maxmemory by disabling slowlog (it will cause slight memory growth on slow env).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants