Use the extra bits in the flags for len/alloc.#7904
Use the extra bits in the flags for len/alloc.#7904hbina wants to merge 4 commits intoredis:unstablefrom
Conversation
f66e1eb to
e7812a2
Compare
Implementation notes: 1. For sds(len/setlen/etc) methods, we technically do not need to mask the for the extra bits whenever we we want to shift the given value because the caller of all of these functions should have guaranteed that the given `size_t` does not exceed what the current SDS_TYPE_* can handle. 2. I think this is all that needs to be done. There might be edge cases where code is accessing the values directly? Signed-off-by: Hanif Bin Ariffin <[email protected]>
Signed-off-by: Hanif Bin Ariffin <[email protected]>
Signed-off-by: Hanif Bin Ariffin <[email protected]>
95b35e6 to
cdcbbd6
Compare
|
@hbina i know certain specific string sizes which once had to use 16 bit header sds, will now fit into 8 bit header sds. but i'm not sure the extra complexity in the code justifies that improvement. let's consider the memory saving in this case (most significant one): for the bigger sizes it's even less rewarding (saving 4 bytes out of 65k string) |
|
@oranagra thanks for looking at this! I am new here >.<
I think (from reading above) you are misunderstanding what my change will do? Notice that I shift the additional bits in the flags to make it a power of (9, 10), (17, 18), and (33, 34) approximately.
I think the implementation can be made simpler by simply rethinking what bits are used for len/alloc. to As long its contained within this file and all access be done through helper functions...it should be fine. |
|
@hbina i may be missing your intent indeed (didn't carefully read the changes). this PR is about memory saving, right? if i'm still missing your point please correct me. |
|
@oranagra Oh okay, yes, I understand now. What you are claiming is that my changes will have diminishing returns in memory saving as the string gets bigger. Yes, I agree with that. But there's also other benefits from increasing the range of SDS because of this Lines 249 to 263 in aaacb8c While are are at it, and I don't want to open an issue just for this, what do you think about Line 979 in aaacb8c eagerly creating the strings instead of lazily doing so? I think it could be improved to do basically 0 wasted allocations. |
|
@hbina regarding this memory optimization, it's not only that it's diminishing as strings get bigger, even on the smallest string that it affects, the benefit is not that big (considering the size of the string). this is what we avoided going that way in the past. regarding size class conversion in sdsMakeRoomFor and also in sdsRemoveFreeSpace, i think this is an edge case that we don't want to consider. note that the above waste of bytes in order to reduce the need for size class conversion goes against the main topic of this PR (to reduce space). if we are worried about that conversion, the most important thing to do is completely drop the use of sds16 and sds32, and always use sds64, since a conversion of a big string is a lot more painful, and the waste of 16 bytes is not is not that bad. |
|
Sounds reasonable. Thanks for looking into this regardless :) |
**Bug Fixes:** * [redis#7219](RediSearch/RediSearch#7219) Fix a concurrency issue on Reducer in `FT.AGGREGATE` * [redis#7255](RediSearch/RediSearch#7255) Fix `BM25STD` underflow wraparound * [redis#7275](RediSearch/RediSearch#7275) Report used memory as `unsigned long long` to avoid underflows * [redis#7264](RediSearch/RediSearch#7264) Fix `totalDocsLen` updates * [redis#6995](RediSearch/RediSearch#6995) Do not fanout `FT.INFO` to replicas * [redis#7350](RediSearch/RediSearch#7350) Fix `FT.CREATE` failure with LeanVec parameters on non-Intel architectures * [redis#7384](RediSearch/RediSearch#7384) Fix index load from RDB temporary memory overhead * [redis#7459](RediSearch/RediSearch#7459) Fix Fork GC potential double-free on error path * [redis#7458](RediSearch/RediSearch#7458) Fix a GC performence regression * [redis#7470](RediSearch/RediSearch#7470) Avoid draining worker thread pool from FLUSH callback to avoid deadlocks * [redis#7554](RediSearch/RediSearch#7554) Handle the case where `SCORE` is sent alone without extra fields (coordinator) * [redis#7685](RediSearch/RediSearch#7685) Fix cursor logical leak * [redis#7794](RediSearch/RediSearch#7794) Fix `cmp_strings()` to correctly handle binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE * [redis#7873](RediSearch/RediSearch#7873) Handle warnings in empty `FT.AGGREGATE` replies (cluster) * [redis#7886](RediSearch/RediSearch#7886) Remove non-TEXT fields from the spec keys dictionary * [redis#7904](RediSearch/RediSearch#7904) Refactor keys dictionary handling * [redis#7901](RediSearch/RediSearch#7901) Support multiple warnings in reply * [redis#8083](RediSearch/RediSearch#8083) Fix incorrect FULLTEXT field metric counts * [redis#8153](RediSearch/RediSearch#8153) Fix configuration registration issues **Improvements:** * [redis#7154](RediSearch/RediSearch#7154) `FT.AGGREGATE` can return Background Indexing OOM warnings * [redis#7083](RediSearch/RediSearch#7083) Add the default text scorer as a configuration option * [redis#7341](RediSearch/RediSearch#7341) Rename `FT.PROFILE` counter fields * [redis#7436](RediSearch/RediSearch#7436) Enhance `FT.PROFILE` with vector search execution details * [redis#7435](RediSearch/RediSearch#7435) Ensure full `FT.PROFILE` output on timeout with RETURN policy * [redis#7534](RediSearch/RediSearch#7534) Reduce the number of worker threads asynchronously to avoid deadlocks during queries * [redis#7614](RediSearch/RediSearch#7614) Track timeout warnings and errors in INFO * [redis#7646](RediSearch/RediSearch#7646) Track `maxprefixexpansions` warnings and errors in INFO * [redis#7577](RediSearch/RediSearch#7577) Track query syntax/argument errors (basis for query error tracking) * [redis#7737](RediSearch/RediSearch#7737) Add `Internal cursor reads` metric to cluster `FT.PROFILE` output * [redis#7759](RediSearch/RediSearch#7759) Extend indexing metrics * [redis#7710](RediSearch/RediSearch#7710) Support `WITHCOUNT` keyword in `FT.AGGREGATE` * [redis#7957](RediSearch/RediSearch#7957) Persist query warnings across cursor reads * [redis#8054](RediSearch/RediSearch#8054) Add logging for index-related commands * [redis#8151](RediSearch/RediSearch#8151) Fix shard total profile time reporting in `FT.PROFILE` * [redis#8103](RediSearch/RediSearch#8103) Output current thread IndexSpec information on crash
**Bug Fixes:** * [#7219](RediSearch/RediSearch#7219) Fix a concurrency issue on Reducer in `FT.AGGREGATE` * [#7255](RediSearch/RediSearch#7255) Fix `BM25STD` underflow wraparound * [#7275](RediSearch/RediSearch#7275) Report used memory as `unsigned long long` to avoid underflows * [#7264](RediSearch/RediSearch#7264) Fix `totalDocsLen` updates * [#6995](RediSearch/RediSearch#6995) Do not fanout `FT.INFO` to replicas * [#7350](RediSearch/RediSearch#7350) Fix `FT.CREATE` failure with LeanVec parameters on non-Intel architectures * [#7694](RediSearch/RediSearch#7694) Use asynchronous jobs for GC in SVS to accelerate execution * [#7384](RediSearch/RediSearch#7384) Fix index load from RDB temporary memory overhead * [#7459](RediSearch/RediSearch#7459) Fix Fork GC potential double-free on error path * [#7458](RediSearch/RediSearch#7458) Fix a GC performence regression * [#7470](RediSearch/RediSearch#7470) Avoid draining worker thread pool from FLUSH callback to avoid deadlocks * [#7554](RediSearch/RediSearch#7554) Handle the case where `SCORE` is sent alone without extra fields (coordinator) * [#7685](RediSearch/RediSearch#7685) Fix cursor logical leak * [#7794](RediSearch/RediSearch#7794) Fix `cmp_strings()` to correctly handle binary data with embedded NULLs in TOLIST reducer in FT.AGGREGATE * [#7873](RediSearch/RediSearch#7873) Handle warnings in empty `FT.AGGREGATE` replies (cluster) * [#7886](RediSearch/RediSearch#7886) Remove non-TEXT fields from the spec keys dictionary * [#7904](RediSearch/RediSearch#7904) Refactor keys dictionary handling * [#7901](RediSearch/RediSearch#7901) Support multiple warnings in reply * [#8083](RediSearch/RediSearch#8083) Fix incorrect FULLTEXT field metric counts * [#8153](RediSearch/RediSearch#8153) Fix configuration registration issues **Improvements:** * [#7154](RediSearch/RediSearch#7154) `FT.AGGREGATE` can return Background Indexing OOM warnings * [#7083](RediSearch/RediSearch#7083) Add the default text scorer as a configuration option * [#7341](RediSearch/RediSearch#7341) Rename `FT.PROFILE` counter fields * [#7436](RediSearch/RediSearch#7436) Enhance `FT.PROFILE` with vector search execution details * [#7435](RediSearch/RediSearch#7435) Ensure full `FT.PROFILE` output on timeout with RETURN policy * [#7534](RediSearch/RediSearch#7534) Reduce the number of worker threads asynchronously to avoid deadlocks during queries * [#7614](RediSearch/RediSearch#7614) Track timeout warnings and errors in INFO * [#7646](RediSearch/RediSearch#7646) Track `maxprefixexpansions` warnings and errors in INFO * [#7577](RediSearch/RediSearch#7577) Track query syntax/argument errors (basis for query error tracking) * [#7737](RediSearch/RediSearch#7737) Add `Internal cursor reads` metric to cluster `FT.PROFILE` output * [#7759](RediSearch/RediSearch#7759) Extend indexing metrics * [#7710](RediSearch/RediSearch#7710) Support `WITHCOUNT` keyword in `FT.AGGREGATE` * [#7957](RediSearch/RediSearch#7957) Persist query warnings across cursor reads * [#8054](RediSearch/RediSearch#8054) Add logging for index-related commands * [#8151](RediSearch/RediSearch#8151) Fix shard total profile time reporting in `FT.PROFILE` * [#8103](RediSearch/RediSearch#8103) Output current thread IndexSpec information on crash
Implementation notes:
For sds(len/setlen/etc) methods, we technically do not need to mask the for the extra bits whenever we
we want to shift the given value because the caller of all of these functions should have guaranteed that
the given
size_tdoes not exceed what the current SDS_TYPE_* can handle.I think this is all that needs to be done. There might be edge cases where code is accessing the values directly?
Edit: Getting connection refused error for tests. Is that me? :V
Signed-off-by: Hanif Bin Ariffin [email protected]