Enhance dict to support arbitrary metadata carried in dictEntry#8210
Enhance dict to support arbitrary metadata carried in dictEntry#8210JimB123 wants to merge 6 commits intoredis:unstablefrom
Conversation
Rebase to latest OSS
rebase from redis
pull from main branch
|
@JimB123 thanks for submitting this PR. I wonder if this is a preparation to set the ground for a followup PR? i don't think we wanna merge this one on it's own (add capabilities that we're not using). P.S. I actually made a somewhat similar change to dict.c in the past, adding optional callbacks for allocation and release of the dictEntry. i did this for the purpose of keeping some allocation reuse pull for some case of rapid / repeated malloc and frees, but when i think of it such a change (the one i did) could serve your purpose too, and even allow the flexibility of allocating a different size of payload per entry rather than a constant tone. |
|
@oranagra There was two things we were considering open sourcing:
|
|
@madolson you mean when key names are > 16 bytes? anyway, how is that related to the current PR? |
|
@oranagra Yes I meant > 16 bytes. More precisely, a linked list is more efficient when the keys (stored on average within the radix tree) consume more than 16 bytes each, since a linked list can get away with 16 bytes (2 8 byte allocations for the previous and next entry). We've had several people who use GUIDs to store data, which are stored very inefficiently in the radix tree, complain about memory. The linked list pointers can be stored in the dict-entry metadata that is outlined here, which is how it relates. This wouldn't replace the RAX, since it will actually be more memory intensive for keys with lots of shared prefixes. It seems like a relatively small change that can optimize some workloads though. To your point, we can just leave this PR here until we have a real implementation. Also, the dict implementation is always less efficient than a linked list, since we need the key pointer, the entry pointer, the pointer to the entry, and the extra overhead from the buckets. |
|
@oranagra @madolson Memory-wise, a doubly linked list would always be more efficient than a RAX for this case. In most cases it would be significantly more efficient. The issue is that the RAX duplicates the key, which is already linked in the dictEntry. If there are commonalities in the keys, it will duplicate somewhat efficiently, but still, it is duplicated. In practice, this means (at minimum) an 8-byte end marker on each key and a 16-byte data node for the portion of the key which is different. This is true unless the key is the leading substring of a larger key. But even in the leading substring case, memory is wasted as each node must be rounded up to the allocation size of 8 bytes and a pointer must be added. I don't believe that it's possible to add entries to a Redis RAX using less than 16 bytes. Simple example - "home" and "homesick":
Now add "homesick"
This assumes 8-byte pointers. Of course if we were using 4-byte pointers, the DLL would be correspondingly smaller also. It also assumes that the RAX only carries keys (and no data) which is the case for |
|
There's something I'm missing here, or context not provided? I mean: IIRC the only places here redis uses a list are when it doesn't have a key, and doesn't perform searches. What use case are we discussing and how is it related to this PR? |
|
@oranagra The slot->key mapping table could be re-implemented as a linked list instead of a radix tree. The linked list is built from the dictEntrys. We only need to loop over the entire structure, so we don't need to maintain ordering. We also don't need O(1) lookups because it's being attached to the dictionary, which already provides that. It's an optimization that, as Jim points out, can save a lot of memory for use cases where the keys have very little shared prefixes. |
|
@zuiderkwast go for it! |
|
@madolson I had a late night coding session. :-) PR is up.
Actually, this rax concatenates the slot number (two bytes) before the key, so keys with a common prefix will only share space if they also belong to the same slot. |
|
Yeah, we can close this now that it's been merged through the entry-metadata PR. |
The Redis dictionary, as defined in
dict.h/dict.c, is a general purpose dictionary utility that should not be modified for custom changes in Redis. The dictionary uses adictEntrystructure to carry each key/value pair.This update allows the embedding of arbitrary metadata at the end of the
dictEntrystruct by allowing the allocation of a fixed amount of space at the end of eachdictEntryin a given dictionary.This is useful for modules that might like to maintain meta data along with a Redis defined structure which they can't modify. This allows the storage of the metadata without the overhead of another dictionary.
This is an optional capability specified through the
dictTypeand will not affect performance of the dictionary. It will only affect space (memory) for dictionaries where this is explicitly enabled.