Skip to content

Enhance dict to support arbitrary metadata carried in dictEntry#8210

Closed
JimB123 wants to merge 6 commits intoredis:unstablefrom
JimB123:dict
Closed

Enhance dict to support arbitrary metadata carried in dictEntry#8210
JimB123 wants to merge 6 commits intoredis:unstablefrom
JimB123:dict

Conversation

@JimB123
Copy link
Contributor

@JimB123 JimB123 commented Dec 17, 2020

The Redis dictionary, as defined in dict.h/dict.c, is a general purpose dictionary utility that should not be modified for custom changes in Redis. The dictionary uses a dictEntry structure to carry each key/value pair.

This update allows the embedding of arbitrary metadata at the end of the dictEntry struct by allowing the allocation of a fixed amount of space at the end of each dictEntry in a given dictionary.

This is useful for modules that might like to maintain meta data along with a Redis defined structure which they can't modify. This allows the storage of the metadata without the overhead of another dictionary.

This is an optional capability specified through the dictType and will not affect performance of the dictionary. It will only affect space (memory) for dictionaries where this is explicitly enabled.

@oranagra
Copy link
Member

@JimB123 thanks for submitting this PR.

I wonder if this is a preparation to set the ground for a followup PR? i don't think we wanna merge this one on it's own (add capabilities that we're not using).
Note that the "dict" capability that redis exposes for modules is based on rax.c, not dict.c

P.S. I actually made a somewhat similar change to dict.c in the past, adding optional callbacks for allocation and release of the dictEntry. i did this for the purpose of keeping some allocation reuse pull for some case of rapid / repeated malloc and frees, but when i think of it such a change (the one i did) could serve your purpose too, and even allow the flexibility of allocating a different size of payload per entry rather than a constant tone.

@madolson
Copy link
Contributor

madolson commented Dec 21, 2020

@oranagra There was two things we were considering open sourcing:

  1. A secret we haven't decided to tell the community yet. ;)
  2. Removing the slots-keys table, and instead replacing it with a linked list of all the entries. This is specifically a memory enhancement when the size of the keys < 16 bytes (which is usually the case), since the rax can use a lot of extra memory for large keys.

@oranagra
Copy link
Member

@madolson you mean when key names are > 16 bytes?
IIRC rax is in most cases more memory efficient than a dict / and certainly a list.
maybe i don't understand the case well enough, can you add more info?

anyway, how is that related to the current PR?

@madolson
Copy link
Contributor

@oranagra Yes I meant > 16 bytes. More precisely, a linked list is more efficient when the keys (stored on average within the radix tree) consume more than 16 bytes each, since a linked list can get away with 16 bytes (2 8 byte allocations for the previous and next entry). We've had several people who use GUIDs to store data, which are stored very inefficiently in the radix tree, complain about memory. The linked list pointers can be stored in the dict-entry metadata that is outlined here, which is how it relates.

This wouldn't replace the RAX, since it will actually be more memory intensive for keys with lots of shared prefixes. It seems like a relatively small change that can optimize some workloads though.

To your point, we can just leave this PR here until we have a real implementation.

Also, the dict implementation is always less efficient than a linked list, since we need the key pointer, the entry pointer, the pointer to the entry, and the extra overhead from the buckets.

@JimB123
Copy link
Contributor Author

JimB123 commented Jan 4, 2021

@oranagra @madolson Memory-wise, a doubly linked list would always be more efficient than a RAX for this case. In most cases it would be significantly more efficient.

The issue is that the RAX duplicates the key, which is already linked in the dictEntry. If there are commonalities in the keys, it will duplicate somewhat efficiently, but still, it is duplicated. In practice, this means (at minimum) an 8-byte end marker on each key and a 16-byte data node for the portion of the key which is different. This is true unless the key is the leading substring of a larger key. But even in the leading substring case, memory is wasted as each node must be rounded up to the allocation size of 8 bytes and a pointer must be added. I don't believe that it's possible to add entries to a Redis RAX using less than 16 bytes.

Simple example - "home" and "homesick":
"home" (by itself) gets stored as: "home" -> endMark.

  • "home" takes 16 bytes (including the pointer to the end mark)
  • the end marker is 8 bytes. Total = 24 bytes

Now add "homesick"
This gets stored as "home" -> "sick" -> endMark.

  • "home" takes 16 bytes (including the pointer to "sick")
  • "sick" takes 16 bytes (including the pointer to the end mark)
  • the end marker is 8 bytes. Total = 40 bytes for 2 overlapping entries.

This assumes 8-byte pointers. Of course if we were using 4-byte pointers, the DLL would be correspondingly smaller also. It also assumes that the RAX only carries keys (and no data) which is the case for slots_to_keys.

@oranagra
Copy link
Member

oranagra commented Jan 4, 2021

There's something I'm missing here, or context not provided?
Why are we comparing rax and linked-list?
Is there any use case where these are both applicable?

I mean:
Hash and rax have sane lookup complexity, and list is just a dumb O(N) search.
Hash and rax have key names, list doesn't necessarily.

IIRC the only places here redis uses a list are when it doesn't have a key, and doesn't perform searches.

What use case are we discussing and how is it related to this PR?

@madolson
Copy link
Contributor

madolson commented Jan 4, 2021

@oranagra The slot->key mapping table could be re-implemented as a linked list instead of a radix tree. The linked list is built from the dictEntrys.

We only need to loop over the entire structure, so we don't need to maintain ordering. We also don't need O(1) lookups because it's being attached to the dictionary, which already provides that. It's an optimization that, as Jim points out, can save a lot of memory for use cases where the keys have very little shared prefixes.

@zuiderkwast
Copy link
Contributor

@JimB123 @madolson did anyone start the rewrite of clusterState slots_to_keys to a linked list based on this? If not, I'll start working on that now.

@madolson
Copy link
Contributor

@zuiderkwast go for it!

@zuiderkwast
Copy link
Contributor

@madolson I had a late night coding session. :-) PR is up.

@JimB123

The issue is that the RAX duplicates the key, which is already linked in the dictEntry. If there are commonalities in the keys, it will duplicate somewhat efficiently, but still, it is duplicated.

Actually, this rax concatenates the slot number (two bytes) before the key, so keys with a common prefix will only share space if they also belong to the same slot.

@soloestoy soloestoy added the state:to-be-closed requesting the core team to close the issue label Sep 16, 2021
@madolson
Copy link
Contributor

madolson commented Sep 20, 2021

Yeah, we can close this now that it's been merged through the entry-metadata PR.

@madolson madolson closed this Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

state:to-be-closed requesting the core team to close the issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants