Introduce HASH items expiration#2089
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2089 +/- ##
============================================
- Coverage 71.49% 71.37% -0.13%
============================================
Files 123 125 +2
Lines 67487 69207 +1720
============================================
+ Hits 48251 49395 +1144
- Misses 19236 19812 +576
🚀 New features to boost your workflow:
|
zuiderkwast
left a comment
There was a problem hiding this comment.
I did a partial pass on this. I got to the hashtable callback and the entry abstraction. I didn't get to the actual field expiration logic in t_hash and the volatile set though. Need to continue another day.
65eeb1d to
8ecd584
Compare
rainsupreme
left a comment
There was a problem hiding this comment.
This is a lot of work you've done! I've only had time for a partial review today, but I had a few comments/questions so far. The command schema and entry memory layout looks good to me. It'll be interesting to see perf testing too! 😀
We are just more focused on introducing the functionality and would focus on performance testing as soon as possible. |
src/entry.c
Outdated
| zfree(entryAllocPtr(entry)); | ||
| } | ||
|
|
||
| /* Takes ownership of value, does not take ownership of field */ |
There was a problem hiding this comment.
I will just remove that logic for now. it was meant for sets, but I am not sure it will remain that way.
|
First comment - 37 changed files??? Dang! |
JimB123
left a comment
There was a problem hiding this comment.
still reviewing. Posting Day 1. 😨
you still have another PR in the oven - it might be bigger than this :( |
Thank you @rjd15372 ! TBH the entry is NOT the main focus of this PR. most of the entry code is taken from the already existing implementation of hashTypeEntry (with indeed some changes). I think the really interesting part are the new commands themselves. this is were the complex logic is introduced (HSETEX, HGETEX, HEXPIRE etc...) there is also the new volatile set API in the t_hash.c (that I do not like that much) but we can focus on this in the PR introducing the volatile set. |
This is needed due to changes presented in #2089 --------- Signed-off-by: Ran Shidlansik <[email protected]>
|
that will be great if we next time, when doing the rebase merge, we squash the PR number like #2089 in the commit message title. (I usually locate the PR web page based on the commit message title) |
Following new API presented in #2089, we might access out of bound memory in case of some illegal command input Signed-off-by: Ran Shidlansik <[email protected]>
This is needed due to changes presented in valkey-io#2089 --------- Signed-off-by: Ran Shidlansik <[email protected]>
…y-io#2464) Following new API presented in valkey-io#2089, we might access out of bound memory in case of some illegal command input Signed-off-by: Ran Shidlansik <[email protected]>
related to: valkey-io/valkey#2089 --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Viktor Söderqvist <[email protected]> Co-authored-by: Josh Soref <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
…y-io#2464) Following new API presented in valkey-io#2089, we might access out of bound memory in case of some illegal command input Signed-off-by: Ran Shidlansik <[email protected]> Signed-off-by: Harkrishn Patro <[email protected]>
In #2089 we added a deferred logic for HGETALL since we cannot anticipate the size of the output as it may contain expired hash items which should not be included. As part of the work of #2022 this would greatly increase the time for HGETALL processing, thus we introduce this minor improvement to avoid using deferred reply in case the hash has NO volatile items. --------- Signed-off-by: Ran Shidlansik <[email protected]>
In valkey-io#2089 we added a deferred logic for HGETALL since we cannot anticipate the size of the output as it may contain expired hash items which should not be included. As part of the work of valkey-io#2022 this would greatly increase the time for HGETALL processing, thus we introduce this minor improvement to avoid using deferred reply in case the hash has NO volatile items. --------- Signed-off-by: Ran Shidlansik <[email protected]>
In the original implementation of Hash Field Expiration (#2089), the HSETEX command was implemented to report keyspace notifications only for performed changes. This is mostly aligned with other Hash commands (for example, HDEL will also not report `hdel` event for items which does not exist) The HSETEX case is somewhat different and is more like the `HSET` case. During HSETEX, after the command validations pass, items are ALWAYS "added" to the object, even though they might not actually be added. This case is the same for when the hash object is empty or when all the provided fields do not exist in the object (as reported [here](#2998)) This PR changes the way `HSETEX` will report keyspace notifications so that: 1. `hset` notification will ALWAYS be reported if all command validations pass. 2. `hexpire` will be reported in case the command include an expiration time (even past time) 3. `hxpired` will be reported in case the provided expiration time is in the past (or 0) 4. `hdel` will be reported in case the hash exists (or created as part of the command) and following the command execution it was left empty. 5. we will always return '1' as a return value of tHSETEX command which passed all validations. Before that we returned 1 only if we applied the change cross ALL the input fields, so in case some of them did not exist and a past time was set we would return 0. --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Jacob Murphy <[email protected]>
…-io#3001) In the original implementation of Hash Field Expiration (valkey-io#2089), the HSETEX command was implemented to report keyspace notifications only for performed changes. This is mostly aligned with other Hash commands (for example, HDEL will also not report `hdel` event for items which does not exist) The HSETEX case is somewhat different and is more like the `HSET` case. During HSETEX, after the command validations pass, items are ALWAYS "added" to the object, even though they might not actually be added. This case is the same for when the hash object is empty or when all the provided fields do not exist in the object (as reported [here](valkey-io#2998)) This PR changes the way `HSETEX` will report keyspace notifications so that: 1. `hset` notification will ALWAYS be reported if all command validations pass. 2. `hexpire` will be reported in case the command include an expiration time (even past time) 3. `hxpired` will be reported in case the provided expiration time is in the past (or 0) 4. `hdel` will be reported in case the hash exists (or created as part of the command) and following the command execution it was left empty. 5. we will always return '1' as a return value of tHSETEX command which passed all validations. Before that we returned 1 only if we applied the change cross ALL the input fields, so in case some of them did not exist and a past time was set we would return 0. --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Jacob Murphy <[email protected]>
In the original implementation of Hash Field Expiration (#2089), the HSETEX command was implemented to report keyspace notifications only for performed changes. This is mostly aligned with other Hash commands (for example, HDEL will also not report `hdel` event for items which does not exist) The HSETEX case is somewhat different and is more like the `HSET` case. During HSETEX, after the command validations pass, items are ALWAYS "added" to the object, even though they might not actually be added. This case is the same for when the hash object is empty or when all the provided fields do not exist in the object (as reported [here](#2998)) This PR changes the way `HSETEX` will report keyspace notifications so that: 1. `hset` notification will ALWAYS be reported if all command validations pass. 2. `hexpire` will be reported in case the command include an expiration time (even past time) 3. `hxpired` will be reported in case the provided expiration time is in the past (or 0) 4. `hdel` will be reported in case the hash exists (or created as part of the command) and following the command execution it was left empty. 5. we will always return '1' as a return value of tHSETEX command which passed all validations. Before that we returned 1 only if we applied the change cross ALL the input fields, so in case some of them did not exist and a past time was set we would return 0. --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Jacob Murphy <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]>
In the original implementation of Hash Field Expiration (#2089), the HSETEX command was implemented to report keyspace notifications only for performed changes. This is mostly aligned with other Hash commands (for example, HDEL will also not report `hdel` event for items which does not exist) The HSETEX case is somewhat different and is more like the `HSET` case. During HSETEX, after the command validations pass, items are ALWAYS "added" to the object, even though they might not actually be added. This case is the same for when the hash object is empty or when all the provided fields do not exist in the object (as reported [here](#2998)) This PR changes the way `HSETEX` will report keyspace notifications so that: 1. `hset` notification will ALWAYS be reported if all command validations pass. 2. `hexpire` will be reported in case the command include an expiration time (even past time) 3. `hxpired` will be reported in case the provided expiration time is in the past (or 0) 4. `hdel` will be reported in case the hash exists (or created as part of the command) and following the command execution it was left empty. 5. we will always return '1' as a return value of tHSETEX command which passed all validations. Before that we returned 1 only if we applied the change cross ALL the input fields, so in case some of them did not exist and a past time was set we would return 0. --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Jacob Murphy <[email protected]> Signed-off-by: Ran Shidlansik <[email protected]>
…-io#3001) In the original implementation of Hash Field Expiration (valkey-io#2089), the HSETEX command was implemented to report keyspace notifications only for performed changes. This is mostly aligned with other Hash commands (for example, HDEL will also not report `hdel` event for items which does not exist) The HSETEX case is somewhat different and is more like the `HSET` case. During HSETEX, after the command validations pass, items are ALWAYS "added" to the object, even though they might not actually be added. This case is the same for when the hash object is empty or when all the provided fields do not exist in the object (as reported [here](valkey-io#2998)) This PR changes the way `HSETEX` will report keyspace notifications so that: 1. `hset` notification will ALWAYS be reported if all command validations pass. 2. `hexpire` will be reported in case the command include an expiration time (even past time) 3. `hxpired` will be reported in case the provided expiration time is in the past (or 0) 4. `hdel` will be reported in case the hash exists (or created as part of the command) and following the command execution it was left empty. 5. we will always return '1' as a return value of tHSETEX command which passed all validations. Before that we returned 1 only if we applied the change cross ALL the input fields, so in case some of them did not exist and a past time was set we would return 0. --------- Signed-off-by: Ran Shidlansik <[email protected]> Co-authored-by: Jacob Murphy <[email protected]>
Closes #640
Summary
This PR introduces support for field-level expiration in Valkey hash types, making it possible for individual fields inside a hash to expire independently — creating what we call volatile fields.
This is just the first out of 3 PRs. The content of this PR focus on enabling the basic ability to set and modify hash fields expiration as well as persistency (AOF+RDB) and defrag.
The second PR introduces the new algorithm (volatile-set) to track volatile hash fields is in the last stages of review. The current implementation in this PR (in volatile-set.h/c) is just s tub implementation and will be replaced by The second PR
The third PR which introduces the active expiration and defragmentation jobs.
For more highlevel design details you can track the RFC PR: valkey-io/valkey-rfc#22.
Major decisions
Some highlevel major decisions which are taken as part of this work:
We decided to copy the existing Redis API in order to maintain compatibility with existing clients.
We decided to avoid introducing lazy-expiration at this point, in order to reduce complexity and rely only on active-expiration for memory reclamation. This will require us to continue to work on improving the active expiration job and potentially consider introduce lazy-expiration support later on.
Although different commands which are adding expiration on hash fields are influencing the memory utilization (by allocating more memory for expiration time and metadata) we decided to avoid adding the DENYOOM for these commands (an exception is HSETEX) in order to be better aligned with highlevel keys commands like
expireSome hash type commands will produce unexpected results:
for the case:
The reported events are:
New entry type
This PR also modularizes and exposes the internal
hashTypeEntrylogic as a new standaloneentry.c/hmodule. This new abstraction handles all aspects of field–value–expiry encoding using multiple memory layouts optimized for performance and memory efficiency.An
entryis an abstraction that represents a single field–value pair with optional expiration. Internally, Valkey uses different memory layouts for compactness and efficiency, chosen dynamically based on size and encoding constraints.The entry pointer is the field sds. Which make us use an entry just like any sds. We encode the entry layout type
in the field SDS header. Field type SDS_TYPE_5 doesn't have any spare bits to
encode this so we use it only for the first layout type.
Entry with embedded value, used for small sizes. The value is stored as
SDS_TYPE_8. The field can use any SDS type.
Entry can also have expiration timestamp, which is the UNIX timestamp for it to be expired.
For aligned fast access, we keep the expiry timestamp prior to the start of the sds header.
Entry with value pointer, used for larger fields and values. The field is SDS
type 8 or higher.
The
entry.c/hAPI provides methods to:Supported Commands
This PR introduces new commands and extends existing ones to support field expiration:
Commands
The proposed API is very much identical to the Redis provided API (Redis 7.4 + 8.0). This is intentionally proposed in order to avoid breaking client applications already opted to use hash items TTL.
HSETEX
Synopsis
Set the value of one or more fields of a given hash key, and optionally set their expiration time or time-to-live (TTL).
The HSETEX command supports the following set of options:
NX— Only set the fields if the hash object does NOT exist.XX— Only set the fields if if the hash object doesx exist.FNX— Only set the fields if none of them already exist.FXX— Only set the fields if all of them already exist.EX seconds— Set the specified expiration time in seconds.PX milliseconds— Set the specified expiration time in milliseconds.EXAT unix-time-seconds— Set the specified Unix time in seconds at which the fields will expire.PXAT unix-time-milliseconds— Set the specified Unix time in milliseconds at which the fields will expire.KEEPTTL— Retain the TTL associated with the fields.The
EX,PX,EXAT,PXAT, andKEEPTTLoptions are mutually exclusive.HEGTEX
Synopsis
Get the value of one or more fields of a given hash key and optionally set their expiration time or time-to-live (TTL).
The
HGETEXcommand supports a set of options:EX seconds— Set the specified expiration time, in seconds.PX milliseconds— Set the specified expiration time, in milliseconds.EXAT unix-time-seconds— Set the specified Unix time at which the fields will expire, in seconds.PXAT unix-time-milliseconds— Set the specified Unix time at which the fields will expire, in milliseconds.PERSIST— Remove the TTL associated with the fields.The
EX,PX,EXAT,PXAT, andPERSISToptions are mutually exclusive.HEXPIRE
Synopsis
Set an expiration (TTL or time to live) on one or more fields of a given hash key. You must specify at least one field. Field(s) will automatically be deleted from the hash key when their TTLs expire.
Field expirations will only be cleared by commands that delete or overwrite the contents of the hash fields, including
HDELandHSETcommands. This means that all the operations that conceptually alter the value stored at a hash key's field without replacing it with a new one will leave the TTL untouched.You can clear the TTL of a specific field by specifying 0 for the ‘seconds’ argument.
Note that calling
HEXPIRE/HPEXPIREwith a time in the past will result in the hash field being deleted immediately.The
HEXPIREcommand supports a set of options:NX— For each specified field, set expiration only when the field has no expiration.XX— For each specified field, set expiration only when the field has an existing expiration.GT— For each specified field, set expiration only when the new expiration is greater than current one.LT— For each specified field, set expiration only when the new expiration is less than current one.HEXPIREAT
Synopsis
HEXPIREAThas the same effect and semantics asHEXPIRE, but instead of specifying the number of seconds for the TTL (time to live), it takes an absolute Unix timestamp in seconds since Unix epoch. A timestamp in the past will delete the field immediately.The
HEXPIREATcommand supports a set of options:NX— For each specified field, set expiration only when the field has no expiration.XX— For each specified field, set expiration only when the field has an existing expiration.GT— For each specified field, set expiration only when the new expiration is greater than current one.LT— For each specified field, set expiration only when the new expiration is less than current one.HPEXPIRE
Synopsis
This command works like
HEXPIRE, but the expiration of a field is specified in milliseconds instead of seconds.The
HPEXPIREcommand supports a set of options:NX— For each specified field, set expiration only when the field has no expiration.XX— For each specified field, set expiration only when the field has an existing expiration.GT— For each specified field, set expiration only when the new expiration is greater than current one.LT— For each specified field, set expiration only when the new expiration is less than current one.HPEXPIREAT
Synopsis
HPEXPIREAThas the same effect and semantics asHEXPIREAT``,but the Unix time at which the field will expire is specified in milliseconds since Unix epoch instead of seconds.HPERSIST
Synopsis
Remove the existing expiration on a hash key's field(s), turning the field(s) from volatile (a field with expiration set) to persistent (a field that will never expire as no TTL (time to live) is associated).
HSETEX
Synopsis
Similar to
HSETbut adds one or more hash fields that expire after specified number of seconds. By default, this command overwrites the values and expirations of specified fields that exist in the hash. IfNXoption is specified, the field data will not be overwritten. Ifkeydoesn't exist, a new Hash key is created.The HSETEX command supports a set of options:
NX— For each specified field, set expiration only when the field has no expiration.HTTL
Synopsis
Returns the remaining TTL (time to live) of a hash key's field(s) that have a set expiration. This introspection capability allows you to check how many seconds a given hash field will continue to be part of the hash key.
HPTTL
Like
HTTL, this command returns the remaining TTL (time to live) of a field that has an expiration set, but in milliseconds instead of seconds.HEXPIRETIME
Synopsis
Returns the absolute Unix timestamp in seconds since Unix epoch at which the given key's field(s) will expire.
HPEXPIRETIME
Synopsis
HPEXPIRETIMEhas the same semantics asHEXPIRETIME, but returns the absolute Unix expiration timestamp in milliseconds since Unix epoch instead of seconds.Keyspace Notifications
This PR introduces new notification events to support field-level expiration:
hexpirehexpiredhpersistdelNote that we diverge from Redis in the cases we emit hexpired event.
For example:
given the following usecase:
regarding the keyspace-notifications:
Redis reports:
However In our current suggestion, Valkey will emit:
Propagation and Replication
HSETEX,HGETEX, etc.) are not propagated as-is.HDEL(for expired fields)HPEXPIREAT(for setting absolute expiration)HPERSIST(for removing expiration)This ensures compatibility with replication and AOF while maintaining consistent field-level expiry behavior.
Performance Comparison
Accumulated Backlog
[ ] Consider extending HSETEX with extra arguments: NX/XX so that it is possible to prevent adding/setting/mutating fields of a non-existent hash
[ ] Avoid loading expired fields when non-preamble RDB is being loaded on primary. This is an optimization in order to reduce loading unnecessary fields (which are expired). This would also require us to propagate the HDEL to the replicas in case of RDBFLAGS_FEED_REPL. Note that it might have to require some refactoring:
1/ propagate the rdbflags and current time to rdbLoadObject. 2/ consider the case of restore and check_rdb etc...
For this reason I would like to avoid this optimizationfor the first drop.