-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Apache Pinot provides native support of Upsert since v0.6.0 (#4261), it allows users to modify existing records, and successfully onboard many use cases. We observed Pinot upsert clusters usually have high usage of heap memory. This is because the upsert metadata (primaryKeyIndexes and validDocIndexes), are stored in heap of pinot hosts. For use cases with high cardinality of primary keys, the heap usage of these upsert tables usually becomes the bottleneck of the hardware resource.
For some use cases, records that shared primary keys will get updates frequently during a time window, and after the time window, these records won’t get updated any more. In these use cases, each primary key has a lifecycle and will be deactivated after the time window. Currently these primary keys won’t expire until the retention days, and they will be kept in primaryKeyIndexes. We shall introduce TTL (time-to-live) for Pinot primary keys. Primary keys will expire after the TTL, and we can remove inactive keys from upsert metadata to save heap space.
Few Challenges that we want to solve.
- snapshots management for validDocIndexes
- implement TTL for primary keys in primaryKeyIndexes
- snapshot backup in the deepstore.