Skip to content

Implement TTL support for Pinot upsert #9529

@deemoliu

Description

@deemoliu

Apache Pinot provides native support of Upsert since v0.6.0 (#4261), it allows users to modify existing records, and successfully onboard many use cases. We observed Pinot upsert clusters usually have high usage of heap memory. This is because the upsert metadata (primaryKeyIndexes and validDocIndexes), are stored in heap of pinot hosts. For use cases with high cardinality of primary keys, the heap usage of these upsert tables usually becomes the bottleneck of the hardware resource.

For some use cases, records that shared primary keys will get updates frequently during a time window, and after the time window, these records won’t get updated any more. In these use cases, each primary key has a lifecycle and will be deactivated after the time window. Currently these primary keys won’t expire until the retention days, and they will be kept in primaryKeyIndexes. We shall introduce TTL (time-to-live) for Pinot primary keys. Primary keys will expire after the TTL, and we can remove inactive keys from upsert metadata to save heap space.

Few Challenges that we want to solve.

  • snapshots management for validDocIndexes
  • implement TTL for primary keys in primaryKeyIndexes
  • snapshot backup in the deepstore.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions