Skip to content

Questions about Stream level consumption model #10996

@navina

Description

@navina

Issue is to open-up a discussion around:

  1. For which use-cases does it make sense to use the stream level consumption model in Pinot ?
  2. What are the semantics offered by the stream level consumption model. Eg. how does data from the source get partitioned into Pinot tables ? How is the consumption monitored in this model? Iiuc, segment name convention is also different?
  3. Some feature differences I have noticed are (please correct, if I am mistaken). I am sure there are more.
Feature HLC LLC
Force commit No Yes
Stream Message metadata extraction No (can potentially be extended) Yes
Ingestion throttling No Yes
  1. Documentation is sparse about this usage and its guarantees. Iirc, there were a few examples in OSS documentation which used high level consumer. Users have mistakenly used these samples with ConsumerType.HIGHLEVEL and ended up in long debugging sessions. One example is https://apache-pinot.slack.com/archives/CDRCA57FC/p1687987849496959?thread_ts=1687912445.703689&cid=CDRCA57FC. (when the original incident happened, we spent ~1-2 days debugging before realizing that the stream type is high level)
  2. Known issues:
  • Deleting a high level table does not clean up the ZK metadata (ideal state & segment store)

I would like to propose that we find a path to sunset the stream level consumption model. but I don't want to proceed without understanding the above questions. Please help clarify.

I also see comments like "This can be removed once we remove HLC implementation from the code" link . So, I am assuming this topic has come up before for discussion :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions