-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Open
Description
Issue is to open-up a discussion around:
- For which use-cases does it make sense to use the stream level consumption model in Pinot ?
- What are the semantics offered by the stream level consumption model. Eg. how does data from the source get partitioned into Pinot tables ? How is the consumption monitored in this model? Iiuc, segment name convention is also different?
- Some feature differences I have noticed are (please correct, if I am mistaken). I am sure there are more.
| Feature | HLC | LLC |
|---|---|---|
| Force commit | No | Yes |
| Stream Message metadata extraction | No (can potentially be extended) | Yes |
| Ingestion throttling | No | Yes |
- Documentation is sparse about this usage and its guarantees. Iirc, there were a few examples in OSS documentation which used high level consumer. Users have mistakenly used these samples with
ConsumerType.HIGHLEVELand ended up in long debugging sessions. One example is https://apache-pinot.slack.com/archives/CDRCA57FC/p1687987849496959?thread_ts=1687912445.703689&cid=CDRCA57FC. (when the original incident happened, we spent ~1-2 days debugging before realizing that the stream type is high level) - Known issues:
- Deleting a high level table does not clean up the ZK metadata (ideal state & segment store)
I would like to propose that we find a path to sunset the stream level consumption model. but I don't want to proceed without understanding the above questions. Please help clarify.
I also see comments like "This can be removed once we remove HLC implementation from the code" link . So, I am assuming this topic has come up before for discussion :)
mayankshriv
Metadata
Metadata
Assignees
Labels
No labels