-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Pinot is a distributed real-time OLAP engine that can provide second-level data freshness by ingesting kafka events and capacity to manage months of historical data load from various data sources such as HDFS, schemaless, etc. However, Pinot right now mostly functions as an append-only storage system. It doesn’t allow modify/delete of existing records with the exception of overriding all data within a time range with offline tables. This limits the applicability of pinot system due to a lot of use-cases requiring updates to their data due to the nature of their events or needs for data correction/backfill. In order to extend the capacity of pinot and serve more use-cases, we are going to implement the upsert features in Pinot which allows users to update existing records in Pinot tables with its kafka data input stream.
Some initial requirements for the upsert projects:
-
Only support for full update to pinot event
-
Only support for Kafka-compatible queue ingestion model
-
Single pinot server/table can handle 10k/sec ingestion message rate
-
Each pinot server can handle 1 Billion records or 1TB storage
-
Ingestion latency overhead compared to non-upsert model < 1min
-
Query latency overhead compared to non-upsert model < 50%
-
Data retention < 2 weeks