-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
We have an partial-upsert table where the number of updates per key is pretty high in a short period of time (within an hour we get 1000s of updates for a key).
Between this, if we are querying for that particular primary key, we see no response from Pinot intermittently. I saw, this coincides with an update received for that primary key (query is received within 1 second of a new record for that key).
After few seconds, the record comes up again in the query response and everything works fine until there is another overlap of query-time and ingestion-time.
I suspect it might be happening because we update DocID by removing it first and then adding it again.
Lines 303 to 308 in 168408a
| if (segment == currentSegment) { | |
| replaceDocId(validDocIds, queryableDocIds, currentDocId, newDocId, recordInfo); | |
| } else { | |
| removeDocId(currentSegment, currentDocId); | |
| addDocId(validDocIds, queryableDocIds, newDocId, recordInfo); | |
| } |
And it might be a race condition where query is received between these 2 actions.
Is this expected behaviour? Is there a way we can guarantee atleast the older record (if not newer) during this time?
One of the possible solution can be to have read lock on validDocIds before updating and use the same lock in FilterPlanNode -
pinot/pinot-core/src/main/java/org/apache/pinot/core/plan/FilterPlanNode.java
Lines 88 to 99 in 168408a
| MutableRoaringBitmap queryableDocIdSnapshot = null; | |
| if (!_queryContext.isSkipUpsert()) { | |
| ThreadSafeMutableRoaringBitmap queryableDocIds = _indexSegment.getQueryableDocIds(); | |
| if (queryableDocIds != null) { | |
| queryableDocIdSnapshot = queryableDocIds.getMutableRoaringBitmap(); | |
| } else { | |
| ThreadSafeMutableRoaringBitmap validDocIds = _indexSegment.getValidDocIds(); | |
| if (validDocIds != null) { | |
| queryableDocIdSnapshot = validDocIds.getMutableRoaringBitmap(); | |
| } | |
| } | |
| } |
But this can become a major issue during high-qps / high-ingestion scenario where the other thread gets starved.
One more approach can be to get a clone of validDocIds before updating and then update (add / remove) the docId in the clone and then copy it back to the original validDocID. This can work for updates in the present consuming segment but might not work for updates where previous record was in an old immutable segment. This will also have memory implications as we will have to maintain a clone of validDocID at every message. What if we do this in batches? We can reduce memory footprint but the implementation can become quite intrusive.