-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Issue Description
(Assume no sorted column is set)
Say we have two records for a primary-key in a consuming segment: R0, R1 (in Kafka order). Say both of these had the same comparison column value.
After ingesting these events, the metadata manager map will point to R1 as being the latest record (which is correct).
Then, during segment commit when the ImmutableSegment replaces the MutableSegment, we will end up pointing the upsert metadata manager map to the older record R0 for a brief moment, until the iterator reaches R1.
But when the ConcurrentMapPartitionUpsertMetadataManager#addOrReplaceSegment method is being called, the consuming segment may have already started consumption. If an event for the same primary key comes when the map was briefly pointing to R0, then the merged record will have wrong data.
Since this is a race-condition, this can manifest differently across the replicas and lead to even more issues. (use issue #12399 to discuss that).
How This was Discovered
I was trying to debug some other Partial Upsert issue, and had created a table in one of our clusters. I force committed a few times in an hour to get a bunch of segments, and was surprised to see that the replicas of the segments had diverged. This particular table keeps the comparison column values as 0.
On taking a deeper look I found these records.
- Correct data: https://gist.github.com/ankitsultana/280f6fbcb704f8305359e002055a83b8
- Incorrect data: https://gist.github.com/ankitsultana/4459d1dcd1ecc43bad7b4636b814a306
Possible Fix
One possible fix is that for Partial Upsert tables, we compare the docIds when the comparison column values match. However, this won't work when users have set a sorted column, so we need to first discuss about supporting sorted column with partial-upserts: #12397.
Another solution is to lock the map in addOrReplaceSegment (it is contending with realtime ingestion anyways).
This diff shows how the first proposal would look like: #12395