Skip to content

Conversation

@tibrewalpratik17
Copy link
Contributor

@tibrewalpratik17 tibrewalpratik17 commented May 31, 2024

label:
optimization
enhancement
upsert

Change 1

Inspired from @klsince's work #12976.
This patch enhances the doTakeSnapshot flow to not snapshot all segments in a given partition but only the ones which have been updated since last-snapshot taken. This particularly improves scenarios where the number of segments per partition is high. doTakeSnapshot workflow runs before a new consuming segment starts consumption and directly introduces ingestion lag before starting consumption.

Change 2

This patch also reorders the takeSnapshot and removeDeletedPrimaryKeys flow putting the latter before the first in case of deletedKeysTTL set. This way all the keys and validDocIDs that got removed in removeDeletedPrimaryKeys will be snapshotted immediately rather than one commit cycle later.

We were seeing scenerios where the snapshot flow time taken went upto 30s in case of some tables.
Screenshot 2024-06-06 at 6 21 45 PM

Change 3

We enable snapshotting during server restart for partial-upsert tables before the first consuming segment. This was not done before with the assumption that not all segments are loaded but in case of partial-upsert tables we don't start consumption unless all data is loaded. This saves one segment commit cycle for snapshotting in case of enabling snapshots for tables or after server restart.
The below screenshot shows a dip after server restart and it takes one commit cyle to recover snapshots again.
Screenshot 2024-06-07 at 7 48 58 PM

@codecov-commenter
Copy link

codecov-commenter commented May 31, 2024

Codecov Report

Attention: Patch coverage is 28.57143% with 20 lines in your changes missing coverage. Please review.

Project coverage is 62.11%. Comparing base (59551e4) to head (c0c2785).
Report is 608 commits behind head on master.

Files Patch % Lines
...cal/upsert/BasePartitionUpsertMetadataManager.java 36.36% 14 Missing ⚠️
...a/manager/realtime/RealtimeSegmentDataManager.java 0.00% 5 Missing ⚠️
...org/apache/pinot/spi/config/table/TableConfig.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13285      +/-   ##
============================================
+ Coverage     61.75%   62.11%   +0.35%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2548     +112     
  Lines        133233   139979    +6746     
  Branches      20636    21735    +1099     
============================================
+ Hits          82274    86941    +4667     
- Misses        44911    46447    +1536     
- Partials       6048     6591     +543     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 35.28% <0.00%> (-26.43%) ⬇️
java-21 62.00% <28.57%> (+0.37%) ⬆️
skip-bytebuffers-false 62.09% <28.57%> (+0.34%) ⬆️
skip-bytebuffers-true 61.97% <28.57%> (+34.24%) ⬆️
temurin 62.11% <28.57%> (+0.35%) ⬆️
unittests 62.10% <28.57%> (+0.35%) ⬆️
unittests1 46.67% <0.00%> (-0.22%) ⬇️
unittests2 27.72% <28.57%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tibrewalpratik17 tibrewalpratik17 force-pushed the optimize_snapshotting branch from e30186b to e378495 Compare June 6, 2024 12:48
@tibrewalpratik17 tibrewalpratik17 marked this pull request as ready for review June 6, 2024 18:03
@tibrewalpratik17
Copy link
Contributor Author

cc @klsince @Jackie-Jiang

@tibrewalpratik17 tibrewalpratik17 force-pushed the optimize_snapshotting branch from d5ca6cb to 78f2a2a Compare June 6, 2024 23:27
@tibrewalpratik17 tibrewalpratik17 requested a review from klsince June 8, 2024 18:12
@klsince klsince merged commit d91ad73 into apache:master Jun 11, 2024
@tibrewalpratik17 tibrewalpratik17 deleted the optimize_snapshotting branch June 14, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants