Skip to content

Conversation

@itschrispeck
Copy link
Contributor

@itschrispeck itschrispeck commented Mar 28, 2024

Motivation/implementation doc: https://docs.google.com/document/d/1Leo4mQvR-6Gscseq50oqSnddX8Y3up6fnysKaly-eAU/edit?usp=sharing

Internally we've seen roughly 40-60% improvement in overall segment build time. The lower peaks are from a table/tenant with this change, the higher ingestion delay peaks are from an identical table in a tenant without this change:

image

Testing: deployed internally, local testing, validated basic pause/restart/reload operations on a table to ensure no regression in TextIndexHandler index build.

tags: ingestion performance

@codecov-commenter
Copy link

codecov-commenter commented Mar 28, 2024

Codecov Report

Attention: Patch coverage is 86.20690% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 62.24%. Comparing base (59551e4) to head (2bddff1).
Report is 298 commits behind head on master.

Files Patch % Lines
...ment/creator/impl/text/LuceneTextIndexCreator.java 86.95% 7 Missing and 2 partials ⚠️
...me/impl/invertedindex/RealtimeLuceneTextIndex.java 42.85% 4 Missing ⚠️
...ot/segment/spi/creator/SegmentGeneratorConfig.java 50.00% 2 Missing ⚠️
.../pinot/segment/spi/index/mutable/MutableIndex.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #12744      +/-   ##
============================================
+ Coverage     61.75%   62.24%   +0.49%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2503      +67     
  Lines        133233   136497    +3264     
  Branches      20636    21117     +481     
============================================
+ Hits          82274    84964    +2690     
- Misses        44911    45248     +337     
- Partials       6048     6285     +237     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.18% <86.20%> (+0.47%) ⬆️
java-21 62.10% <86.20%> (+0.48%) ⬆️
skip-bytebuffers-false 62.22% <86.20%> (+0.47%) ⬆️
skip-bytebuffers-true 62.08% <86.20%> (+34.35%) ⬆️
temurin 62.24% <86.20%> (+0.49%) ⬆️
unittests 62.24% <86.20%> (+0.49%) ⬆️
unittests1 46.70% <10.34%> (-0.19%) ⬇️
unittests2 28.02% <75.86%> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@itschrispeck itschrispeck marked this pull request as draft March 28, 2024 23:24
@itschrispeck itschrispeck marked this pull request as ready for review April 1, 2024 19:54
* @param commit true if the index should be committed (at the end after all documents have
* been added), false if index should not be committed
* been added), false if index should not be
* @param sortedDocIds sortedDocIds from segment conversion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as my earlier comment on the naming. Also since this is a public method, we need more explanation and some example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just log the error e?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't quite follow this, can you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant LOGGER.err("Failed.. .", _column, e);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments in line 65-66 seem to be self-conflicting: commit to disk is just one possible action? the implementation can actually choose other action for preparation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commit to disk should be the main mechanism for index reuse for any indexes using this path - disk is an intermediary. Other actions refers to actions a MutableIndex implementation might want to perform before writing some artifacts to disk

@itschrispeck itschrispeck requested a review from chenboat April 12, 2024 02:12
@klsince
Copy link
Contributor

klsince commented Jul 9, 2024

hi @itschrispeck is there a feature flag to disable this optimization?

We found the heap usage got higher than before after upgrading to recent code. From server logs, we found "Reusing the realtime lucene index for segment" which led me here. And from heap dump, we found lucene.index.SegmentCommitInfo was a top consumer of heap space, and they were mainly referenced by RealtimeLuceneTextIndex.

num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       5481198      686979272  [B ([email protected])
   2:       9944219      477322512  java.util.HashMap ([email protected])
   3:       4496967      467684568  org.apache.lucene.index.SegmentCommitInfo
   4:       8873625      283956000  java.util.HashMap$Node ([email protected])
...

We didn't config TextIndex for those tables across the upgrades, so I'd assume this optimization was enabled by default. So I'd like to check if there is a feature flag to disable this, so we can validate if this had caused the higher heap usage.

If there is no such feature flag, I can try to add one for your review. I'm thinking to add one around here

// Optimization for realtime segment conversion
    if (dataSource instanceof RealtimeSegmentSegmentCreationDataSource) { <--- looks like hard coded to enable this optimization when committing mutable segment
      _config.setRealtimeConversion(true);
      _config.setConsumerDir(((RealtimeSegmentSegmentCreationDataSource) dataSource).getConsumerDir());
    }

cc @Jackie-Jiang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants