Skip to content

Conversation

@Jackie-Jiang
Copy link
Contributor

Break FlushThresholdUpdater into 2 steps:

  1. Update the stats with committing segment when segment is fully committed
  2. Get threshold for the new consuming segment

This way pauseless consumption can update the stats properly.

@Jackie-Jiang
Copy link
Contributor Author

cc @KKcorps @9aman

@Jackie-Jiang Jackie-Jiang force-pushed the pauseless_size_based branch from 441f02e to 6d71ea7 Compare March 22, 2025 01:45
@codecov-commenter
Copy link

codecov-commenter commented Mar 22, 2025

Codecov Report

Attention: Patch coverage is 60.22099% with 72 lines in your changes missing coverage. Please review.

Project coverage is 63.58%. Comparing base (59551e4) to head (17469c0).
Report is 1908 commits behind head on master.

Files with missing lines Patch % Lines
.../core/realtime/PinotLLCRealtimeSegmentManager.java 31.14% 39 Missing and 3 partials ⚠️
...egment/SizeBasedSegmentFlushThresholdComputer.java 83.33% 7 Missing and 7 partials ⚠️
...g/apache/pinot/spi/utils/IngestionConfigUtils.java 40.00% 6 Missing and 3 partials ⚠️
...x/core/realtime/PauselessSegmentCompletionFSM.java 0.00% 2 Missing ⚠️
.../helix/core/realtime/SegmentCompletionManager.java 0.00% 2 Missing ⚠️
...r/validation/RealtimeSegmentValidationManager.java 0.00% 1 Missing ⚠️
...ata/manager/realtime/RealtimeTableDataManager.java 0.00% 1 Missing ⚠️
...ealtime/writer/StatelessRealtimeSegmentWriter.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #15347      +/-   ##
============================================
+ Coverage     61.75%   63.58%   +1.83%     
- Complexity      207     1374    +1167     
============================================
  Files          2436     2785     +349     
  Lines        133233   157211   +23978     
  Branches      20636    24137    +3501     
============================================
+ Hits          82274    99962   +17688     
- Misses        44911    49703    +4792     
- Partials       6048     7546    +1498     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.55% <60.22%> (+1.84%) ⬆️
java-21 63.50% <60.22%> (+1.88%) ⬆️
skip-bytebuffers-false 63.58% <60.22%> (+1.83%) ⬆️
skip-bytebuffers-true 63.48% <60.22%> (+35.75%) ⬆️
temurin 63.58% <60.22%> (+1.83%) ⬆️
unittests 63.58% <60.22%> (+1.83%) ⬆️
unittests1 56.09% <33.33%> (+9.20%) ⬆️
unittests2 34.21% <56.90%> (+6.47%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang force-pushed the pauseless_size_based branch from 6d71ea7 to b3fb5e1 Compare March 22, 2025 08:43
}
if (tableConfig.getIndexingConfig() != null && tableConfig.getIndexingConfig().getStreamConfigs() != null) {
return Arrays.asList(tableConfig.getIndexingConfig().getStreamConfigs());
return List.of(tableConfig.getIndexingConfig().getStreamConfigs());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR: getStreamConfigMaps() seems to be performing validations on the streaamConfigMaps in case of multi-stream ingestion.

This should not be done in a getter. The validate function in the TableConfigUtils relies on this to perform these checks when the tableConfig is added/ updated. Ideally, there should be a separate function for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. We should address this in a separate PR

Preconditions.checkState(committingSegmentZKMetadata.getStatus() != Status.DONE,
"Segment status for segment: %s should not be DONE", segmentName);
Status status = committingSegmentZKMetadata.getStatus();
Preconditions.checkState(status == Status.IN_PROGRESS || status == Status.COMMITTING,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a check for pauseless enabled as well in addition to IN_PROGRESS. For pauseless tables, we should not jump directly to DONE from IN_PROGRESS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the pauseless info at this level. The check is already performed in commitSegmentMetadataToDone().

List<PartitionGroupConsumptionStatus> currentPartitionGroupConsumptionStatusList =
offsetsHaveToChange
? Collections.emptyList() // offsets from metadata are not valid anymore; fetch for all partitions
offsetsHaveToChange ? Collections.emptyList()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted (auto-reformat)

// offsets from metadata are not valid anymore; fetch for all partitions
: getPartitionGroupConsumptionStatusList(idealState, streamConfigs);
// FIXME: Right now, we assume topics are sharing same offset criteria
OffsetCriteria originalOffsetCriteria = streamConfigs.get(0).getOffsetCriteria();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use getFirstStreamConfigMap method here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need all configs in other method, so it is better to just read all configs once

Copy link
Contributor

@KKcorps KKcorps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM logic wise! There are a lot of unnecessary formatting changes here though.

@Jackie-Jiang Jackie-Jiang force-pushed the pauseless_size_based branch from f91c40b to 17469c0 Compare March 25, 2025 17:12
@Jackie-Jiang Jackie-Jiang merged commit 3708622 into apache:master Mar 25, 2025
22 checks passed
@Jackie-Jiang Jackie-Jiang deleted the pauseless_size_based branch March 25, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants