Skip to content

Conversation

@saurabhd336
Copy link
Contributor

@saurabhd336 saurabhd336 commented Sep 9, 2022

#9332

New added controller rest API:

  • POST tables/{tableName}/timeBoundary: Set hybrid table query time boundary based on offline segments' metadata (max end time)
  • DELETE tables/{tableName}/timeBoundary: Remove the time boundary

New added server rest API:

  • GET tables/{tableNameWithType}/allSegmentsLoaded: Validates if the ideal state matches with the segment state on the server

@saurabhd336 saurabhd336 changed the title Allow setting custom time boundary for hybrid table queries Allow setting custom time boundary for hybrid table queries (WIP) Sep 9, 2022
@saurabhd336 saurabhd336 force-pushed the customTimeBoundary branch 3 times, most recently from 1f7f579 to d16f9cc Compare September 12, 2022 06:32
@saurabhd336 saurabhd336 changed the title Allow setting custom time boundary for hybrid table queries (WIP) Allow setting custom time boundary for hybrid table queries Sep 19, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2022

Codecov Report

Merging #9356 (9fc8787) into master (421154f) will decrease coverage by 0.04%.
The diff coverage is 30.89%.

@@             Coverage Diff              @@
##             master    #9356      +/-   ##
============================================
- Coverage     69.96%   69.92%   -0.05%     
- Complexity     5197     5202       +5     
============================================
  Files          1924     1927       +3     
  Lines        102414   102693     +279     
  Branches      15539    15586      +47     
============================================
+ Hits          71655    71806     +151     
- Misses        25695    25830     +135     
+ Partials       5064     5057       -7     
Flag Coverage Δ
integration1 26.03% <23.57%> (+0.05%) ⬆️
integration2 24.64% <23.57%> (+0.04%) ⬆️
unittests1 67.23% <25.00%> (-0.02%) ⬇️
unittests2 15.57% <32.29%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../restlet/resources/TableSegmentValidationInfo.java 0.00% <0.00%> (ø)
...che/pinot/server/api/resources/TablesResource.java 38.88% <0.00%> (-5.10%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 24.32% <ø> (-0.68%) ⬇️
...oller/api/resources/PinotTableRestletResource.java 55.17% <5.76%> (-8.25%) ⬇️
...rg/apache/pinot/server/starter/ServerInstance.java 85.81% <50.00%> (-0.52%) ⬇️
...oker/routing/timeboundary/TimeBoundaryManager.java 91.83% <88.57%> (+0.47%) ⬆️
...che/pinot/broker/routing/BrokerRoutingManager.java 86.07% <100.00%> (ø)
...a/org/apache/pinot/common/metrics/BrokerGauge.java 94.11% <100.00%> (+0.36%) ⬆️
.../pinot/common/function/scalar/ObjectFunctions.java 66.66% <0.00%> (-20.84%) ⬇️
...ion/function/DistinctCountAggregationFunction.java 50.22% <0.00%> (-8.53%) ⬇️
... and 65 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@saurabhd336 saurabhd336 force-pushed the customTimeBoundary branch 2 times, most recently from ab772c7 to 4fb86e6 Compare September 19, 2022 10:36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We shouldn't need to keep them separate. We can directly update the _timeBoundaryInfo when the time is set in IS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured some tricky edge cases with that which could lead to wrong time boundaries being used especially during cases like concurrent segment refreshes along with enforced time boundary being set / unset. This felt like the safest..
We can discuss this offline

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. When the time boundary is explicitly set, we want to skip the _endTimeMsMap updates. Same for refreshSegment()

Copy link
Contributor Author

@saurabhd336 saurabhd336 Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed to correctly update the metric that tracks the difference b/w current time boundary and max(end time across all segments)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We can pass HelixManager through the ServerInstance. We already pass it in the ServerInstance constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest simplifying it. Same for the DELETE API

Suggested change
@Path("table/{tableName}/enforceQueryTimeBoundary")
@Path("table/{tableName}/timeBoundary")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input has to be raw table name, or the table validation will fail. Consider revising the parameter description to be more explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to check the state of the server in IS. For ONLINE, check the CRC; For CONSUMING, check if the segment exist; Skip other states.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest more explicit API, such as /allSegmentsLoaded

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) I feel TIME_BOUNDARY is more concise. To be more specific, HYBRID_TABLE_TIME_BOUNDARY might also be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@saurabhd336 saurabhd336 force-pushed the customTimeBoundary branch 5 times, most recently from b2711c5 to 0480ea1 Compare September 30, 2022 06:09
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static final String QUERY_TIME_BOUNDARY = "HYBRID_TABLE_TIME_BOUNDARY";
public static final String HYBRID_TABLE_TIME_BOUNDARY = "HYBRID_TABLE_TIME_BOUNDARY";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(format) shouldn't remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Comment on lines 533 to 534
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor)

Suggested change
@ApiOperation(value = "Validates if the ideal state matches with the segmentstate on this server", notes =
"Validates if the ideal state matches with the segmentstate on this server")
@ApiOperation(value = "Validates if the ideal state matches with the segment state on this server", notes =
"Validates if the ideal state matches with the segment state on this server")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider throwing proper exception when zkMetadata is null, or it will get NPE here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Path("table/{tableName}/timeBoundary")
@Path("tables/{tableName}/timeBoundary")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) We don't need to synchronize on it because it can only be called in the synchronized method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always want to update the metrics. Currently when user refresh a segment, the metric won't be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we also want to check if the explicitly set flag is the same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) Consider following the same convention of using -1 to represent the invalid value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is quite complicated. It will be good if we can simplify the logic, or put some comments explaining the logic under each if branch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't seem to find a way to simplify this without using 2 different TimeBoundaryInfo instances for explicitly set and derived time boundaries. Have added comments for now.

private final long _timeOffsetMs;
private final Map<String, Long> _endTimeMsMap = new HashMap<>();

private long _explicitlySetTimeBoundaryMs = INVALID_TIME_MS;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be marked @volatile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be because it is always under the synchronized block. Maybe worth adding a comment

@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Oct 4, 2022
@Jackie-Jiang Jackie-Jiang merged commit 79c27e0 into apache:master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants