-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Allow setting custom time boundary for hybrid table queries #9356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f7f579 to
d16f9cc
Compare
d16f9cc to
6a45644
Compare
Codecov Report
@@ Coverage Diff @@
## master #9356 +/- ##
============================================
- Coverage 69.96% 69.92% -0.05%
- Complexity 5197 5202 +5
============================================
Files 1924 1927 +3
Lines 102414 102693 +279
Branches 15539 15586 +47
============================================
+ Hits 71655 71806 +151
- Misses 25695 25830 +135
+ Partials 5064 5057 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ab772c7 to
4fb86e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We shouldn't need to keep them separate. We can directly update the _timeBoundaryInfo when the time is set in IS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured some tricky edge cases with that which could lead to wrong time boundaries being used especially during cases like concurrent segment refreshes along with enforced time boundary being set / unset. This felt like the safest..
We can discuss this offline
...t-broker/src/main/java/org/apache/pinot/broker/routing/timeboundary/TimeBoundaryManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. When the time boundary is explicitly set, we want to skip the _endTimeMsMap updates. Same for refreshSegment()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed to correctly update the metric that tracks the difference b/w current time boundary and max(end time across all segments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We can pass HelixManager through the ServerInstance. We already pass it in the ServerInstance constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest simplifying it. Same for the DELETE API
| @Path("table/{tableName}/enforceQueryTimeBoundary") | |
| @Path("table/{tableName}/timeBoundary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The input has to be raw table name, or the table validation will fail. Consider revising the parameter description to be more explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to check the state of the server in IS. For ONLINE, check the CRC; For CONSUMING, check if the segment exist; Skip other states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest more explicit API, such as /allSegmentsLoaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) I feel TIME_BOUNDARY is more concise. To be more specific, HYBRID_TABLE_TIME_BOUNDARY might also be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
b2711c5 to
0480ea1
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public static final String QUERY_TIME_BOUNDARY = "HYBRID_TABLE_TIME_BOUNDARY"; | |
| public static final String HYBRID_TABLE_TIME_BOUNDARY = "HYBRID_TABLE_TIME_BOUNDARY"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(format) shouldn't remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
| @ApiOperation(value = "Validates if the ideal state matches with the segmentstate on this server", notes = | |
| "Validates if the ideal state matches with the segmentstate on this server") | |
| @ApiOperation(value = "Validates if the ideal state matches with the segment state on this server", notes = | |
| "Validates if the ideal state matches with the segment state on this server") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider throwing proper exception when zkMetadata is null, or it will get NPE here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @Path("table/{tableName}/timeBoundary") | |
| @Path("tables/{tableName}/timeBoundary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) We don't need to synchronize on it because it can only be called in the synchronized method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always want to update the metrics. Currently when user refresh a segment, the metric won't be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we also want to check if the explicitly set flag is the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) Consider following the same convention of using -1 to represent the invalid value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is quite complicated. It will be good if we can simplify the logic, or put some comments explaining the logic under each if branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't seem to find a way to simplify this without using 2 different TimeBoundaryInfo instances for explicitly set and derived time boundaries. Have added comments for now.
a93441e to
0d92260
Compare
414eaf6 to
9fc8787
Compare
| private final long _timeOffsetMs; | ||
| private final Map<String, Long> _endTimeMsMap = new HashMap<>(); | ||
|
|
||
| private long _explicitlySetTimeBoundaryMs = INVALID_TIME_MS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be marked @volatile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't need to be because it is always under the synchronized block. Maybe worth adding a comment
#9332
New added controller rest API:
POST tables/{tableName}/timeBoundary: Set hybrid table query time boundary based on offline segments' metadata (max end time)DELETE tables/{tableName}/timeBoundary: Remove the time boundaryNew added server rest API:
GET tables/{tableNameWithType}/allSegmentsLoaded: Validates if the ideal state matches with the segment state on the server