Skip to content

Conversation

@KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Sep 16, 2022

Currently, if an user specifies an incorrect datetime format accidentally, we don't allow any way to fix the segment start and end time in zookeeper metadata.
This creates problem when the end times lie before the retention interval in which case those segments get removed. It also creates problem in case of time pruning during queries.

The PR aims to introduce a new API to address this issue. It is still Work in progress.

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 8957120 to 8b1088e Compare September 16, 2022 16:45
@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from c6a638f to 3db9e6c Compare September 20, 2022 10:40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw exception when table config is not available

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 3db9e6c to 466395b Compare September 22, 2022 11:19
@KKcorps KKcorps changed the title [WIP] Add a new API to fix segment date time in metadata Add a new API to fix segment date time in metadata Sep 23, 2022
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2022

Codecov Report

Merging #9413 (90adbff) into master (69722b0) will decrease coverage by 34.58%.
The diff coverage is 48.33%.

@@              Coverage Diff              @@
##             master    #9413       +/-   ##
=============================================
- Coverage     69.83%   35.25%   -34.59%     
+ Complexity     4718      189     -4529     
=============================================
  Files          1902     1915       +13     
  Lines        101521   102049      +528     
  Branches      15411    15479       +68     
=============================================
- Hits          70899    35973    -34926     
- Misses        25623    63022    +37399     
+ Partials       4999     3054     -1945     
Flag Coverage Δ
integration1 26.13% <20.00%> (+0.13%) ⬆️
integration2 ?
unittests1 ?
unittests2 15.52% <38.33%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ler/api/resources/PinotSegmentRestletResource.java 20.00% <0.00%> (-18.71%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 0.00% <ø> (-25.00%) ⬇️
...not/common/metadata/segment/SegmentZKMetadata.java 81.04% <66.66%> (-5.35%) ⬇️
...ot/controller/helix/core/util/ZKMetadataUtils.java 87.87% <75.00%> (-10.04%) ⬇️
...ntroller/helix/core/PinotHelixResourceManager.java 70.05% <100.00%> (-1.55%) ⬇️
...in/java/org/apache/pinot/spi/utils/BytesUtils.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/BaseRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...java/org/apache/pinot/spi/trace/NoOpRecording.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/table/FSTType.java 0.00% <0.00%> (-100.00%) ⬇️
...ava/org/apache/pinot/spi/config/user/RoleType.java 0.00% <0.00%> (-100.00%) ⬇️
... and 1151 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch 2 times, most recently from ed749e4 to 9650f8e Compare September 25, 2022 03:57
@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 9650f8e to 99ef712 Compare September 25, 2022 08:48
if (segmentMetadata.getTimeInterval() != null) {
segmentZKMetadata.setStartTime(segmentMetadata.getStartTime());
segmentZKMetadata.setEndTime(segmentMetadata.getEndTime());
segmentZKMetadata.setTimeUnit(segmentMetadata.getTimeUnit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can change this to always put MILLISECONDS as the time unit (convert start/end time using the time unit), then we no longer need to worry about time unit when updating the time interval

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason why I'm suggesting this is that there is one corner case: if the original date time format is in SECONDS, then changed to MILLISECONDS, if we keep the time unit as the old one, we might lose precision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Closing the PR #9472. I haven't removed the logic to convert timestamp using timeUnit since I am not sure about backward compatibility

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from b4b9d7a to 440f6d3 Compare September 27, 2022 06:05
@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 440f6d3 to f9e5bb1 Compare September 27, 2022 06:32
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Sep 28, 2022
@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 9020520 to 965d857 Compare September 28, 2022 05:45
segmentZKMetadata.setStartTime(segmentMetadata.getStartTime());
segmentZKMetadata.setEndTime(segmentMetadata.getEndTime());
segmentZKMetadata.setTimeUnit(segmentMetadata.getTimeUnit());
long startTimeMs = TimeUnit.MILLISECONDS.convert(segmentMetadata.getStartTime(), segmentMetadata.getTimeUnit());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) start/end time can be easily read from the time interval without converting again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 952d495 to cf42232 Compare September 29, 2022 00:27
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from 90adbff to 4d335f5 Compare September 29, 2022 02:51
@KKcorps KKcorps force-pushed the fix_segment_datetime_api branch from e56cf26 to 5427fa8 Compare September 29, 2022 09:52
@KKcorps KKcorps merged commit d2619c1 into apache:master Sep 29, 2022
61yao pushed a commit to 61yao/pinot that referenced this pull request Oct 3, 2022
* Add API to handle change in timestamp format

* Fix linting

* Add support for storing raw segment start and end time in metadata

* Do not store start end time in segment metadata seperately

* remove refresh flag

* Store new start/end time with proper time unit

* Move the API to segment resource

* Fix test failure

* Add test

* Cleanup: Remove duplicate methods

* Store start/end time in milliseconds in zookeeper

* Refactor: change method names and reduce scope of exceptions

* Remove redundant timeunit conversion

* Throw user errors and check for time column

* Fix Segment tests

Co-authored-by: Kartik Khare <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants