-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Cache Deleted Segment Names in Server to Avoid SegmentMissingError #9423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| */ | ||
| @Nullable | ||
| protected SegmentDataManager registerSegment(String segmentName, SegmentDataManager segmentDataManager) { | ||
| SegmentDataManager oldSegmentDataManager = _segmentDataManagerMap.put(segmentName, segmentDataManager); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may wanna make _segmentDataManagerMap private so concrete classes that extend BaseTableDataManager use the map only via the provided methods here.
Codecov Report
@@ Coverage Diff @@
## master #9423 +/- ##
============================================
+ Coverage 69.73% 69.78% +0.05%
- Complexity 5087 5094 +7
============================================
Files 1890 1890
Lines 100743 100780 +37
Branches 15349 15351 +2
============================================
+ Hits 70252 70333 +81
+ Misses 25506 25450 -56
- Partials 4985 4997 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
jackjlli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's a legitimate behavior that server finished segment deletion before controller updates the EV, as controller won't update it until it receives the message from servers saying the state transition is completed, not to mention brokers listen to the notification of EV changes to update routing table, which is the very last step.
General speaking I think the better way (or best practice) is to avoid querying the very last pieces of segments, by adding the time column in your filter clauses of the queries. If the Pinot table you are querying is a hybrid table, the oldest realtime segments won't be queried because of the correct time boundary maintained internally. Thus we won't see such exception even if we enable the exception returned feature that you implemented. Similarly, if we only query the realtime only or offline only table, we shouldn't query the oldest segments, as the behavior of them is harder to predict in a distributed world. Adding such cache will definitely work while it seems more like a hack to get around with it.
pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
Outdated
Show resolved
Hide resolved
| private final static String[] REQUIRED_KEYS = {INSTANCE_ID, INSTANCE_DATA_DIR, READ_MODE}; | ||
| private static final long DEFAULT_ERROR_CACHE_SIZE = 100L; | ||
| private static final int DEFAULT_DELETED_SEGMENTS_CACHE_SIZE = 10_000; | ||
| private static final int DEFAULT_DELETED_SEGMENTS_CACHE_TTL_MINUTES = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the state transition doesn't take too much time (at most sub-seconds) on spreading the helix messages and notifications. If it really needs 10 minutes to converge, it probably means there is something wrong in your cluster and needs to be fixed.
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
Outdated
Show resolved
Hide resolved
This is discussed in #9371. It doesn't handle the scenario where a segment is manually deleted. Requiring users to add filters in the query isn't tenable, and the alternative is to automatically prune such segments in the Broker. But that's not doable right now since Broker doesn't watch TableConfig changes (retention may be changed by users). |
I don't think it's a good idea to manually delete segments when segments are being queried. The better way is to disable them before doing the deletion.
I'm not saying we should ask users to do that but your client system append the predicate in the filter clause.
I'm not sure I fully understand your statement here. Pruning segments has nothing to do with table config changes. And in fact table config is already cached in broker, and there is a watcher in TableCache to listen to the table config changes in ZK. |
For instance when a user changes the retention, it may take a while for the changes to be reflected in Broker segment pruners. @Jackie-Jiang: I think you mentioned that this scenario may cause an issue since Broker doesn't listen to table-config changes. Can you share those concerns here? |
pinot-core/src/main/java/org/apache/pinot/core/data/manager/BaseTableDataManager.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
Outdated
Show resolved
Hide resolved
pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
Outdated
Show resolved
Hide resolved
jackjlli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor but LGTM. Thanks for addressing it!
|
|
||
| long getStreamSegmentDownloadUntarRateLimit(); | ||
|
|
||
| int getDeletedSegmentsCacheSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: getRecentlyDeletedSegmentsCacheSize()
|
|
||
| int getDeletedSegmentsCacheSize(); | ||
|
|
||
| int getDeletedSegmentsCacheTtlMinutes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
…pache#9423) * Cache Deleted Segment Names in Server to Avoid SegmentMissingError * self-review: remove unneeded call + change type for numMissingSegments * Address feedback * Add UT + Address Feedback
Added a UT. fixes #9371
cc: @chenboat @yupeng9 @Jackie-Jiang