Skip to content

Conversation

@saurabhd336
Copy link
Contributor

@saurabhd336 saurabhd336 commented Jun 3, 2022

This PR adds a controller API that can be used to get the current status of a 'reload segments' task. When a segment reload task is submitted, an identifier will be returned, and the task details will be stored into ZK. Users can then call the newly added API with the task ID, to get the status details of the reload task.
Another API has been added to get all historical reload task ids.

Design document: https://docs.google.com/document/d/1Eqn2FDDIhCr8G2JFlifs5FjT0LsVPfpPTpdJIJvorwI/edit?usp=sharing

@saurabhd336 saurabhd336 marked this pull request as draft June 3, 2022 12:00
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach looks good!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider adding a size limit to the cache, and once the entry count reaches the limit, we remove the earliest entries

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I replace this map with a LRU map instead then? I think that might achieve something close to this and we can specify max size when initialising the map

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this handle null status properly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It returns "null" string

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null response need to be properly handled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. The msgId should just be the ZNode id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the getMsgId does fetch the record's zk id but for some reason I was seeing this difference on server and controller side. I'll look into this more and update the PR accordingly if we are able to use msgId itself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this?

Copy link
Contributor Author

@saurabhd336 saurabhd336 Jun 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So helix, when sending a PARTICIPANT message, generates one message per participant to store into ZK. When doing that, it overrides the original message's id with a new random one
https://github.com/apache/helix/blob/master/helix-core/src/main/java/org/apache/helix/messaging/DefaultMessagingService.java#L199

It only retains the fields from the original message

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider storing the task id and reload info (table name / segment name) into a ZK node, and add another API to read the reload tasks in the cluster in case user loses the response of the reload call. We can also put other useful info there, e.g. number of messages sent, which can be included in the status check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that API just returns the list of active reload task ids and description like # of messages sent, table names etc?
Agree that might be useful. But should that list be cleaned up by some sort of periodic task? Or no need?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We may put some default retention to keep the reload info (e.g. 1 day). I don't think we need a periodic task to clean up the ZK node. Instead, for each reload call, when we add new entry, we can clean up the expired ones.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added the API, anc the changes in realod API to store taskMetadata into ZK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I could not find "retention" specification when creating ZNodes. Is there any example already in place where we might be doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've handled retention via explicitly cleaning up old tasks when new ones are created.

@saurabhd336
Copy link
Contributor Author

I've updated the PR to

  1. Add taskMeta to ZK.
  2. Add an API to return all tasks for a table.

Yet to add

  1. Retention on task ZNode
  2. Track status for single segment reload

I was hoping to get the approach validated first, for all segments API. Once we are aligned on this, I'll update the PR to include changes for single segment reload.

cc: @Jackie-Jiang @npawar

@saurabhd336 saurabhd336 changed the title (WIP) Add server API for task status Add controller API for reload segment task status Jun 15, 2022
@saurabhd336 saurabhd336 marked this pull request as ready for review June 15, 2022 05:57
@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2022

Codecov Report

Merging #8828 (0b3fc36) into master (cc8362f) will decrease coverage by 43.94%.
The diff coverage is 67.85%.

❗ Current head 0b3fc36 differs from pull request most recent head 966e46f. Consider uploading reports for the commit 966e46f to get more accurate results

@@              Coverage Diff              @@
##             master    #8828       +/-   ##
=============================================
- Coverage     68.80%   24.86%   -43.95%     
+ Complexity     4747       51     -4696     
=============================================
  Files          1842     1834        -8     
  Lines         97579    97437      -142     
  Branches      14716    14698       -18     
=============================================
- Hits          67141    24228    -42913     
- Misses        25705    70827    +45122     
+ Partials       4733     2382     -2351     
Flag Coverage Δ
integration1 ?
integration2 24.86% <67.85%> (?)
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...oller/api/resources/PinotTableRestletResource.java 27.72% <0.00%> (-36.82%) ⬇️
...segment/local/data/manager/SegmentDataManager.java 0.00% <0.00%> (-83.34%) ⬇️
...va/org/apache/pinot/spi/utils/CommonConstants.java 0.00% <0.00%> (-28.58%) ⬇️
...spi/utils/builder/ControllerRequestURLBuilder.java 0.00% <0.00%> (ø)
...ver/api/resources/ControllerJobStatusResource.java 60.00% <60.00%> (ø)
...ntroller/helix/core/PinotHelixResourceManager.java 31.75% <66.66%> (-35.55%) ⬇️
...ler/api/resources/PinotSegmentRestletResource.java 31.19% <70.10%> (+6.29%) ⬆️
...ache/pinot/common/metadata/ZKMetadataProvider.java 60.11% <100.00%> (-5.43%) ⬇️
...mmon/metadata/controllerjob/ControllerJobType.java 100.00% <100.00%> (ø)
...urces/ServerReloadControllerJobStatusResponse.java 100.00% <100.00%> (ø)
... and 1407 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not keeping per-task level ZK metadata. It might create too many ZNode, and can be easily left over if not handled properly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static String constructPropertyStorePathForTaskResource(String resourceName) {
public static String constructPropertyStorePathForTask(String resourceName) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) revert the empty lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this create a LRU cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@saurabhd336 saurabhd336 force-pushed the reloadStatus branch 2 times, most recently from 4823c44 to 1047e19 Compare June 16, 2022 09:03
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we name this something other than TASKS? Task is already a concept in helix task framework and as a result in minion, so I can see this getting very confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CONTROLLER_JOB?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, there's already a string association between TaskType and minion tasks like MergeRollup, R2O etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract this block into a common util to share across this and the above method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tableName/tableNameWithType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case of a hybrid table, these 2 lines would run twice and always be redundant for one of the tables. Curious, why you chose to not just have tableNameWithType in the params?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reload API accepts tableName and type is optional. Hence thought of keeping it same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But even the reload API, returns 1 jobId per tableType for hybrid table. I think expecting tableNameWithType for this API makes sense. Have changed

Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add segment name & tablename in the log message.

Will not repeat for other log messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe most other APIs have table name and table type as independent parameters. We should keep all APIs that way, and move towards this model if that is not already the case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@mcvsubbu
Copy link
Contributor

Please change the description. This PR does much more than just get reload status. Earlier design for reload was to "fire and forget" (send a reload helix message and let servers handle it as they please). This one (if I understand right) changes the reload command to start a helix task.

have you considered what happens when controller is upgraded to the new version but the servers are running old version, and the reload command is issued? Please evaluate and add that to the PR description, and also mark this PR for release notes

@saurabhd336
Copy link
Contributor Author

@mcvsubbu So nothing about the core functionality wrt reloading segments has been changed here. It's still largely a fire and forget operation, using helix messages that are handled by servers. This PR just associates each reload op with a trackable id, writes some additional meta about the task into ZK, and provides APIs which can be used to 1) Get the status of a given realod op task id 2) Get a historical list of reload tasks.

@mcvsubbu
Copy link
Contributor

@mcvsubbu So nothing about the core functionality wrt reloading segments has been changed here. It's still largely a fire and forget operation, using helix messages that are handled by servers. This PR just associates each reload op with a trackable id, writes some additional meta about the task into ZK, and provides APIs which can be used to 1) Get the status of a given realod op task id 2) Get a historical list of reload tasks.

When you start using new znodes, then we need to worry about upgrades, cleaning up of znodes, etc. Please write up a design document on this before proceeding.

@saurabhd336
Copy link
Contributor Author

When you start using new znodes, then we need to worry about upgrades, cleaning up of znodes, etc. Please write up a design document on this before proceeding.

Sounds good. I'll share the design doc for this.

@saurabhd336
Copy link
Contributor Author

Copy link
Contributor

@npawar npawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changed approach is clever and very light weight. It's just a little odd to think of this as "status of the exact reload id" because there's no correlation anymore between the exact fired reload and the status we're returning.
Are there any other operations that would set that load time? Like a resetSegment? (I think not, but just confirming). Or any other gotchas that need to be called out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jobMetadata or just metadata?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename all "Task" in this file to "Job" (and also in PinotHelixResourceManager)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@saurabhd336
Copy link
Contributor Author

The changed approach is clever and very light weight. It's just a little odd to think of this as "status of the exact reload id" because there's no correlation anymore between the exact fired reload and the status we're returning. Are there any other operations that would set that load time? Like a resetSegment? (I think not, but just confirming). Or any other gotchas that need to be called out?

@npawar Yes and that's a idea with this change. Scenarios like server restarting in the middle of a reload op, a rebalance, etc can lead to this. Earlier with the tight coupling of task id -> status map at the server, there were issues with server restarts etc leading to missing status data at the server. With this approach, success of a reload op has simply been defined as the load time being > reload job submission time.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the reload message, we should also carry the reload submission time so that we don't need to reload the same segment multiple times when multiple reload is submitted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check if the record exists. We can directly get() and check if the return is null. It can also prevent race condition.
When the record exist, we should track the record version (passing a Stat into the get()), and check the expected version when setting the record to prevent race condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Should we be retrying in case of a version mismatch? For now I've not added retries, in line with other version based _propertyStore updates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return new SuccessResponse("Sent " + msgInfo + " reload messages");
return new SuccessResponse("Sent " + msgInfo.getLeft() + " reload messages");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to return the job id as part of the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's change the response message to include both job id and the messages sent, something like "Submitted reload job: %s, sent %d reload messages"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Added both the job id as well as success / failure status of ZK write

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest reflecting this back to the user, instead of just logging an error and return success. User won't be able to read the reload status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Modified the response for this API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor) _loadTimeMs for simplicity, same for the getter and setter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then let's change the response message to include both job id and the messages sent, something like "Submitted reload job: %s, sent %d reload messages"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the response more clear to the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Changed this to return both job id as well as zk meta write success / failure status

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create one ZNode per table, which is too much IMO. We are just storing the recent controller jobs in the ZNode, and there shouldn't be a lot of them running in parallel, so we can have one single ZNode for all the tables. Keeping one single node will help track all the running jobs, and also clean up the old jobs.
We can also consider having one node per job type to isolate the metadata for different jobs. All the entries for the same job type should have the same format, which is easier to manage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I suppose since the tableName is part of the meta being stored itself and the jobId is guaranteed to be unique, we need not have to keep tableName as a key? I've flattened it out

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall logic looks good.
We can consider adding a test into the OfflineClusterIntegrationTest where we have several places reloading the table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) We usually put : for easier searching purpose, same for other places

Suggested change
LOGGER.error("Failed to add reload segment job meta into zookeeper for table {}, segment {}",
LOGGER.error("Failed to add reload segment job meta into zookeeper for table: {}, segment: {}",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Comment on lines 1992 to 1999
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save one redundant ZK access and also avoid race condition, same for other places

Suggested change
if (_propertyStore.exists(controllerJobResourcePath, AccessOption.PERSISTENT)) {
ZNRecord taskResourceZnRecord = _propertyStore.get(controllerJobResourcePath, null, -1);
ZNRecord controllerJobRecord = _propertyStore.get(controllerJobResourcePath, null, -1);
if (controllerJobRecord != null) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helix may throw a ZkNoNodeException exception if options = -1. So I've handled that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate with @Nullable.
Let's add some javadoc for the public method. Same for other public methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added javadoc. jobId must not be be null?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line into addNewReloadSegmentJob() and addNewReloadAllSegmentsJob() to be more readable. We want to keep the metadata creation part together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
* Task ZK props
* Controller job ZK props

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one applies to all jobs, instead of just the reload jobs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use camel case as the key name in the ZNode. We may also simplify the key as it is always under the context of controller job. E.g. jobType, jobId, tableName, submissionTimeMs, messagesCount, reloadSegmentName

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle this properly on the controller side? Should we return 0/0 instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With controller now filtering servers for the segment being queried using idealstate, this should be far less frequent.
And yes the controller makes the API calls using CompletionServiceHelper which handles non 200 responses correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel returning 0/0 might be preferred so that we can differentiate server not responding vs server not having the segment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to return 0 / 0

@saurabhd336 saurabhd336 force-pushed the reloadStatus branch 2 times, most recently from e0ca1a4 to 5f0e293 Compare July 27, 2022 06:34
@Jackie-Jiang Jackie-Jiang added feature release-notes Referenced by PRs that need attention when compiling the next release notes labels Aug 2, 2022
@Jackie-Jiang Jackie-Jiang merged commit 3d2b6f3 into apache:master Aug 2, 2022
@Jackie-Jiang
Copy link
Contributor

Let's add a release-notes section in the PR description to list down the new rest APIs added

Comment on lines +669 to +670
ServerReloadControllerJobStatusResponse response =
JsonUtils.stringToObject(responseString, ServerReloadControllerJobStatusResponse.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using ServerReloadControllerJobStatusResponse.class in JsonUtil instead of SegmentReloadStatusValue.class ? does it guarantee to be always compatible? this seems like a risky assumption consider controller and server are upgraded separately during rolling upgrade

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this is risky assumption. We can change this to SegmentReloadStatusValue.class
But, I think since the agreed upon upgrade sequence is controller -> broker -> servers, if new fields are added to SegmentReloadStatusValue, in the controller we'd always need to ensure the newer fields are null checked before accessing. Since during upgrade server can be on older version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants