Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Apr 5, 2022

Description

  1. Add AdhocTaskConfig definition
  2. Add Adhoc minion task creation endpoint
  3. Add the implementation of SegmentGenerationAndPushTask for Adhoc task creation

Sample requests:

{
   "taskType": "SegmentGenerationAndPushTask",
   "tableName": "myTable",
   "taskName": "myTask-0",
   "taskConfigs": {
     "inputDirURI": "s3://my-bucket/my-file.json",
     "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
     "input.fs.prop.accessKey": "<aws-access-key>",
     "input.fs.prop.secretKey": "<aws-secret-key>",
     "input.fs.prop.region": "us-west-2"
   }
 }

Release Notes

Adding Adhoc minion task execution endpoint in Pinot controller /tasks/execute.

Documentation

@mcvsubbu
Copy link
Contributor

mcvsubbu commented Apr 5, 2022

Please give me a couple of days, I would like to review this.

Meanwhile, if you can please mention some use cases I would appreciate it.

@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch from b1068c7 to 471168c Compare April 5, 2022 17:09
@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Apr 5, 2022

Please give me a couple of days, I would like to review this.

Meanwhile, if you can please mention some use cases I would appreciate it.

Thanks, @mcvsubbu, In short, the goal here is to add a new public endpoint to schedule ad-hoc minion tasks.

Current minion tasks require all tasks configs to be part of table configs, it makes sense for regularly scheduled jobs, but when I want to quickly test something, it's hard for me to bring in data using minion.

Once this feature is in place, we can also implement the feature of SQL INSERT INTO with async data insertion using pinot in the query console. This is to simplify the ingestion process and make pinot more DBA friendly.

E.g.

INSERT INTO [db.]table [(c1, c2, c3)] FROM INFILE file_name [COMPRESSION type] FORMAT format_name;

or

LOAD LABEL [db.]task (
     DATA INFILE("s3://my-s3-bucket/*/*.parquet") INTO TABLE `table`
)
WITH MINION minionTag (
  "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
  "input.fs.prop.accessKey": "<my-access-key>",
  "input.fs.prop.secretKey": "<my-secret-key>",
  "input.fs.prop.region": "us-west-2"
);

@codecov-commenter
Copy link

codecov-commenter commented Apr 5, 2022

Codecov Report

Merging #8465 (862ad5c) into master (bb46e46) will increase coverage by 41.34%.
The diff coverage is 13.15%.

@@              Coverage Diff              @@
##             master    #8465       +/-   ##
=============================================
+ Coverage     29.21%   70.55%   +41.34%     
- Complexity        0     4287     +4287     
=============================================
  Files          1662     1679       +17     
  Lines         87290    87873      +583     
  Branches      13236    13305       +69     
=============================================
+ Hits          25501    61999    +36498     
+ Misses        59467    21557    -37910     
- Partials       2322     4317     +1995     
Flag Coverage Δ
integration1 26.93% <6.57%> (-0.01%) ⬇️
integration2 25.98% <6.57%> (+0.11%) ⬆️
unittests1 67.04% <71.42%> (?)
unittests2 14.06% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...roller/api/exception/NoTaskScheduledException.java 0.00% <0.00%> (ø)
...ller/api/exception/TaskAlreadyExistsException.java 0.00% <0.00%> (ø)
...roller/api/exception/UnknownTaskTypeException.java 0.00% <0.00%> (ø)
...roller/api/resources/PinotTaskRestletResource.java 0.00% <0.00%> (ø)
...controller/helix/core/minion/PinotTaskManager.java 58.17% <0.00%> (+30.07%) ⬆️
...helix/core/minion/generator/BaseTaskGenerator.java 85.71% <0.00%> (+15.71%) ⬆️
...elix/core/minion/generator/PinotTaskGenerator.java 33.33% <ø> (ø)
...andpush/SegmentGenerationAndPushTaskGenerator.java 2.16% <0.00%> (-0.78%) ⬇️
...lix/core/minion/PinotHelixTaskResourceManager.java 41.11% <66.66%> (+0.64%) ⬆️
.../apache/pinot/spi/config/task/AdhocTaskConfig.java 83.33% <83.33%> (ø)
... and 1185 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb46e46...862ad5c. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when this API would be used? I see /tasks/{taskType}/taskstates is there.

Copy link
Contributor Author

@xiangfu0 xiangfu0 Apr 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have one API to show all the task states, across all task types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do that in the client? (i.e. fetch all tasks and then iterate through the types to get status)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal eventually is to convert those API responses to an in-memory table and you can run the below queries to see job status.

SHOW LOAD;
SHOW LOAD WHERE taskName = 'mytask-01';
SHOW LOAD WHERE state = 'CANCELLED';
SHOW LOAD WHERE state = 'LOADING';

I feel we may need a better data structure to hold the task status, e.g. job progress(12/35 COMPLETED, 5/35 IN_PROGRESS ), exception: xxxxx, time usage: 12mins, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this API for now, we can add it later on with a better understanding of the task information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do that in the client? (i.e. fetch all tasks and then iterate through the types to get status)?

@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch from 85138ea to 972f339 Compare April 6, 2022 00:21
@xiangfu0 xiangfu0 requested a review from mcvsubbu April 6, 2022 03:51
Copy link
Contributor

@klsince klsince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiangfu0
Copy link
Contributor Author

xiangfu0 commented Apr 7, 2022

@mcvsubbu can you take a look again ?

@xiangfu0 xiangfu0 requested a review from Jackie-Jiang April 8, 2022 01:22
@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch 3 times, most recently from 84f3f66 to 326ede3 Compare April 8, 2022 20:56
@xiangfu0 xiangfu0 added the release-notes Referenced by PRs that need attention when compiling the next release notes label Apr 8, 2022
@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch 2 times, most recently from 126f26b to 6b3db0a Compare April 8, 2022 22:27
@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch 2 times, most recently from a4e54b8 to 1f53ee1 Compare April 11, 2022 23:44
1. Add adhoc task creation endpoint for PinotTaskRestletResource endpoint
2. Add adhoc minion task submission endpoint and the implementation of SegmentGenerationAndPushTask
@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch 4 times, most recently from ff26801 to 5da7f98 Compare April 12, 2022 18:04
@xiangfu0 xiangfu0 requested a review from mcvsubbu April 12, 2022 18:24
Copy link
Contributor

@mcvsubbu mcvsubbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor change suggested.
thank you

@xiangfu0 xiangfu0 force-pushed the adhoc_minion_task branch from 5da7f98 to 862ad5c Compare April 13, 2022 21:09
@xiangfu0
Copy link
Contributor Author

Minor change suggested. thank you

Thanks for your comments!

@xiangfu0 xiangfu0 merged commit 58ffe94 into apache:master Apr 13, 2022
@xiangfu0 xiangfu0 deleted the adhoc_minion_task branch April 13, 2022 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants