Skip to content

Conversation

@KKcorps
Copy link
Contributor

@KKcorps KKcorps commented Apr 17, 2022

Add support for Spark 3.2.1 in Pinot.

Changes done

  • Create a seperate module for spark 3.x and rename existing to spark 2.x
  • Make spark dependencies as provided
  • Upgrade scala to 2.12 in spark 3.x plugin
  • Shade apache commons dependency

Verified in local with the following command from the documentation

export PINOT_VERSION=0.11.0-SNAPSHOT
export PINOT_DISTRIBUTION_DIR=~/Documents/Developer/pinot/build/

spark-3.2.1-bin-hadoop2.7/bin/spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf "spark.driver.extraJavaOptions=-Dplugins.dir=${PINOT_DISTRIBUTION_DIR}/plugins -Dplugins.include=pinot-csv -Dlog4j2.configurationFile=${PINOT_DISTRIBUTION_DIR}/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=${PINOT_DISTRIBUTION_DIR}/plugins-external/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-${PINOT_VERSION}-shaded.jar:${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar" local://${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar -jobSpecFile ~/Workspace/Spark/spark_job_spec.yml

@codecov-commenter
Copy link

codecov-commenter commented Apr 17, 2022

Codecov Report

Merging #8560 (f347137) into master (e8f9d88) will decrease coverage by 5.68%.
The diff coverage is n/a.

❗ Current head f347137 differs from pull request most recent head c8e04a3. Consider uploading reports for the commit c8e04a3 to get more accurate results

@@             Coverage Diff              @@
##             master    #8560      +/-   ##
============================================
- Coverage     68.56%   62.87%   -5.69%     
  Complexity     4640     4640              
============================================
  Files          1741     1695      -46     
  Lines         91475    89477    -1998     
  Branches      13674    13451     -223     
============================================
- Hits          62724    56263    -6461     
- Misses        24363    29169    +4806     
+ Partials       4388     4045     -343     
Flag Coverage Δ
integration1 ?
unittests1 66.22% <ø> (-0.01%) ⬇️
unittests2 14.13% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...va/org/apache/pinot/core/routing/RoutingTable.java 0.00% <0.00%> (-100.00%) ⬇️
...va/org/apache/pinot/common/config/NettyConfig.java 0.00% <0.00%> (-100.00%) ⬇️
...a/org/apache/pinot/common/metrics/MinionMeter.java 0.00% <0.00%> (-100.00%) ⬇️
...g/apache/pinot/common/metrics/ControllerMeter.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/BrokerQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
.../apache/pinot/common/metrics/MinionQueryPhase.java 0.00% <0.00%> (-100.00%) ⬇️
...ache/pinot/server/access/AccessControlFactory.java 0.00% <0.00%> (-100.00%) ⬇️
...he/pinot/common/messages/TableDeletionMessage.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/core/data/manager/realtime/TimerService.java 0.00% <0.00%> (-100.00%) ⬇️
...pinot/minion/exception/TaskCancelledException.java 0.00% <0.00%> (-100.00%) ⬇️
... and 343 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e8f9d88...c8e04a3. Read the comment docs.

@KKcorps KKcorps marked this pull request as ready for review April 17, 2022 21:15
@KKcorps KKcorps added feature dependencies Pull requests that update a dependency file refactor labels Apr 17, 2022
@KKcorps KKcorps requested review from mayankshriv and xiangfu0 April 17, 2022 21:20
@KKcorps KKcorps changed the title Add support for Spark 3.x [WIP] Add support for Spark 3.x Apr 17, 2022
@KKcorps
Copy link
Contributor Author

KKcorps commented Apr 18, 2022

Addresses #7771

@KKcorps
Copy link
Contributor Author

KKcorps commented May 19, 2022

@xiangfu0 @Jackie-Jiang Seems like adding spark tests might not be possible in spark3.2 module. Reason being the conflicts in commons-lang3 and Jackson version.
The plugin works because of shading which is not enforced during test phase.

@KKcorps KKcorps requested review from Jackie-Jiang and removed request for mayankshriv May 19, 2022 12:06
@xiangfu0
Copy link
Contributor

@xiangfu0 @Jackie-Jiang Seems like adding spark tests might not be possible in spark3.2 module. Reason being the conflicts in commons-lang3 and Jackson version. The plugin works because of shading which is not enforced during test phase.

Then should you update the dependency version for commons-lang3 and Jackson version in the Spark 3.2 module?

@KKcorps
Copy link
Contributor Author

KKcorps commented May 20, 2022

Maven enforcer won't allow that right. Using different versions for a dependency. If it did, I would simply use the required versions here and shade them with spark3 prefix.

@xiangfu0
Copy link
Contributor

Maven enforcer won't allow that right. Using different versions for a dependency. If it did, I would simply use the required versions here and shade them with spark3 prefix.

You need to exclude those dependencies to make enforcer check pass.

@KKcorps
Copy link
Contributor Author

KKcorps commented May 23, 2022

Test works now but dependency section doesn't look nice.

@Jackie-Jiang Jackie-Jiang merged commit af42909 into apache:master Jun 8, 2022
@KKcorps KKcorps added the release-notes Referenced by PRs that need attention when compiling the next release notes label Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file feature refactor release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants