Skip to content

[Spark] Instrument SparkSubmit to capture errors outside of SparkListener interface#5505

Merged
paul-laffon-dd merged 2 commits intomasterfrom
paul.laffon/spark-capture-errors
Jul 3, 2023
Merged

[Spark] Instrument SparkSubmit to capture errors outside of SparkListener interface#5505
paul-laffon-dd merged 2 commits intomasterfrom
paul.laffon/spark-capture-errors

Conversation

@paul-laffon-dd
Copy link
Copy Markdown
Contributor

What Does This Do

Instrument the SparkSubmit.runMain(...) function to capture all errors occurring in a spark application

This SparkSubmit class is used by spark to launch the application in non YARN/Mesos environnements. The YARN case is already covered by instrumenting the ApplicationMaster.finish() function (from this PR #5267)

Motivation

We are currently using the interface SparkListener to receive events from spark. However, this interface is only capturing errors that occur during the spark computations, but not in arbitrary user code

def main() = {
  val spark = SparkSession.builder.getOrCreate()
  val df = spark.read.parquet("...")
  
  // Exception will be captured because it's happening during spark computations
  df.map(x => new RuntimeException("Some exception"))

  // Exception currently not captured because it is thrown outside of spark computations
  throw new RuntimeException("Some exception")
}

@paul-laffon-dd paul-laffon-dd force-pushed the paul.laffon/spark-capture-errors branch from 5e6f0f4 to 1b86749 Compare June 30, 2023 15:03
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Jun 30, 2023

Benchmarks

Parameters

Baseline Candidate
commit 1.18.0-SNAPSHOT~7b156c3fe5 1.18.0-SNAPSHOT~bea7977165
config baseline candidate
See matching parameters
Baseline Candidate
module Agent Agent
parent None None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 22 cases.

Comment on lines +192 to +197
span {
operationName "spark.application"
resourceName "spark.application"
spanType "spark"
errored true
parent()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you test span tags like error message and type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those can be tested to validate we are catching the right exception. The small caveat is if spark changes this error type / message in future versions, but this should be minor

@PerfectSlayer PerfectSlayer added type: enhancement Enhancements and improvements inst: apache spark Apache Spark instrumentation labels Jun 30, 2023
@paul-laffon-dd paul-laffon-dd marked this pull request as ready for review July 3, 2023 09:36
@paul-laffon-dd paul-laffon-dd requested a review from a team as a code owner July 3, 2023 09:36
Copy link
Copy Markdown
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Don't forget to squash before merging 👍

@paul-laffon-dd paul-laffon-dd merged commit 9c0017f into master Jul 3, 2023
@paul-laffon-dd paul-laffon-dd deleted the paul.laffon/spark-capture-errors branch July 3, 2023 13:33
@github-actions github-actions Bot added this to the 1.18.0 milestone Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants