Skip to content

Conversation

@piaozhexiu
Copy link

This patch attaches a signal handler for SIGINT to JobWaiter when a job is running and detaches it as soon as the job is finished. When ctrl-c is pressed during this time, spark-shell cancels the running job.

It's worth noting that signal handler is only overridden when job is running. So if ctrl-c is pressed when spark-shell is idle, it will still interrupt JVM and exit the shell (same as now).

In addition, if ctrl-c is pressed multiple times when job is running, it will interrupt JVM and exit the shell. This is to avoid any case that the shell becomes unresponsive. For eg, consider this scenario. User submits a job in the shell. But since the job lists a lot of files before running, it might be hanging in the driver. In this case, user can interrupt JVM and exit the shell by ctrl-c'ing twice.

Here is summary-

# case before after
1 shell is idle ctrl-c will exit the shell ctrl-c will exit the shell
2 job is submitted but not yet running (e.g. driver listing input files) ctrl-c will exit the shell ctrl-c will exit the shell [a]
3 job is submitted and running on cluster ctrl-c will exit the shell ctrl-c will not exit the shell but cancel the job [b]

This patch basically improves case 3 by allowing the user to cancel a running job on cluster w/o exiting the shell.

[a] 1st ctrl-c will print a log message saying Cancelling running job.. This might take some time, so be patient. Press Ctrl-C again to kill JVM. 2nd ctrl-c will exit the shell.
[b] when job is running, ctrl-c immediately cancels the job, and the prompt is returned to user.

@piaozhexiu piaozhexiu changed the title [SPARK-10001] [CORE] Ctrl-C in spark-shell doesn't kill running job [SPARK-10001] [CORE] Allow Ctrl-C in spark-shell to kill running job Aug 14, 2015
@piaozhexiu piaozhexiu closed this Aug 14, 2015
@piaozhexiu piaozhexiu reopened this Aug 14, 2015
@piaozhexiu
Copy link
Author

Strange error in jenkins. Build doesn't start for some reason.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40927 timed out for PR 8216 at commit 7342f1b after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #1614 has finished for PR 8216 at commit c8a3ba8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StringIndexerModel (

@dbtsai
Copy link
Member

dbtsai commented Aug 15, 2015

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Aug 15, 2015

Test build #40955 has finished for PR 8216 at commit c8a3ba8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dependency can cause both Runtime and Compile time problems, if the used JVM/compiler does not provide these classes. Does this work with other JVM implementations, such as OpenJDK and IBM JDK?

I am in favor of this functionality, but I think we should be very careful using platform specific code. Luckily this is not the first time people had to solve this problem. JLine solves this in the following way:
https://github.com/jline/jline2/blob/master/src/main/java/jline/console/ConsoleReader.java#L257-L284

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell thank you very much for your comment. I understand your concern.

But Apache Hive and Facebook Presto both import sun.misc.Signal/SignalHander here and here. I am positive they're heavily tested against any other JDKs.

In addition, JLine implementation that you're referring to also use the same package via reflection in the end. All it does is adding some safety guard for the case sun.misc.Signal/SignalHander might not exist, which is unlikely in any major JDKs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really think there is an easy way around the sun.misc.Signal/SignalHander approach.

I have to admit that my concern is moot if all major implementations support this (can anyone confirm this?). I just like the JLine approach better because this doesn't cause Compile/Run-time errors when classes cannot be found, it will just default to the current behavior.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell sure. I can update my patch to handle the case when sun.misc.Signal/SignalHander classes don't exit. Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell I tried what you suggested here, but that makes jenkins build time out. Seems Java reflection adds overheads, so I had to revert it.

@piaozhexiu piaozhexiu force-pushed the SPARK-10001 branch 2 times, most recently from 9b1cca9 to 1cae5da Compare August 17, 2015 06:54
@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41011 timed out for PR 8216 at commit 1cae5da after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41025 has finished for PR 8216 at commit 9f6507c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 17, 2015

Test build #41037 has finished for PR 8216 at commit 166e9f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@piaozhexiu
Copy link
Author

Exception encountered when attempting to run a suite with class name: org.apache.spark.deploy.yarn.YarnClusterSuite *** ABORTED *** (1 second, 891 milliseconds)
  org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: ResourceManager failed to start. Final state is STOPPED

The test failure is because mini YARN cluster failed to start the RM. I'll trigger another run of tests.

@piaozhexiu piaozhexiu force-pushed the SPARK-10001 branch 2 times, most recently from e0cd5dd to c03bf07 Compare August 18, 2015 01:19
@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41083 has finished for PR 8216 at commit c03bf07.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StringIndexerModel (
    • class ElementwiseProduct(JavaTransformer, HasInputCol, HasOutputCol):
    • implicit class StringToColumn(val sc: StringContext)
    • case class FilterNode(condition: Expression, child: LocalNode) extends UnaryLocalNode
    • abstract class LocalNode extends TreeNode[LocalNode]
    • abstract class LeafLocalNode extends LocalNode
    • abstract class UnaryLocalNode extends LocalNode
    • case class ProjectNode(projectList: Seq[NamedExpression], child: LocalNode) extends UnaryLocalNode
    • case class SeqScanNode(output: Seq[Attribute], data: Seq[InternalRow]) extends LeafLocalNode
    • public final class UTF8String implements Comparable<UTF8String>, Externalizable

@piaozhexiu
Copy link
Author

Hmm, not sure why a test case fails in YarnClusterSuite, but it seems failing constantly. I'll investigate it.

@piaozhexiu
Copy link
Author

I believe the YarnClusterSuite failure is fixed by SPARK-10059 because I see the same error message (java.lang.NoClassDefFoundError: javax/servlet/jsp/JspFactory) in the log.

I'll rebase my patch to trigger another run of unit tests.

@SparkQA
Copy link

SparkQA commented Aug 18, 2015

Test build #41148 has finished for PR 8216 at commit d3eabf0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #41991 has finished for PR 8216 at commit d3eabf0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42012 has finished for PR 8216 at commit cb3b73f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this into a final block?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davies thank you for reviewing!

My intention for Try was to tolerate any error while detaching signal handler since this is on a best effort basis. For eg, if attachSignalHandler fails for some reason, detachSignalHandler will fail too.

I rewrote this code using Try, Success, and Failure as follows-

  val attachTry = Try(attachSigintHandler())
  while (!_jobFinished) {
    this.wait()
  }
  attachTry match {
    case _: Success[_] => detachSigintHandler()
    case _: Failure[_] => // Ignore error. Signal handler is on a best effort basis.
  }

Hopefully it is more clear now.

@davies
Copy link
Contributor

davies commented Sep 10, 2015

@piaozhexiu This is cool, thanks! I also want this for Python for a long time (could be a separate PR).

@davies
Copy link
Contributor

davies commented Sep 10, 2015

@SparkQA
Copy link

SparkQA commented Sep 12, 2015

Test build #42373 has finished for PR 8216 at commit 1ba208a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class IntersectNode(conf: SQLConf, left: LocalNode, right: LocalNode)
    • case class SampleNode(
    • case class TakeOrderedAndProjectNode(

@andrewor14
Copy link
Contributor

retest this please
@davies we still want this right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened if the current thread is not MainThread ?

@nezihyigitbasi
Copy link
Contributor

@andrewor14 @davies guys any plans to get this in?

@hvanhovell
Copy link
Contributor

@piaozhexiu could you update this PR?

@piaozhexiu
Copy link
Author

@hvanhovell @nezihyigitbasi Looks like JobWaiter has changed since. I haven't actively followed Spark recently. Can you guys take over from here? Thanks!

@holdenk
Copy link
Contributor

holdenk commented Apr 19, 2016

cc @jodersky would this be something you would be interested in continuing on with?

@jodersky
Copy link
Member

That would indeed be a cool feature to have! I'll give it a shot

@jodersky
Copy link
Member

jodersky commented Apr 20, 2016

Should signal handling really be implemented in a JobWaiter? I'm afraid that implementing it there could have unwanted consequences for non-shell operation: Imagine implementing any custom, standalone program, in that case it should terminate on reception of a single SIGINT immediately, regardless of current job execution.

IMO signal handling is best implemented on the shell level. Any thoughts?

@asfgit asfgit closed this in 8012793 Apr 22, 2016
srielau added a commit to srielau/spark that referenced this pull request Jan 12, 2026
Disabled Jekyll's smart_quotes feature to use standard ASCII apostrophes (')
instead of fancy Unicode quotes (' and ') in generated HTML.

Change in _config.yml:
  kramdown:
    smart_quotes: ["apos", "apos", "quot", "quot"]

This ensures SQLSTATE codes like '02000' and '02xxx' render with standard
ASCII single quotes (&apache#39;) instead of curly quotes (&apache#8216; and &apache#8217;),
making them easier to copy/paste and more consistent with SQL syntax.

Affects all documentation, but particularly important for:
- SQLSTATE codes ('02000', '02xxx', '45000', etc.)
- String literals in code examples
- Technical identifiers that use quotes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants