-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10001] [CORE] Allow Ctrl-C in spark-shell to kill running job #8216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7342f1b to
63f6ef8
Compare
63f6ef8 to
c8a3ba8
Compare
|
Strange error in jenkins. Build doesn't start for some reason. |
|
Test build #40927 timed out for PR 8216 at commit |
|
Test build #1614 has finished for PR 8216 at commit
|
|
Jenkins, test this please |
|
Test build #40955 has finished for PR 8216 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency can cause both Runtime and Compile time problems, if the used JVM/compiler does not provide these classes. Does this work with other JVM implementations, such as OpenJDK and IBM JDK?
I am in favor of this functionality, but I think we should be very careful using platform specific code. Luckily this is not the first time people had to solve this problem. JLine solves this in the following way:
https://github.com/jline/jline2/blob/master/src/main/java/jline/console/ConsoleReader.java#L257-L284
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell thank you very much for your comment. I understand your concern.
But Apache Hive and Facebook Presto both import sun.misc.Signal/SignalHander here and here. I am positive they're heavily tested against any other JDKs.
In addition, JLine implementation that you're referring to also use the same package via reflection in the end. All it does is adding some safety guard for the case sun.misc.Signal/SignalHander might not exist, which is unlikely in any major JDKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really think there is an easy way around the sun.misc.Signal/SignalHander approach.
I have to admit that my concern is moot if all major implementations support this (can anyone confirm this?). I just like the JLine approach better because this doesn't cause Compile/Run-time errors when classes cannot be found, it will just default to the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell sure. I can update my patch to handle the case when sun.misc.Signal/SignalHander classes don't exit. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvanhovell I tried what you suggested here, but that makes jenkins build time out. Seems Java reflection adds overheads, so I had to revert it.
9b1cca9 to
1cae5da
Compare
|
Test build #41011 timed out for PR 8216 at commit |
|
Test build #41025 has finished for PR 8216 at commit
|
|
Test build #41037 has finished for PR 8216 at commit
|
The test failure is because mini YARN cluster failed to start the RM. I'll trigger another run of tests. |
e0cd5dd to
c03bf07
Compare
|
Test build #41083 has finished for PR 8216 at commit
|
|
Hmm, not sure why a test case fails in YarnClusterSuite, but it seems failing constantly. I'll investigate it. |
|
I believe the YarnClusterSuite failure is fixed by SPARK-10059 because I see the same error message ( I'll rebase my patch to trigger another run of unit tests. |
c03bf07 to
d3eabf0
Compare
|
Test build #41148 has finished for PR 8216 at commit
|
|
Test build #41991 has finished for PR 8216 at commit
|
d3eabf0 to
cb3b73f
Compare
|
Test build #42012 has finished for PR 8216 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this into a final block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davies thank you for reviewing!
My intention for Try was to tolerate any error while detaching signal handler since this is on a best effort basis. For eg, if attachSignalHandler fails for some reason, detachSignalHandler will fail too.
I rewrote this code using Try, Success, and Failure as follows-
val attachTry = Try(attachSigintHandler())
while (!_jobFinished) {
this.wait()
}
attachTry match {
case _: Success[_] => detachSigintHandler()
case _: Failure[_] => // Ignore error. Signal handler is on a best effort basis.
}
Hopefully it is more clear now.
|
@piaozhexiu This is cool, thanks! I also want this for Python for a long time (could be a separate PR). |
This reverts commit 1cae5da93e44a82a6da7828203ce090e5afb1107.
61c2020 to
1ba208a
Compare
|
Test build #42373 has finished for PR 8216 at commit
|
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened if the current thread is not MainThread ?
|
@andrewor14 @davies guys any plans to get this in? |
|
@piaozhexiu could you update this PR? |
|
@hvanhovell @nezihyigitbasi Looks like JobWaiter has changed since. I haven't actively followed Spark recently. Can you guys take over from here? Thanks! |
|
cc @jodersky would this be something you would be interested in continuing on with? |
|
That would indeed be a cool feature to have! I'll give it a shot |
|
Should signal handling really be implemented in a JobWaiter? I'm afraid that implementing it there could have unwanted consequences for non-shell operation: Imagine implementing any custom, standalone program, in that case it should terminate on reception of a single SIGINT immediately, regardless of current job execution. IMO signal handling is best implemented on the shell level. Any thoughts? |
Disabled Jekyll's smart_quotes feature to use standard ASCII apostrophes (')
instead of fancy Unicode quotes (' and ') in generated HTML.
Change in _config.yml:
kramdown:
smart_quotes: ["apos", "apos", "quot", "quot"]
This ensures SQLSTATE codes like '02000' and '02xxx' render with standard
ASCII single quotes (&apache#39;) instead of curly quotes (&apache#8216; and &apache#8217;),
making them easier to copy/paste and more consistent with SQL syntax.
Affects all documentation, but particularly important for:
- SQLSTATE codes ('02000', '02xxx', '45000', etc.)
- String literals in code examples
- Technical identifiers that use quotes
This patch attaches a signal handler for SIGINT to JobWaiter when a job is running and detaches it as soon as the job is finished. When ctrl-c is pressed during this time, spark-shell cancels the running job.
It's worth noting that signal handler is only overridden when job is running. So if ctrl-c is pressed when spark-shell is idle, it will still interrupt JVM and exit the shell (same as now).
In addition, if ctrl-c is pressed multiple times when job is running, it will interrupt JVM and exit the shell. This is to avoid any case that the shell becomes unresponsive. For eg, consider this scenario. User submits a job in the shell. But since the job lists a lot of files before running, it might be hanging in the driver. In this case, user can interrupt JVM and exit the shell by ctrl-c'ing twice.
Here is summary-
This patch basically improves
case 3by allowing the user to cancel a running job on cluster w/o exiting the shell.[a] 1st ctrl-c will print a log message saying
Cancelling running job.. This might take some time, so be patient. Press Ctrl-C again to kill JVM.2nd ctrl-c will exit the shell.[b] when job is running, ctrl-c immediately cancels the job, and the prompt is returned to user.