Skip to content

Add capability for goroutine dumping on SIGQUIT/USR1#10786

Merged
thaJeztah merged 1 commit intomoby:masterfrom
estesp:add-goroutine-debug
May 12, 2015
Merged

Add capability for goroutine dumping on SIGQUIT/USR1#10786
thaJeztah merged 1 commit intomoby:masterfrom
estesp:add-goroutine-debug

Conversation

@estesp
Copy link
Copy Markdown
Contributor

@estesp estesp commented Feb 13, 2015

This allows "kill -QUIT pidof docker" to be used to dump the
goroutine stacktraces from the docker daemon to the debug log. The
SIGQUIT listener is only created if the daemon is started in debug mode.

Updated based on feedback -- 4/21:
Goroutine dumps are created on SIGQUIT (matching Go-lang builtin dumper), and SIGUSR1. The user signal 1 handler is not destructive: it only dumps goroutines without exiting the daemon. The quit handler is a hard-exit (no cleanup routine called) after dumping the goroutines to allow a method to exit a hung daemon after capturing debug information. Both handlers are installed without any reliance on "DEBUG" being set in the daemon. The routines are logged at "info" level, matching the default log-level set at start time. If the log-level is set to something more restrictive (warn or error), then the goroutine logging will not be visible. Potentially we might want to "escalate" the output to a higher log level if we think that will be a problem with usage.

Docker-DCO-1.1-Signed-off-by: Phil Estes [email protected] (github: estesp)

@tiborvass
Copy link
Copy Markdown
Contributor

Ping @vieux IIRC he either did it already or it was planned. Let me know

@SvenDowideit
Copy link
Copy Markdown
Contributor

can we please add this to the docs/source/contributing/devenvironment.md doc?

@estesp
Copy link
Copy Markdown
Contributor Author

estesp commented Feb 19, 2015

@tiborvass after talking to @vieux and testing it locally--this is built into Go itself now, and our pkg/signals setup when DEBUG is non-empty in the environment is already correctly allowing SIGQUIT to pass through to the Go handler (rather than set up a trap for it).

@SvenDowideit I could change this PR to add a note about the ability to dump stacktraces with SIGQUIT in the docs, or just close the PR without anything given this is a Go feature that Docker is allowing to work properly when DEBUG="" in the environment (which is set with -D properly)

@tiborvass
Copy link
Copy Markdown
Contributor

@vieux can you confirm that the only scenario where it makes sense to trap SIGQUIT is when attached to a container?
In all other scenarios, SIGQUIT should do what it's supposed to do: quit abruptly. I don't think it should depend on the "DEBUG" variable.

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Mar 6, 2015

If I recall correctly, if you have container(s) running sigquit is carched because you have to press it 3 times to actually quit.

Do you remember this @tiborvass?

@tiborvass
Copy link
Copy Markdown
Contributor

@vieux I thought that's for SIGINT not SIGQUIT. SIGQUIT is Ctrl+\.

Comment thread docker/daemon.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if \n\n is necessary here, looks like log.Debugf should add EOL itself.

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented Apr 1, 2015

@estesp Also on rebase note, that we changed all log for logrus.

Comment thread docker/daemon.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably makes sense to write one big log message. Because actually other messages can be between :)

@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented Apr 1, 2015

I tried and I like it.
ping @icecrime @tiborvass for move to code-review or to be silent forever.

@tiborvass
Copy link
Copy Markdown
Contributor

@estesp FYI currently SIGQUIT should be already handled (try Ctrl+\ on a daemon in the foreground).

Looking at the code now.

@tiborvass
Copy link
Copy Markdown
Contributor

NOT LGTM

@estesp this code is already available in pkg/signal/trap.go

Unfortunately it checks for the DEBUG envvar, which I believe we set when -D is present. Not sure what the best solution is, but this PR is not it :) Maybe we should remove that part from pkg/signal and handle it in daemon ?

If I'm missing something, let me know :)

@estesp
Copy link
Copy Markdown
Contributor Author

estesp commented Apr 1, 2015

@tiborvass yes.. I've been lazy to get back to this, but I confirmed last month with @vieux that there is no need to set up another handler as it is already happening in base Go lang debug traps.

Why I left the PR open was to either (a) clarify about DEBUG on/off and how that affects this capability (which it does because of Docker signal, and/or (b) add some docs to dev environment setup about goroutine dumping)

@tiborvass
Copy link
Copy Markdown
Contributor

@estesp IMO t should not be only on DEBUG.

I actually had a great chat with @gabrtv in Denver about this, and we agreed that we should probably have another signal handler that just dumps all the stack traces just like SIGQUIT would but without quitting. This would allow production environments to get a lot more feedback if they're ever stuck in a weird case.

A hacky solution would be to discriminate on DEBUG. If it's in debug mode, SIGQUIT quits, if not, then just dumps output without quitting. It's not very intuitive though cause SIGQUIT wouldn't quit.

However, I couldn't find conclusive documentation on what to do on SIGQUIT. I even found contradictions: some say we should clean up after SIGQUIT others say no, some say it's okay to ignore SIGQUIT, others say it should always terminate the process.

Also we should probably write both to stdout and to disk, the stack traces from the runtime package just like core dump.

What do you think?

Also golang/go@5146a93 can be interesting to read.

@estesp
Copy link
Copy Markdown
Contributor Author

estesp commented Apr 2, 2015

It would be straightforward to use something like SIGUSR1 as the trigger, and as you note, that doesn't get in the way of any "expected" behaviors for signals like SIGQUIT. Not to show my age and mention Java, but that was how Java did it's special core dump routine(s). :)

@tiborvass
Copy link
Copy Markdown
Contributor

@estesp i'm fine with SIGUSR1 :)

@gabrtv
Copy link
Copy Markdown

gabrtv commented Apr 2, 2015

👍 for a SIGUSR1 handler that dumps goroutines and other runtime debug info. This would be super helpful in troubleshooting production Docker daemons that seem to be leaking goroutines.

@icecrime
Copy link
Copy Markdown
Contributor

@estesp Can you please update the PR to use SIGUSR1? We'll proceed with code review then, thanks!

@tiborvass
Copy link
Copy Markdown
Contributor

Ping @estesp for SIGUSR1, I think this will be a 1.7 feature of its own :)

@estesp estesp force-pushed the add-goroutine-debug branch from d34d8e7 to b4452f6 Compare April 21, 2015 04:25
@estesp estesp force-pushed the add-goroutine-debug branch from b4452f6 to 3e8dcde Compare April 21, 2015 04:26
@estesp estesp changed the title Add debug capability for goroutine dumping on SIGQUIT Add capability for goroutine dumping on SIGQUIT/USR1 Apr 21, 2015
@estesp estesp force-pushed the add-goroutine-debug branch from 3e8dcde to 2f103d1 Compare April 21, 2015 04:41
@estesp
Copy link
Copy Markdown
Contributor Author

estesp commented Apr 21, 2015

@icecrime @tiborvass thanks for the reminders! I had almost forgotten about this. I've updated the code and the first comment with the current implementation based on the feedback. I realize from CI output that we will need a build tag for a separate trap.go for Windows as SIGUSR1 is not available there. I'll fix that tomorrow.

@estesp
Copy link
Copy Markdown
Contributor Author

estesp commented Apr 22, 2015

@icecrime @tiborvass this should be ready for review--I refactored to only set up a SIGUSR1 handler for build !windows

@icecrime
Copy link
Copy Markdown
Contributor

Thanks @estesp! Pushing to 2-needs-code-review.

Comment thread pkg/signal/debugtrap.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're missing a loop of some sort here, because the way it's written you can only dump the stack once here (unless that's the intent?).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, no one submitted the requirements document with a specific feature requests to dump more than once. 😢 Embarrassed.. my original code was correct but in all the refactoring I hastily recreated this part and forgot the loop.. fixed now!! :)

@estesp estesp force-pushed the add-goroutine-debug branch 2 times, most recently from f5dcb1c to f0b6296 Compare April 22, 2015 02:14
@icecrime icecrime removed the dco/yes label Apr 23, 2015
@jessfraz jessfraz added this to the 1.7.0 milestone Apr 24, 2015
@estesp estesp mentioned this pull request Apr 24, 2015
@LK4D4
Copy link
Copy Markdown
Contributor

LK4D4 commented May 5, 2015

ping @tiborvass
Ok for you?

@calavera
Copy link
Copy Markdown
Contributor

calavera commented May 7, 2015

code LGTM and I like this feature 👍

@jessfraz
Copy link
Copy Markdown
Contributor

jessfraz commented May 8, 2015

LGtm

@jessfraz
Copy link
Copy Markdown
Contributor

jessfraz commented May 8, 2015

ping @tiborvass ok with you? we will not merge without you liking it :)

@tiborvass
Copy link
Copy Markdown
Contributor

Behavior is good, reading code now

@crosbymichael
Copy link
Copy Markdown
Contributor

Does this really work? I thought you could only have one signal handler per process and here you are making two.

edit: maybe it's one signal handler per type of signal???

Comment thread pkg/signal/trap.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estesp I agree with crosbymichael, why can't we just add syscall.SIGUSR1 to the list below? If we want it to be cross platform, we could make it in a const sigdump = syscall.SIGUSR1 and make it or const sigdump = syscall.Signal(-1) on windows maybe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our IRC discussion and the comments here ( https://golang.org/pkg/os/signal/#Notify ), I think it is fully reasonable to have a separate handler func for the SIGUSR1 channel notifier, and while the other method would work, I'm not sure it has value over this model.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://golang.org/pkg/os/signal/#Notify shows indeed that it doesn't matter, so it's okay

EDIT: Sorry I got github cached...

@tiborvass
Copy link
Copy Markdown
Contributor

LGTM

@tiborvass
Copy link
Copy Markdown
Contributor

@estesp Actually this package is starting to become pretty weird, can't we just move all the debugtrap stuff to docker/daemon.go ?

Add handler for SIGUSR1 based on feedback regarding when to dump
goroutine stacks.  This will also dump goroutine stack traces on SIGQUIT
followed by a hard-exit from the daemon.

Docker-DCO-1.1-Signed-off-by: Phil Estes <[email protected]> (github: estesp)
@estesp estesp force-pushed the add-goroutine-debug branch from f0b6296 to 95fcf76 Compare May 12, 2015 00:10
@tiborvass
Copy link
Copy Markdown
Contributor

@crosbymichael let me know if this is fine. We could put DumpStacks() to its own debug package... but i really wanna merge this.

@crosbymichael
Copy link
Copy Markdown
Contributor

LGTM

1 similar comment
@tiborvass
Copy link
Copy Markdown
Contributor

LGTM

thaJeztah added a commit that referenced this pull request May 12, 2015
Add capability for goroutine dumping on SIGQUIT/USR1
@thaJeztah thaJeztah merged commit a5007e5 into moby:master May 12, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.