Skip to content

More context for errors#38068

Merged
vdemeester merged 5 commits intomoby:masterfrom
kolyshkin:err
Dec 12, 2018
Merged

More context for errors#38068
vdemeester merged 5 commits intomoby:masterfrom
kolyshkin:err

Conversation

@kolyshkin
Copy link
Contributor

Some functions that moby uses return "raw" errors from the kernel. Wrap those to provide more error context to the caller. Please see individual commit descriptions for details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done.

@AkihiroSuda
Copy link
Member

LGTM but a small nit.

Could you port this to containerd, because we are going to deprecate docker/pkg/mount pkg and use containerd

@kolyshkin
Copy link
Contributor Author

Could you port this to containerd, because we are going to deprecate docker/pkg/mount pkg and use containerd

Yes, and I was about to start working on that... but there's much more to be ported (mountinfo parser, for one thing).

@AkihiroSuda
Copy link
Member

05:28:00 pkg/mount/mount.go:41:1:warning: exported method MountError.Cause should have comment or be unexported (golint)
05:28:00 pkg/mount/mount.go:13:6:warning: type name will be used as mount.MountError by other packages, and that stutters; consider calling this Error (golint)

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Oct 23, 2018

05:28:00 pkg/mount/mount.go:41:1:warning: exported method MountError.Cause should have comment or be unexported (golint)
05:28:00 pkg/mount/mount.go:13:6:warning: type name will be used as mount.MountError by other packages, and that stutters; consider calling this Error (golint)

Should be fixed now.

@thaJeztah
Copy link
Member

Failures look legit (at least, same failures on janky, power, and z)

03:41:31 --- FAIL: TestSubtreePrivate (0.04s)
03:41:31     sharedsubtree_linux_test.go:65: %!v(PANIC=runtime error: invalid memory address or nil pointer dereference)
03:41:31 --- FAIL: TestSubtreeSharedSlave (0.06s)
03:41:31     sharedsubtree_linux_test.go:249: %!v(PANIC=runtime error: invalid memory address or nil pointer dereference)
03:41:31 --- FAIL: TestSubtreeUnbindable (0.02s)
03:41:31     sharedsubtree_linux_test.go:330: mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1000: invalid argument
03:41:31 FAIL

@kolyshkin
Copy link
Contributor Author

I have now carefully audited all users of pkg/mount and made all the necessary changes (in most cases it is simplification of logging as the error already contains everything needed).

@kolyshkin
Copy link
Contributor Author

Failure in experimental CI is flaky test, TestServiceWithDefaultAddressPoolInit, #37836

Windows failure is unrelated

@kolyshkin
Copy link
Contributor Author

z failure is known flaky test, TestSwarmClusterRotateUnlockKey, see #33041

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here; perhaps s/mount failed/failed to mount/ ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this if is redundant now, and we can just return umountErr (perhaps you find the explicit if more clear though, so up to you 😄)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we'd want to keep zfs in the error message (to more easily find where to look for the issue?) or is this already logged with storagedriver=zfs ? 🤔 (in that case; ignore 😂)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be clear from the context -- after all, we only support one graph driver at the time, and its name is logged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep zfs in there (same with other graphdriver's error messages as well).

"error creating zfs mount" is much easier to search and find in code or to find other people reporting a similar error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, though, I don't think having unique error strings to aid developers in bug report analysis is a good strategy. It would be much better to provide some context in error messages (like file name and line number where the error happens). This is the way I was doing it back in my C days (using preprocessor macros). Alas, logrus doesn't do it. Perhaps it will get this functionality one day? Could be possible with debug.Stack() or smth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"errors.WithStack" from "github.com/pkg/errors"... but still a good error message that signals where the error came from is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we always want to ignore the EINVAL case (a similar check is in RecursiveUnmount); perhaps move this to unmount() and return nil if the error is due to sys call.EINVAL;

diff --git a/pkg/mount/mount.go b/pkg/mount/mount.go
index 12f968cbb8..3edd031acf 100644
--- a/pkg/mount/mount.go
+++ b/pkg/mount/mount.go
@@ -4,9 +4,7 @@ import (
 	"sort"
 	"strconv"
 	"strings"
-	"syscall"
 
-	"github.com/pkg/errors"
 	"github.com/sirupsen/logrus"
 )
 
@@ -125,12 +123,7 @@ func ForceMount(device, target, mType, options string) error {
 // Unmount lazily unmounts a filesystem on supported platforms, otherwise
 // does a normal unmount.
 func Unmount(target string) error {
-	err := unmount(target, mntDetach)
-	if err != nil && errors.Cause(err) == syscall.EINVAL {
-		// ignore "not mounted" error
-		err = nil
-	}
-	return err
+	return unmount(target, mntDetach)
 }
 
 // RecursiveUnmount unmounts the target and all mounts underneath, starting with
@@ -150,16 +143,6 @@ func RecursiveUnmount(target string) error {
 		logrus.Debugf("Trying to unmount %s", m.Mountpoint)
 		err = unmount(m.Mountpoint, mntDetach)
 		if err != nil {
-			// If the error is EINVAL either this whole package is wrong (invalid flags passed to unmount(2)) or this is
-			// not a mountpoint (which is ok in this case).
-			// Meanwhile calling `Mounted()` is very expensive.
-			//
-			// We've purposefully used `syscall.EINVAL` here instead of `unix.EINVAL` to avoid platform branching
-			// Since `EINVAL` is defined for both Windows and Linux in the `syscall` package (and other platforms),
-			//   this is nicer than defining a custom value that we can refer to in each platform file.
-			if errors.Cause(err) == syscall.EINVAL {
-				continue
-			}
 			if i == len(mounts)-1 {
 				if mounted, e := Mounted(m.Mountpoint); e != nil || mounted {
 					return err
diff --git a/pkg/mount/mounter_linux.go b/pkg/mount/mounter_linux.go
index bfa87a05ee..6f289e06d2 100644
--- a/pkg/mount/mounter_linux.go
+++ b/pkg/mount/mounter_linux.go
@@ -1,6 +1,9 @@
 package mount // import "github.com/docker/docker/pkg/mount"
 
 import (
+	"syscall"
+
+	"github.com/pkg/errors"
 	"golang.org/x/sys/unix"
 )
 
@@ -77,6 +80,16 @@ func unmount(target string, flag int) error {
 	if err == nil {
 		return nil
 	}
+	// If the error is EINVAL either this whole package is wrong (invalid flags passed to unmount(2)) or this is
+	// not a mountpoint (which is ok in this case).
+	// Meanwhile calling `Mounted()` is very expensive.
+	//
+	// We've purposefully used `syscall.EINVAL` here instead of `unix.EINVAL` to avoid platform branching
+	// Since `EINVAL` is defined for both Windows and Linux in the `syscall` package (and other platforms),
+	//   this is nicer than defining a custom value that we can refer to in each platform file.
+	if errors.Cause(err) == syscall.EINVAL {
+		return nil
+	}
 	return &Error{
 		op:     "umount",
 		target: target,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure; it'll be cleaner this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be "Failed to unmount" ? Ah, I see; the .Error() returns the "op" as part of the error, so this will include unmount.

Perhaps we should prefix that error with "failed to <op>" otherwise we'll have to add that everywhere 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am following the style of errors from os package (and the general unix error style) which is in the form <op>: <error>. In fact it makes sense here as well, something that would look like:

... level=warning msg="umount <dir>: device busy"

which gives no less info than

... level=warning msg="Failed to umount <dir>: device busy"

so I'll drop the "failed to" from this place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the error start with "failed to " (e.g.)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, all the error messages (say from os package) are in the form op: msg, for example

open: permission denied

or

readlink: no such file or directory

I think we should follow the pattern. Brevity is the soul of wit 😸, besides, "failed to" is redundant here -- it's an error, so it is implied that something has failed.

@codecov
Copy link

codecov bot commented Oct 25, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@3e44f58). Click here to learn what that means.
The diff coverage is 27.69%.

@@            Coverage Diff            @@
##             master   #38068   +/-   ##
=========================================
  Coverage          ?    36.1%           
=========================================
  Files             ?      610           
  Lines             ?    45296           
  Branches          ?        0           
=========================================
  Hits              ?    16352           
  Misses            ?    26706           
  Partials          ?     2238

@thaJeztah
Copy link
Member

Seeing one more failure;

18:17:27 pkg/mount/mounter_freebsd.go:1::warning: file is not goimported (goimports)

@kolyshkin
Copy link
Contributor Author

Seeing one more failure

I forgot a hunk :( should be fixed now

@kolyshkin
Copy link
Contributor Author

@kolyshkin
Copy link
Contributor Author

PPC failure is #36903

syscall.Stat (and Lstat), unlike functions from os pkg,
return "raw" errors (like EPERM or EINVAL), and those are
propagated up the function call stack unchanged, and gets
logged and/or returned to the user as is.

Wrap those into os.PathError{} so the error message will
at least have function name and file name.

Note we use Capitalized function names to distinguish
between functions in os and ours.

Signed-off-by: Kir Kolyshkin <[email protected]>
As standard mount.Unmount does what we need, let's use it.

In addition, this adds ignoring "not mounted" condition, which
was previously implemented (see PR#33329, commit cfa2591)
via a very expensive call to mount.Mounted().

Signed-off-by: Kir Kolyshkin <[email protected]>
The function is not needed as it's just a shallow wrapper around
unix.Mount().

Signed-off-by: Kir Kolyshkin <[email protected]>
It has been pointed out that we're ignoring EINVAL from umount(2)
everywhere, so let's move it to a lower-level function. Also, its
implementation should be the same for any UNIX incarnation, so
let's consolidate it.

Signed-off-by: Kir Kolyshkin <[email protected]>
The errors returned from Mount and Unmount functions are raw
syscall.Errno errors (like EPERM or EINVAL), which provides
no context about what has happened and why.

Similar to os.PathError type, introduce mount.Error type
with some context. The error messages will now look like this:

> mount /tmp/mount-tests/source:/tmp/mount-tests/target, flags: 0x1001: operation not permitted

or

> mount tmpfs:/tmp/mount-test-source-516297835: operation not permitted

Before this patch, it was just

> operation not permitted

[v2: add Cause()]
[v3: rename MountError to Error, document Cause()]
[v4: fixes; audited all users]
[v5: make Error type private; changes after @cpuguy83 reviews]

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

Rebased; conflicts resolved.

I believe this one is ready to be merged and is very helpful is diagnosing bugs (issue of the day this PR could greatly help with is #38252)

@kolyshkin
Copy link
Contributor Author

@cpuguy83 PTAL; I know you wanted volume mount errors to be wrapped as well but I think it could be done in a separate PR -- this one is more about system mounts (as in mount(2)).

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants