Skip to content

Bump hcsshim to b3f49c06ffaeef24d09c6c08ec8ec8425a0303e2#40128

Closed
vikramhh wants to merge 1 commit intomoby:masterfrom
vikramhh:bump_hcsshim
Closed

Bump hcsshim to b3f49c06ffaeef24d09c6c08ec8ec8425a0303e2#40128
vikramhh wants to merge 1 commit intomoby:masterfrom
vikramhh:bump_hcsshim

Conversation

@vikramhh
Copy link

@vikramhh vikramhh commented Oct 24, 2019

rebased on top of #40154

Among other things, this is required to pull in
microsoft/hcsshim#718 which should
take care of multiple issues reported to us.

full diff: microsoft/hcsshim@2226e08...6c7177e

includes:

Note that this is a temporary workaround for a bug in the platform, and will be reverted once that is addressed:

- What I did

- How I did it

- How to verify it

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@thaJeztah
Copy link
Member


vendor/github.com/Microsoft/hcsshim/internal/schema2/properties.go:13:2: cannot find package "github.com/containerd/cgroups/stats/v1" in any of:
 	/go/src/github.com/docker/docker/vendor/github.com/containerd/cgroups/stats/v1 (vendor tree)
 	/usr/local/go/src/github.com/containerd/cgroups/stats/v1 (from $GOROOT)
 	/go/src/github.com/containerd/cgroups/stats/v1 (from $GOPATH)

Looks like this needs a more recent version of the containerd/cgroups package to get https://github.com/containerd/cgroups/blob/master/stats/v1/, which was added in containerd/cgroups#100

@thaJeztah

This comment has been minimized.

@vikramhh
Copy link
Author

@thaJeztah @tiborvass - #40128 is in now. How can I trigger all checks to run on this change so that we could see if there are any build breaks? I recall @andrewhsu telling me that I need a member of maintainers group to do that.

@thaJeztah
Copy link
Member

@vikramhh strange; CI should trigger as usual, because you didn't make changes to the Jenkinsfile. Could you try a rebase/force push to see if Jenkins starts?

Note that our current CI doesn't run tests with Hyper-V isolation, but I have a "WIP" test branch that adds Windows 1903 and hyper-v isolation to Jenkins; #39846

I'll rebase that PR on top of this one to see if we can confirm the issue is resolved

@thaJeztah
Copy link
Member

oh; your branch has a merge-conflict; you need to rebase and re-run vndr github.com/Microsoft/hcsshim

@thaJeztah
Copy link
Member

@vikramhh I did a quick rebase and re-vendor

@thaJeztah
Copy link
Member

oh, I need to fix-up vendoring;

[2019-10-30T11:18:11.665Z]  D vendor/github.com/containerd/cgroups/stats/v1/doc.go
[2019-10-30T11:18:11.665Z]  D vendor/github.com/containerd/cgroups/stats/v1/metrics.pb.go
[2019-10-30T11:18:11.665Z]  D vendor/github.com/containerd/cgroups/stats/v1/metrics.proto

@thaJeztah thaJeztah force-pushed the bump_hcsshim branch 2 times, most recently from 92eca4c to c804ec6 Compare October 30, 2019 12:12
@thaJeztah
Copy link
Member

opened #40154 to update containerd/cgroups, and based this on top of that one

@thaJeztah thaJeztah force-pushed the bump_hcsshim branch 3 times, most recently from 344d1a0 to ba2ff63 Compare October 31, 2019 00:10
@thaJeztah
Copy link
Member

Seeing a lot of these failures on both RS1 and RS5;


[2019-10-30T14:22:37.513Z]     --- FAIL: TestDockerSuite/TestEventsContainerEventsAttrSort (3.17s)
[2019-10-30T14:22:37.513Z]         cli.go:29: assertion failed: 
[2019-10-30T14:22:37.513Z]             Command:  d:\CI\PR-39846\32\binary\docker.exe run --rm --name container-events-test busybox true
[2019-10-30T14:22:37.513Z]             ExitCode: 125
[2019-10-30T14:22:37.513Z]             Error:    exit status 125
[2019-10-30T14:22:37.513Z]             Stdout:   
[2019-10-30T14:22:37.513Z]             Stderr:   d:\CI\PR-39846\32\binary\docker.exe: Error response from daemon: process 1176 in container 01ef1c1ff2fddc4f76fdeb147bf724ce80a00bf46f90b4e3edf51531cf818426 encountered an error during hcsshim::Process::StdioLegacy: failure in a Windows system call: Element not found. (0x490).
[2019-10-30T14:22:37.513Z]             
[2019-10-30T14:22:37.513Z]             
[2019-10-30T14:22:37.513Z]             Failures:
[2019-10-30T14:22:37.513Z]             ExitCode was 125 expected 0
[2019-10-30T14:22:37.513Z]             Expected no error


[2019-10-30T13:57:29.000Z]     --- FAIL: TestDockerSuite/TestRunWorkingDirectory (4.12s)
[2019-10-30T13:57:29.000Z]         cli.go:29: assertion failed: 
[2019-10-30T13:57:29.000Z]             Command:  d:\CI\PR-40128\11\binary\docker.exe run --workdir C:/Windows busybox pwd
[2019-10-30T13:57:29.000Z]             ExitCode: 125
[2019-10-30T13:57:29.000Z]             Error:    exit status 125
[2019-10-30T13:57:29.000Z]             Stdout:   
[2019-10-30T13:57:29.000Z]             Stderr:   d:\CI\PR-40128\11\binary\docker.exe: Error response from daemon: process 4760 in container 0860bbbb4a73487c39416f46ecd6600b62e88d6627d89dd0fa6f19f1b0c261bb encountered an error during hcsshim::Process::StdioLegacy: failure in a Windows system call: Element not found. (0x490).
[2019-10-30T13:57:29.000Z]             
[2019-10-30T13:57:29.000Z]             
[2019-10-30T13:57:29.000Z]             Failures:
[2019-10-30T13:57:29.000Z]             ExitCode was 125 expected 0
[2019-10-30T13:57:29.000Z]             Expected no error

@vikramhh
Copy link
Author

vikramhh commented Oct 31, 2019

0x490 is ERROR_NOT_FOUND - this is very likely coming from makeOpenFiles in internal/hcs/utils.go

@kevpar - could you please take a look.

@thaJeztah
Copy link
Member

0x490 is ERROR_NOT_FOUND [https://golang.org/pkg/syscall/?GOOS=windows] - this is very likely coming from makeOpenFiles in internal/hcs/utils.go

Ok, so

func (s *DockerSuite) TestRunWorkingDirectory(c *testing.T) {
dir := "/root"
image := "busybox"
if testEnv.OSType == "windows" {
dir = `C:/Windows`
}
// First with -w
out, _ := dockerCmd(c, "run", "-w", dir, image, "pwd")
out = strings.TrimSpace(out)
if out != dir {
c.Errorf("-w failed to set working directory")
}
// Then with --workdir
out, _ = dockerCmd(c, "run", "--workdir", dir, image, "pwd")
out = strings.TrimSpace(out)
if out != dir {
c.Errorf("--workdir failed to set working directory")
}
}
starts a container with --workdir=C:\Windows. On Linux, specifying a non-existing directory as workdir, will create the working directory if it's not there. So it could be that something in the code is first trying to change to the directory, and then falling back to creating?

(I assume C:\Windows is in the image, but adding just in case)

@ebalders
Copy link

ebalders commented Nov 8, 2019

Any ETA on when this will be merged in?

@thaJeztah
Copy link
Member

@ebalders there's a regression in this hcsshim version, which is currently blocking this from being merged

@kevpar
Copy link
Contributor

kevpar commented Nov 8, 2019

I am taking a look at this.

@kevpar
Copy link
Contributor

kevpar commented Nov 8, 2019

@thaJeztah I'm not familiar with the moby Jenkins CI. Is there a command I can run to repro the CI tests locally?

@vikramhh vikramhh changed the title Bump hcsshim to 6c7177eae8be632af2e15e44b62d69ab18389ddb Bump hcsshim to b3f49c06ffaeef24d09c6c08ec8ec8425a0303e2 Nov 15, 2019
@vikramhh
Copy link
Author

vikramhh commented Nov 15, 2019

Changed the title to reflect the fact that we had to bump hcsshim further to take in microsoft/hcsshim#737

@thaJeztah
Copy link
Member

Looks like you may have to run vndr again;

[2019-11-15T22:42:52.013Z] The result of vndr differs
[2019-11-15T22:42:52.013Z] 
[2019-11-15T22:42:52.013Z]  D vendor/github.com/containerd/cgroups/stats/v1/metrics.pb.txt
[2019-11-15T22:42:52.013Z]  M vendor/github.com/containerd/cgroups/stats/v1/metrics.proto
[2019-11-15T22:42:52.013Z] 
[2019-11-15T22:42:52.013Z] Please vendor your package with github.com/LK4D4/vndr.
[2019-11-15T22:42:52.013Z] 
[2019-11-15T22:42:52.013Z] diff --git a/vendor/github.com/containerd/cgroups/stats/v1/metrics.proto b/ve
[2019-11-15T22:42:52.013Z] ndor/github.com/containerd/cgroups/stats/v1/metrics.proto
[2019-11-15T22:42:52.013Z] index e821e8407..ba6be851d 100644
[2019-11-15T22:42:52.013Z] --- a/vendor/github.com/containerd/cgroups/stats/v1/metrics.proto
[2019-11-15T22:42:52.013Z] +++ b/vendor/github.com/containerd/cgroups/stats/v1/metrics.proto
[2019-11-15T22:42:52.013Z] @@ -135,6 +135,7 @@ message NetworkStat {
[2019-11-15T22:42:52.013Z]  	uint64 tx_errors = 8;
[2019-11-15T22:42:52.013Z]  	uint64 tx_dropped = 9;
[2019-11-15T22:42:52.013Z]  }
[2019-11-15T22:42:52.013Z] +
[2019-11-15T22:42:52.013Z]  // CgroupStats exports per-cgroup statistics.
[2019-11-15T22:42:52.013Z]  message CgroupStats {
[2019-11-15T22:42:52.013Z]  	// number of tasks sleeping

@vikramhh
Copy link
Author

@thaJeztah - would you recommend running RS1 tests as well for this?

@thaJeztah
Copy link
Member

restarted CI with RS1 enabled as well

@vikramhh
Copy link
Author

@thaJeztah - the RS1 failures seem to be known flaky tests mentioned in #39885 - does this look good to go otherwise?

@thaJeztah
Copy link
Member

Yes, if it's ready for review, feel free to move it out of draft

@vikramhh vikramhh marked this pull request as ready for review November 18, 2019 11:42
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vikramhh
Copy link
Author

@kolyshkin - Looks like I need two reviewers for getting a merge done. Could you please take a look.

@kolyshkin
Copy link
Contributor

I am seeing the following failure on Windows RS1

[2019-11-22T18:31:50.937Z] --- FAIL: TestDockerSuite/TestRunAttachFailedNoLeak (4.19s)

[2019-11-22T18:31:50.937Z] docker_cli_run_test.go:3948: assertion failed: expression is false: strings.Contains(out, "port is already allocated") || strings.Contains(out, "were not connected because a duplicate name exists") || strings.Contains(out, "The specified port already exists") || strings.Contains(out, "HNS failed with error : Failed to create endpoint") || strings.Contains(out, "HNS failed with error : The object already exists"): Output: d:\CI\PR-40128\20\binary\docker.exe: Error response from daemon: failed to create endpoint fail on network nat: hns failed with error : The object already exists.

and it looks like it's relevant.

@vikramhh I suggest you fix this test case to do case-insensitive comparison instead (just lowercase out before doing comparisons, and make sure all comprarison strings are also lowercased).

@kolyshkin
Copy link
Contributor

just lowercase out before doing comparisons, and make sure all comprarison strings are also lowercased

i.e. something like

diff --git a/integration-cli/docker_cli_run_test.go b/integration-cli/docker_cli_run_test.go
index af9044a92c..4271aba72e 100644
--- a/integration-cli/docker_cli_run_test.go
+++ b/integration-cli/docker_cli_run_test.go
@@ -3945,11 +3945,11 @@ func (s *DockerSuite) TestRunAttachFailedNoLeak(c *testing.T) {
        assert.Assert(c, err != nil, "Command should have failed but succeeded with: %s\nContainer 'test' [%+v]: %s\nContainer 'fail' [%+v]: %s", out, err1, out1, err2, out2)
        // check for windows error as well
        // TODO Windows Post TP5. Fix the error message string
-       assert.Assert(c, strings.Contains(out, "port is already allocated") ||
+       assert.Assert(c, strings.Contains(strings.ToLower(out), "port is already allocated") ||
                strings.Contains(out, "were not connected because a duplicate name exists") ||
-               strings.Contains(out, "The specified port already exists") ||
-               strings.Contains(out, "HNS failed with error : Failed to create endpoint") ||
-               strings.Contains(out, "HNS failed with error : The object already exists"), fmt.Sprintf("Output: %s", out))
+               strings.Contains(out, "the specified port already exists") ||
+               strings.Contains(out, "hns failed with error : failed to create endpoint") ||
+               strings.Contains(out, "hns failed with error : the object already exists"), fmt.Sprintf("Output: %s", out))
        dockerCmd(c, "rm", "-f", "test")
 
        // NGoroutines is not updated right away, so we need to wait before failing

@vikramhh vikramhh force-pushed the bump_hcsshim branch 2 times, most recently from 9b831b6 to f743ef4 Compare November 23, 2019 14:22
@vikramhh
Copy link
Author

@kolyshkin - Thanks for the review. I have fixed TestRunAttachFailedNoLeak per your suggestion and verified that it passes RS1 checks. Ready to merge unless you have any other comments.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments above

@thaJeztah
Copy link
Member

@vikramhh I see you updated the tests, but could you move that change to a separate commit, as requested by @kolyshkin ?

Among other things, this is required to pull in
microsoft/hcsshim#718 which should
take care of multiple issues reported to us.

Also fixes microsoft/hcsshim#737
which was caught by checks while attempting to bump
up hcsshim version.

Modifies TestRunAttachFailedNoLeak to do case-insensitive
comparison to fix a failure on RS1.

Signed-off-by: Vikram bir Singh <[email protected]>
@ebalders
Copy link

I see this has been merged, any idea when it will be integrated into a Docker release?

@kolyshkin
Copy link
Contributor

@ebalders this one was not merged; #40250 was. In there you will be able to see links to backports to 19.03 branch and monitor it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment