Skip to content

[wip] Update containerd runtime to v1.2.0#37932

Closed
thaJeztah wants to merge 1 commit intomoby:masterfrom
thaJeztah:bump_containerd_runtime_v1.2.0
Closed

[wip] Update containerd runtime to v1.2.0#37932
thaJeztah wants to merge 1 commit intomoby:masterfrom
thaJeztah:bump_containerd_runtime_v1.2.0

Conversation

@thaJeztah
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah commented Sep 28, 2018

Updated to v1.2.0. (c4446665cb9c30056f4998ed953e6d4ff22c7c39)

changes since the last bump: containerd/containerd@4055185...v1.2.0

updated to current master, which should be pretty close to v1.2.0 GA (not tagged yet), and includes the fix for the failures here;

diff since v1.2.0-rc.2: containerd/containerd@v1.2.0-rc.2...4055185

release notes: https://github.com/containerd/containerd/releases/tag/v1.2.0-rc.2

full diff since rc.1: containerd/containerd@v1.2.0-rc.1...v1.2.0-rc.2

Possibly relevant changes;

full diff since rc.0: containerd/containerd@v1.2.0-rc.0...v1.2.0-rc.1

  • New V2 Runtime with a stable gRPC interface for managing containers through
    external shims.
  • Updated CRI Plugin, validated against Kubernetes v1.11 and v1.12, but it is
    also compatible with Kubernetes v1.10.
  • Support for Kubernetes Runtime Class, introduced in Kubernetes 1.12
  • A new proxy plugin configuration has been added to allow external
    snapshotters be connected to containerd using gRPC.-
  • A new Install method on the containerd client allows users to publish host
    level binaries using standard container build tooling and container
    distribution tooling to download containerd related binaries on their systems.
  • Add support for cleaning up leases and content ingests to garbage collections.
  • Improved multi-arch image support using more precise matching and ranking
  • Some Minor API additions

@thaJeztah
Copy link
Copy Markdown
Member Author

Copy link
Copy Markdown
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐸

@thaJeztah
Copy link
Copy Markdown
Member Author

Ok; looks like this is actually failing:

Janky timed out;

11:31:34 Build timed out (after 200 minutes). Marking the build as failed.
11:31:34 Build timed out (after 200 minutes). Marking the build as aborted.
11:31:34 Set build name.
11:31:34 New build name is '#37932'
11:31:34 Build was aborted

Experimental https://jenkins.dockerproject.org/job/Docker-PRs-experimental/42382/console

08:30:41 --- FAIL: TestContainerStartOnDaemonRestart (2.34s)
08:30:41     daemon.go:292: [d597195c12a96] waiting for daemon to start
08:30:41     daemon.go:324: [d597195c12a96] daemon started
08:30:41     daemon.go:292: [d597195c12a96] waiting for daemon to start
08:30:41     daemon.go:324: [d597195c12a96] daemon started
08:30:41     daemon_linux_test.go:65: assertion failed: error is not nil: Error response from daemon: monitor task: cgroup is already being collected: unknown: failed to start test container
08:30:41     daemon.go:282: [d597195c12a96] exiting daemon

PowerPC; same failure: https://jenkins.dockerproject.org/job/Docker-PRs-powerpc/11578/console

13:36:44 --- FAIL: TestContainerStartOnDaemonRestart (4.19s)
13:36:44     daemon.go:291: [dec29adbcc719] waiting for daemon to start
13:36:44     daemon.go:323: [dec29adbcc719] daemon started
13:36:44     daemon.go:291: [dec29adbcc719] waiting for daemon to start
13:36:44     daemon.go:323: [dec29adbcc719] daemon started
13:36:44     daemon_linux_test.go:65: assertion failed: error is not nil: Error response from daemon: monitor task: cgroup is already being collected: unknown: failed to start test container
13:36:44     daemon.go:281: [dec29adbcc719] exiting daemon

Z; same failure; https://jenkins.dockerproject.org/job/Docker-PRs-s390x/11444/console

09:45:49 --- FAIL: TestContainerStartOnDaemonRestart (2.94s)
09:45:49     daemon.go:291: [d4f1dc69ce69b] waiting for daemon to start
09:45:49     daemon.go:323: [d4f1dc69ce69b] daemon started
09:45:49     daemon.go:291: [d4f1dc69ce69b] waiting for daemon to start
09:45:49     daemon.go:323: [d4f1dc69ce69b] daemon started
09:45:49     daemon_linux_test.go:65: assertion failed: error is not nil: Error response from daemon: monitor task: cgroup is already being collected: unknown: failed to start test container
09:45:49     daemon.go:281: [d4f1dc69ce69b] exiting daemon

@thaJeztah
Copy link
Copy Markdown
Member Author

I see the same failure was happening on my old PR; #37710 (comment)

@thaJeztah
Copy link
Copy Markdown
Member Author

@thaJeztah
Copy link
Copy Markdown
Member Author

ping @tiborvass perhaps you have any idea? (you ran into the same test a while back)

@AkihiroSuda
Copy link
Copy Markdown
Member

rc1 is available now

@thaJeztah thaJeztah force-pushed the bump_containerd_runtime_v1.2.0 branch from 381d64d to 72d92d4 Compare October 4, 2018 12:02
@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 4, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@a5e2dd2). Click here to learn what that means.
The diff coverage is n/a.

@@            Coverage Diff            @@
##             master   #37932   +/-   ##
=========================================
  Coverage          ?   36.11%           
=========================================
  Files             ?      610           
  Lines             ?    45216           
  Branches          ?        0           
=========================================
  Hits              ?    16328           
  Misses            ?    26649           
  Partials          ?     2239

@thaJeztah thaJeztah changed the title Update containerd runtime to v1.2.0-rc.0 Update containerd runtime to v1.2.0-rc.1 Oct 4, 2018
@thaJeztah thaJeztah force-pushed the bump_containerd_runtime_v1.2.0 branch from 72d92d4 to a5e8170 Compare October 5, 2018 10:40
@thaJeztah
Copy link
Copy Markdown
Member Author

Rebased on #37710, to see if that fixes the problem

@thaJeztah
Copy link
Copy Markdown
Member Author

Ok that didn't help; still failing on the same.

Here's from an earlier run (without the containerd client bump);

time="2018-10-04T15:47:40.101249463Z" level=info msg="API listen on /tmp/docker-integration/d5e35f489c854.sock"
time="2018-10-04T15:47:40.104573736Z" level=debug msg="Calling GET /_ping"
time="2018-10-04T15:47:40.106633678Z" level=debug msg="Calling GET /info"
time="2018-10-04T15:47:40.134842186Z" level=debug msg="Calling POST /v1.39/containers/a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9/start"
time="2018-10-04T15:47:40.136141754Z" level=debug msg="container mounted via layerStore: &{/go/src/github.com/docker/docker/bundles/test-integration/d5e35f489c854/root/overlay2/0d5167e2a76dd9b0260451a03173c3bd5f0ae0eb8819ac532842621bda76ab08/merged 0x563118c264c0 0x563118c264c0}"
time="2018-10-04T15:47:40.136561020Z" level=debug msg="Assigning addresses for endpoint naughty_chatelet's interface on network bridge"
time="2018-10-04T15:47:40.136592480Z" level=debug msg="RequestAddress(LocalDefault/172.18.0.0/16, <nil>, map[])"
time="2018-10-04T15:47:40.136624967Z" level=debug msg="Request address PoolID:172.18.0.0/16 App: ipam/default/data, ID: LocalDefault/172.18.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65533, Sequence: (0xc0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:0 Serial:false PrefAddress:<nil> "
time="2018-10-04T15:47:40.139394140Z" level=debug msg="Assigning addresses for endpoint naughty_chatelet's interface on network bridge"
time="2018-10-04T15:47:40.142740759Z" level=debug msg="Programming external connectivity on endpoint naughty_chatelet (0ed72fa5797217710c9f61181d783182620e48bdde18e5642d525144c9e9b008)"
time="2018-10-04T15:47:40.143331099Z" level=debug msg="EnableService a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 START"
time="2018-10-04T15:47:40.143356155Z" level=debug msg="EnableService a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 DONE"
time="2018-10-04T15:47:40.145770162Z" level=debug msg="bundle dir created" bundle=/tmp/dxr/d5e35f489c854/containerd/a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 module=libcontainerd namespace=moby root=/go/src/github.com/docker/docker/bundles/test-integration/d5e35f489c854/root/overlay2/0d5167e2a76dd9b0260451a03173c3bd5f0ae0eb8819ac532842621bda76ab08/merged
time="2018-10-04T15:47:40.399479908Z" level=debug msg="sandbox set key processing took 110.297666ms for container a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9"
time="2018-10-04T15:47:40.506001325Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/create
time="2018-10-04T15:47:40.506504381Z" level=error msg="stream copy error: read /proc/self/fd/19: file already closed"
time="2018-10-04T15:47:40.506573964Z" level=error msg="stream copy error: read /proc/self/fd/20: file already closed"
time="2018-10-04T15:47:40.518577066Z" level=error msg="failed to delete failed start container" container=a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 error="cannot delete running task a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9: failed precondition"
time="2018-10-04T15:47:40.523188576Z" level=debug msg="Revoking external connectivity on endpoint naughty_chatelet (0ed72fa5797217710c9f61181d783182620e48bdde18e5642d525144c9e9b008)"
time="2018-10-04T15:47:40.523741694Z" level=debug msg="DeleteConntrackEntries purged ipv4:0, ipv6:0"
time="2018-10-04T15:47:40.615004148Z" level=debug msg="Releasing addresses for endpoint naughty_chatelet's interface on network bridge"
time="2018-10-04T15:47:40.615059361Z" level=debug msg="ReleaseAddress(LocalDefault/172.18.0.0/16, 172.18.0.2)"
time="2018-10-04T15:47:40.615096593Z" level=debug msg="Released address PoolID:LocalDefault/172.18.0.0/16, Address:172.18.0.2 Sequence:App: ipam/default/data, ID: LocalDefault/172.18.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65532, Sequence: (0xe0000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:3"
time="2018-10-04T15:47:40.678451004Z" level=error msg="a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 cleanup: failed to delete container from containerd: cannot delete running task a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9: failed precondition"
time="2018-10-04T15:47:40.678545724Z" level=error msg="Handler for POST /v1.39/containers/a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9/start returned error: monitor task: cgroup is already being collected: unknown"
time="2018-10-04T15:47:40.680743288Z" level=debug msg="Calling DELETE /v1.39/containers/a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9?force=1"
time="2018-10-04T15:47:40.689905866Z" level=info msg="Processing signal 'interrupt'"
time="2018-10-04T15:47:40.690034120Z" level=debug msg="daemon configured with a 15 seconds minimum shutdown timeout"
time="2018-10-04T15:47:40.690170179Z" level=debug msg="start clean shutdown of all containers with a 15 seconds timeout..."
time="2018-10-04T15:47:40.691324619Z" level=debug msg="Unix socket /tmp/dxr/d5e35f489c854/libnetwork/2d75bb866ce6b3a84b0c820e460c478f65245b97baadaf1e08c7383aa5fcbe97.sock doesn't exist. cannot accept client connections"
time="2018-10-04T15:47:40.691422182Z" level=debug msg="Cleaning up old mountid : start."
time="2018-10-04T15:47:40.692104928Z" level=debug msg="Cleaning up old mountid : done."
time="2018-10-04T15:47:40.692626051Z" level=debug msg="Clean shutdown succeeded"

Looks like this may be part of the problem ("cannot delete running task "). This is after the daemon and containerd are started again;

time="2018-10-04T15:47:40.518577066Z" level=error msg="failed to delete failed start container" container=a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9 error="cannot delete running task a264029c90f49ad4ce973f4780234f7ad780bde41d39b9255dc48af6060408f9: failed precondition"

@thaJeztah thaJeztah force-pushed the bump_containerd_runtime_v1.2.0 branch from a5e8170 to cfc6e08 Compare October 5, 2018 12:00
@thaJeztah
Copy link
Copy Markdown
Member Author

@crosbymichael if you have any ideas on that failure ^^ - I won't have time this evening, but if you have any ideas 🤗

@thaJeztah thaJeztah force-pushed the bump_containerd_runtime_v1.2.0 branch from 783d342 to 7e4bba1 Compare October 25, 2018 09:19
@thaJeztah thaJeztah changed the title [wip] Update containerd runtime to v1.2.0-rc.2 [wip] Update containerd runtime to v1.2.0 Oct 25, 2018
@thaJeztah
Copy link
Copy Markdown
Member Author

Looks like CI is hanging after completion;

12:20:38 Build timed out (after 180 minutes). Marking the build as failed.
12:20:38 Build timed out (after 180 minutes). Marking the build as aborted.
14:20:15 Build timed out (after 300 minutes). Marking the build as failed.
14:20:15 Build timed out (after 300 minutes). Marking the build as aborted.

Which looks the same as #38062 (comment)

CI seems to be hanging because of leaked containerd-shim processes created while docker-py ran.

Logs and processes in: https://gist.github.com/tonistiigi/a5739d2c0f82442c0317bea044cc56e2

From the logs. The test ran a DELETE request for that failed with

time="2018-10-23T19:03:57.531600098Z" level=error > msg="fc42a122b164349bced9e08588f6782f65de38e823b5649467533cd07d359464 cleanup: failed to delete container from containerd: cannot delete running task fc42a122b164349bced9e08588f6782f65de38e823b5649467533cd07d359464: failed precondition"

and leaked the shim process.

@thaJeztah
Copy link
Copy Markdown
Member Author

thaJeztah commented Oct 25, 2018

Linking, just in case there's a relation #37072 actually, that's about images, not containers.

release notes: https://github.com/containerd/containerd/releases/tag/v1.2.0

- New V2 Runtime with a stable gRPC interface for managing containers through
  external shims.
- Updated CRI Plugin, validated against Kubernetes v1.11 and v1.12, but it is
  also compatible with Kubernetes v1.10.
- Support for Kubernetes Runtime Class, introduced in Kubernetes 1.12
- A new proxy plugin configuration has been added to allow external
  snapshotters be connected to containerd using gRPC.-
- A new Install method on the containerd client allows users to publish host
  level binaries using standard container build tooling and container
  distribution tooling to download containerd related binaries on their systems.
- Add support for cleaning up leases and content ingests to garbage collections.
- Improved multi-arch image support using more precise matching and ranking
- Added a runtime `options` field for shim v2 runtime. Use the `options` field to
  config runtime specific options, e.g. `NoPivotRoot` and `SystemdCgroup` for
  runtime type `io.containerd.runc.v1`.
- Some Minor API additions
  - Add `ListStream` method to containers API. This allows listing a larger
    number of containers without hitting message size limts.
  - Add `Sync` flag to `Delete` in leases API. Setting this option will ensure
    a garbage collection completes before the removal call is returned. This can
    be used to guarantee unreferenced objects are removed from disk after a lease.

Signed-off-by: Sebastiaan van Stijn <[email protected]>
@thaJeztah thaJeztah force-pushed the bump_containerd_runtime_v1.2.0 branch from 480f9c2 to c8c5c15 Compare November 12, 2018 16:45
@thaJeztah
Copy link
Copy Markdown
Member Author

Rebased, after #38128 was merged (which updates runc)

@thaJeztah
Copy link
Copy Markdown
Member Author

Closing, because #38168 was merged, which has these changes

@thaJeztah thaJeztah closed this Nov 20, 2018
@thaJeztah thaJeztah deleted the bump_containerd_runtime_v1.2.0 branch November 20, 2018 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IPAM IP leakage Use Authorizer for image registry auth

9 participants