Skip to content

daemon: release sandbox even when NetworkDisabled#46651

Merged
thaJeztah merged 1 commit intomoby:masterfrom
akerouanton:releaseNetwork-NetworkDisabled
Oct 16, 2023
Merged

daemon: release sandbox even when NetworkDisabled#46651
thaJeztah merged 1 commit intomoby:masterfrom
akerouanton:releaseNetwork-NetworkDisabled

Conversation

@akerouanton
Copy link
Copy Markdown
Member

@akerouanton akerouanton commented Oct 16, 2023

- What I did

When the default bridge is disabled by setting dockerd's --bridge=none option, the daemon still creates a sandbox for containers with no network attachment specified. In that case NetworkDisabled will be set to true.

However, currently the releaseNetwork call will early return if NetworkDisabled is true. Thus, these sandboxes won't be deleted until the daemon is restarted. If a high number of such containers are created, the daemon would then take few minutes to start.

As a side note, NetworkDisabled semantics is weird/broken and should be revised:

  • On one hand a sandbox is created even if NetworkDisbled is set. Thus it allows these containers to be manually connected to other networks;
  • OTOH, when manually connecting such container to a network nothing happens and no error is returned (ie. no interface and no route provisioned, no embedded DNS, etc...);

- Description for the changelog

  • Fix a bug that would prevent network sandboxes to be properly deleted when stopping containers with no network attachment are specified and dockerd's --bridge=none option is specified.

- A picture of a cute animal (not mandatory but encouraged)

When the default bridge is disabled by setting dockerd's `--bridge=none`
option, the daemon still creates a sandbox for containers with no
network attachment specified. In that case `NetworkDisabled` will be set
to true.

However, currently the `releaseNetwork` call will early return if
NetworkDisabled is true. Thus, these sandboxes won't be deleted until
the daemon is restarted. If a high number of such containers are
created, the daemon would then take few minutes to start.

See moby#42461.

Signed-off-by: payall4u <[email protected]>
Signed-off-by: Albin Kerouanton <[email protected]>
@thaJeztah
Copy link
Copy Markdown
Member

OTOH, when manually connecting such container to a network nothing happens and no error is returned (ie. no interface and no route provisioned, no embedded DNS, etc...);

Yeah, that's confusing.

I also recall having discussions about --network=none and if it would be technically possible to connect a container to a network "after the fact". I.e., we currently don't allow a container to be connected to a network once it's created with none network;

docker network create foo

# this creates a container with a sandbox, but no networking connection
docker run -d --network=none --name hello nginx:alpine

# and connecting to a network produces an error
docker network connect foo hello
Error response from daemon: container cannot be connected to multiple networks with one of the networks in private (none) mode

I could see use-cases where you'd want to create a container without networking (at first), and later connect it to a network, but not sure what'd be involved in doing so (if possible!)

Comment on lines +972 to 973
// TODO(aker): If we hit this case, the endpoint state won't be cleaned up (ie. no call to cleanOperationalData).
if daemon.netController == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there's an actual case where daemon.netController CAN be nil 🤔

I'm considering that that should (probably?) never happen, and if it would we should probably make that a fatal error, and exit (before we even reach this code)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually my comment isn't about an hypothetical situation:

  • (*Daemon).releaseNetwork is called by (*Daemon).Cleanup:
    daemon.releaseNetwork(container)
  • Which is called by (*Daemon).restore on L449:
    daemon.Cleanup(c)
  • But netController is initialized by calling (*Daemon).initNetworkingController on L530:
    if err = daemon.initNetworkController(&cfg.Config, activeSandboxes); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I recall the order of initialisation being... fun.. I once introduced a regression because of some of that; #43130 (comment)

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Copy Markdown
Member

should we cherry-pick this fix?

@thaJeztah
Copy link
Copy Markdown
Member

@neersighted LGTY?

@thaJeztah
Copy link
Copy Markdown
Member

Let me bring this one in; just wanted to have a second set of eyes before merging 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale network sandboxes causing long docker startup time Bug: docker hang on restarting when bridge set to none.

4 participants