Skip to content

Conversation

@robmry
Copy link
Contributor

@robmry robmry commented Feb 24, 2025

- What I did

Commit 0546d90 introduced the use of ipset to reduce the number of rules that need to be processed per-packet, and make the code a bit simpler.

But, docker's used on embedded kernels compiled without support for ipset, so the change is too disruptive.

- How I did it

Replace the two ipset rules with a new chain that writes out the rule's actions long-hand. So ..

This rule:

      -A FORWARD -m set --match-set docker-ext-bridges-v4 dst \
        -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

Is transformed into a per-bridge rule in new chain DOCKER-CT:

      -A DOCKER-FORWARD -j DOCKER-CT
      -A DOCKER-CT -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      -A DOCKER-CT -o bridge1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

And:

      -A FORWARD -m set --match-set docker-ext-bridges-v4 dst -j DOCKER

Is transformed into a per-bridge rule in new chain DOCKER-BRIDGE:

      -A DOCKER-FORWARD -j DOCKER-BRIDGE
      -A DOCKER-BRIDGE -o docker0 -j DOCKER
      -A DOCKER-BRIDGE -o bridge1 -j DOCKER

- How to verify it

The usual tests.

Ran 28.0.0 with some networks, stopped it and updated to this PR, checked that the ipset rules were removed.

- Human readable description for the release notes

- Remove dependency on kernel modules `ip_set`, `ip_set_hash_net` and `netfilter_xt_set`.
  - The dependency was introduced in release 28.0.0 but proved too disruptive. The iptables rules using these modules have been replaced.

@robmry robmry changed the title Experimental - disable ip set Drop the use of ipset Feb 24, 2025
@robmry robmry self-assigned this Feb 24, 2025
@robmry robmry added this to the 28.0.1 milestone Feb 24, 2025
@robmry robmry force-pushed the disable_ip_set branch 4 times, most recently from 673e09a to 14c96a5 Compare February 24, 2025 21:08
@robmry robmry marked this pull request as ready for review February 25, 2025 11:27
Copy link
Member

@akerouanton akerouanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

network that's made it through the DOCKER and isolation chains, whether the
destination is external or another network.

The DOCKER-CT chain is an early ACCEPT for any RELATED,ESTABLISHED traffic to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me that we're lacking a proper --ctstate INVALID -j DROP but IIRC it could cause issues for some Docker users (although I don't recall the exact details). That's probably something we should add in the next major. I can raise a PR if you want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, probably best not to try to tackle that for 28.0.1 and risk introducing new issues.

@thaJeztah
Copy link
Member

I guess after this one, we can also revert the changes in check-config.sh ?

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM

(but I'm an iptables illiterate, so don't hold me to those parts)

defer func() {
if retErr != nil {
if err := iptable.RemoveExistingChain(DockerBridgeChain, iptables.Filter); err != nil {
log.G(context.TODO()).Warnf("failed on removing iptables FILTER chain %s on cleanup: %v", DockerBridgeChain, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a follow-up, we could use structured logs (at least WithError() for these. (and at some point see where we / if we can punch through contexts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are lots of log lines following the same pattern in that function, we could sort them all out separately. (Or, not, because hopefully we'll delete all the iptables code soon ... first though, it's being moved to a separate package, to make it possible to swap in nftables / firewalld backends - that's in-progress, so I'd rather not make lots of changes here that might get lost when I rebase.

We haven't plumbed in context for any of the bridge driver's initialisation code (or most/all of libnetwork's initialisation code). We could also do that separately.


_, err = iptable.NewChain(DockerBridgeChain, iptables.Filter)
if err != nil {
return fmt.Errorf("failed to create FILTER chain %s: %v", DockerBridgeChain, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the underlying error would never be relevant to us, beyond reporting "something failed"? (looking at %v vs %w)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I don't think we'd want to look at the underlying error ... most of the errors in that function use %v, so we could swap them for %w in a follow up.

Commit 0546d90 introduced the use of ipset to reduce the number
of rules that need to be processed per-packet, and make the code
a bit simpler.

But, docker's used on embedded kernels compiled without support
for ipset, so the change is too disruptive.

Replace the two ipset rules with a new chain that writes out the
rule's actions long-hand. So ..

This rule:
  -A FORWARD -m set --match-set docker-ext-bridges-v4 dst \
    -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
Is transformed into a per-bridge rule in new chain DOCKER-CT:
  -A DOCKER-FORWARD -j DOCKER-CT
  -A DOCKER-CT -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
  -A DOCKER-CT -o bridge1 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT

And:
  -A FORWARD -m set --match-set docker-ext-bridges-v4 dst -j DOCKER
Is transformed into a per-bridge rule in new chain DOCKER-BRIDGE:
  -A DOCKER-FORWARD -j DOCKER-BRIDGE
  -A DOCKER-BRIDGE -o docker0 -j DOCKER
  -A DOCKER-BRIDGE -o bridge1 -j DOCKER

Signed-off-by: Rob Murray <[email protected]>
@robmry
Copy link
Contributor Author

robmry commented Feb 25, 2025

Rebased, and reworded some text in the iptablesdoc template (@vvoland) ...

-The FORWARD chain rules are numbered in the output above, they are:
+The FORWARD chain rules, explained in the order they appear in the output above, are:
-are appended. Referring to the numbers in the iptables output ...
+are appended. The DOCKER-FORWARD chain rules, explained in the order they appear in
+the output above, are:

@robmry
Copy link
Contributor Author

robmry commented Feb 25, 2025

I guess after this one, we can also revert the changes in check-config.sh ?

Oh, yes - will do.

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

28.0.0 Causes the issue: failed to register "bridge" driver: invalid argument 28.0.0: Cannot start with iptables=true

4 participants