libnetwork, libn/d/overlay: handle coalesced updates from NetworkDB#50393
libnetwork, libn/d/overlay: handle coalesced updates from NetworkDB#50393corhere merged 6 commits intomoby:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR replaces the legacy Create/Update/Delete event types in NetworkDB with a unified WatchEvent that carries both previous and current values, and updates all watchers and driver interfaces to handle coalesced updates correctly.
- Introduce
WatchEventwithPrev/Valuefields and helper methods. - Update all broadcaster and watcher code (NetworkDB, drivers, agent) to emit and consume
WatchEvent. - Remove old stub
EventNotify/DecodeTableEntryimplementations and adapt tests to useWatchEvent.
Reviewed Changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| libnetwork/networkdb/watch.go | Define WatchEvent, update Watch filtering |
| libnetwork/networkdb/*.go | Refactor broadcasters and handlers to use WatchEvent |
| libnetwork/driverapi/driverapi.go | Added TableWatcher interface for coalesced events |
| libnetwork/drivers/... | Remove old stubs, implement new EventNotify |
| libnetwork/networkdb/tableevent_test.go | Update tests to expect WatchEvent |
| libnetwork/networkdb/networkdb_test.go | Adapt testWatch helper to validate Prev/Value |
| libnetwork/agent.go | Integrate TableWatcher events in Services and handlers |
| internal/iterutil/iterutil.go | Add SameValues/Deref utilities |
| internal/iterutil/iterutil_test.go | New tests for SameValues/Deref |
Comments suppressed due to low confidence (1)
| case DeleteEvent: | ||
| evt = event(ev) | ||
| } | ||
| evt := ev.(WatchEvent) |
There was a problem hiding this comment.
The unguarded type assertion ev.(WatchEvent) can panic if a non-WatchEvent is received; consider using a safe assertion (e, ok := ev.(WatchEvent)) or a type switch to prevent runtime panics.
| evt := ev.(WatchEvent) | |
| evt, ok := ev.(WatchEvent) | |
| if !ok { | |
| return false | |
| } |
There was a problem hiding this comment.
This is intentional. I want it to fail fast and loudly if events of the wrong type are received, as that would be a bug
| } else { | ||
| ep.info = info | ||
| eps[epID] = ep | ||
| if d, ok := d.(driverapi.TableWatcher); ok { |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
| func TestDeref(t *testing.T) { | ||
| a := make([]*int, 3) | ||
| for i := range a { | ||
| a[i] = &i |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
|
Looks like there's a compile failure on Windows at least; |
e8c508f to
739e533
Compare
| switch ev.(type) { | ||
| case networkdb.CreateEvent, networkdb.UpdateEvent: | ||
| switch { | ||
| case event.Value != nil: |
There was a problem hiding this comment.
Using the event.IsX methods might make this clearer?
There was a problem hiding this comment.
Yes, in retrospect I agree. I'll save this sort of construct for the following commit where the implementation gets corrected.
739e533 to
f147ed8
Compare
austinvazquez
left a comment
There was a problem hiding this comment.
The networking stack is new to me, but 👍🏼 from a Go/delta POV. No blockers from me.
| _ map[MACAddr]bool | ||
| _ map[IPMAC]bool |
akerouanton
left a comment
There was a problem hiding this comment.
In your 2nd commit, introducing TableWatcher, you added an interface assertion to the Linux overlay driver, but not to the Windows overlay driver. Is this on purpose?
Otherwise LGTM
Nope, that's an oversight. Nice catch! |
When handling updates to existing entries, it is often necessary to know what the previous value was. NetworkDB knows the previous and new values when it broadcasts an update event for an entry. Include both values in the update event so the watchers do not have to do their own parallel bookkeeping. Unify the event types under WatchEvent as representing the operation kind in the type system has been inconvenient, not useful. The operation is now implied by the nilness of the Value and Prev event fields. Signed-off-by: Cory Snider <[email protected]>
Overlay is the only driver which makes use of the EventNotify facility, yet all other driver implementations are forced to provide a stub implementation. Move the EventNotify and DecodeTableEntry methods into a new optional TableWatcher interface and remove the stubs from all the other drivers. Signed-off-by: Cory Snider <[email protected]>
The macAddr and ipmac types are generally useful within libnetwork. Move them to a dedicated package and overhaul the API to be more like that of the net/netip package. Update the overlay driver to utilize these types, adapting to the new API. Signed-off-by: Cory Snider <[email protected]>
Windows and Linux overlay driver instances are interoperable, working from the same NetworkDB table for peer discovery. As both drivers produce and consume serialized data through the table, they both need to have a shared understanding of the shape and semantics of that data. The Windows overlay driver contains a duplicate copy of the protobuf definitions used for marshaling and unmarshaling the NetworkDB peer entries for dubious reasons. It gives us the flexibility to have the definitions diverge, which is only really useful for shooting ourselves in the foot. Make daemon/libnetwork/drivers/overlay the source of truth for the peer record definitions and the name of the NetworkDB table for distributing peer records. Signed-off-by: Cory Snider <[email protected]>
The eventually-consistent nature of NetworkDB means we cannot depend on events being received in the same order that they were sent. Nor can we depend on receiving events for all intermediate states. It is possible for a series of entry UPDATEs, or a DELETE followed by a CREATE with the same key, to get coalesced into a single UPDATE event on the receiving node. Watchers of NetworkDB tables therefore need to be prepared to gracefully handle arbitrary UPDATEs of a key, including those where the new value may have nothing in common with the previous value. The overlay driver naively handled events for overlay_peer_table assuming that an endpoint leave followed by a rejoin of the same endpoint would always be expressed as a DELETE event followed by a CREATE. It would handle a coalesced UPDATE as a CREATE, inserting a new entry into peerDB without removing the old one. This would have various side effects, such as having the "transient state" of multiple entries in peerDB with the same peer IP never settle. Update driverapi to pass both the previous and new value of a table entry into the driver. Modify the overlay driver to handle an UPDATE by removing the previous peer entry from peerDB then adding the new one. Modify the Windows overlay driver to match. Signed-off-by: Cory Snider <[email protected]>
The eventually-consistent nature of NetworkDB means we cannot depend on events being received in the same order that they were sent. Nor can we depend on receiving events for all intermediate states. It is possible for a series of entry UPDATEs, or a DELETE followed by a CREATE with the same key, to get coalesced into a single UPDATE event on the receiving node. Watchers of NetworkDB tables therefore need to be prepared to gracefully handle arbitrary UPDATEs of a key, including those where the new value may have nothing in common with the previous value. The libnetwork controller naively handled events for endpoint_table assuming that an endpoint leave followed by a rejoin of the same endpoint would always be expressed as a DELETE event followed by a CREATE. It would handle a coalesced UPDATE as a CREATE, adding a new service binding without removing the old one. This would have various side effects, such as having the "transient state" of having multiple conflicting service bindings where more than one endpoint is assigned an IP address never settling. Modify the libnetwork controller to handle an UPDATE by removing the previous service binding then adding the new one. Signed-off-by: Cory Snider <[email protected]>
f147ed8 to
4538a1d
Compare
- What I did
- How I did it
The eventually-consistent nature of NetworkDB means we cannot depend on events being received in the same order that they were sent. Nor can we depend on receiving events for all intermediate states. It is possible for a series of entry UPDATEs, or a DELETE followed by a CREATE with the same key, to get coalesced into a single UPDATE event on the receiving node. Watchers of NetworkDB tables therefore need to be prepared to gracefully handle arbitrary UPDATEs of a key, including those where the new value may have nothing in common with the previous value.
Both the overlay driver's watcher for
overlay_peer_tableand the libnetwork controller's watcher forendpoint_tablewould misbehave when processing a coalesced UPDATE event. They would handle it the same as a CREATE: insert new entries into control-plane databases without removing the old. (This is a simplified explanation.) The result would be "transient states" with conflicts that never resolve, leaving obsolete entries in kernel control-plane databases and service-discovery DNS. Fix the watchers to remove the old before adding the new when handling UPDATE events.- How to verify it
tc netemscheduler. E.g.:tc qdisc add dev eth0 root netem delay 100ms 200ms loss 1%node.role!=managerwhich is connected to the user-defined overlay network from step 2. Arrange it so the tasks exit after a few seconds, e.g.alpine sh -c 'sleep 5'tasks.<service from step 4>while the service tasks keep flapping in the background. Verify that the set of IP addresses in the answer does not grow unbounded.docker run -d --name hellobasic --net my-overlay nginxdemos/hello:plain-textdocker run --rm --net my-overlay alpine watch wget -qO- hellobasicfor run in {1..3}; do docker kill hellobasic && docker start hellobasic; doneso the endpoint flaps and gets reattached with different IP and MAC addresses.docker service create -d --name hello --network my-overlay --mode global --constraint 'node.role!=manager' -p 30888:80 nginxdemos/hello:plain-textfor run in {1..10}; do docker service update --force hello; sleep 15; done- Human readable description for the release notes
- Greatly improve the reliability of overlay networking and the Swarm routing mesh- A picture of a cute animal (not mandatory but encouraged)