Skip to content

Commit 270a4d4

Browse files
committed
libn/networkdb: stop table events from racing network leaves
When a node leaves a network or the cluster, or memberlist considers the node as failed, NetworkDB atomically deletes all table entries (for the left network) owned by the node. This maintains the invariant that table entries owned by a node are present in the local database indices iff that node is an active cluster member which is participating in the network the entries pertain to. (*NetworkDB).handleTableEvent() is written in a way which attempts to minimize the amount of time it is in a critical section with the mutex locked for writing. It first checks under a read-lock whether both the local node and the node where the event originated are participating in the network which the event pertains to. If the check passes, the mutex is unlocked for reading and locked for writing so the local database state is mutated in a critical section. That leaves a window of time between the participation check the write-lock being acquired for a network or node event to arrive and be processed. If a table event for a node+network races a node or network event which triggers the purge of all table entries for the same node+network, the invariant could be violated. The table entry described by the table event may be reinserted into the local database state after being purged by the node's leaving, resulting in an orphaned table entry which the local node will bulk-sync to other nodes indefinitely. It's not completely wrong to perform a pre-flight check outside of the critical section. It allows for an early return in the no-op case without having to bear the cost of synchronization. But such optimistic concurrency control is only sound if the condition is double-checked inside the critical section. It is tricky to get right, and this instance of optimistic concurrency control smells like a case of premature optimization. Move the pre-flight check into the critical section to ensure that the invariant is maintained. Signed-off-by: Cory Snider <[email protected]>
1 parent a3aea15 commit 270a4d4

1 file changed

Lines changed: 6 additions & 7 deletions

File tree

libnetwork/networkdb/delegate.go

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,13 @@ func (nDB *NetworkDB) handleTableEvent(tEvent *TableEvent, isBulkSync bool) bool
148148
// Update our local clock if the received messages has newer time.
149149
nDB.tableClock.Witness(tEvent.LTime)
150150

151+
nDB.Lock()
152+
// Hold the lock until after we broadcast the event to watchers so that
153+
// the new watch receives either the synthesized event or the event we
154+
// broadcast, never both.
155+
defer nDB.Unlock()
156+
151157
// Ignore the table events for networks that are in the process of going away
152-
nDB.RLock()
153158
networks := nDB.networks[nDB.config.NodeID]
154159
network, ok := networks[tEvent.NetworkID]
155160
// Check if the owner of the event is still part of the network
@@ -161,18 +166,12 @@ func (nDB *NetworkDB) handleTableEvent(tEvent *TableEvent, isBulkSync bool) bool
161166
break
162167
}
163168
}
164-
nDB.RUnlock()
165169

166170
if !ok || network.leaving || !nodePresent {
167171
// I'm out of the network OR the event owner is not anymore part of the network so do not propagate
168172
return false
169173
}
170174

171-
nDB.Lock()
172-
// Hold the lock until after we broadcast the event to watchers so that
173-
// the new watch receives either the synthesized event or the event we
174-
// broadcast, never both.
175-
defer nDB.Unlock()
176175
var entryPresent bool
177176
prev, err := nDB.getEntry(tEvent.TableName, tEvent.NetworkID, tEvent.Key)
178177
if err == nil {

0 commit comments

Comments
 (0)