Skip to content

libnetwork: NetworkDB: fix synchronisation and upgrade github.com/armon/go-radix #42646

@thaJeztah

Description

@thaJeztah

relates to #42645

Libnetwork's NetworkDB currenty depends on a very old version of github.com/armon/go-radix.

Upgrading the dependency to more current versions (v1.0.0) results in a panic / segfault when performing concurrent read/write operations moby/libnetwork#2581 (comment);

=== RUN   TestNetworkDBCRUDTableEntries
2020/09/11 15:11:30 Closing DB instances...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x917796]

goroutine 292 [running]:
github.com/docker/libnetwork/vendor/github.com/armon/go-radix.recursiveWalk(0x0, 0xc000050b50, 0xc000248800)
	/go/src/github.com/docker/libnetwork/vendor/github.com/armon/go-radix/radix.go:519 +0x26
github.com/docker/libnetwork/vendor/github.com/armon/go-radix.recursiveWalk(0xc0003a0bd0, 0xc000050b50, 0xc000050b40)
	/go/src/github.com/docker/libnetwork/vendor/github.com/armon/go-radix/radix.go:525 +0x7e
github.com/docker/libnetwork/vendor/github.com/armon/go-radix.recursiveWalk(0xc0003a0b10, 0xc000050b50, 0xc00034e950)
	/go/src/github.com/docker/libnetwork/vendor/github.com/armon/go-radix/radix.go:525 +0x7e
github.com/docker/libnetwork/vendor/github.com/armon/go-radix.recursiveWalk(0xc0003283f0, 0xc000050b50, 0x1)
	/go/src/github.com/docker/libnetwork/vendor/github.com/armon/go-radix/radix.go:525 +0x7e
github.com/docker/libnetwork/vendor/github.com/armon/go-radix.(*Tree).Walk(...)
	/go/src/github.com/docker/libnetwork/vendor/github.com/armon/go-radix/radix.go:447
github.com/docker/libnetwork/networkdb.(*NetworkDB).deleteNodeTableEntries(0xc0001e6ea0, 0xc00047c930, 0xc)
	/go/src/github.com/docker/libnetwork/networkdb/networkdb.go:546 +0xa3
github.com/docker/libnetwork/networkdb.(*NetworkDB).changeNodeState(0xc0001e6ea0, 0xc00047c930, 0xc, 0x1, 0xc000080570, 0xc000328301, 0xc000025ac6)
	/go/src/github.com/docker/libnetwork/networkdb/nodemgmt.go:88 +0x31e
github.com/docker/libnetwork/networkdb.(*NetworkDB).handleNodeEvent(0xc0001e6ea0, 0xc000234e80, 0x0)
	/go/src/github.com/docker/libnetwork/networkdb/delegate.go:60 +0x1ad
github.com/docker/libnetwork/networkdb.(*NetworkDB).handleNodeMessage(0xc0001e6ea0, 0xc000025ac0, 0x12, 0x20)
	/go/src/github.com/docker/libnetwork/networkdb/delegate.go:299 +0x121
github.com/docker/libnetwork/networkdb.(*NetworkDB).handleMessage(0xc0001e6ea0, 0xc00051a006, 0x16, 0xfffa, 0xc0003a6500)
	/go/src/github.com/docker/libnetwork/networkdb/delegate.go:383 +0x2dc
github.com/docker/libnetwork/networkdb.(*delegate).NotifyMsg(0xc000010540, 0xc00051a006, 0x16, 0xfffa)
	/go/src/github.com/docker/libnetwork/networkdb/delegate.go:402 +0x72
github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist.(*Memberlist).handleUser(...)
	/go/src/github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist/net.go:550
github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist.(*Memberlist).packetHandler(0xc0000c2500)
	/go/src/github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist/net.go:390 +0x18e
created by github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist.newMemberlist
	/go/src/github.com/docker/libnetwork/vendor/github.com/hashicorp/memberlist/memberlist.go:180 +0x406
exit status 2
FAIL	github.com/docker/libnetwork/networkdb	13.464s
make: *** [Makefile:129: unit-tests-local] Error 1
make: *** [Makefile:123: unit-tests] Error 2

We tried to contribute a fix for this issue (see armon/go-radix#12 and armon/go-radix#14), but this was rejected by the package's author, because the package itself is not designed for concurrent operations armon/go-radix#14 (comment):

t seems like the only way to hit this issue is to be doing concurrent read/write operations without synchronization. This means a delete took place concurrently with a walk. This fix might avoid a particular race condition, but the library itself is not designed for concurrent read/write and there are likely other edge cases that are not handled.

We should look at NetworkDB and either add synchronisation there, or look for an alternative dependency that supports concurrency.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions