Vendoring libnetwork & libkv with fixes#17278
Conversation
This carries fixes for - Internal racy /etc/hosts updates within container during SD - Renable SD service record watch after cluster-store restarts - Fix to allow remote IPAM driver to return no IP if the user prefers - Fix to allow --fixed-cidr and --bip to be in same range Signed-off-by: Madhu Venugopal <[email protected]>
Signed-off-by: Madhu Venugopal <[email protected]>
Signed-off-by: Alessandro Boch <[email protected]>
|
@abronan can you PTAL & confirm the libkv changes ? |
|
@mavenugo the test fails for me locally |
|
@tiborvass which ones ? the CI seems to be happy so far... |
|
@mavenugo seems unrelated, will investigate |
There was a problem hiding this comment.
The logic of this Watch wrapper is so intricate and confusing, I had hard times understanding what is was doing and how it was handling the watch failure. This can be simplified by splitting this in two functions, one managing the watch loop and the other handling the retry process (similar to the way it's done in kv discovery in docker for example).
The use of labels is also totally unnecessary and the fact that this goto at the end is going back to the beginning of the goroutine screams that you should use a simple loop and/or split the logic in two functions 😿
There was a problem hiding this comment.
@abronan we can certainly talk about splitting it into multiple functions (which I have to see how best to organize it).
But the watch logic is actually an interesting golang's channel pattern for 1->many receivers. Once you understand the intention, you can appreciate @mrjana's idea on managing it this way. So, let me explain the logic a bit. The only watch that libnetwork has today is the network watch (not a watchtree) for SD. Since libkv doesn't (and cannot) provide a retry logic for watches, its the applications responsibility to do retrigger the watch when the store restarts. We need a trigger to restart the watch for all the created and active networks when the store is back online. Instead of writing a registration mechanism for a broadcast channel pattern, we used a simple & elegant trick for this notification. By default if there are multiple receivers for a channel and there is no data to be sent on that channel, they will all be in blocked state. When the channel is closed, every blocked receiver will be notified. We use this pattern to trigger all the waiting go-routines to start the watch again when the store is up. This reduces the code-complexity greatly. Since we are not looking for any actual data on the receivers but just a trigger to restart the watch, this fits perfectly well. Also, as you can see, we dont even do a periodic retry to connect to the data-store. that also helps to reduce the code complexity greatly.
There was a problem hiding this comment.
I understand the intent, this is a simple barrier to restart the watches when the store is back up (as this is not a WatchTree but a set of Watch).
I still think that the logic can be greatly improved by reorganizing and splitting this function in two (one for the retry logic and handling the barrier, and the other for the Watch loop). This would be much cleaner and you also get rid of the labels..
|
LGTM on the libkv side. Just a remark on the code logic of the Watch in |
|
LGTM |
Vendoring libnetwork & libkv with fixes
Addresses a few issues