This repository was archived by the owner on Oct 12, 2022. It is now read-only.
Fix a livelock in bgp advertisement code.#32
Merged
pavel-shirshov merged 1 commit intosonic-net:debian/0.99.24.1from Apr 5, 2019
Merged
Fix a livelock in bgp advertisement code.#32pavel-shirshov merged 1 commit intosonic-net:debian/0.99.24.1from
pavel-shirshov merged 1 commit intosonic-net:debian/0.99.24.1from
Conversation
quagga prepares a list of bgp advertisements for every peer in peer->sync[afi][safi]->update structure. bgp_write_packet() function goes through the list of withdraws first, and if there is no withdraws it goes through the list of updatas. If the update list has updates quagga checks if those update were received from a peer, which supports "Graceful restart capabilty". If the peer doesn't support it, quagga sends the advertisement structure to the bgp_update_packet() right away. But if the peer supports "Graceful restart" capability quagga does some additional checks. Quagga checks that: 1. the originating peer announced "Graceful restart capability" 2. our side announced "Graceful restart capability" to the originating peer 3. both the originating side and our side don't set R-bit together 4. the advertisement contains a route which is not STALE 5. the advertisement not for SAFI_MPLS_VPN. Item 3 contained a bug. By RFC we must NOT wait for the End-of-Rib if the peer sent us R-bit. But quagga didn't wait for End-of-Rib only if both the peer and our side sent R-bit together. So when we found an advertisement ready for update and the advertisement was originated from the peer, which didn't send us End-Of-Rib quagga stops to go through the list of updates despite the fact the peer is not goibg to set us End-Of-Rib and announced that by R-bit. The issue is even worse. If the list contains updates which could be announce, we will not announce them, until we send all blocked updates. This bug doesn't fire at quagga start, because quagga sends the R-bit enabled 120 seconds (check BGP_DEFAULT_RESTART_TIME), but after that quagga stopped sending the R-bit, and after that quagga wait for End-Of-Rib from the peer, which announced that it wouldn't send it to us. The patch fix the issue by removing R-bit check from our side. We don't wait for End-of-Rib if the originating peer sent us enabled R-bit. Excerpt from RFC4724: The most significant bit is defined as the Restart State (R) bit, which can be used to avoid possible deadlock caused by waiting for the End-of-RIB marker when multiple BGP speakers peering with each other restart. When set (value 1), this bit indicates that the BGP speaker has restarted, and its peer MUST NOT wait for the End-of-RIB marker from the speaker before advertising routing information to the speaker. See https://tools.ietf.org/html/rfc4724#section-3
|
👍 |
lguohan
approved these changes
Mar 28, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
quagga prepares a list of bgp advertisements for every peer in
peer->sync[afi][safi]->update structure. bgp_write_packet()
function goes through the list of withdraws first, and if there
is no withdraws it goes through the list of updatas. If the update
list has updates quagga checks if those update were received from
a peer, which supports "Graceful restart capabilty". If the peer
doesn't support it, quagga sends the advertisement structure to
the bgp_update_packet() right away. But if the peer supports
"Graceful restart" capability quagga does some additional checks.
Quagga checks that:
peer
Item 3 contained a bug. By RFC we must NOT wait for the End-of-Rib
if the peer sent us R-bit. But quagga didn't wait for End-of-Rib
only if both the peer and our side sent R-bit together.
So when we found an advertisement ready for update and the advertisement
was originated from the peer, which didn't send us End-Of-Rib quagga
stops to go through the list of updates despite the fact the peer is not
goibg to set us End-Of-Rib and announced that by R-bit.
The issue is even worse. If the list contains updates which could be
announce, we will not announce them, until we send all blocked updates.
This bug doesn't fire at quagga start, because quagga sends the
R-bit enabled 120 seconds (check BGP_DEFAULT_RESTART_TIME),
but after that quagga stopped sending the R-bit, and after that
quagga wait for End-Of-Rib from the peer, which announced that
it wouldn't send it to us.
The patch fix the issue by removing R-bit check from our side. We don't
wait for End-of-Rib if the originating peer sent us enabled R-bit.
Excerpt from RFC4724:
The most significant bit is defined as the Restart State (R)
bit, which can be used to avoid possible deadlock caused by
waiting for the End-of-RIB marker when multiple BGP speakers
peering with each other restart. When set (value 1), this bit
indicates that the BGP speaker has restarted, and its peer MUST
NOT wait for the End-of-RIB marker from the speaker before
advertising routing information to the speaker.
See https://tools.ietf.org/html/rfc4724#section-3