Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.

Fix a livelock in bgp advertisement code.#32

Merged
pavel-shirshov merged 1 commit intosonic-net:debian/0.99.24.1from
pavel-shirshov:pavelsh/fix_fifo
Apr 5, 2019
Merged

Fix a livelock in bgp advertisement code.#32
pavel-shirshov merged 1 commit intosonic-net:debian/0.99.24.1from
pavel-shirshov:pavelsh/fix_fifo

Conversation

@pavel-shirshov
Copy link
Copy Markdown
Contributor

quagga prepares a list of bgp advertisements for every peer in
peer->sync[afi][safi]->update structure. bgp_write_packet()
function goes through the list of withdraws first, and if there
is no withdraws it goes through the list of updatas. If the update
list has updates quagga checks if those update were received from
a peer, which supports "Graceful restart capabilty". If the peer
doesn't support it, quagga sends the advertisement structure to
the bgp_update_packet() right away. But if the peer supports
"Graceful restart" capability quagga does some additional checks.
Quagga checks that:

  1. the originating peer announced "Graceful restart capability"
  2. our side announced "Graceful restart capability" to the originating
    peer
  3. both the originating side and our side don't set R-bit together
  4. the advertisement contains a route which is not STALE
  5. the advertisement not for SAFI_MPLS_VPN.

Item 3 contained a bug. By RFC we must NOT wait for the End-of-Rib
if the peer sent us R-bit. But quagga didn't wait for End-of-Rib
only if both the peer and our side sent R-bit together.

So when we found an advertisement ready for update and the advertisement
was originated from the peer, which didn't send us End-Of-Rib quagga
stops to go through the list of updates despite the fact the peer is not
goibg to set us End-Of-Rib and announced that by R-bit.
The issue is even worse. If the list contains updates which could be
announce, we will not announce them, until we send all blocked updates.

This bug doesn't fire at quagga start, because quagga sends the
R-bit enabled 120 seconds (check BGP_DEFAULT_RESTART_TIME),
but after that quagga stopped sending the R-bit, and after that
quagga wait for End-Of-Rib from the peer, which announced that
it wouldn't send it to us.

The patch fix the issue by removing R-bit check from our side. We don't
wait for End-of-Rib if the originating peer sent us enabled R-bit.

Excerpt from RFC4724:
The most significant bit is defined as the Restart State (R)
bit, which can be used to avoid possible deadlock caused by
waiting for the End-of-RIB marker when multiple BGP speakers
peering with each other restart. When set (value 1), this bit
indicates that the BGP speaker has restarted, and its peer MUST
NOT wait for the End-of-RIB marker from the speaker before
advertising routing information to the speaker.

See https://tools.ietf.org/html/rfc4724#section-3

quagga prepares a list of bgp advertisements for every peer in
peer->sync[afi][safi]->update structure. bgp_write_packet()
function goes through the list of withdraws first, and if there
is no withdraws it goes through the list of updatas. If the update
list has updates quagga checks if those update were received from
a peer, which supports "Graceful restart capabilty". If the peer
doesn't support it, quagga sends the advertisement structure to
the bgp_update_packet() right away. But if the peer supports
"Graceful restart" capability quagga does some additional checks.
Quagga checks that:
1. the originating peer announced "Graceful restart capability"
2. our side announced "Graceful restart capability" to the originating
   peer
3. both the originating side and our side don't set R-bit together
4. the advertisement contains a route which is not STALE
5. the advertisement not for SAFI_MPLS_VPN.

Item 3 contained a bug. By RFC we must NOT wait for the End-of-Rib
if the peer sent us R-bit. But quagga didn't wait for End-of-Rib
only if both the peer and our side sent R-bit together.

So when we found an advertisement ready for update and the advertisement
was originated from the peer, which didn't send us End-Of-Rib quagga
stops to go through the list of updates despite the fact the peer is not
goibg to set us End-Of-Rib and announced that by R-bit.
The issue is even worse. If the list contains updates which could be
announce, we will not announce them, until we send all blocked updates.

This bug doesn't fire at quagga start, because quagga sends the
R-bit enabled 120 seconds (check BGP_DEFAULT_RESTART_TIME),
but after that quagga stopped sending the R-bit, and after that
quagga wait for End-Of-Rib from the peer, which announced that
it wouldn't send it to us.

The patch fix the issue by removing R-bit check from our side. We don't
wait for End-of-Rib if the originating peer sent us enabled R-bit.

Excerpt from RFC4724:
The most significant bit is defined as the Restart State (R)
bit, which can be used to avoid possible deadlock caused by
waiting for the End-of-RIB marker when multiple BGP speakers
peering with each other restart.  When set (value 1), this bit
indicates that the BGP speaker has restarted, and its peer MUST
NOT wait for the End-of-RIB marker from the speaker before
advertising routing information to the speaker.

See https://tools.ietf.org/html/rfc4724#section-3
@pavel-shirshov pavel-shirshov self-assigned this Mar 27, 2019
@pavel-shirshov pavel-shirshov requested a review from lguohan March 27, 2019 20:15
@wendani
Copy link
Copy Markdown

wendani commented Mar 27, 2019

👍

@pavel-shirshov pavel-shirshov merged commit e5ede97 into sonic-net:debian/0.99.24.1 Apr 5, 2019
@pavel-shirshov pavel-shirshov deleted the pavelsh/fix_fifo branch April 5, 2019 18:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants