Skip to content

BGP graceful restart timeout err on peer if perform warm-restart on SONiC device #2958

@wangxin

Description

@wangxin

Description

The warm-reboot testing failed on lastest master image: SONiC.HEAD.978-6d62249. On peer device, below BGP_GRACEFUL_RESTART_TIMEOUT error was observed while warm-reboot was performed on SONiC device.

Below is the log observed on peer device (Arista VM):

May 30 17:58:45 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state Established event Closed new state Idle
May 30 17:58:45 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state Established event Closed new state Idle
May 30 18:00:45 ARISTA01T1 Rib: %BGP-5-BGP_GRACEFUL_RESTART_TIMEOUT: Deleting stale routes from peer 10.0.0.56 (AS 65100)
May 30 18:00:45 ARISTA01T1 Rib: %BGP-5-BGP_GRACEFUL_RESTART_TIMEOUT: Deleting stale routes from peer fc00::71 (AS 65100)
May 30 18:01:13 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:13 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:16 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state Established event Closed new state Idle
May 30 18:01:17 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state Established event Closed new state Idle
May 30 18:01:26 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer fc00::71 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established
May 30 18:01:28 ARISTA01T1 Rib: %BGP-5-ADJCHANGE: peer 10.0.0.56 (AS 65100) old state OpenConfirm event RecvKeepAlive new state Established

Before quagga was replaced with frr, the graceful restart time was configured to 240 seconds in #2754. However, this configuration was not in templates/frr.conf.j2. And the frr graceful restart time is default to 120 seconds. Actually, the frr takes longer than that to do graceful restart. This caused the warm-reboot testing failed.

Steps to reproduce the issue:

  1. BGP is configured between SONiC device and peer device (Aristra VM for example)
  2. Perform warm-reboot on SONiC
  3. Monitor the ip routes on peer device.

Describe the results you received:
When the BGP_GRACEFUL_RESTART_TIMEOUT error was observed on Arista VM, the ip routes learnt from SONiC was removed.

Describe the results you expected:
The routes should not be removed during warm-reboot.

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.HEAD.978-6d62249
Distribution: Debian 9.9
Kernel: 4.9.0-8-2-amd64
Build commit: 6d62249
Build date: Sun May 26 13:48:25 UTC 2019
Built by: johnar@jenkins-worker-4

Attach debug file sudo generate_dump:

```
(paste your output here)
```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions