Skip to content

fast-reboot fails when LAG has VLAN membership #4793

@nazariig

Description

@nazariig

Description

fast-reboot fails while calling /usr/bin/fast-reboot-dump.py:

Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012  File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012    res = main()#012  File "/usr/bin/fast-reboot-dump.py", line 294, in main#012    garp_send(arp_entries, map_mac_ip_per_vlan)#012  File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012  File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'

The issue happens when /usr/bin/fast-reboot-dump.py tries to process VLAN LAG members:

def get_fdb(db, vlan_name, vlan_id, bridge_id_2_iface):
    fdb_types = {
      'SAI_FDB_ENTRY_TYPE_DYNAMIC': 'dynamic',
      'SAI_FDB_ENTRY_TYPE_STATIC' : 'static'
    }

    bvid = get_vlan_oid_by_vlan_id(db, vlan_id)
    available_macs = set()
    map_mac_ip = {}
    fdb_entries = []
    keys = db.keys(db.ASIC_DB, 'ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:{*\"bvid\":\"%s\"*}' % bvid)
    keys = [] if keys is None else keys
    for key in keys:
        key_obj = json.loads(key.replace('ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:', ''))
        mac = str(key_obj['mac'])
        if not is_mac_unicast(mac):
            continue
        available_macs.add((vlan_name, mac.lower()))
        fdb_mac = mac.replace(':', '-')
        # get attributes
        value = db.get_all(db.ASIC_DB, key)
        fdb_type = fdb_types[value['SAI_FDB_ENTRY_ATTR_TYPE']]
        if value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID'] not in bridge_id_2_iface:
            continue
        fdb_port = bridge_id_2_iface[value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']]

        obj = {
          'FDB_TABLE:Vlan%d:%s' % (vlan_id, fdb_mac) : {
            'type': fdb_type,
            'port': fdb_port,
          },
          'OP': 'SET'
        }

        fdb_entries.append(obj)
        map_mac_ip[mac.lower()] = fdb_port

    return fdb_entries, available_macs, map_mac_ip

vlan_id:
10

key_obj:
{u'mac': u'24:8A:07:7E:41:80', u'bvid': u'oid:0x260000000005bb', u'switch_id': u'oid:0x21000000000000'}

value:
{'SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID': 'oid:0x3a0000000005bd', 'SAI_FDB_ENTRY_ATTR_PACKET_ACTION': 'SAI_PACKET_ACTION_FORWARD', 'SAI_FDB_ENTRY_ATTR_TYPE': 'SAI_FDB_ENTRY_TYPE_DYNAMIC'}

bridge_id_2_iface:
{'oid:0x3a00000000064c': 'Ethernet56'}

value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']:
0x3a0000000005bd
def garp_send(arp_entries, map_mac_ip_per_vlan):
    ETH_P_ALL = 0x03

    # generate source ip addresses for arp packets
    src_ip_addrs = {vlan_name:get_iface_ip_addr(vlan_name) for vlan_name,_,_ in arp_entries}

    # generate source mac addresses for arp packets
    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}

arp_entries:
[('Vlan10', '24:8a:07:7e:41:80', '10.0.1.1'), ('Vlan10', '24:8a:07:7e:41:80', '2000')]

map_mac_ip_per_vlan:
{'Vlan23': {}, 'Vlan10': {}}

SONiC DB info:

root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x3a0000000005bd'
ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd

root@sonic:/home/admin# redis-cli -n 1 HGETALL 'ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd'
1) "SAI_BRIDGE_PORT_ATTR_TYPE"
2) "SAI_BRIDGE_PORT_TYPE_PORT"
3) "SAI_BRIDGE_PORT_ATTR_PORT_ID"
4) "oid:0x20000000005b5"
5) "SAI_BRIDGE_PORT_ATTR_ADMIN_STATE"
6) "true"
7) "SAI_BRIDGE_PORT_ATTR_FDB_LEARNING_MODE"
8) "SAI_BRIDGE_PORT_FDB_LEARNING_MODE_HW"

root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x20000000005b5'
ASIC_STATE:SAI_OBJECT_TYPE_LAG:oid:0x20000000005b5

The result of operation is invalid since bridge_id_2_iface doesn't have mapping for LAG bridge interfaces.

Steps to reproduce the issue:

  1. Connect two DUT with LAG
  2. Add LAG to VLAN RIF as tagged member
  3. Setup BGP session over VLAN RIF

Describe the results you received:
fast-reboot fails

Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012  File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012    res = main()#012  File "/usr/bin/fast-reboot-dump.py", line 294, in main#012    garp_send(arp_entries, map_mac_ip_per_vlan)#012  File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012  File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012    src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'

Describe the results you expected:
fast-reboot shouldn't fail

Additional information you deem important (e.g. issue happens only occasionally):

Output of show version:

SONiC Software Version: SONiC.201911.113-093d7731
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 093d7731
Build date: Sun Jun 14 03:45:40 UTC 2020
Built by: johnar@jenkins-worker-8

Platform: x86_64-mlnx_msn2100-r0
HwSKU: ACS-MSN2100
ASIC: mellanox
Uptime: 19:13:15 up 10:56,  3 users,  load average: 0.19, 0.32, 0.48

Docker images:
REPOSITORY                    TAG                   IMAGE ID            SIZE
docker-syncd-mlnx             201911.113-093d7731   d11fa0617162        386MB
docker-syncd-mlnx             latest                d11fa0617162        386MB
docker-router-advertiser      201911.113-093d7731   da9d108cabc9        285MB
docker-router-advertiser      latest                da9d108cabc9        285MB
docker-sonic-mgmt-framework   201911.113-093d7731   deba713cbabb        425MB
docker-sonic-mgmt-framework   latest                deba713cbabb        425MB
docker-platform-monitor       201911.113-093d7731   cedcaec571f9        647MB
docker-platform-monitor       latest                cedcaec571f9        647MB
docker-fpm-frr                201911.113-093d7731   578bdd07c4c0        330MB
docker-fpm-frr                latest                578bdd07c4c0        330MB
docker-sflow                  201911.113-093d7731   3c8863e5a96a        310MB
docker-sflow                  latest                3c8863e5a96a        310MB
docker-lldp-sv2               201911.113-093d7731   15d73e30c0e9        307MB
docker-lldp-sv2               latest                15d73e30c0e9        307MB
docker-dhcp-relay             201911.113-093d7731   65346705abce        295MB
docker-dhcp-relay             latest                65346705abce        295MB
docker-database               201911.113-093d7731   b98668b03299        285MB
docker-database               latest                b98668b03299        285MB
docker-teamd                  201911.113-093d7731   d983d6a99831        310MB
docker-teamd                  latest                d983d6a99831        310MB
docker-snmp-sv2               201911.113-093d7731   0c821c3e62ce        343MB
docker-snmp-sv2               latest                0c821c3e62ce        343MB
docker-orchagent              201911.113-093d7731   ea84da2dedc9        328MB
docker-orchagent              latest                ea84da2dedc9        328MB
docker-nat                    201911.113-093d7731   0ade55a3c7a3        311MB
docker-nat                    latest                0ade55a3c7a3        311MB
docker-sonic-telemetry        201911.113-093d7731   9f3fe08edde6        349MB
docker-sonic-telemetry        latest                9f3fe08edde6        349MB

Attach debug file sudo generate_dump:

root@sonic:/home/admin# show int status
      Interface        Lanes    Speed    MTU    Alias             Vlan    Oper    Admin             Type    Asym PFC
---------------  -----------  -------  -----  -------  ---------------  ------  -------  ---------------  ----------
      Ethernet0            0      25G   9100     etp1           routed    down     down  QSFP28 or later         N/A
      Ethernet4            4      25G   9100     etp2           routed    down     down   SFP/SFP+/SFP28         N/A
      Ethernet8            8      25G   9100     etp3           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet12           12      25G   9100     etp4           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet16           16      25G   9100     etp5           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet20           20      25G   9100     etp6           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet24           24      25G   9100     etp7           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet28           28      25G   9100     etp8           routed    down     down   SFP/SFP+/SFP28         N/A
     Ethernet32           32      25G   9100     etp9  PortChannel0001      up       up   SFP/SFP+/SFP28         N/A
     Ethernet36           36      25G   9100    etp10  PortChannel0001      up       up   SFP/SFP+/SFP28         N/A
     Ethernet40           40      25G   9100   etp11a  PortChannel0002    down     down  QSFP28 or later         N/A
     Ethernet41           41      25G   9100   etp11b  PortChannel0002    down     down  QSFP28 or later         N/A
     Ethernet42           42      25G   9100   etp11c           routed    down     down  QSFP28 or later         N/A
     Ethernet43           43      25G   9100   etp11d           routed    down     down  QSFP28 or later         N/A
     Ethernet44           44      25G   9100   etp12a           routed    down     down  QSFP28 or later         N/A
     Ethernet45           45      25G   9100   etp12b           routed    down     down  QSFP28 or later         N/A
     Ethernet46           46      25G   9100   etp12c           routed    down     down  QSFP28 or later         N/A
     Ethernet47           47      25G   9100   etp12d           routed    down     down  QSFP28 or later         N/A
     Ethernet48  48,49,50,51     100G   9100    etp13           routed    down     down  QSFP28 or later         N/A
     Ethernet52  52,53,54,55     100G   9100    etp14           routed    down     down  QSFP28 or later         N/A
     Ethernet56  56,57,58,59     100G   9100    etp15            trunk    down     down  QSFP28 or later         N/A
     Ethernet60  60,61,62,63     100G   9100    etp16           routed    down     down  QSFP28 or later         N/A
PortChannel0001          N/A      50G   9100      N/A           routed      up       up              N/A         N/A
PortChannel0002          N/A      50G   9100      N/A           routed    down       up              N/A         N/A

root@sonic:/home/admin# show int po
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
       S - selected, D - deselected, * - not synced
  No.  Team Dev         Protocol     Ports
-----  ---------------  -----------  ---------------------------
 0001  PortChannel0001  LACP(A)(Up)  Ethernet32(S) Ethernet36(S)
 0002  PortChannel0002  LACP(A)(Dw)  Ethernet41(D) Ethernet40(D)

root@sonic:/home/admin# show vlan brief
+-----------+----------------+-----------------+----------------+-----------------------+
|   VLAN ID | IP Address     | Ports           | Port Tagging   | DHCP Helper Address   |
+===========+================+=================+================+=======================+
|        10 | 10.0.1.2/24    | PortChannel0001 | tagged         |                       |
|           | 2000:1::2/64   |                 |                |                       |
+-----------+----------------+-----------------+----------------+-----------------------+
|        23 | 100.2.3.1/24   | Ethernet56      | tagged         |                       |
|           | 2000:2:3::1/64 |                 |                |                       |
+-----------+----------------+-----------------+----------------+-----------------------+

root@sonic:/home/admin# show ip int
Interface        Master    IPv4 address/mask    Admin/Oper    BGP Neighbor    Neighbor IP
---------------  --------  -------------------  ------------  --------------  -------------
Ethernet60                 100.2.4.1/24         down/down     IXIA2.4         100.2.4.2
Loopback0                  1.1.1.2/32           up/up         N/A             N/A
PortChannel0002            10.0.2.2/24          up/down       Aux             10.0.2.1
Vlan10                     10.0.1.2/24          up/up         Aux             10.0.1.1
Vlan23                     100.2.3.1/24         up/up         IXIA2.3         100.2.3.2
docker0                    240.127.1.1/24       up/down       N/A             N/A
eth0                       10.210.25.44/22      up/up         N/A             N/A
lo                         127.0.0.1/8          up/up         N/A             N/A

root@sonic:/home/admin# show ip bgp su

IPv4 Unicast Summary:
BGP router identifier 1.1.1.2, local AS number 65200 vrf-id 0
BGP table version 9
RIB entries 9, using 1656 bytes of memory
Peers 4, using 82 KiB of memory
Peer groups 4, using 256 bytes of memory

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   NeighborName
10.0.1.1        4      65100    9725    9726        0    0    0 02:42:00            3   Aux
10.0.2.1        4      65100    2513    2516        0    0    0 02:41:34       Active   Aux
100.2.3.2       4      65023       0       0        0    0    0    never       Active   IXIA2.3
100.2.4.2       4      65024       0       0        0    0    0    never       Active   IXIA2.4

Total number of neighbors 4

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions