Description
fast-reboot fails while calling /usr/bin/fast-reboot-dump.py:
Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012 File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012 res = main()#012 File "/usr/bin/fast-reboot-dump.py", line 294, in main#012 garp_send(arp_entries, map_mac_ip_per_vlan)#012 File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012 src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012 File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012 src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'
The issue happens when /usr/bin/fast-reboot-dump.py tries to process VLAN LAG members:
def get_fdb(db, vlan_name, vlan_id, bridge_id_2_iface):
fdb_types = {
'SAI_FDB_ENTRY_TYPE_DYNAMIC': 'dynamic',
'SAI_FDB_ENTRY_TYPE_STATIC' : 'static'
}
bvid = get_vlan_oid_by_vlan_id(db, vlan_id)
available_macs = set()
map_mac_ip = {}
fdb_entries = []
keys = db.keys(db.ASIC_DB, 'ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:{*\"bvid\":\"%s\"*}' % bvid)
keys = [] if keys is None else keys
for key in keys:
key_obj = json.loads(key.replace('ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:', ''))
mac = str(key_obj['mac'])
if not is_mac_unicast(mac):
continue
available_macs.add((vlan_name, mac.lower()))
fdb_mac = mac.replace(':', '-')
# get attributes
value = db.get_all(db.ASIC_DB, key)
fdb_type = fdb_types[value['SAI_FDB_ENTRY_ATTR_TYPE']]
if value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID'] not in bridge_id_2_iface:
continue
fdb_port = bridge_id_2_iface[value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']]
obj = {
'FDB_TABLE:Vlan%d:%s' % (vlan_id, fdb_mac) : {
'type': fdb_type,
'port': fdb_port,
},
'OP': 'SET'
}
fdb_entries.append(obj)
map_mac_ip[mac.lower()] = fdb_port
return fdb_entries, available_macs, map_mac_ip
vlan_id:
10
key_obj:
{u'mac': u'24:8A:07:7E:41:80', u'bvid': u'oid:0x260000000005bb', u'switch_id': u'oid:0x21000000000000'}
value:
{'SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID': 'oid:0x3a0000000005bd', 'SAI_FDB_ENTRY_ATTR_PACKET_ACTION': 'SAI_PACKET_ACTION_FORWARD', 'SAI_FDB_ENTRY_ATTR_TYPE': 'SAI_FDB_ENTRY_TYPE_DYNAMIC'}
bridge_id_2_iface:
{'oid:0x3a00000000064c': 'Ethernet56'}
value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']:
0x3a0000000005bd
def garp_send(arp_entries, map_mac_ip_per_vlan):
ETH_P_ALL = 0x03
# generate source ip addresses for arp packets
src_ip_addrs = {vlan_name:get_iface_ip_addr(vlan_name) for vlan_name,_,_ in arp_entries}
# generate source mac addresses for arp packets
src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}
arp_entries:
[('Vlan10', '24:8a:07:7e:41:80', '10.0.1.1'), ('Vlan10', '24:8a:07:7e:41:80', '2000')]
map_mac_ip_per_vlan:
{'Vlan23': {}, 'Vlan10': {}}
SONiC DB info:
root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x3a0000000005bd'
ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd
root@sonic:/home/admin# redis-cli -n 1 HGETALL 'ASIC_STATE:SAI_OBJECT_TYPE_BRIDGE_PORT:oid:0x3a0000000005bd'
1) "SAI_BRIDGE_PORT_ATTR_TYPE"
2) "SAI_BRIDGE_PORT_TYPE_PORT"
3) "SAI_BRIDGE_PORT_ATTR_PORT_ID"
4) "oid:0x20000000005b5"
5) "SAI_BRIDGE_PORT_ATTR_ADMIN_STATE"
6) "true"
7) "SAI_BRIDGE_PORT_ATTR_FDB_LEARNING_MODE"
8) "SAI_BRIDGE_PORT_FDB_LEARNING_MODE_HW"
root@sonic:/home/admin# redis-cli -n 1 KEYS '*' | grep '0x20000000005b5'
ASIC_STATE:SAI_OBJECT_TYPE_LAG:oid:0x20000000005b5
The result of operation is invalid since bridge_id_2_iface doesn't have mapping for LAG bridge interfaces.
Steps to reproduce the issue:
- Connect two DUT with LAG
- Add LAG to VLAN RIF as tagged member
- Setup BGP session over VLAN RIF
Describe the results you received:
fast-reboot fails
Jun 17 18:48:23.632648 sonic ERR fast-reboot-dump: Got an exception '24:8a:07:7e:41:80': Traceback: Traceback (most recent call last):#012 File "/usr/bin/fast-reboot-dump.py", line 301, in <module>#012 res = main()#012 File "/usr/bin/fast-reboot-dump.py", line 294, in main#012 garp_send(arp_entries, map_mac_ip_per_vlan)#012 File "/usr/bin/fast-reboot-dump.py", line 229, in garp_send#012 src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012 File "/usr/bin/fast-reboot-dump.py", line 229, in <setcomp>#012 src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries}#012KeyError: '24:8a:07:7e:41:80'
Describe the results you expected:
fast-reboot shouldn't fail
Additional information you deem important (e.g. issue happens only occasionally):
Output of show version:
SONiC Software Version: SONiC.201911.113-093d7731
Distribution: Debian 9.12
Kernel: 4.9.0-11-2-amd64
Build commit: 093d7731
Build date: Sun Jun 14 03:45:40 UTC 2020
Built by: johnar@jenkins-worker-8
Platform: x86_64-mlnx_msn2100-r0
HwSKU: ACS-MSN2100
ASIC: mellanox
Uptime: 19:13:15 up 10:56, 3 users, load average: 0.19, 0.32, 0.48
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx 201911.113-093d7731 d11fa0617162 386MB
docker-syncd-mlnx latest d11fa0617162 386MB
docker-router-advertiser 201911.113-093d7731 da9d108cabc9 285MB
docker-router-advertiser latest da9d108cabc9 285MB
docker-sonic-mgmt-framework 201911.113-093d7731 deba713cbabb 425MB
docker-sonic-mgmt-framework latest deba713cbabb 425MB
docker-platform-monitor 201911.113-093d7731 cedcaec571f9 647MB
docker-platform-monitor latest cedcaec571f9 647MB
docker-fpm-frr 201911.113-093d7731 578bdd07c4c0 330MB
docker-fpm-frr latest 578bdd07c4c0 330MB
docker-sflow 201911.113-093d7731 3c8863e5a96a 310MB
docker-sflow latest 3c8863e5a96a 310MB
docker-lldp-sv2 201911.113-093d7731 15d73e30c0e9 307MB
docker-lldp-sv2 latest 15d73e30c0e9 307MB
docker-dhcp-relay 201911.113-093d7731 65346705abce 295MB
docker-dhcp-relay latest 65346705abce 295MB
docker-database 201911.113-093d7731 b98668b03299 285MB
docker-database latest b98668b03299 285MB
docker-teamd 201911.113-093d7731 d983d6a99831 310MB
docker-teamd latest d983d6a99831 310MB
docker-snmp-sv2 201911.113-093d7731 0c821c3e62ce 343MB
docker-snmp-sv2 latest 0c821c3e62ce 343MB
docker-orchagent 201911.113-093d7731 ea84da2dedc9 328MB
docker-orchagent latest ea84da2dedc9 328MB
docker-nat 201911.113-093d7731 0ade55a3c7a3 311MB
docker-nat latest 0ade55a3c7a3 311MB
docker-sonic-telemetry 201911.113-093d7731 9f3fe08edde6 349MB
docker-sonic-telemetry latest 9f3fe08edde6 349MB
Attach debug file sudo generate_dump:
root@sonic:/home/admin# show int status
Interface Lanes Speed MTU Alias Vlan Oper Admin Type Asym PFC
--------------- ----------- ------- ----- ------- --------------- ------ ------- --------------- ----------
Ethernet0 0 25G 9100 etp1 routed down down QSFP28 or later N/A
Ethernet4 4 25G 9100 etp2 routed down down SFP/SFP+/SFP28 N/A
Ethernet8 8 25G 9100 etp3 routed down down SFP/SFP+/SFP28 N/A
Ethernet12 12 25G 9100 etp4 routed down down SFP/SFP+/SFP28 N/A
Ethernet16 16 25G 9100 etp5 routed down down SFP/SFP+/SFP28 N/A
Ethernet20 20 25G 9100 etp6 routed down down SFP/SFP+/SFP28 N/A
Ethernet24 24 25G 9100 etp7 routed down down SFP/SFP+/SFP28 N/A
Ethernet28 28 25G 9100 etp8 routed down down SFP/SFP+/SFP28 N/A
Ethernet32 32 25G 9100 etp9 PortChannel0001 up up SFP/SFP+/SFP28 N/A
Ethernet36 36 25G 9100 etp10 PortChannel0001 up up SFP/SFP+/SFP28 N/A
Ethernet40 40 25G 9100 etp11a PortChannel0002 down down QSFP28 or later N/A
Ethernet41 41 25G 9100 etp11b PortChannel0002 down down QSFP28 or later N/A
Ethernet42 42 25G 9100 etp11c routed down down QSFP28 or later N/A
Ethernet43 43 25G 9100 etp11d routed down down QSFP28 or later N/A
Ethernet44 44 25G 9100 etp12a routed down down QSFP28 or later N/A
Ethernet45 45 25G 9100 etp12b routed down down QSFP28 or later N/A
Ethernet46 46 25G 9100 etp12c routed down down QSFP28 or later N/A
Ethernet47 47 25G 9100 etp12d routed down down QSFP28 or later N/A
Ethernet48 48,49,50,51 100G 9100 etp13 routed down down QSFP28 or later N/A
Ethernet52 52,53,54,55 100G 9100 etp14 routed down down QSFP28 or later N/A
Ethernet56 56,57,58,59 100G 9100 etp15 trunk down down QSFP28 or later N/A
Ethernet60 60,61,62,63 100G 9100 etp16 routed down down QSFP28 or later N/A
PortChannel0001 N/A 50G 9100 N/A routed up up N/A N/A
PortChannel0002 N/A 50G 9100 N/A routed down up N/A N/A
root@sonic:/home/admin# show int po
Flags: A - active, I - inactive, Up - up, Dw - Down, N/A - not available,
S - selected, D - deselected, * - not synced
No. Team Dev Protocol Ports
----- --------------- ----------- ---------------------------
0001 PortChannel0001 LACP(A)(Up) Ethernet32(S) Ethernet36(S)
0002 PortChannel0002 LACP(A)(Dw) Ethernet41(D) Ethernet40(D)
root@sonic:/home/admin# show vlan brief
+-----------+----------------+-----------------+----------------+-----------------------+
| VLAN ID | IP Address | Ports | Port Tagging | DHCP Helper Address |
+===========+================+=================+================+=======================+
| 10 | 10.0.1.2/24 | PortChannel0001 | tagged | |
| | 2000:1::2/64 | | | |
+-----------+----------------+-----------------+----------------+-----------------------+
| 23 | 100.2.3.1/24 | Ethernet56 | tagged | |
| | 2000:2:3::1/64 | | | |
+-----------+----------------+-----------------+----------------+-----------------------+
root@sonic:/home/admin# show ip int
Interface Master IPv4 address/mask Admin/Oper BGP Neighbor Neighbor IP
--------------- -------- ------------------- ------------ -------------- -------------
Ethernet60 100.2.4.1/24 down/down IXIA2.4 100.2.4.2
Loopback0 1.1.1.2/32 up/up N/A N/A
PortChannel0002 10.0.2.2/24 up/down Aux 10.0.2.1
Vlan10 10.0.1.2/24 up/up Aux 10.0.1.1
Vlan23 100.2.3.1/24 up/up IXIA2.3 100.2.3.2
docker0 240.127.1.1/24 up/down N/A N/A
eth0 10.210.25.44/22 up/up N/A N/A
lo 127.0.0.1/8 up/up N/A N/A
root@sonic:/home/admin# show ip bgp su
IPv4 Unicast Summary:
BGP router identifier 1.1.1.2, local AS number 65200 vrf-id 0
BGP table version 9
RIB entries 9, using 1656 bytes of memory
Peers 4, using 82 KiB of memory
Peer groups 4, using 256 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
10.0.1.1 4 65100 9725 9726 0 0 0 02:42:00 3 Aux
10.0.2.1 4 65100 2513 2516 0 0 0 02:41:34 Active Aux
100.2.3.2 4 65023 0 0 0 0 0 never Active IXIA2.3
100.2.4.2 4 65024 0 0 0 0 0 never Active IXIA2.4
Total number of neighbors 4
Description
fast-reboot fails while calling
/usr/bin/fast-reboot-dump.py:The issue happens when
/usr/bin/fast-reboot-dump.pytries to process VLAN LAG members:def get_fdb(db, vlan_name, vlan_id, bridge_id_2_iface): fdb_types = { 'SAI_FDB_ENTRY_TYPE_DYNAMIC': 'dynamic', 'SAI_FDB_ENTRY_TYPE_STATIC' : 'static' } bvid = get_vlan_oid_by_vlan_id(db, vlan_id) available_macs = set() map_mac_ip = {} fdb_entries = [] keys = db.keys(db.ASIC_DB, 'ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:{*\"bvid\":\"%s\"*}' % bvid) keys = [] if keys is None else keys for key in keys: key_obj = json.loads(key.replace('ASIC_STATE:SAI_OBJECT_TYPE_FDB_ENTRY:', '')) mac = str(key_obj['mac']) if not is_mac_unicast(mac): continue available_macs.add((vlan_name, mac.lower())) fdb_mac = mac.replace(':', '-') # get attributes value = db.get_all(db.ASIC_DB, key) fdb_type = fdb_types[value['SAI_FDB_ENTRY_ATTR_TYPE']] if value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID'] not in bridge_id_2_iface: continue fdb_port = bridge_id_2_iface[value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']] obj = { 'FDB_TABLE:Vlan%d:%s' % (vlan_id, fdb_mac) : { 'type': fdb_type, 'port': fdb_port, }, 'OP': 'SET' } fdb_entries.append(obj) map_mac_ip[mac.lower()] = fdb_port return fdb_entries, available_macs, map_mac_ip vlan_id: 10 key_obj: {u'mac': u'24:8A:07:7E:41:80', u'bvid': u'oid:0x260000000005bb', u'switch_id': u'oid:0x21000000000000'} value: {'SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID': 'oid:0x3a0000000005bd', 'SAI_FDB_ENTRY_ATTR_PACKET_ACTION': 'SAI_PACKET_ACTION_FORWARD', 'SAI_FDB_ENTRY_ATTR_TYPE': 'SAI_FDB_ENTRY_TYPE_DYNAMIC'} bridge_id_2_iface: {'oid:0x3a00000000064c': 'Ethernet56'} value['SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID']: 0x3a0000000005bddef garp_send(arp_entries, map_mac_ip_per_vlan): ETH_P_ALL = 0x03 # generate source ip addresses for arp packets src_ip_addrs = {vlan_name:get_iface_ip_addr(vlan_name) for vlan_name,_,_ in arp_entries} # generate source mac addresses for arp packets src_ifs = {map_mac_ip_per_vlan[vlan_name][dst_mac] for vlan_name, dst_mac, _ in arp_entries} arp_entries: [('Vlan10', '24:8a:07:7e:41:80', '10.0.1.1'), ('Vlan10', '24:8a:07:7e:41:80', '2000')] map_mac_ip_per_vlan: {'Vlan23': {}, 'Vlan10': {}}SONiC DB info:
The result of operation is invalid since
bridge_id_2_ifacedoesn't have mapping for LAG bridge interfaces.Steps to reproduce the issue:
Describe the results you received:
fast-reboot fails
Describe the results you expected:
fast-reboot shouldn't fail
Additional information you deem important (e.g. issue happens only occasionally):
Output of
show version:Attach debug file
sudo generate_dump: