Description
When a MAC that's learnt remotely, is learnt local the move doesn't work. The issue is due to a race condition. On detecting mac move, fdbsyncd tries to delete remote mac first and then replace it with local mac. However frr will try to reinstall the remote mac as the local vtep cannot delete remote mac. This results in remote mac being reinstalled after the local mac is updated which results mac pointing to remote vtep even after the move.
Logs from fdbsyncd:
Oct 12 02:40:03.850516 sonic INFO swss#orchagent: message repeated 56 times: [ :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS]
Oct 12 02:40:03.850516 sonic INFO swss#orchagent: :- update: FDB event:2, MAC: 00:01:00:00:00:01 , BVID: 0x26000000000b0d , bridge port ID: 0x3a000000000b0e.
Oct 12 02:40:03.850516 sonic INFO swss#orchagent: :- update: Received MOVE event for bvid=0x26000000000b0d mac=00:01:00:00:00:01 port=0x3a000000000b0e
Oct 12 02:40:03.850516 sonic INFO swss#orchagent: :- storeFdbEntryState: FdbOrch notification: mac 00:01:00:00:00:01 was inserted in port Ethernet64 into bv_id 0x26000000000b0d
Oct 12 02:40:03.850516 sonic INFO swss#orchagent: :- storeFdbEntryState: m_entries size=1 mac=00:01:00:00:00:01 port=0x3a000000000b0e
Oct 12 02:40:03.850827 sonic INFO swss#fdbsyncd: :- processStateFdb: FDBSYNCD STATE FDB updates key=Vlan11:00:01:00:00:00:01, operation=SET
Oct 12 02:40:03.850827 sonic INFO swss#fdbsyncd: :- processStateFdb: FDBSYNCD STATE FDB updates : FvFiels port, FvValues: Ethernet64
Oct 12 02:40:03.850827 sonic INFO swss#fdbsyncd: :- processStateFdb: FDBSYNCD STATE FDB updates : FvFiels type, FvValues: dynamic
Oct 12 02:40:03.856634 sonic INFO swss#fdbsyncd: :- macDelVxlanEntry: Success cmd: bridge fdb del 00:01:00:00:00:01 dev VTEP1-11 dst 2.2.2.2 vlan 11, res=, ret=0
Oct 12 02:40:03.856634 sonic INFO swss#fdbsyncd: :- updateLocalMac: Local learn event deleting from VXLAN table DEL_KEY Vlan11:00:01:00:00:00:01
Oct 12 02:40:03.856634 sonic INFO swss#fdbsyncd: :- macDelVxlan: DEL_KEY Vlan11:00:01:00:00:00:01 vtep:2.2.2.2 type:dynamic
Oct 12 02:40:03.856634 sonic NOTICE swss#fdbsyncd: :- macDelVxlanDB: VXLAN_FDB_TABLE: DEL_KEY Vlan11:00:01:00:00:00:01 vtep:2.2.2.2 type:dynamic
Oct 12 02:40:03.857513 sonic INFO swss#orchagent: :- removeFdbEntry: FdbOrch RemoveFDBEntry: mac=00:01:00:00:00:01 bv_id=0x26000000000b0dorigin 4
Oct 12 02:40:03.857513 sonic INFO swss#orchagent: :- removeFdbEntry: FdbOrch RemoveFDBEntry: mac=00:01:00:00:00:01 fdb origin is different; found_origin:1 delete_origin:4
Oct 12 02:40:03.861031 sonic INFO swss#fdbsyncd: :- updateLocalMac: cmd: bridge fdb replace 00:01:00:00:00:01 dev Ethernet64 master dynamic vlan 11, res=, ret=0
Oct 12 02:40:03.861031 sonic INFO swss#fdbsyncd: :- onMsgNbr: Tunnel IP 2.2.2.2 Int33686018
Oct 12 02:40:03.861125 sonic INFO swss#fdbsyncd: :- onMsgNbr: Tunnel IP 2.2.2.2 Int33686018
Oct 12 02:40:03.861125 sonic INFO swss#fdbsyncd: :- macAddVxlan: VXLAN_FDB_TABLE: ADD_KEY Vlan11:00:01:00:00:00:01 vtep:2.2.2.2 type:dynamic
Oct 12 02:40:03.861865 sonic INFO swss#orchagent: :- addFdbEntry: mac=00:01:00:00:00:01 bv_id=0x26000000000b0d port_name=Port_SRC_VTEP_1.1.1.1 type=dynamic origin=4 remote_ip=2.2.2.2
Oct 12 02:40:03.861865 sonic INFO swss#orchagent: :- addFdbEntry: MAC-Update FDB 00:01:00:00:00:01 in Vlan11 on from-Ethernet64:to-Port_SRC_VTEP_1.1.1.1 from-dynamic:to-dynamic origin-1-to-4
Oct 12 02:40:03.862440 qa-eth-vt03-1-4600ca1 NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[1833]- set_dispatch_attrib_handler: Set TYPE, key:fdb entry mac [00:01:00:00:00:01] bv_id b00000026 (VLAN), val:STATIC
Oct 12 02:40:03.864124 qa-eth-vt03-1-4600ca1 NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[1833]- set_dispatch_attrib_handler: Set ALLOW_MAC_MOVE, key:fdb entry mac [00:01:00:00:00:01] bv_id b00000026 (VLAN), val:true
Oct 12 02:40:03.865344 qa-eth-vt03-1-4600ca1 NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[1833]- set_dispatch_attrib_handler: Set BRIDGE_PORT_ID, key:fdb entry mac [00:01:00:00:00:01] bv_id b00000026 (VLAN), val:BRIDGE_PORT,(0:0),205,0000,0
Oct 12 02:40:03.866570 qa-eth-vt03-1-4600ca1 NOTICE syncd#SDK: [SAI_UTILS.NOTICE] mlnx_sai_utils.c[1833]- set_dispatch_attrib_handler: Set ENDPOINT_IP, key:fdb entry mac [00:01:00:00:00:01] bv_id b00000026 (VLAN), val:2.2.2.2
Oct 12 02:40:03.867593 sonic INFO swss#fdbsyncd: :- processStateFdb: FDBSYNCD STATE FDB updates key=Vlan11:00:01:00:00:00:01, operation=DEL
Oct 12 02:40:03.867619 sonic INFO swss#fdbsyncd: :- macCheckSrcDB: DEL_KEY Vlan11:00:01:00:00:00:01
Oct 12 02:40:03.870828 sonic INFO swss#fdbsyncd: :- updateLocalMac: cmd: bridge fdb del 00:01:00:00:00:01 dev Ethernet64 master dynamic vlan 11, res=, ret=0
Oct 12 02:40:03.870828 sonic INFO swss#fdbsyncd: :- macRefreshStateDB: Refreshing Vlan:11 MAC route MAC:00:01:00:00:00:01 Key Vlan11:00:01:00:00:00:01
Oct 12 02:40:05.055843 sonic INFO swss#orchagent: :- set: setting attribute 0x10000004 status: SAI_STATUS_SUCCESS
Steps to reproduce the issue:
- Using EVPN learn mac in VTEP2 and advertise it as remote mac to VTEP1
- Try to learn the mac locally again in VTEP1
- Check show vxlan remotemac tables
Describe the results you received:
The mac is not learnt locally. It still points remote
Describe the results you expected:
The mac should move to local. It shouldn't point to remote.
Output of show version:
SONiC Software Version: SONiC.202205.42-ea51d9514_Internal
Distribution: Debian 11.5
Kernel: 5.10.0-12-2-amd64
Build commit: ea51d9514
Build date: Fri Oct 7 05:45:56 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci03-243
Platform: x86_64-mlnx_msn4600c-r0
HwSKU: ACS-MSN4600C
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2140X00042
Model Number: MSN4600-CS2FO_QP
Hardware Revision: A1
Uptime: 04:16:10 up 6:39, 2 users, load average: 0.53, 0.55, 0.56
Date: Wed 12 Oct 2022 04:16:10
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx 202205.42-ea51d9514_Internal 97dd12cebe1e 859MB
docker-syncd-mlnx latest 97dd12cebe1e 859MB
docker-orchagent 202205.42-ea51d9514_Internal 347f0cdc723f 478MB
docker-orchagent latest 347f0cdc723f 478MB
docker-fpm-frr 202205.42-ea51d9514_Internal ddadceae2d69 488MB
docker-fpm-frr latest ddadceae2d69 488MB
docker-teamd 202205.42-ea51d9514_Internal 28f79f968d3c 459MB
docker-teamd latest 28f79f968d3c 459MB
docker-platform-monitor 202205.42-ea51d9514_Internal 629c9ea03cf2 861MB
docker-platform-monitor latest 629c9ea03cf2 861MB
docker-macsec latest a7ea8b95281f 461MB
docker-snmp 202205.42-ea51d9514_Internal 0e96a62d07ee 488MB
docker-snmp latest 0e96a62d07ee 488MB
docker-dhcp-relay latest 8cef09a39edf 452MB
docker-lldp 202205.42-ea51d9514_Internal 337146c6b971 485MB
docker-lldp latest 337146c6b971 485MB
docker-mux 202205.42-ea51d9514_Internal 464339799d55 492MB
docker-mux latest 464339799d55 492MB
docker-sonic-telemetry 202205.42-ea51d9514_Internal 7fc604d28c7c 523MB
docker-sonic-telemetry latest 7fc604d28c7c 523MB
docker-database 202205.42-ea51d9514_Internal 98a7bdcfd7e8 443MB
docker-database latest 98a7bdcfd7e8 443MB
docker-router-advertiser 202205.42-ea51d9514_Internal f05c810acb38 443MB
docker-router-advertiser latest f05c810acb38 443MB
docker-nat 202205.42-ea51d9514_Internal 272fda2cdf1a 430MB
docker-nat latest 272fda2cdf1a 430MB
docker-sflow 202205.42-ea51d9514_Internal 5723c8d63918 428MB
docker-sflow latest 5723c8d63918 428MB
docker-sonic-mgmt-framework 202205.42-ea51d9514_Internal 0fd3a3d91b98 557MB
docker-sonic-mgmt-framework latest 0fd3a3d91b98 557MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh 1.3.1-202205 4e8b9199b984 643MB
Output of show techsupport:
VTEP 1: qa-eth-vt03-1-4600ca1
VTEP 2: qa-eth-vt03-2-3700v
MAC used: 00:01:00:00:00:01
First learnt locally in VTEP2. VTEP1 learns it as remote mac throug EVPN.
Then traffic with same mac is sent again on VTEP1 local port. However mac move doesn't happen.
Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_qa-eth-vt03-1-4600ca1_20221012_033236.tar.gz
sonic_dump_qa-eth-vt03-2-3700v_20221012_033232.tar.gz
Description
When a MAC that's learnt remotely, is learnt local the move doesn't work. The issue is due to a race condition. On detecting mac move, fdbsyncd tries to delete remote mac first and then replace it with local mac. However frr will try to reinstall the remote mac as the local vtep cannot delete remote mac. This results in remote mac being reinstalled after the local mac is updated which results mac pointing to remote vtep even after the move.
Logs from fdbsyncd:
Steps to reproduce the issue:
Describe the results you received:
The mac is not learnt locally. It still points remote
Describe the results you expected:
The mac should move to local. It shouldn't point to remote.
Output of
show version:Output of
show techsupport:Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_qa-eth-vt03-1-4600ca1_20221012_033236.tar.gz
sonic_dump_qa-eth-vt03-2-3700v_20221012_033232.tar.gz