Is it platform specific
generic
Importance or Severity
Critical
Description of the bug
When mgmt VRF is enabled on SmartSwitch, orchagent.sh may bind the northbound ZMQ server to the eth0 management IP (in the mgmt VRF netns), causing ZMQ bind failure and orchagent/swss containers to crash-loop.
Steps to Reproduce
-
On a SmartSwitch NPU running the failing image, set DEVICE_METADATA|localhost subtype=SmartSwitch and configure a management IP on eth0 (MGMT_INTERFACE).
-
Enable management VRF: set MGMT_VRF_CONFIG|vrf_global mgmtVrfEnabled=true.
-
Apply the config: sudo config save -y then sudo config reload -y (or reboot).
-
Observe swss crash-looping and orchagent exiting; docker logs swss shows ZMQ init/bind failure (binds to the eth0 mgmt IP).
Actual Behavior and Expected Behavior
Actual behavior: After enabling mgmtVrfEnabled=true (and applying via config reload/reboot), orchagent fails to start (ZMQ bind/init fails due to choosing the eth0 mgmt IP), causing the swss container to crash-loop and related containers/services may flap.
Expected behavior: Enabling management VRF should not impact orchagent startup; swss should remain stable and orchagent should bind ZMQ to a valid address (e.g., loopback or midplane interface in the correct namespace) and continue running normally.
Relevant log output
You can see it in the log sequence:
Create ZMQ server with address: tcp://10.3.150.162 (mgmt eth0 IP while mgmt VRF is enabled)
zmq_bind failed on endpoint … zmqerrno: 99 (Cannot assign requested address)
terminate called after throwing an instance of 'std::runtime_error'
orchagent terminated by SIGABRT (core dumped)
process orchagent exited unexpectedly → supervisor restarts/terminates swss (container crash-loop)
Output of show version, show techsupport
Attach files (if any)
No response
Is it platform specific
generic
Importance or Severity
Critical
Description of the bug
When mgmt VRF is enabled on SmartSwitch, orchagent.sh may bind the northbound ZMQ server to the eth0 management IP (in the mgmt VRF netns), causing ZMQ bind failure and orchagent/swss containers to crash-loop.
Steps to Reproduce
On a SmartSwitch NPU running the failing image, set DEVICE_METADATA|localhost subtype=SmartSwitch and configure a management IP on eth0 (MGMT_INTERFACE).
Enable management VRF: set MGMT_VRF_CONFIG|vrf_global mgmtVrfEnabled=true.
Apply the config: sudo config save -y then sudo config reload -y (or reboot).
Observe swss crash-looping and orchagent exiting; docker logs swss shows ZMQ init/bind failure (binds to the eth0 mgmt IP).
Actual Behavior and Expected Behavior
Actual behavior: After enabling
mgmtVrfEnabled=true(and applying viaconfig reload/reboot),orchagentfails to start (ZMQ bind/init fails due to choosing the eth0 mgmt IP), causing theswsscontainer to crash-loop and related containers/services may flap.Expected behavior: Enabling management VRF should not impact
orchagentstartup;swssshould remain stable andorchagentshould bind ZMQ to a valid address (e.g., loopback or midplane interface in the correct namespace) and continue running normally.Relevant log output
Output of
show version,show techsupportAttach files (if any)
No response