EEPROM issue on Bluefield-2

Hi,

I am having an issue with a NVIDIA BlueField-2 port not coming up. With the correct cable plugged in the port ethtool output shows:

Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
Port: Other
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Link detected: no (EEPROM issue)

The port does not detect link, and it reports an EEPROM issue.
I have tried power-cycling the DPU as suggested in Known Issues - NVIDIA Docs with no luck.
The same issue was previously fixed with a cold reboot of the server hosting the DPU, but we’ve made multiple attempts (even uplugged and plugged the card) and it didn’t work.

Does anyone know what specifically causes Link detected: no (EEPROM issue) on BlueField-2?What additional diagnostics should I run?

Thanks.

Hello,

The mlx5 driver reports this when it fails to read the transceiver module’s EEPROM over the I²C/management interface. This prevents the firmware from identifying the cable type, negotiating speed/FEC, and bringing the link up.

Please check which FW version you are running: flint -d q full

If running version below 24.39.4082 or 24.42.1000, I suggest upgrading the FW to the latest as there was a known bug in these versions.

I would generally recommend making sure you are on the latest FW and driver versions.

After upgrading please make sure to power-cycle the host.

If the issue is not solved after that, please go ahead and open a support case directly with Enterprise Support and it will be handled based on entitlement.

Thanks,

Jonathan.