Skip to content

[Mellanox] Add support for set/get system led status#17

Closed
Junchao-Mellanox wants to merge 2 commits intomasterfrom
system-led-api
Closed

[Mellanox] Add support for set/get system led status#17
Junchao-Mellanox wants to merge 2 commits intomasterfrom
system-led-api

Conversation

@Junchao-Mellanox
Copy link
Copy Markdown
Owner

- Why I did it

System health feature needs to set/get system led status

- How I did it

  1. Add a led object in chassis class and initialize it when the API is called on host side
  2. Read/write system led system fs to get/set the status

- How to verify it

Manual test on SN2700

- Description for the changelog

- A picture of a cute animal (not mandatory but encouraged)

@Junchao-Mellanox
Copy link
Copy Markdown
Owner Author

Community PR: sonic-net#4829

Junchao-Mellanox pushed a commit that referenced this pull request Jul 31, 2020
* src/sonic-telemetry fa8d498...3bd7ca3 (4):
  > Update gnmi deps (#40)
  > [testdata] Update SFP keys to align with new standard (#39)
  > Fixed the parameters for subscribe APIs (#38)
  > Azure ro mode (#34)

* src/sonic-mgmt-common 444aa9a...cc01ce4 (4):
  > Make gnmi dep version the same as in telemetry repo (#17)
  > Cleanup translib and cvl go test cases (#13)
  > Package update and enhancements/fixes in YGOT, and Request Binder (#12)
  > Translib phase I changes (#11)

Note: sonic-telemetry submodule update is dependent upon sonic-mgmt-common submodule update, thus updating both in this patch
Junchao-Mellanox pushed a commit that referenced this pull request Aug 10, 2020
* src/sonic-ztp c959371...dd025bc (2):
  > Update all references to new 'sonic-installer' file name (#18)
  > Filter out non-printable characters read from syseeprom (#17)
Junchao-Mellanox pushed a commit that referenced this pull request Jan 17, 2022
* [BFN] Updated platform APIs impl

Signed-off-by: Andriy Kokhan <[email protected]>

* Extended BFN platform SFP APIs implementation

* Update sfp.py

* [BFN] Extended SFP platform plugin implementation

Signed-off-by: Andriy Kokhan <[email protected]>

* [BFN] Extended Fans platform plugin implementation

* [BFN] divided classes Fan and  FanDrawer into 2 files

* Signed-off-by: Vadym Yashchenko <[email protected]>

What I did
	Add get_model() function
	Add get_low_critical_threshold() function
	Change __get(...) function.
How I did it
	Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API

* Add get_presence() function and revised __get() function

Signed-off-by: Vadym Yashchenko <[email protected]>

* [BFN] Updated PSU platform APIs impl

Signed-off-by: Dmytro Lytvynenko <[email protected]>

* Added BFN PSU cache (#9)

Signed-off-by: Andriy Kokhan <[email protected]>

* [BFN]  Fans and Fantray platform APIs update (#7)

* [BFN] Updated SFP platform APIs (#10)

Signed-off-by: Volodymyr Boyko <[email protected]>

* [BFN] Updated platform API for thermal (#8)

* Signed-off-by: Vadym Yashchenko <[email protected]>

* Revert "[BFN]  Fans and Fantray platform APIs update (#7)" (#11)

This reverts commit c62a733.

* Add support health monitor system (#15)

Signed-off-by: Petro Bratash <[email protected]>

* Update chassis.py

* [BFN] Updated FANs and FAN Tray platform API (#14)

* Fix fix_alignment (#17)

Signed-off-by: Petro Bratash <[email protected]>

* [BFN] Improvement show environment (#16)

* Added PSU temperature skip into platform.json (#18)

Signed-off-by: Andriy Kokhan <[email protected]>

* Do not skip psud on Newport

Signed-off-by: Andriy Kokhan <[email protected]>

* [BFN] fix fan status from Not OK to Ok (#19)

* [BFN] Updated SFP platform plugin (#13)

Signed-off-by: Volodymyr Boyko <[email protected]>

* [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21)

Signed-off-by: Mykola Gerasymenko <[email protected]>

* [barefoot] Tmp fix vendor_rev (#22)

Signed-off-by: Volodymyr Boyko <[email protected]>

* Fixed python issues in sonic_platform/fan_drawer.py

Signed-off-by: Andriy Kokhan <[email protected]>

* Updated fan_drawer.py

* Fixing trailing white spaces in fan_drawer.py

* [BFN] Fix thrift for SFPs API

Signed-off-by: Volodymyr Boyko <[email protected]>

* In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue

Signed-off-by: Andriy Kokhan <[email protected]>

* [Newport] Thermal manager  (#23)

* Signed-off-by: Vadym Yashchenko <[email protected]>

* Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue"

This reverts commit 1e73127.

* Removed 'controllable' options from platform.json to fix factory default config generation

Signed-off-by: Andriy Kokhan <[email protected]>

* Update thermal_manager.py

* Migrated SFP plugin to sonic_xcvr API (#30)

Signed-off-by: Andriy Kokhan <[email protected]>

Co-authored-by: KostiantynYarovyiBf <[email protected]>
Co-authored-by: Vadym Yashchenko <[email protected]>
Co-authored-by: Dmytro Lytvynenko <[email protected]>
Co-authored-by: Volodymyr Boiko <[email protected]>
Co-authored-by: Petro Bratash <[email protected]>
Co-authored-by: Mykola Gerasymenko <[email protected]>
Junchao-Mellanox pushed a commit that referenced this pull request Feb 8, 2022
[sonic-linkmgrd][master] submodule update

ef1f5eb Jing Zhang Feb 3 09:37:25 2022 [linkmgrd] linkmgrd subscribes MUX_CABLE_INFO table to handle peer OIR events (#17)
bcd74b4 Jing Zhang Feb 1 09:52:00 2022 Collect ICMP packet loss information (#14)

sign-off: Jing Zhang [email protected]
Junchao-Mellanox pushed a commit that referenced this pull request Mar 14, 2022
[sonic-linkmgrd][202012] submodule update

ef1f5eb Jing Zhang Feb 3 09:37:25 2022 [linkmgrd] linkmgrd subscribes MUX_CABLE_INFO table to handle peer OIR events (#17)
bcd74b4 Jing Zhang Feb 1 09:52:00 2022 Collect ICMP packet loss information (#14)

sign-off: Jing Zhang [email protected]
Junchao-Mellanox pushed a commit that referenced this pull request Mar 13, 2024
…tically (sonic-net#18017)

#### Why I did it
src/sonic-dash-api
```
* ec15bc7 - (HEAD -> master, origin/master, origin/HEAD) Revert "rename VnetMapping.action_type" (#17) (2 hours ago) [Ze Gan]
* ad0f59e - Add unspecified default value to all enums (2 days ago) [Lawrence Lee]
*   dd844b1 - Merge branch 'add-enum-default' of github.com:theasianpianist/sonic-dash-api into add-enum-default (4 days ago) [Lawrence Lee]
|\  
| * 4b31135 - Merge branch 'master' into add-enum-default (4 days ago) [Lawrence Lee]
* | 4b41ea7 - rename VnetMapping.action_type (4 days ago) [Lawrence Lee]
|/  
* b1ab99f - Add unspecified default value to all enums (4 days ago) [Lawrence Lee]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Dec 11, 2024
* Update to Linux 6.1.94
* Integrate HW-MGMT 7.0040.1008 Changes (#17)
* Update DNX kernel module build
* Update kernel and saibcm-modules-dnx to versions on branch

Signed-off-by: Saikrishna Arcot <[email protected]>
Co-authored-by: Vivek <[email protected]>
Junchao-Mellanox pushed a commit that referenced this pull request Dec 11, 2024
…7250E platform (sonic-net#20367)

Update sonic-platform submodule for Nokia-IXR7250E:
Fixes Nokia-ION/ndk#57

cdfbbe2 [H4-32D]Update platform modules after OC tests (Update README.md #17)
f28eff0 [H4-64D]Fix SFP+ port, eeprom, reboot-cause, thermal algorithm, add PSU input voltage check (Fix rules in Makefiles #15)
178e15a Minor watchdog change for better retention of last kick stamp
c479392 Remove rogue platform_reboot file
331abe0 Enhance watchdog script to detect fsde device hung signature
4c6b7c1 Fixed update temperature issue
5002fb7 Remove average and maximum
c620130 No PSU Master status led in IMM. No need to set it

Signed-off-by: mlok <[email protected]>
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…tically (sonic-net#678)

#### Why I did it
src/sonic-sairedis
```
* fcf2cd0 - (HEAD -> 202412, origin/HEAD, origin/202412) [hash] update ECMP/LAG hash VS lib with SAI_NATIVE_HASH_FIELD_IPV6_FLOW_LABEL (#17) (6 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…omatically (sonic-net#696)

#### Why I did it
src/sonic-swss-common
```
* b750cc1 - (HEAD -> 202412, origin/HEAD, origin/202412) [code sync] Merge code from sonic-net/sonic-swss-common:202411 to 202412 (#17) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Mar 7, 2025
…tomatically (sonic-net#695)

#### Why I did it
src/sonic-linux-kernel
```
* b2ed221 - (HEAD -> 202412, origin/HEAD, origin/202412) [optoe] Reset page select byte to 0 before upper memory access on page 0h (sonic-net#464) (#17) (21 hours ago) [mssonicbld]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Junchao-Mellanox pushed a commit that referenced this pull request Jun 6, 2025
…UT so that we can get back to back Paladin ports up with Arista-7060X6-16PE-384C-O128S2 (sonic-net#1144)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it

Currently when we loaded HWSKU `Arista-7060X6-16PE-384C-O128S2` on two moby devices and connect their Paladin ports back to back, we can't get link up. It may help if we can get these links up and run the tests.

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it

Created a new `FANOUT` HWSKU containing special lanemap and polarity configs so that we can load `Arista-7060X6-16PE-384C-O128S2` on one Moby and `Arista-7060X6-16PE-384C-O128S2-FANOUT` and get Paladin ports up when connecting them back to back with the following setup:
```
Moby1 Moby2
HWSKU: Arista-7060X6-16PE-384C-O128S2 HWSKU: Arista-7060X6-16PE-384C-O128S2-FANOUT
#17 <-> #18
#19 <-> #20
#21 <-> #22
#23 <-> #24

#18 <-> #17
#20 <-> #19
#22 <-> #21
#24 <-> #23
```

#### How to verify it
Verified that all the Paladin ports can link up with the above setup.

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
- [ ] 202211
- [ ] 202305
- [x] msft-202412

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->
- [x] msft-202412

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
Created `Arista-7060X6-16PE-384C-O128S2-FANOUT` based on `Arista-7060X6-16PE-384C-O128S2` and only update lanemap and polarity settings in bcm config.

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
Junchao-Mellanox pushed a commit that referenced this pull request Mar 6, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <[email protected]>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, #7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, #14)
- Fix misleading 'critical path estimate' comment (#4)
- Fix parallelism timeline comment (60s not 10s) (#5)
- Include after-relationship packages in fan stats (#6)
- Guard disk I/O division by zero when INTERVAL<=1 (#8)
- Remove unused elapsed_line variable (#9)
- Remove redundant LIBSWSSCOMMON_DBG check (#10)
- Remove active_make_jobs from CSV header comment (#11)
- Wire up _RDEPENDS parsing to build reverse deps (#12)
- Remove unnecessary 'if v' filter on rdeps JSON (#13)
- Remove unused REPORT_FORMAT parameter (#15)
- Add cycle detection to critical path algorithm (#16)
- Add execute permission check for companion scripts (#17)

Signed-off-by: Rustiqly <[email protected]>

---------

Signed-off-by: Rustiqly <[email protected]>
Co-authored-by: Rustiqly <[email protected]>
Junchao-Mellanox pushed a commit that referenced this pull request Mar 31, 2026
…dating udevd rules (sonic-net#26343)

- Why I did it
On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs.

This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase.

Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#4  0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#5  0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#6  0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#7  0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#8  0x0000559f295519cf in ?? ()
#9  0x0000559f29553a77 in ?? ()
#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#14 0x0000559f29545820 in ?? ()
#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000559f29545c51 in ?? ()

- How I did it
Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op.

- How to verify it
Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step
Reboot the switch
Verify no new systemd-udevd coredumps in /var/core/
Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID )
Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running)
Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules

Signed-off-by: Hemanth Kumar Tirupati <[email protected]>
Junchao-Mellanox pushed a commit that referenced this pull request Apr 7, 2026
…dating udevd rules (sonic-net#26573)

<!--
 Please make sure you've read and understood our contributing guidelines:
 https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

 failure_prs.log skip_prs.log Make sure all your commits include a signature generated with `git commit -s` **

 If this is a bug fix, make sure your description includes "fixes #xxxx", or
 "closes #xxxx" or "resolves #xxxx"

 Please provide the following information:
-->

#### Why I did it
On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs.

This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase.

```
Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#4 0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#5 0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#6 0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#7 0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#8 0x0000559f295519cf in ?? ()
#9 0x0000559f29553a77 in ?? ()
#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#14 0x0000559f29545820 in ?? ()
#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#17 0x0000559f29545c51 in ?? ()

```

#### How I did it
Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op.

#### How to verify it

<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
- Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step
- Reboot the switch
- Verify no new systemd-udevd coredumps in /var/core/
- Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID <X> (systemd udevd is PID <Y>)
- Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running)
- Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules

Signed-off-by: Sonic Build Admin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants