Skip to content

Added support for Pensando-elba platform for trixie build#25518

Merged
yxieca merged 6 commits intosonic-net:masterfrom
SahilChaudhari:202511_bringup
Feb 26, 2026
Merged

Added support for Pensando-elba platform for trixie build#25518
yxieca merged 6 commits intosonic-net:masterfrom
SahilChaudhari:202511_bringup

Conversation

@SahilChaudhari
Copy link
Copy Markdown
Contributor

@SahilChaudhari SahilChaudhari commented Feb 15, 2026

Why I did it

Latest master branch is moved to trixie environment which uses 6.12 kernel. Modified dsc-drivers, pensando scripts, makefiles and plugins to support 6.12 kernel.

Work item tracking
  • Microsoft ADO (number only):

How I did it

git clone https://github.com/sonic-net/sonic-buildimage.git
<path_to_sonic-builldimage>: make init
<path_to_sonic-builldimage>: make configure PLATFORM=pensando PLATFORM_ARCH=arm64
cd <path_to_sonic-builldimage>/platform/pensando/pensando-sonic-artifacts
<path_to_sonic-builldimage>/platform/pensando/pensando-sonic-artifacts: gh release download 1.87.0-SS-18-release
<path_to_sonic-builldimage>: make target/sonic-pensando.tar

How to verify it

load image on Pensando dpu on Mtfuji DSS. All dockers should be up and running and all interfaces should be up

Which release branch to backport (provide reason below if selected)

  • 202305
  • 202311
  • 202405
  • 202411
  • 202505
  • 202511

Tested branch (Please provide the tested image version)

  • <20251102.23>

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@SahilChaudhari SahilChaudhari changed the title 202511 bringup Added support for Pensando-elba platform for trixie build Feb 15, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@SahilChaudhari SahilChaudhari marked this pull request as ready for review February 23, 2026 13:42
Comment on lines +55 to +64
# Mount /var/log as tmpfs if not already mounted (required for monit filesystem check)
if ! mountpoint -q /var/log; then
mount -t tmpfs -o size=100M,mode=0755 tmpfs /var/log
# Recreate essential log directories
mkdir -p /var/log/journal
mkdir -p /var/log/swss
mkdir -p /var/log/sonic
fi
systemd-tmpfiles --create --prefix /var/log/journal
systemctl restart systemd-journald
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added here? Shouldn't this be handled from initramfs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it with varlog_size cmdline variable in platform.conf

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny
Copy link
Copy Markdown
Contributor

prsunny commented Feb 23, 2026

Related PR - sonic-net/sonic-linux-kernel#538

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

@vvolam vvolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Please address Saikrishna's comment

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@prabhataravind
Copy link
Copy Markdown
Contributor

hi @saiarcot895 please review the updates from Sahil and sign-off when you get a chance. hi @vmittal-msft please help merge this to 202511 once all checks pass.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the Pensando-elba platform on Debian trixie with the 6.12 kernel. The changes primarily involve platform driver updates to support the new kernel version, hardware revision detection for different board variants (Mtfuji V1 and V2), and improvements to DPU initialization and health monitoring.

Changes:

  • Updated kernel drivers (ionic, mdev, pciesvc) for 6.12 kernel API compatibility
  • Added board revision detection and hardware-specific sensor mappings for Mtfuji V1 and V2 variants
  • Refactored DPU health monitoring and initialization scripts
  • Modified systemd service dependencies for Pensando platform

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
platform/pensando/sonic-platform-modules-dpu/sonic_platform/watchdog.py Changed watchdog device path from watchdog1 to watchdog0 for trixie
platform/pensando/sonic-platform-modules-dpu/sonic_platform/thermal.py Added board revision detection and separate sensor mappings for Mtfuji V1 and V2
platform/pensando/sonic-platform-modules-dpu/sonic_platform/sensor.py Added voltage/current sensor mappings per board revision
platform/pensando/sonic-platform-modules-dpu/sonic_platform/pcie.py New PCIe utility implementation for device discovery
platform/pensando/sonic-platform-modules-dpu/sonic_platform/helper.py Added get_board_rev() and get_slot_id() helper methods
platform/pensando/sonic-platform-modules-dpu/sonic_platform/component.py Fixed firmware version parsing to handle new image naming
platform/pensando/sonic-platform-modules-dpu/sonic_platform/chassis.py Updated sensor counts per board revision and added slot ID masking
platform/pensando/sonic-platform-modules-dpu/setup.py Pinned grpcio-tools version to <=1.66.2
platform/pensando/sonic-platform-modules-dpu/dpu/utils/fetch_dpu_status Refactored health monitoring with data plane state management
platform/pensando/sonic-platform-modules-dpu/dpu/utils/dpu_pensando_util.py Added function to disable unused containers
platform/pensando/sonic-platform-modules-dpu/dpu/utils/dpu_db_util.py Removed duplicate code and simplified helper functions
files/dsc/dpu.init Added SSH key regeneration and robust DHCP handling with timeout logic
platform/pensando/platform.conf Added varlog_size boot parameter and SSH host key cleanup
platform/pensando/dsc-drivers/src/drivers/linux/pciesvc/* Implemented unity build approach for kexec compatibility on 6.12 kernel
platform/pensando/dsc-drivers/src/drivers/linux/mdev/mdev_drv.c Updated mdev_remove signature and class_create call for 6.12 API
platform/pensando/dsc-drivers/src/drivers/linux/eth/ionic/* Updated ethtool and tracing APIs for 6.12 kernel
platform/pensando/dsc-drivers/debian/* Updated module installation paths for trixie build
platform/pensando/docker-syncd-pensando/* Changed base image from bullseye to bookworm
files/build_templates/sonic_debian_extension.j2 Added Pensando-specific systemd service masking and dependencies
device/pensando/arm64-elba-asic-flash128-r0/plugins/ssd_util.py Added SSD utility using eMMC backend
build_debian.sh Updated device tree paths for new kernel naming convention

self._api_helper = APIHelper()
self.index = thermal_index + 1
self.board_id = g_board_id
self.board_rev = g_board_rev
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing global declaration for g_board_rev. Line 72 accesses the global variable g_board_rev, but it's not declared as global in the init method (only g_board_id is declared on line 67 in the old code). Since g_board_rev is set as a module-level global in _thermals_available(), it needs to be declared as global here to be accessed correctly. Add 'global g_board_rev' after the existing global g_board_id declaration.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +115
#try:
# cmd = f'sonic-cfggen -a "{{\\"INTERFACE\\": {{\\"Ethernet0\\": {{}},\\"Ethernet0|18.{slot_id}.202.1/31\\": {{}}}}}}" --write-to-db'
# run_cmd(cmd)
#except Exception as e:
# log_err("failed to set Ethernet0 ip due to {}".format(e))
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the codebase. If this code might be needed in the future, it should be documented in version control history or added as a TODO comment explaining why it's disabled. Leaving commented code reduces maintainability and creates confusion about whether this functionality is intentional or incomplete.

Suggested change
#try:
# cmd = f'sonic-cfggen -a "{{\\"INTERFACE\\": {{\\"Ethernet0\\": {{}},\\"Ethernet0|18.{slot_id}.202.1/31\\": {{}}}}}}" --write-to-db'
# run_cmd(cmd)
#except Exception as e:
# log_err("failed to set Ethernet0 ip due to {}".format(e))
# TODO: Add Ethernet0 IP configuration here if required for this platform.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +90
if self.board_rev == self._api_helper.mtfuji_rev_v2:
self.voltage_sensor_mapping = VOLTAGE_SENSOR_MAPPING_V2
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use elif instead of two separate if statements. Lines 89 checks a condition on board_rev that is mutually exclusive with the check in the surrounding conditional structure, though the change here uses if. For consistency with the pattern and to make the mutual exclusivity clear, consider using elif for line 89.

Copilot uses AI. Check for mistakes.
Comment on lines +185 to +186
except:
return -1
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions without specifying the exception type. This can mask programming errors and make debugging difficult. Consider catching specific exceptions (e.g., ValueError, IOError) or at minimum use 'except Exception:' to allow KeyboardInterrupt and SystemExit to propagate.

Copilot uses AI. Check for mistakes.
Comment on lines +80 to +83
if self.board_rev == self._api_helper.mtfuji_rev_v1:
self.sensor_mapping = self.SENSOR_MAPPING_MTFUJI_V1
if self.board_rev == self._api_helper.mtfuji_rev_v2:
self.sensor_mapping = self.SENSOR_MAPPING_MTFUJI_V2
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use elif instead of two separate if statements. Lines 80 and 82 both check conditions on board_rev, but they are mutually exclusive. Using elif makes this intention clearer and slightly more efficient by avoiding the second condition check when the first is true.

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +161
except:
return "N/A"
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except clause catches all exceptions without specifying the exception type. This can mask programming errors and make debugging difficult. Consider catching specific exceptions (e.g., ValueError, IOError) or at minimum use 'except Exception:' to allow KeyboardInterrupt and SystemExit to propagate.

Copilot uses AI. Check for mistakes.
Comment on lines 104 to +118
def config_setup():
try:
from sonic_platform.chassis import Chassis
slot_id = Chassis().get_my_slot()
except Exception as e:
log_err("failed to get slot id due to {}".format(e))

try:
cmd = f'sonic-cfggen -a "{{\\"INTERFACE\\": {{\\"Ethernet0\\": {{}},\\"Ethernet0|18.{slot_id}.202.1/31\\": {{}}}}}}" --write-to-db'
run_cmd(cmd)
except Exception as e:
log_err("failed to set Ethernet0 ip due to {}".format(e))
#try:
# cmd = f'sonic-cfggen -a "{{\\"INTERFACE\\": {{\\"Ethernet0\\": {{}},\\"Ethernet0|18.{slot_id}.202.1/31\\": {{}}}}}}" --write-to-db'
# run_cmd(cmd)
#except Exception as e:
# log_err("failed to set Ethernet0 ip due to {}".format(e))

setup_platform_components_json(slot_id)
disable_unused_containers()
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable slot_id may be undefined if the exception is raised on line 107. If Chassis().get_my_slot() raises an exception, slot_id is never assigned, but it's used on lines 117 (in setup_platform_components_json call). Initialize slot_id before the try block (e.g., slot_id = -1) or handle the case where it's undefined.

Copilot uses AI. Check for mistakes.
Comment on lines +187 to +188
if self.board_rev == self._api_helper.mtfuji_rev_v2:
self.current_sensor_mapping = CURRENT_SENSOR_MAPPING_V2
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use elif instead of two separate if statements. Lines 187 checks a condition on board_rev that is mutually exclusive with line 185. Using elif makes the mutual exclusivity clear and avoids the unnecessary second condition check.

Copilot uses AI. Check for mistakes.
mdev:
@echo "===> Building MDEV driver "
$(MAKE) -C $(KSRC) V=1 M=$(KMOD_OUT_DIR) src=$(KMOD_SRC_DIR)/mdev $(KOPT)
#$(MAKE) -C $(KSRC) V=1 M=$(KMOD_SRC_DIR)/mdev KBUILD_OUTPUT=$(KMOD_OUT_DIR) $(KOPT)
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed rather than left in the Makefile. The commented line on 102 appears to be an old build command with KBUILD_OUTPUT that has been replaced. If this might be needed in the future, document it in version control history or add a comment explaining why it's disabled.

Suggested change
#$(MAKE) -C $(KSRC) V=1 M=$(KMOD_SRC_DIR)/mdev KBUILD_OUTPUT=$(KMOD_OUT_DIR) $(KOPT)

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +51
function generate_ssh_host_keys()
{
# Generate SSH host keys
log_msg "Removing existing SSH host keys"
rm -rfd /etc/ssh/ssh_host*
sleep 1
ssh-keygen -A
systemctl restart ssh.service
log_msg "SSH host keys generated successfully"
}

function start_polaris()
{
# Run only if ssh.service is NOT active (inactive/failed/not-found)
if ! systemctl is-active --quiet ssh.service; then
log_msg "ssh.service is not active; regenerating host keys and restarting ssh"
generate_ssh_host_keys
fi
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination of generate_ssh_host_keys() and its use in start_polaris() will delete /etc/ssh/ssh_host* and regenerate SSH host keys on every boot, since dpu.init starts in runlevel S before ssh.service is active. This makes the device’s SSH host identity effectively ephemeral, preventing clients from reliably detecting man-in-the-middle attacks because host keys appear to change on each reboot. Limit host key regeneration to first install or the case where no host keys exist (e.g., checking for missing key files or a one-time flag) so SSH host keys remain stable across normal reboots.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@yxieca yxieca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed: Pensando trixie/kernel 6.12 support changes; looks good.

@yxieca yxieca merged commit b991da7 into sonic-net:master Feb 26, 2026
29 of 30 checks passed
@mssonicbld
Copy link
Copy Markdown
Collaborator

Cherry-pick PR to 202511: #25685

FengPan-Frank pushed a commit to FengPan-Frank/sonic-buildimage that referenced this pull request Mar 6, 2026
…25518)

What is the motivation for this PR
Latest master moved to trixie (6.12 kernel). Update Pensando drivers/scripts/makefiles/plugins to support 6.12.

How did you do it
Updated dsc-drivers, Pensando scripts, makefiles and plugins for 6.12; built using Pensando artifacts (1.87.0-SS-18-release).

How did you verify/test it
Loaded image on Pensando DPU on Mtfuji DSS; all dockers up and interfaces up.

Signed-off-by: Sahil Chaudhari <[email protected]>
Signed-off-by: Feng Pan <[email protected]>
dprital pushed a commit that referenced this pull request Mar 19, 2026
What is the motivation for this PR
Latest master moved to trixie (6.12 kernel). Update Pensando drivers/scripts/makefiles/plugins to support 6.12.

How did you do it
Updated dsc-drivers, Pensando scripts, makefiles and plugins for 6.12; built using Pensando artifacts (1.87.0-SS-18-release).

How did you verify/test it
Loaded image on Pensando DPU on Mtfuji DSS; all dockers up and interfaces up.

Signed-off-by: Sahil Chaudhari <[email protected]>
Signed-off-by: dprital <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants