Skip to content

[chassis][nokia][202405.17]: Chassisd crashes when updating moduledb #21131

@liamkearney-msft

Description

@liamkearney-msft

Description

Chassisd fails on supervisor on Nokia 7250, leading to syncd/swss not starting up. Backtrace seen on the syslog is as follows:

2024 Dec 10 01:57:04.137945 svcstr-7250-sup-1 INFO pmon#supervisord 2024-12-10 01:57:04,137 INFO success: chassisd entered RUNNING state, process has stayed up for > than 10 seconds (startsecs)

2024 Dec 10 01:57:09.544727 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd Traceback (most recent call last):

2024 Dec 10 01:57:09.544727 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 710, in <module>

2024 Dec 10 01:57:09.545539 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     main()

2024 Dec 10 01:57:09.545753 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 705, in main

2024 Dec 10 01:57:09.545938 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     chassisd.run()

2024 Dec 10 01:57:09.545949 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 683, in run

2024 Dec 10 01:57:09.546348 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     self.module_updater.module_db_update()

2024 Dec 10 01:57:09.546558 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd   File "/usr/local/bin/chassisd", line 279, in module_db_update

2024 Dec 10 01:57:09.546731 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd     if self.my_slot == int(module_info_dict['slot']):

2024 Dec 10 01:57:09.546891 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

2024 Dec 10 01:57:09.546902 svcstr-7250-sup-1 INFO pmon#supervisord: chassisd ValueError: invalid literal for int() with base 10: 'A'

This is due to this PR : sonic-net/sonic-platform-daemons#560 , and the fact that nokia uses characters (eg. A, B, C etc.) for its slot name.

Steps to reproduce the issue:

  1. Start a nokia 7250 on 202405.17
  2. Note that syncd / swss does not start

Describe the results you received:

No syncd containers running

Describe the results you expected:

syncd starts as expected

Output of show version:

20240510.17

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions