Skip to content

[swss] Orchagent terminated by SIGHUP because logrotate sent SIGHUP on boot after 202405->202411 warm upgrade #21962

@volodymyrsamotiy

Description

@volodymyrsamotiy

Description
After upgrading from 202405 to 202411 image, during boot to new image, orchagent was terminated by SIGHUP.
Not sure if it is related to warm-reboot or upgrade flow, probably it is generic issue, but we reproduced it during warm upgrade.
Please note that this issue happened just once so far.

In syslog there is indication that there was some BASH error in logrotate script:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator

After that logrotate sent SIGHUP to orchagent:

2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec

As a result orchagent exited:

2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)

Looks the error happened in the script that is defined for postrotate action in /etc/logrotate.d/rsyslog configuration file:
https://github.com/sonic-net/sonic-buildimage/blob/master/files/image_config/logrotate/rsyslog.j2#L118

    postrotate
        if [ $(echo $1 | grep -c "/var/log/swss/") -gt 0 ]; then
            # for multi asic platforms, there are multiple orchagents
            # send the SIGHUP only to the orchagent the which needs log file rotation
            PLATFORM=`sonic-cfggen -H -v DEVICE_METADATA.localhost.platform`
            ASIC_CONF=/usr/share/sonic/device/$PLATFORM/asic.conf
            if [ -f "$ASIC_CONF" ]; then
                . $ASIC_CONF
            fi
            if [ $NUM_ASIC -gt 1 ]; then
                log_file=$1
                log_file_name=${log_file#/var/log/swss/}
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $log_file_name"
                pgrep -xa orchagent | grep $log_file_name | awk '{ print $1; }' | xargs /bin/kill -HUP 2>/dev/null || true
            else
                logger -p syslog.info -t "logrotate" "Sending SIGHUP to OA log_file_name: $1"
                pgrep -x orchagent | xargs /bin/kill -HUP 2>/dev/null || true
            fi
        else
            if [ -f /var/run/rsyslogd.pid ]; then
                /bin/kill -HUP $(cat /var/run/rsyslogd.pid)
            fi
        fi
    endscript

Steps to reproduce the issue:
No specific steps to reproduce, issue happened just once so far.
It looks like generic statistical issue related to logrotate.
But we reproduced it during warm upgrade from 202404 to 202411.

Describe the results you received:
Orchagent terminated by SIGHUP because logrotate sent SIGHUP:

2025 Mar  2 03:40:02.291520 sonic INFO logrotate[8512]: logrotate_script: 10: [: -gt: unexpected operator
2025 Mar  2 03:40:02.294484 sonic INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec
2025 Mar  2 03:40:02.389481 sonic INFO systemd[1]: logrotate.service: Deactivated successfully.
2025 Mar  2 03:40:02.389594 sonic INFO systemd[1]: Finished logrotate.service - Rotate log files.
2025 Mar  2 03:40:02.393109 sonic INFO swss#supervisord 2025-03-02 03:40:02,381 WARN exited: orchagent (terminated by SIGHUP; not expected)
2025 Mar  2 03:40:02.398651 sonic INFO swss#supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'

Describe the results you expected:
Logrotate should not send SIGHUP to orchagent

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions