Skip to content

[hostcfgd]: File size check failed: /etc/pam.d/login is empty, file corrupted #19748

@nazariig

Description

@nazariig

Description

The bug is caused by a race condition in hostcfgd - the daemon is missing graceful shutdown flow and since filesystem modification sequence is not atomic, sometimes it may end up in empty configuration files.

To simplify, the flow is:

  1. Config file is being moved to a backup copy
  2. SIGTERM is received and execution is interrupted
  3. Original config file is missing in the filesystem
  4. On the next startup, the original config file is re-created as empty one
  5. Config file is being moved to a backup copy
  6. Both files are empty

LOG:

Apr  4 05:15:42.112943 sonic INFO hostcfgd: file size check pass: /etc/pam.d/sshd size is (2139) bytes
Apr  4 05:15:42.123196 sonic ERR hostcfgd: file size check failed: /etc/pam.d/login is empty, file corrupted
Apr  4 05:15:42.150275 sonic INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes
Apr  4 05:15:42.170420 sonic INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes

DUMP:

sonic_dump_20240404_051657/etc/pam.d

root@sonic: pam.d$ ls -la | grep "login\|sshd"
-rw-r--r--  1 root root    0 Apr  4 05:18 login
-rw-r--r--  1 root root    0 Apr  4 05:18 login.old
-rw-r--r--  1 root root 2139 Apr  4 05:18 sshd
-rw-r--r--  1 root root 2139 Apr  4 05:18 sshd.old

root@sonic: pam.d$ cat login
root@sonic: pam.d$ cat login.old

https://github.com/sonic-net/sonic-host-services/blob/master/scripts/hostcfgd#L726

# Modify common-auth include file in /etc/pam.d/login, sshd.
# /etc/pam.d/sudo is not handled, because it would change the existing
# behavior. It can be modified once a config knob is added for sudo.
if os.path.isfile(PAM_AUTH_CONF):
    self.modify_single_file(ETC_PAMD_SSHD,  [ "/^@include/s/common-auth$/common-auth-sonic/" ])
    self.modify_single_file(ETC_PAMD_LOGIN, [ "/^@include/s/common-auth$/common-auth-sonic/" ])
else:
    self.modify_single_file(ETC_PAMD_SSHD,  [ "/^@include/s/common-auth-sonic$/common-auth/" ])
    self.modify_single_file(ETC_PAMD_LOGIN, [ "/^@include/s/common-auth-sonic$/common-auth/" ])

https://github.com/sonic-net/sonic-host-services/blob/master/scripts/hostcfgd#L609

def modify_single_file(self, filename, operations=None):
    if operations:
        e_list = ['-e'] * len(operations)
        e_operations = [item for sublist in zip(e_list, operations) for item in sublist]
        with open(filename+'.new', 'w') as f:
            subprocess.call(["sed"] + e_operations + [filename], stdout=f)
        subprocess.call(["mv", '-f', filename, filename+'.old'])
        subprocess.call(['mv', '-f', filename+'.new', filename])

    self.check_file_not_empty(filename)

https://github.com/sonic-net/sonic-host-services/blob/master/scripts/hostcfgd#L596

def check_file_not_empty(self, filename):
    exists = os.path.exists(filename)
    if not exists:
        syslog.syslog(syslog.LOG_ERR, "file size check failed: {} is missing".format(filename))
        return

    size = os.path.getsize(filename)
    if size == 0:
        syslog.syslog(syslog.LOG_ERR, "file size check failed: {} is empty, file corrupted".format(filename))
        return

    syslog.syslog(syslog.LOG_INFO, "file size check pass: {} size is ({}) bytes".format(filename, size))

The mitigation attempt: sonic-net/sonic-host-services#36

Steps to reproduce the issue:

  1. Copy hostcfgd module
root@sonic:/home/admin# cp -fv /usr/local/bin/hostcfgd ./hostcfg
  1. Run script
#!/usr/bin/env python

from hostcfg import AaaCfg
from hostcfg import ETC_PAMD_SSHD
from hostcfg import ETC_PAMD_LOGIN

cfg = AaaCfg()

cond = True

while True:
    if cond:
        cfg.modify_single_file(ETC_PAMD_SSHD,  [ "/^@include/s/common-auth-sonic$/common-auth/" ])
        cfg.modify_single_file(ETC_PAMD_LOGIN, [ "/^@include/s/common-auth-sonic$/common-auth/" ])
        cond = False
    else:
        cfg.modify_single_file(ETC_PAMD_SSHD,  [ "/^@include/s/common-auth$/common-auth-sonic/" ])
        cfg.modify_single_file(ETC_PAMD_LOGIN, [ "/^@include/s/common-auth$/common-auth-sonic/" ])
        cond = True
  1. Press Ctrl+C
2024 Jul 30 14:57:19.285135 sonic ERR test.py: file size check failed: /etc/pam.d/sshd is empty, file corrupted
2024 Jul 30 14:57:19.289114 sonic ERR test.py: file size check failed: /etc/pam.d/login is empty, file corrupted
2024 Jul 30 14:57:19.292988 sonic ERR test.py: file size check failed: /etc/pam.d/sshd is empty, file corrupted
2024 Jul 30 14:57:19.297005 sonic ERR test.py: file size check failed: /etc/pam.d/login is empty, file corrupted
2024 Jul 30 14:57:19.301031 sonic ERR test.py: file size check failed: /etc/pam.d/sshd is empty, file corrupted

Describe the results you received:

Apr  4 05:15:42.112943 sonic INFO hostcfgd: file size check pass: /etc/pam.d/sshd size is (2139) bytes
Apr  4 05:15:42.123196 sonic ERR hostcfgd: file size check failed: /etc/pam.d/login is empty, file corrupted
Apr  4 05:15:42.150275 sonic INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes
Apr  4 05:15:42.170420 sonic INFO hostcfgd: file size check pass: /etc/nsswitch.conf size is (494) bytes

Describe the results you expected:

No errors are expected

Output of show version:

  • N/A

Output of show techsupport:

  • N/A

Additional information you deem important (e.g. issue happens only occasionally):

  • N/A

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions