Description
I noticed that there is a new service "monit" enabled recently, and it trigger error logs when some critical process absence. However, it would cause regression test failure for log analyzing error. Is there a way to disable monit while running regression test?
This monit service randomly failed test cases. A number of test cases such as fdb, sfp_presence and so on has observed this issue.
Error log:
"Jan 14 22:19:45.874238 arc-switch1029 ERR monit[580]: 'telemetry' process is not running\n",
"Jan 14 22:19:45.894969 arc-switch1029 ERR monit[580]: 'dialout_client' process is not running\n",
"Jan 14 22:19:45.915904 arc-switch1029 ERR monit[580]: 'syncd' process is not running\n"]},
And we also observe something like "Failed to start Telemetry":
LogAnalyzerError: {'match_messages': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': ['Jan 16 00:20:20.544788 r-anaconda-10 INFO telemetry#supervisord: start.sh dialout: ERROR (abnormal termination)\n', 'Jan 16 00:20:50.928955 r-anaconda-10 ERR systemd[1]: Failed to start Telemetry container.\n']}, 'total': {'expected_match': 0, 'expected_missing_match': 0, 'match': 2}, 'match_files': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': {'expected_match': 0, 'match': 2}}, 'expect_messages': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': []}, 'unused_expected_regexp': []}
SONiC Version:
Distribution: Debian 9.11
Kernel: 4.9.0-9-2-amd64
Build commit: 7ec2732
Build date: Fri Jan 10 23:38:05 UTC 2020
Built by: johnar@jenkins-worker-8
Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
Serial Number: MT1851X02961
Uptime: 02:45:05 up 5:20, 1 user, load average: 3.23, 3.28, 3.32
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx HEAD.10-7ec27323 f77730f1b0af 377MB
docker-syncd-mlnx latest f77730f1b0af 377MB
docker-sonic-mgmt-framework HEAD.10-7ec27323 f3378872a1bf 330MB
docker-sonic-mgmt-framework latest f3378872a1bf 330MB
docker-platform-monitor HEAD.10-7ec27323 a56aed93f1f5 569MB
docker-platform-monitor latest a56aed93f1f5 569MB
docker-fpm-frr HEAD.10-7ec27323 9d4e274b54de 325MB
docker-fpm-frr latest 9d4e274b54de 325MB
docker-sflow HEAD.10-7ec27323 aea21d5bbbf2 305MB
docker-sflow latest aea21d5bbbf2 305MB
docker-lldp-sv2 HEAD.10-7ec27323 c9fe6c39327f 303MB
docker-lldp-sv2 latest c9fe6c39327f 303MB
docker-dhcp-relay HEAD.10-7ec27323 c8fcc953c6d7 290MB
docker-dhcp-relay latest c8fcc953c6d7 290MB
docker-database HEAD.10-7ec27323 7a58a614a58d 282MB
docker-database latest 7a58a614a58d 282MB
docker-teamd HEAD.10-7ec27323 0fa032d07578 305MB
docker-teamd latest 0fa032d07578 305MB
docker-snmp-sv2 HEAD.10-7ec27323 b1989927a046 339MB
docker-snmp-sv2 latest b1989927a046 339MB
docker-orchagent HEAD.10-7ec27323 e537766a194b 323MB
docker-orchagent latest e537766a194b 323MB
docker-sonic-telemetry HEAD.10-7ec27323 58c2d998f8bb 343MB
docker-sonic-telemetry latest 58c2d998f8bb 343MB
docker-router-advertiser HEAD.10-7ec27323 e1d0d9fc6b59 282MB
docker-router-advertiser latest e1d0d9fc6b59 282MB
Steps to reproduce the issue:
N/A
Describe the results you received:
A lot regression test case failed due to log analyze fail.
Describe the results you expected:
Monit service should be able to disabled during some test cases.
Additional information you deem important (e.g. issue happens only occasionally):
N/A
Description
I noticed that there is a new service "monit" enabled recently, and it trigger error logs when some critical process absence. However, it would cause regression test failure for log analyzing error. Is there a way to disable monit while running regression test?
This monit service randomly failed test cases. A number of test cases such as fdb, sfp_presence and so on has observed this issue.
Error log:
"Jan 14 22:19:45.874238 arc-switch1029 ERR monit[580]: 'telemetry' process is not running\n",
"Jan 14 22:19:45.894969 arc-switch1029 ERR monit[580]: 'dialout_client' process is not running\n",
"Jan 14 22:19:45.915904 arc-switch1029 ERR monit[580]: 'syncd' process is not running\n"]},
And we also observe something like "Failed to start Telemetry":
LogAnalyzerError: {'match_messages': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': ['Jan 16 00:20:20.544788 r-anaconda-10 INFO telemetry#supervisord: start.sh dialout: ERROR (abnormal termination)\n', 'Jan 16 00:20:50.928955 r-anaconda-10 ERR systemd[1]: Failed to start Telemetry container.\n']}, 'total': {'expected_match': 0, 'expected_missing_match': 0, 'match': 2}, 'match_files': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': {'expected_match': 0, 'match': 2}}, 'expect_messages': {'/tmp/pytest-run/syslog.2020-01-16-00:22:50': []}, 'unused_expected_regexp': []}
SONiC Version:
Distribution: Debian 9.11
Kernel: 4.9.0-9-2-amd64
Build commit: 7ec2732
Build date: Fri Jan 10 23:38:05 UTC 2020
Built by: johnar@jenkins-worker-8
Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
Serial Number: MT1851X02961
Uptime: 02:45:05 up 5:20, 1 user, load average: 3.23, 3.28, 3.32
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx HEAD.10-7ec27323 f77730f1b0af 377MB
docker-syncd-mlnx latest f77730f1b0af 377MB
docker-sonic-mgmt-framework HEAD.10-7ec27323 f3378872a1bf 330MB
docker-sonic-mgmt-framework latest f3378872a1bf 330MB
docker-platform-monitor HEAD.10-7ec27323 a56aed93f1f5 569MB
docker-platform-monitor latest a56aed93f1f5 569MB
docker-fpm-frr HEAD.10-7ec27323 9d4e274b54de 325MB
docker-fpm-frr latest 9d4e274b54de 325MB
docker-sflow HEAD.10-7ec27323 aea21d5bbbf2 305MB
docker-sflow latest aea21d5bbbf2 305MB
docker-lldp-sv2 HEAD.10-7ec27323 c9fe6c39327f 303MB
docker-lldp-sv2 latest c9fe6c39327f 303MB
docker-dhcp-relay HEAD.10-7ec27323 c8fcc953c6d7 290MB
docker-dhcp-relay latest c8fcc953c6d7 290MB
docker-database HEAD.10-7ec27323 7a58a614a58d 282MB
docker-database latest 7a58a614a58d 282MB
docker-teamd HEAD.10-7ec27323 0fa032d07578 305MB
docker-teamd latest 0fa032d07578 305MB
docker-snmp-sv2 HEAD.10-7ec27323 b1989927a046 339MB
docker-snmp-sv2 latest b1989927a046 339MB
docker-orchagent HEAD.10-7ec27323 e537766a194b 323MB
docker-orchagent latest e537766a194b 323MB
docker-sonic-telemetry HEAD.10-7ec27323 58c2d998f8bb 343MB
docker-sonic-telemetry latest 58c2d998f8bb 343MB
docker-router-advertiser HEAD.10-7ec27323 e1d0d9fc6b59 282MB
docker-router-advertiser latest e1d0d9fc6b59 282MB
Steps to reproduce the issue:
N/A
Describe the results you received:
A lot regression test case failed due to log analyze fail.
Describe the results you expected:
Monit service should be able to disabled during some test cases.
Additional information you deem important (e.g. issue happens only occasionally):
N/A