[pytest] Ignore specific syslog error messages when testing autorestart#2430
Conversation
…feature and do the postcheck to make sure all the critical processes are alive after testing. Signed-off-by: Yong Zhao <[email protected]>
Signed-off-by: Yong Zhao <[email protected]>
| # Sleep 20 seconds such that containers have a chance to start. | ||
| time.sleep(20) | ||
| processes_status = duthost.all_critical_process_status() | ||
| for container_name, processes in processes_status.items(): | ||
| if processes["status"] is False or len(processes["exited_critical_process"]) > 0: | ||
| pytest.fail("Critical process(es) was not running after container '{}' was restarted.".format(container_name)) |
There was a problem hiding this comment.
Please convert this block into pytest_assert(wait_until(...), failure message)
There was a problem hiding this comment.
Great suggestion! Fixed.
| ignoreRegex = [ | ||
| ".*ERR monit.*", | ||
| ".*ERR swss#orchagent.*removeLag.*", | ||
| ".*ERR pmon#xcvrd.*initializeGlobalConfig.*", | ||
| ".*ERR pmon#thermalctld.*Caught exception while initializing thermal manager.*", | ||
| ".*ERR syncd#syncd.*driverEgressMemoryUpdate.*", | ||
| ".*ERR syncd#syncd.*brcm_sai*", | ||
| ".*ERR teamd#teamsyncd.*readData.*netlink reports an error=-33 on reading a netlink socket.*", | ||
| ".*WARNING syncd#syncd.*saiDiscover: skipping since it cause crash.*", | ||
| ".*ERR systemd.*Failed to start .* container*", | ||
| ".*ERR kernel.*PortChannel.*", | ||
| ".*ERR route_check.*", | ||
| ] |
There was a problem hiding this comment.
Please split this regex array to one array for each service.
There was a problem hiding this comment.
Great Suggestion! Fixed.
There was a problem hiding this comment.
The idea of splitting the regex into groups was so that we can ignore only a subset of messages for each container being tested. You split them into groups, but you then you simply add all the groups in unconditionally, which results in the same behavior as before.
There was a problem hiding this comment.
@jleveque The idea is to split the ignore regex into subgroups and each subgroup represents a container. When a container was tested for example syncd, this function will be called such that the messages in subgroup of syncd will be ignored, right?
There was a problem hiding this comment.
Joe, Thanks for clarification!
pytest_assert(...) to wait containers are all restarted before doing the postcheck. Signed-off-by: Yong Zhao <[email protected]>
| loganalyzer.ignore_regex.extend(teamd_ignoreRegex) | ||
| loganalyzer.ignore_regex.extend(systemd_ignoreRegex) | ||
| loganalyzer.ignore_regex.extend(kernel_ignoreRegex) | ||
| loganalyzer.ignore_regex.extend(other_ignoreRegex) |
There was a problem hiding this comment.
@yozhao101 this is not exactly what I had in mind when asked for splitting. But I understand that you have to put all regex up because you are looping through all services and run test against them. you currently cannot really apply individual regex for individual service.
I'll approve this PR and let it go in. Please open a separate PR to parameterize the test so that the test will run for each service individually, that way you can apply ignore regex per service.
There was a problem hiding this comment.
Sorry, I misunderstood what you ware meaning in our meeting. I will create a separate PR to improve the ignore regex.
…rt (#2430) When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore them are put in the comment of `ignore_expected_loganalyzer_exception(...)` function. After adding these error messages into whitelist, we need do the post check to see whether all the critical processes are alive after testing the autorestart feature. Signed-off-by: Yong Zhao <[email protected]>
Update sonic-swss submodule pointer to include the following: [BFD]Clean up state_db BFD entries on swss restart (sonic-net#2434) Fix the Fec Mode Setting of gbsyncd (sonic-net#2430) [neighsyncd] Enabling ipv4 link local entries for non-dualtor (sonic-net#2427) tlm_teamd: Filter portchannel subinterface events from STATE_DB LAG_TABLE (sonic-net#2408) PFCWD recovery changes using DLR_INIT (sonic-net#2316) Dynamic port configuration - add port buffer cfg to the port ref counter (sonic-net#2194) Signed-off-by: dprital <[email protected]>
Signed-off-by: Yong Zhao [email protected]
Description of PR
Summary:
When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore
them are put in the comment of
ignore_expected_loganalyzer_exception(...)function.After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.
Fixes # (issue)
Type of change
Approach
What is the motivation for this PR?
How did you do it?
I added two more functions in pytest script. One function is used to add the error messages which can be ignored into
whitelist. Another is used to do the post check to see whether all the critical processes are alive.
How did you verify/test it?
I tested this pytest script against a physical testbed:
str-dx010-acs-1.Since currentlyacmsprocess in ACMS containerwas not running by default until the certificate are deployed, pytest script will fail when testing the ACMS container.
Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation