Skip to content

[pytest] Ignore specific syslog error messages when testing autorestart#2430

Merged
lguohan merged 3 commits intosonic-net:masterfrom
yozhao101:fix_autorestart_loganalyzer
Oct 31, 2020
Merged

[pytest] Ignore specific syslog error messages when testing autorestart#2430
lguohan merged 3 commits intosonic-net:masterfrom
yozhao101:fix_autorestart_loganalyzer

Conversation

@yozhao101
Copy link
Copy Markdown
Contributor

@yozhao101 yozhao101 commented Oct 29, 2020

Signed-off-by: Yong Zhao [email protected]

Description of PR

Summary:
When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore
them are put in the comment of ignore_expected_loganalyzer_exception(...) function.

After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.

Fixes # (issue)

Type of change

  • [x ] Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

How did you do it?

I added two more functions in pytest script. One function is used to add the error messages which can be ignored into
whitelist. Another is used to do the post check to see whether all the critical processes are alive.

How did you verify/test it?

I tested this pytest script against a physical testbed: str-dx010-acs-1. Since currently acms process in ACMS container
was not running by default until the certificate are deployed, pytest script will fail when testing the ACMS container.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…feature and do the postcheck to make

sure all the critical processes are alive after testing.

Signed-off-by: Yong Zhao <[email protected]>
@yozhao101 yozhao101 changed the title [pytest] Ignore some log error messages when testing the autorestart … [pytest] Ignore specific syslog error messages when testing autorestart Oct 29, 2020
Comment on lines +273 to +278
# Sleep 20 seconds such that containers have a chance to start.
time.sleep(20)
processes_status = duthost.all_critical_process_status()
for container_name, processes in processes_status.items():
if processes["status"] is False or len(processes["exited_critical_process"]) > 0:
pytest.fail("Critical process(es) was not running after container '{}' was restarted.".format(container_name))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert this block into pytest_assert(wait_until(...), failure message)

Copy link
Copy Markdown
Contributor Author

@yozhao101 yozhao101 Oct 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Fixed.

Comment on lines +56 to +68
ignoreRegex = [
".*ERR monit.*",
".*ERR swss#orchagent.*removeLag.*",
".*ERR pmon#xcvrd.*initializeGlobalConfig.*",
".*ERR pmon#thermalctld.*Caught exception while initializing thermal manager.*",
".*ERR syncd#syncd.*driverEgressMemoryUpdate.*",
".*ERR syncd#syncd.*brcm_sai*",
".*ERR teamd#teamsyncd.*readData.*netlink reports an error=-33 on reading a netlink socket.*",
".*WARNING syncd#syncd.*saiDiscover: skipping since it cause crash.*",
".*ERR systemd.*Failed to start .* container*",
".*ERR kernel.*PortChannel.*",
".*ERR route_check.*",
]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split this regex array to one array for each service.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great Suggestion! Fixed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of splitting the regex into groups was so that we can ignore only a subset of messages for each container being tested. You split them into groups, but you then you simply add all the groups in unconditionally, which results in the same behavior as before.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jleveque The idea is to split the ignore regex into subgroups and each subgroup represents a container. When a container was tested for example syncd, this function will be called such that the messages in subgroup of syncd will be ignored, right?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Joe, Thanks for clarification!

pytest_assert(...) to wait containers are all restarted before doing the postcheck.

Signed-off-by: Yong Zhao <[email protected]>
loganalyzer.ignore_regex.extend(teamd_ignoreRegex)
loganalyzer.ignore_regex.extend(systemd_ignoreRegex)
loganalyzer.ignore_regex.extend(kernel_ignoreRegex)
loganalyzer.ignore_regex.extend(other_ignoreRegex)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yozhao101 this is not exactly what I had in mind when asked for splitting. But I understand that you have to put all regex up because you are looping through all services and run test against them. you currently cannot really apply individual regex for individual service.

I'll approve this PR and let it go in. Please open a separate PR to parameterize the test so that the test will run for each service individually, that way you can apply ignore regex per service.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood what you ware meaning in our meeting. I will create a separate PR to improve the ignore regex.

@lguohan lguohan merged this pull request into sonic-net:master Oct 31, 2020
yxieca pushed a commit that referenced this pull request Oct 31, 2020
…rt (#2430)

When pytest script was ran to test the autorestart feature, there will be error messages in the syslog which will cause
the logAnalyzer to fail at the end of testing. In order to fix this issue, I examine all the error messages in the syslog 
during testing the autorestart feature and add them into whitelist. At the same time, the reason why we should ignore 
them are put in the comment of `ignore_expected_loganalyzer_exception(...)` function. 

After adding these error messages into whitelist, we need do the post check to see whether all the critical processes
are alive after testing the autorestart feature.

Signed-off-by: Yong Zhao <[email protected]>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
Update sonic-swss submodule pointer to include the following:

[BFD]Clean up state_db BFD entries on swss restart (sonic-net#2434)
Fix the Fec Mode Setting of gbsyncd (sonic-net#2430)
[neighsyncd] Enabling ipv4 link local entries for non-dualtor (sonic-net#2427)
tlm_teamd: Filter portchannel subinterface events from STATE_DB LAG_TABLE (sonic-net#2408)
PFCWD recovery changes using DLR_INIT (sonic-net#2316)
Dynamic port configuration - add port buffer cfg to the port ref counter (sonic-net#2194)

Signed-off-by: dprital <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants