[pfcwd] Fix pfcwd async fake storm#2337
Merged
yxieca merged 2 commits intosonic-net:masterfrom Oct 19, 2020
Merged
Conversation
044b314 to
f7e0c03
Compare
f7e0c03 to
88d3df6
Compare
Make indentation all aligned with four-spaces. Signed-off-by: Longxiang Lyu <[email protected]>
d0c0146 to
c82d6ab
Compare
c82d6ab to
dfac6a7
Compare
b32465f to
dfac6a7
Compare
yxieca
approved these changes
Oct 16, 2020
neethajohn
reviewed
Oct 16, 2020
1. Passing `storm_start_defer` and `storm_stop_defer` directly to the thread `target` function instead of retrieving from the test object's `pfc_wd` dictionary, which might cause race condition if `setup_port_params` of those latter ports clean up `pfc_wd` and the thread tries to fetch `storm_stop_defer` from `pfc_wd`. 2. Introduce thread class `InterruptableThread` to grant the exception awareness to the main thread caller when joins the thread. Use it in the thread creation in `test_pfcwd_warm_reboot.py` to make the test aware of any anomalies in storm threads. 3. Introduce `join_all` to timeout join a list of threads. Use it to wait for all storm threads to finish and alert if any threads don't exit as expected. 4. There is a chance the async storm thread make Ansible calls when the DUT SSH service is dead or Redis isn't started. Introducing a threading.Event object `DUTACTIVE` to make the storm thread wait till the DUT is up and active after warm reboot. Signed-off-by: Longxiang Lyu <[email protected]>
dfac6a7 to
8903c5a
Compare
neethajohn
approved these changes
Oct 19, 2020
kazinator-arista
pushed a commit
to kazinator-arista/sonic-mgmt
that referenced
this pull request
Mar 4, 2026
swss: * 7841930 2022-07-15 | [vxlan]Fixing L2MC vlan member caching issue (sonic-net#2378) (HEAD -> 202205) [Sudharsan Dhamal Gopalarathnam] * b8cd435 2022-07-14 | [muxorch] Always use direct link for SoC IPs (sonic-net#2369) [Longxiang Lyu] * 6158d5c 2022-07-08 | Add BGP profile to Vnet routes (sonic-net#2337) [Prince Sunny] * bdb7ffd 2022-07-06 | [teammgr]: Waiting MACsec ready before doLagMemberTask (sonic-net#2286) [Ze Gan] sairedis: * 58359d4 2022-06-30 | [sairedis] Perform log rotate on request (sonic-net#1058) (HEAD -> 202205, github/202205) [Kamil Cudnik] * cad0268 2022-07-13 | Enable cisco debug shell by default (sonic-net#1078) [VenkatCisco] Signed-off-by: Ying Xie <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Summary:
Fixes # (issue)
Fix
pfcwd/test_pfcwd_warm_reboot.py.Type of change
Approach
What is the motivation for this PR?
In some platforms and topologies(more ports, more possibility to reproduce),
pfcwd/test_pfcwd_warm_reboot.pymight fail if the test selects many storming ports in the platform likeArista7260. The reason is that if the test starts multiple fake storming threads, there might be a race condition when one thread tries to fetchstorm_stop_deferfrom the test object'spfc_wddictionary, which could be modified by those latter ports in their calls ofsetup_port_params, the thread will raise aKeyError, exit and leave the port inSTORM DETECTEDstatus. The test will fail because it cannot find anysyslogentry generated because the port is already stormed when it tries to storm the port after warm reboot.How did you do it?
Passing
storm_start_deferandstorm_stop_deferdirectly to thethread
targetfunction instead of retrieving from the test object'spfc_wddictionary, which might cause race condition ifsetup_port_paramsof those latter ports clean uppfc_wdand thethread tries to fetch
storm_stop_deferfrompfc_wd.Introduce thread class
InterruptableThreadto grant the exceptionawareness to the main thread caller when joins the thread. Use it in the
thread creation in
test_pfcwd_warm_reboot.pyto make the test aware ofany anomalies in storm threads.
Introduce
join_allto timeout join a list of threads. Use it towait for all storm threads to finish and alert if any threads don't exit
as expected.
There is a chance the async storm thread make Ansible calls when the
DUT SSH service is dead or Redis isn't started. Introducing a
threading.Event object
DUTACTIVEto make the storm thread wait tillthe DUT is up and active after warm reboot.
Make indentation all aligned with four-spaces.
How did you verify/test it?
Test over
Arista7260witht0-116.Any platform specific information?
Supported testbed topology if it's a new test case?
Documentation