Skip to content

Disable Cisco VOQ WD for dualtor QOS, improve PG test debuggability#20502

Merged
kevinskwang merged 2 commits intosonic-net:masterfrom
rbpittman:disable_voq_wd_tunnel_tests_master
Sep 8, 2025
Merged

Disable Cisco VOQ WD for dualtor QOS, improve PG test debuggability#20502
kevinskwang merged 2 commits intosonic-net:masterfrom
rbpittman:disable_voq_wd_tunnel_tests_master

Conversation

@rbpittman
Copy link
Copy Markdown
Contributor

@rbpittman rbpittman commented Sep 3, 2025

Description of PR

Summary:

  • Disable VOQ WD during test_tunnel_qos_remap.py
  • Commonize some aspects of the VOQ WD disabling from QOS SAI to avoid major code duplication.
  • Improve performance of test for cisco by reducing unnecessary 8-second per loop wait time to 1 second. (Test passed with 0.5 seconds as well, since this is a SAI bypass operation the updated stat should be near-instant).
  • Improve debuggability of PG tunnel decap test by logging all failures and summarizing a report at the end.
  • Fix up test_voq_watchdog.py test with new commonization. Rename parametrization to avoid shadowing.

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • New Test case
    • Skipped for non-supported platforms
  • Test case improvement

Back port request

  • 202205
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505 (will require manual PR, conflicts with 202505 due to a "noqa" format change)

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Validated:

  • test_tunnel_qos_remap.py all tests pass on master branch for Cisco-8000 dualtor-AA.
  • test_qos_sai.py no regressions for Cisco-8000 dualtor-AA.
  • Fix and validate test_voq_watchdog.py

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@kevinskwang
Copy link
Copy Markdown
Contributor

@rbpittman can I understand the voq watchdog will impact the watermark counter function? if that is the case, how we can get pass in test_qos_sai?

@rbpittman
Copy link
Copy Markdown
Contributor Author

No it doesn't impact watermark function.
test_qos_sai already disables voq_watchdog, hence this is adding the same disable to the tunnel tests.
The voq watchdog activates when queues are blocked for a long time, thus it needs to be disabled in long-running TX_DISABLE tests. In this case, the PG watermark would stop going up due to the VOQ watchdog flushing queues.

@kevinskwang kevinskwang merged commit d61736c into sonic-net:master Sep 8, 2025
16 checks passed
@mssonicbld
Copy link
Copy Markdown
Collaborator

@rbpittman PR conflicts with 202411 branch

@kevinskwang
Copy link
Copy Markdown
Contributor

@rbpittman could you create PR to fix the conflicts on 202411?

@rbpittman rbpittman deleted the disable_voq_wd_tunnel_tests_master branch September 8, 2025 20:20
@rbpittman
Copy link
Copy Markdown
Contributor Author

Hi @kevinskwang
There's some significant conflicts, most notably missing #17937 from 202411, which disables VOQ WD for QOS SAI. That PR should at least be merged first.

@kevinskwang
Copy link
Copy Markdown
Contributor

Hi @kevinskwang There's some significant conflicts, most notably missing #17937 from 202411, which disables VOQ WD for QOS SAI. That PR should at least be merged first.

@rbpittman 17937 is cherry-picked to 202411 today.

@rbpittman
Copy link
Copy Markdown
Contributor Author

Hi @kevinskwang
cc @zhixzhu @rraghav-cisco

There's quite a bit more missing code here to support this diff, the question is what features we actually want in the 202411 branch.

We can either:

  1. Try to merge more of the missing code, for example
  1. I can write a custom version of this PR for 202411, however this will cause all the above PRs to also require custom patches. Depends whether these PRs are needed in 202411.

Both options are difficult, what is our prioritization for 202411 maintenance? If we care more about just fixing it for pass-rates I can write a custom patch so test_tunnel can pass, but this would assume we don't want the other PRs.

@zhixzhu
Copy link
Copy Markdown
Contributor

zhixzhu commented Sep 9, 2025

Hi @kevinskwang cc @zhixzhu @rraghav-cisco

There's quite a bit more missing code here to support this diff, the question is what features we actually want in the 202411 branch.

We can either:

1. Try to merge more of the missing code, for example


* [T2:snappi_tests:Add a fixture to disable voq watchdog, use it in pfc scripts. #18564](https://github.com/sonic-net/sonic-mgmt/pull/18564)

* [New test case for OQ watchdog #18937](https://github.com/sonic-net/sonic-mgmt/pull/18937)

* [Consolidate voq watchdog testcases to single testcase #19076](https://github.com/sonic-net/sonic-mgmt/pull/19076)
  This is a lot of delta, even including new test cases. Plus this list is likely not complete.


2. I can write a custom version of this PR for 202411, however this will cause all the above PRs to also require custom patches. Depends whether these PRs are needed in 202411.

Both options are difficult, what is our prioritization for 202411 maintenance? If we care more about just fixing it for pass-rates I can write a custom patch so test_tunnel can pass, but this would assume we don't want the other PRs.

Yes, these PRs are required in 202411 branch. T0/T1 also enabled voq watchdog and oq watchdog.

@rbpittman
Copy link
Copy Markdown
Contributor Author

Then this PR should be considered blocked until at minimum those 3 PRs are merged, in the correct order, and with any missing dependency gaps filled in as that's not necessarily a complete list.

@kevinskwang
Copy link
Copy Markdown
Contributor

@zhixzhu those 3 PRs all have conflicts, could you fix the confilct by first for 202411?

@zhixzhu
Copy link
Copy Markdown
Contributor

zhixzhu commented Sep 11, 2025

@zhixzhu those 3 PRs all have conflicts, could you fix the confilct by first for 202411?

Double commit New test case for OQ watchdog #18937 to 202411: #20610

After the above one merged, I will check Consolidate voq watchdog testcases to single testcase #19076, it depends on the above one.

@zhixzhu
Copy link
Copy Markdown
Contributor

zhixzhu commented Sep 15, 2025

xixuej pushed a commit to xixuej/sonic-mgmt that referenced this pull request Sep 17, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
@zhixzhu
Copy link
Copy Markdown
Contributor

zhixzhu commented Sep 17, 2025

@kevinskwang #20660 is waiting for your review and merge.

@kevinskwang
Copy link
Copy Markdown
Contributor

@zhixzhu Merged

@rbpittman
Copy link
Copy Markdown
Contributor Author

Based on latest conflicts, looks like #19821 is the last double commit missing.

rbpittman added a commit to rbpittman/sonic-mgmt that referenced this pull request Sep 30, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
@rbpittman
Copy link
Copy Markdown
Contributor Author

rbpittman commented Sep 30, 2025

@kevinskwang Filed #20870 for the double commit to 202411. Had some import header conflicts due to the recent "noqa" to "noqa:" conversion. No other feature conflicts.

rbpittman added a commit to rbpittman/sonic-mgmt that referenced this pull request Sep 30, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
@rbpittman
Copy link
Copy Markdown
Contributor Author

rbpittman commented Sep 30, 2025

Also filed #20871 for 202505 branch (which had the same set of noqa conflicts).

StormLiangMS pushed a commit that referenced this pull request Oct 23, 2025
…20502) (#20870)

Double commit #20502

Disable VOQ WD during test_tunnel_qos_remap.py
Commonize some aspects of the VOQ WD disabling from QOS SAI to avoid major code duplication.
Improve performance of test for cisco by reducing unnecessary 8-second per loop wait time to 1 second. (Test passed with 0.5 seconds as well, since this is a SAI bypass operation the updated stat should be near-instant).
Improve debuggability of PG tunnel decap test by logging all failures and summarizing a report at the end.
Fix up test_voq_watchdog.py test with new commonization. Rename parametrization to avoid shadowing.
vidyac86 pushed a commit to vidyac86/sonic-mgmt that referenced this pull request Oct 23, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
StormLiangMS pushed a commit that referenced this pull request Nov 18, 2025
…20502) (#20871)

Summary:
Double commit #20502

Disable VOQ WD during test_tunnel_qos_remap.py
Commonize some aspects of the VOQ WD disabling from QOS SAI to avoid major code duplication.
Improve performance of test for cisco by reducing unnecessary 8-second per loop wait time to 1 second. (Test passed with 0.5 seconds as well, since this is a SAI bypass operation the updated stat should be near-instant).
Improve debuggability of PG tunnel decap test by logging all failures and summarizing a report at the end.
Fix up test_voq_watchdog.py test with new commonization. Rename parametrization to avoid shadowing.
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Dec 8, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: opcoder0 <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 16, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Guy Shemesh <[email protected]>
AharonMalkin pushed a commit to AharonMalkin/sonic-mgmt that referenced this pull request Dec 16, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Aharon Malkin <[email protected]>
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Dec 21, 2025
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Guy Shemesh <[email protected]>
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Jan 13, 2026
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
gshemesh2 pushed a commit to gshemesh2/sonic-mgmt that referenced this pull request Jan 26, 2026
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Guy Shemesh <[email protected]>
lakshmi-nexthop pushed a commit to lakshmi-nexthop/sonic-mgmt that referenced this pull request Jan 28, 2026
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Lakshmi Yarramaneni <[email protected]>
ytzur1 pushed a commit to ytzur1/sonic-mgmt that referenced this pull request Feb 2, 2026
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.

Signed-off-by: Yael Tzur <[email protected]>
kazinator-arista pushed a commit to kazinator-arista/sonic-mgmt that referenced this pull request Mar 4, 2026
…atically (sonic-net#20502)

#### Why I did it
src/sonic-snmpagent
```
* 7580ae2 - (HEAD -> 202311, origin/202311) Fix SNMP output having fewer unicast queues than expected (sonic-net#330) (19 hours ago) [Justin Wong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
venu-nexthop pushed a commit to venu-nexthop/sonic-mgmt that referenced this pull request Mar 27, 2026
…onic-net#20502)

* Disable Cisco VOQ watchdog for dualtor tunnel tests. Improve debuggability and speed of the tunnel PG map test.

* Fix VOQ watchdog test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants