Issue Description
Failure seen:
dst_port_id: 47, src_port_id: 34 src_port_vlan: None
actual dst_port_id: 47
Initial watermark:[112, 0, 0, 0, 256, 0, 0, 0]
Received packets: 0
Init pkts num sent: 0, min: 0, actual watermark value to start: 0
Filled PG min
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
| | Pfc3TxPkt | InDiscard | InDropPkt | OutDiscard | OutDropPkt | OutUcPkt | InUcPkt | InNonUcPkt | OutNonUcPkt | OutQlen | Ing Pg3 Pkt | Ing Pg3 Share Wm |
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
| base src port | 3159299 | 2821770 | 0 | 0 | 0 | 4 | 5839385 | 2442 | 1248 | 0 | 837638 | 0 |
| src port | 3159299 | 2821770 | 0 | 0 | 0 | 4 | 5839385 | 2443 | 1249 | 0 | 837638 | 0 |
| base dst port | 0 | 3 | 0 | 0 | 0 | 422261 | 5177 | 1224 | 88 | 0 | 0 | 0 |
| dst port | 0 | 3 | 0 | 0 | 0 | 422261 | 5177 | 1224 | 88 | 0 | 0 | 0 |
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
pkts num to send: 41, total pkts: 41, pg shared: 415271
Compensate 2176538 packets to port 34, and retry 1 times
Received packets: 418930
To fill PG share pool, send 41 pkt
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
| | Pfc3TxPkt | InDiscard | InDropPkt | OutDiscard | OutDropPkt | OutUcPkt | InUcPkt | InNonUcPkt | OutNonUcPkt | OutQlen | Ing Pg3 Pkt | Ing Pg3 Share Wm |
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
| base src port | 3159299 | 2821770 | 0 | 0 | 0 | 4 | 5839385 | 2442 | 1248 | 0 | 837638 | 0 |
| src port | 5555195 | 4576881 | 0 | 0 | 0 | 4 | 8013426 | 2493 | 1274 | 0 | 1256568 | 46510464 |
| base dst port | 0 | 3 | 0 | 0 | 0 | 422261 | 5177 | 1224 | 88 | 0 | 0 | 0 |
| dst port | 0 | 3 | 0 | 0 | 0 | 422408 | 5227 | 1250 | 88 | 0 | 0 | 0 |
+---------------+-----------+-----------+-----------+------------+------------+----------+---------+------------+-------------+---------+-------------+------------------+
lower bound: 167936, actual value: 46510464, upper bound (+40): 9072
> /root/saitests/py3/sai_qos_tests.py(4624)runTest()
======================================================================
FAIL: sai_qos_tests.PGSharedWatermarkTest
----------------------------------------------------------------------
Traceback (most recent call last):
File "saitests/py3/sai_qos_tests.py", line 4623, in runTest
* (packet_length + internal_hdr_size)))
AssertionError
----------------------------------------------------------------------
Ran 1 test in 934.434s
The issue appears to be during the dynamically_compensate_leakout
Compensate 2176538 packets to port 34, and retry 1 times
We can see this is sending far too many packets 2176538.
This function compares the TX_OK value before and after sending the 41 packets.
Here is where it stores the counts before the packets are sent:
https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L4464
xmit_counters_history, _ = sai_thrift_read_port_counters(
self.dst_client, asic_type, port_list['dst'][dst_port_id])
And within dynamically_compensate_leakout here is where they are read:
https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L454
curr, _ = counter_checker(thrift_client, asic_type, check_port)
leakout_num = curr[check_field] - prev[check_field]
The problem here is the call to dynamically_compensate_leakout is passed self.src_client as the thrift_client argument but is operating on a port in the self.dst_client:
https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L4551
dynamically_compensate_leakout(self.src_client, asic_type, sai_thrift_read_port_counters,
port_list['dst'][dst_port_id], TRANSMITTED_PKTS,
xmit_counters_history, self, src_port_id, pkt, 40)
In this failure case I can see that the dst_port_id value is used on both asics:
(Pdb) port_list['src'][32]
4294967297
(Pdb) port_list['dst'][dst_port_id]
4294967297
If I dump the TX_OK of port 32 on the src asic I get:
(Pdb) sai_thrift_read_port_counters(self.src_client, asic_type, port_list['dst'][dst_port_id])[0][TRANSMITTED_PKTS]
2598800
On the dst asic I get:
(Pdb) sai_thrift_read_port_counters(self.dst_client, asic_type, port_list['dst'][dst_port_id])[0][TRANSMITTED_PKTS]
422409
This is where the massive compensate packet number comes from:
(Pdb) xmit_counters_history[TRANSMITTED_PKTS]
422261
2598800 - 422261 = 2176539
Results you see
We poll the incorrect asic/dut client in dynamically_compensate_leakout
Results you expected to see
We should poll the same asic/dut client in dynamically_compensate_leakout as the port we're referencing
Is it platform specific
generic
Relevant log output
No response
Output of show version
No response
Attach files (if any)
No response
Issue Description
Failure seen:
The issue appears to be during the
dynamically_compensate_leakoutCompensate 2176538 packets to port 34, and retry 1 timesWe can see this is sending far too many packets
2176538.This function compares the
TX_OKvalue before and after sending the 41 packets.Here is where it stores the counts before the packets are sent:
https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L4464
And within
dynamically_compensate_leakouthere is where they are read:https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L454
The problem here is the call to
dynamically_compensate_leakoutis passedself.src_clientas thethrift_clientargument but is operating on a port in theself.dst_client:https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/saitests/py3/sai_qos_tests.py#L4551
In this failure case I can see that the
dst_port_idvalue is used on both asics:If I dump the
TX_OKof port 32 on the src asic I get:On the dst asic I get:
This is where the massive compensate packet number comes from:
Results you see
We poll the incorrect asic/dut client in
dynamically_compensate_leakoutResults you expected to see
We should poll the same asic/dut client in
dynamically_compensate_leakoutas the port we're referencingIs it platform specific
generic
Relevant log output
No response
Output of
show versionNo response
Attach files (if any)
No response