Skip to content

Warmboot reconciliation fails when zero buffer profiles are attached to some queues #899

@svsivm

Description

@svsivm

Description:
Two extra zero buffer profiles are created in the 'temp' asic view if warmboot is executed while some queues have zero buffer profiles attached to them. The VIDs of these two extra zero buffer profiles in temp asic view match those in the 'current' asic view. However the attribute list in the temp asic view is empty for these matching VIDs and hence the comparison logic during warmboot reconciliation inside performObjectSetTransition fails with the errors mentioned below. We ran into this issue while running the PFC WD warmboot pytest, specifically the second sub-test (https://github.com/Azure/sonic-mgmt/blob/master/tests/pfcwd/test_pfcwd_warm_reboot.py#L25)

Steps to reproduce:
Execute the second scenario in the pfc watchdog warmboot test on platform that uses 'zero buffer profile' model to handle PFC storms.

To reproduce manually, perform the following steps:
(a) Enable PFC WD on all target port/queue.
(b) Send PFC storm to target port/queue and verify PFC storm is detected and mitigation action is executed.
(c) While PFC storm is continued to be sent, perform warmboot.
(d) Reconciliation will fail.

Describe the results you got:
From the syslog below, we notice that current asic view has 8 buffer profiles (6 from config_db and 2 zero buffer profiles). Temp asic view however has 10 buffer profiles.

Jul 26 22:37:00.333718 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF_TRAP_GROUP on current view 5 is different than on temporary view: 1
Jul 26 22:37:00.333718 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_POLICER on current view 3 is different than on temporary view: 0
Jul 26 22:37:00.333747 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_BUFFER_PROFILE on current view 8 is different than on temporary view: 10
Jul 26 22:37:00.333812 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count for SAI_OBJECT_TYPE_HOSTIF_TRAP on current view 12 is different than on temporary view: 1
Jul 26 22:37:00.337738 sonic-dut WARNING syncd#syncd: :- logViewObjectCount: object count is different on both view, there will be ASIC OPERATIONS!
Jul 26 22:37:00.407638 sonic-dut NOTICE syncd#syncd: :- processObjectForViewTransition: processing: SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007a8
Jul 26 22:37:00.407638 sonic-dut NOTICE syncd#syncd: :- findCurrentBestMatch: found best match for SAI_OBJECT_TYPE_BUFFER_PROFILE oid:0x190000000007a8 since object status is MATCHED
Jul 26 22:37:00.407656 sonic-dut NOTICE syncd#syncd: :- processObjectForViewTransition: found best match SAI_OBJECT_TYPE_BUFFER_PROFILE: current: oid:0x190000000007a8 temporary: oid:0x190000000007a8
Jul 26 22:37:00.407656 sonic-dut NOTICE syncd#syncd: :- performObjectSetTransition: first pass (curr): attr SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE
Jul 26 22:37:00.407672 sonic-dut NOTICE syncd#syncd: :- performObjectSetTransition: performing default on existing object VID oid:0x190000000007a8: SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE: SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYN AMIC, we need default dependency TREE, FIXME
Jul 26 22:37:00.407672 sonic-dut NOTICE syncd#syncd: :- performObjectSetTransition: Skipping create only attr on matched object: SAI_BUFFER_PROFILE_ATTR_THRESHOLD_MODE:SAI_BUFFER_PROFILE_THRESHOLD_MODE_DYNAMIC
Jul 26 22:37:00.407690 sonic-dut NOTICE syncd#syncd: :- performObjectSetTransition: first pass (curr): attr SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH
Jul 26 22:37:00.407690 sonic-dut NOTICE syncd#syncd: :- performObjectSetTransition: performing default on existing object VID oid:0x190000000007a8: SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH: -8, we need default dependency TRE E, FIXME
Jul 26 22:37:00.407704 sonic-dut ERR syncd#syncd: :- performObjectSetTransition: current attribute is mandatory on create, crate and set, and object MATCHED, FIXME oid:0x190000000007a8 SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_T H:-8
Jul 26 22:37:00.407704 sonic-dut ERR syncd#syncd: :- processObjectForViewTransition: performObjectSetTransition on MATCHED object (oid:0x190000000007a8) FAILED! bug?
Jul 26 22:37:00.407718 sonic-dut NOTICE syncd#syncd: :- applyViewTransition: comparison logic took 0.068258 sec
Jul 26 22:37:00.481019 sonic-dut ERR syncd#syncd: :- applyView: Exception: :- processObjectForViewTransition: performObjectSetTransition on MATCHED object (oid:0x190000000007a8) FAILED! bug?
Jul 26 22:37:00.491945 sonic-dut NOTICE syncd#syncd: :- applyView: apply took 1.044873 sec
Jul 26 22:37:00.492476 sonic-dut ERR swss#orchagent: :- syncd_apply_view: Failed to notify syncd APPLY_VIEW -1

0x190000000007a8 and 0x190000000007aa are the VIDs of the zero buffer profile that we see both in 'current' and 'temp' asic view. However, the temp version has NULL attribute list which is not handled in the object transition code.

Additional information you deem important
Following is from redis-cli output.
Before warmboot:
127.0.0.1:6379[1]> keys BUFFER_PROFILE

  1. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006f1"
  2. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007a8" <<<<< Zero buffer profile
  3. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ee"
  4. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ed"
  5. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ec"
  6. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007aa" <<<<< Zero buffer profile
  7. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006f0"
  8. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ef"

After warmboot:
127.0.0.1:6379[1]> keys BUFFER_PROFILE

  1. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d3"
  2. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ef"
  3. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ed"
  4. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000087c"
  5. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d0"
  6. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d5"
  7. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d2"
  8. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007a8" <<<<< Matching zero buffer profile VID in temp view
  9. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ee"
  10. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x1900000000087a"
  11. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007aa" <<<<< Matching zero buffer profile VID in temp view
  12. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d1"
  13. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006ec"
  14. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006f0"
  15. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000006f1"
  16. "TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007d4"
  17. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007aa"
  18. "ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007a8"

However, the temp view VID has empty attribute list which is not handled in performObjectSetTransition
127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007a8

  1. "NULL"
  2. "NULL"
    127.0.0.1:6379[1]> hgetall TEMP_ASIC_STATE:SAI_OBJECT_TYPE_BUFFER_PROFILE:oid:0x190000000007aa
  3. "NULL"
  4. "NULL"

Debug Analysis:
On further debugging we observed the following (from syslog):

a. There are 2 ‘extra’ buffer profiles created in the ‘temp’ view with empty attribute list. We are not able to determine why /how they got created. However, the VID of those ‘temp’ objects match the VID of the ‘current’ objects (zero buffer profiles). Hence, they are marked as ‘MATCHED’.

b. However, in performObjectSetTransition, the temp object with matched VID of the zero buffer profile is parsed and since the attribute list is empty, it is never put in the ‘processed attr list’. Now, while parsing the attributes of the ‘current’ object (with VID of the zero buffer profile), we encounter failure because one of the attributes of the buffer profile (SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH) is ‘mandatory on create’ and ‘create and set’.

Relevant code snippet from performObjectSetTransition:

for (auto &at: temporaryObj->getAllAttributes())
    {
        auto &temporaryAttr = at.second;

        SWSS_LOG_NOTICE("first pass (temp): attr %s", temporaryAttr->getStrAttrId().c_str());

        const auto meta = temporaryAttr->getAttrMetadata();

        const sai_attribute_t &attr = *temporaryAttr->getSaiAttr();

        processedAttributes.insert(attr.id); // mark attr id as processed <<<<<< Since temp object attr list is empty, this list is empty too.
……
   }

    for (auto &ac: currentBestMatch->getAllAttributes())
    {
        auto &currentAttr = ac.second;

        const auto &meta = currentAttr->getAttrMetadata();

        const sai_attribute_t &attr = *currentAttr->getSaiAttr();

        if (processedAttributes.find(attr.id) != processedAttributes.end())   <<<<<<<<< This is false for the current zero buffer profile object.
        {
            /*
             * This attribute was processed in previous temporary attributes processing so skip it here.
             */

            continue;
        }
……..
……..
                SWSS_LOG_ERROR("current attribute is mandatory on create, crate and set, and object MATCHED, FIXME %s %s:%s",
                        currentBestMatch->m_str_object_id.c_str(),
                        meta->attridname,
                        currentAttr->getStrAttrValue().c_str());
}

Sonic version
SONiC Software Version: SONiC.202012.Innovium.0-dirty-20210523.005101
Distribution: Debian 10.9
Kernel: 4.19.0-12-2-amd64
Build commit: d7b4c62f
Build date: Sun May 23 07:58:46 UTC 2021
Platform: x86_64-cel_midstone-r0
HwSKU: Midstone-200i
ASIC: innovium
ASIC Count: 1

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions