Description
202012 Warm upgrade failure on Dx010 TOR.
Steps to reproduce the issue:
- Warm upgrade 6100 device from any older image to new 202012 image.
- If running test, the failure will be caught by test. Otherwise, to catch this manually, check for LAG flap signs in syslog.
Describe the results you received:
are hitting issues in warm-upgrading Celestica devices running SONiC from any image to 202012 branch image.
Short description of the issue:
- Warm upgrade fails on TOR due to LAG(s) flap.
- LAGs flap due to 90s lacp-session timeout, and lacp-teardown is initiated from the T1 neighbors.
- LACP session takes more than 90s as the reboot process is taking longer than before in 202012 warm bootup path.
- When investigating this I found that:
a. Degradation is seen specifically in first boot steps in rc.local:
b. installing and enabling platform-modules takes a lot of time – in 202012 branch.
c. For comparison, time taken for rc.local processing.
i. Same image warm reboot: ~3s.
ii. Cross branch or in-branch warm “upgrades” to 202012 image: ~30s.
d. The difference in the boot up path is degradation in 202012 upgrade scenario, which caused points 1, 2 above.
Note that this is a 202012 branch specific – I tried 201811 in-branch upgrade, and see that rc.local processing time is much lesser.
This is a blocker for warm upgrades, hence we need a faster resolution for this.
Questions:
- Why are we taking longer in 202012 (vs 201811) platform initialization (enable platform-modules-dx010).
- Can we reduce this time - is it possible to delay some of the operations in this step to later (when warmboot completes?).
- There is an error seen ion installing Python2 package – a) do we need an installation b) why is ERROR seen?
Describe the results you expected:
No LAG should flap after warmreboot.
Unblocked, shorter rc.local processing.
Output of show version:
Output of show techsupport:
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
dx010-202012-54-54-warm.txt
dx010-202012-53-54-warm.txt
Description
202012 Warm upgrade failure on Dx010 TOR.
Steps to reproduce the issue:
Describe the results you received:
are hitting issues in warm-upgrading Celestica devices running SONiC from any image to 202012 branch image.
Short description of the issue:
a. Degradation is seen specifically in first boot steps in rc.local:
b. installing and enabling platform-modules takes a lot of time – in 202012 branch.
c. For comparison, time taken for rc.local processing.
i. Same image warm reboot: ~3s.
ii. Cross branch or in-branch warm “upgrades” to 202012 image: ~30s.
d. The difference in the boot up path is degradation in 202012 upgrade scenario, which caused points 1, 2 above.
Note that this is a 202012 branch specific – I tried 201811 in-branch upgrade, and see that rc.local processing time is much lesser.
This is a blocker for warm upgrades, hence we need a faster resolution for this.
Questions:
Describe the results you expected:
No LAG should flap after warmreboot.
Unblocked, shorter rc.local processing.
Output of
show version:Output of
show techsupport:Additional information you deem important (e.g. issue happens only occasionally):
dx010-202012-54-54-warm.txt
dx010-202012-53-54-warm.txt