[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable#16
[hostcfgd] Fixed the brief blackout in hostcfgd using SubscriberStateTable#16
Conversation
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
| try: | ||
| ret = ast.literal_eval(val) | ||
| except Exception as e: | ||
| ret = False |
There was a problem hiding this comment.
The default value is recorded in two places, one is here, one in the line: self.__safe_eval_bool(feature_cfg.get('has_timer', 'False'))
There was a problem hiding this comment.
The idea is, if the user programs something, invalid apart from True or False, this shouldn't fail.
If the user programs anything other than ''True", the value is set to False
| from swsscommon.swsscommon import ConfigDBConnector | ||
| from sonic_py_common.device_info import is_multi_npu, get_num_npus | ||
| from swsscommon.swsscommon import SubscriberStateTable, DBConnector, Select | ||
| from swsscommon.swsscommon import ConfigDBConnector, TableConsumable |
There was a problem hiding this comment.
Do you still need ConfigDBConnector?
There was a problem hiding this comment.
Yes, i'm still using that for non-active listener use cases. The active listener responsibility is transferred to SubscriberStateTable
There was a problem hiding this comment.
What is the difference between active/non-active listeners?
There was a problem hiding this comment.
Sorry, i haven't framed it properly. I meant, i did not change some of the existing code which fetches tables from the DB . Eg: https://github.com/vivekreddynv/sonic-buildimage/blob/1d626893843cd4514494f280fa7c1eb1159f002c/src/sonic-host-services/scripts/hostcfgd#L905
| class HostConfigDaemon: | ||
| def __init__(self): | ||
| # Just a sanity check to verify if the CONFIG_DB has been initialized | ||
| # before moving forward |
There was a problem hiding this comment.
Why do we need this explicitelly?
There was a problem hiding this comment.
This was present before my changes ,i've just retained it.
There was a problem hiding this comment.
I wonder if we can remove it, those daemons that use swsscommon.SubscriberStateTable don't use ConfigDBConnector's wait_for_init.
There was a problem hiding this comment.
This was probably added because this script used to start very early, and the implementers wanted to wait until the CONFIG_DB is initialized/redis is up. Although, you may argue that this can be removed. But since it doesn't affect anything, i think it's better to have as a sanity check.
| for callback in cbs: | ||
| callback(table, key, op, dict(fvs)) | ||
|
|
||
| if init_load_flag and table == "FEATURE" and key.lower() == "swss": |
There was a problem hiding this comment.
What is the purpose of this condition?
There was a problem hiding this comment.
Previously, before the listen phase was started, update_all_features_config was run followed by load . This inturn invokes the load method of AaaCfg.
With the redesign, we cannot have the manual ordering which we had previously, thus introduced this condition i.e. to basically execute the load of aaacfg after swss is started
There was a problem hiding this comment.
Why AAA config depends on FEATURE config?
There was a problem hiding this comment.
AAA/RADIUS/TACACS tables, have src_intf option which could be configured as either eth0, Loopback0 etc.
Before this change, AAA config was run after the FEATURE config is completed
So, i thought of retaining this order by configuring SWSS first and then AAA. but if you think it's safe to confiure AAA before feature config, then i can change this.
There was a problem hiding this comment.
You're saying that AAA/RADIUS/TACACS need to have an interface to work with, which could be management interface or loopback interface which is created by swss and this is why swss has to be started first and then AAA configuration processed?
If it is, I would suggest removing this condition, since it anyway does not guarantee that Loopback interface will be created at this time.
| 'PyGObject', | ||
| 'sonic-py-common', | ||
| 'systemd-python', | ||
| 'systemd-python' |
There was a problem hiding this comment.
Haven't seen it actually in use
Signed-off-by: Vivek Reddy Karri <[email protected]>
Signed-off-by: Vivek Reddy Karri <[email protected]>
|
@stepanblyschak, Addressed suggestion and provided comments on few of them. Please review |
Signed-off-by: Vivek Reddy Karri <[email protected]>
* [BFN] Updated platform APIs impl Signed-off-by: Andriy Kokhan <[email protected]> * Extended BFN platform SFP APIs implementation * Update sfp.py * [BFN] Extended SFP platform plugin implementation Signed-off-by: Andriy Kokhan <[email protected]> * [BFN] Extended Fans platform plugin implementation * [BFN] divided classes Fan and FanDrawer into 2 files * Signed-off-by: Vadym Yashchenko <[email protected]> What I did Add get_model() function Add get_low_critical_threshold() function Change __get(...) function. How I did it Differnece from previous implementation of __get(...) function is return real value or -9999.9 if value is not provided by thrift API * Add get_presence() function and revised __get() function Signed-off-by: Vadym Yashchenko <[email protected]> * [BFN] Updated PSU platform APIs impl Signed-off-by: Dmytro Lytvynenko <[email protected]> * Added BFN PSU cache (#9) Signed-off-by: Andriy Kokhan <[email protected]> * [BFN] Fans and Fantray platform APIs update (#7) * [BFN] Updated SFP platform APIs (#10) Signed-off-by: Volodymyr Boyko <[email protected]> * [BFN] Updated platform API for thermal (#8) * Signed-off-by: Vadym Yashchenko <[email protected]> * Revert "[BFN] Fans and Fantray platform APIs update (#7)" (#11) This reverts commit c62a733. * Add support health monitor system (#15) Signed-off-by: Petro Bratash <[email protected]> * Update chassis.py * [BFN] Updated FANs and FAN Tray platform API (#14) * Fix fix_alignment (#17) Signed-off-by: Petro Bratash <[email protected]> * [BFN] Improvement show environment (#16) * Added PSU temperature skip into platform.json (#18) Signed-off-by: Andriy Kokhan <[email protected]> * Do not skip psud on Newport Signed-off-by: Andriy Kokhan <[email protected]> * [BFN] fix fan status from Not OK to Ok (#19) * [BFN] Updated SFP platform plugin (#13) Signed-off-by: Volodymyr Boyko <[email protected]> * [DPB] Fix typo for Ethernet0 2x200G[100G,40G] breakout mode (#21) Signed-off-by: Mykola Gerasymenko <[email protected]> * [barefoot] Tmp fix vendor_rev (#22) Signed-off-by: Volodymyr Boyko <[email protected]> * Fixed python issues in sonic_platform/fan_drawer.py Signed-off-by: Andriy Kokhan <[email protected]> * Updated fan_drawer.py * Fixing trailing white spaces in fan_drawer.py * [BFN] Fix thrift for SFPs API Signed-off-by: Volodymyr Boyko <[email protected]> * In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue Signed-off-by: Andriy Kokhan <[email protected]> * [Newport] Thermal manager (#23) * Signed-off-by: Vadym Yashchenko <[email protected]> * Revert "In platform.json, replaced 'false' with '0' to workaround ast.literal_eval() issue" This reverts commit 1e73127. * Removed 'controllable' options from platform.json to fix factory default config generation Signed-off-by: Andriy Kokhan <[email protected]> * Update thermal_manager.py * Migrated SFP plugin to sonic_xcvr API (#30) Signed-off-by: Andriy Kokhan <[email protected]> Co-authored-by: KostiantynYarovyiBf <[email protected]> Co-authored-by: Vadym Yashchenko <[email protected]> Co-authored-by: Dmytro Lytvynenko <[email protected]> Co-authored-by: Volodymyr Boiko <[email protected]> Co-authored-by: Petro Bratash <[email protected]> Co-authored-by: Mykola Gerasymenko <[email protected]>
[sonic-linkmgrd][master] submodule update Commits added: 0c23756 Jing Zhang 2022-01-19 Linkmgrd subscribing State DB route event (#13) 12b9951 Longxiang Lyu 2021-12-13 Add TLV support to ICMP payload (#11) 3eedda3 Longxiang Lyu 2022-01-06 Add missing intermediate states (#16) 8da4982 Ying Xie 2022-01-04 [linkmgrd] update README, set coding style guidance (#15) a897cf8 Longxiang Lyu 2021-12-13 Improve PR template (#16) 6fec701 Jing Zhang 2021-12-06 Add pull request template for linkmgrd repo (#9) signed-off-by: Jing Zhang [email protected]
[sonic-linkmgrd][master] submodule update Commits added: 0c23756 Jing Zhang 2022-01-19 Linkmgrd subscribing State DB route event (#13) 12b9951 Longxiang Lyu 2021-12-13 Add TLV support to ICMP payload (#11) 3eedda3 Longxiang Lyu 2022-01-06 Add missing intermediate states (#16) 8da4982 Ying Xie 2022-01-04 [linkmgrd] update README, set coding style guidance (#15) a897cf8 Longxiang Lyu 2021-12-13 Improve PR template (#16) 6fec701 Jing Zhang 2021-12-06 Add pull request template for linkmgrd repo (#9) signed-off-by: Jing Zhang [email protected]
#### Why I did it Update sonic-host-services submodule to include below commits: ``` bc8698d Merge pull request #21 from abdosi/feature 557a110 Fix the issue where if dest port is not specified in ACL rule than for multi-asic where we create NAT rule to forward traffic from Namespace to host fail with exception. 6e45acc (master) Merge pull request #14 from abdosi/feature 4d6cad7 Merge remote-tracking branch 'upstream/master' into feature bceb13e Install libyang to azure pipeline (#20) 82299f5 Merge pull request #13 from SuvarnaMeenakshi/cacl_fabricns 15d3bf4 Merge branch 'master' into cacl_fabricns de54082 Merge pull request #16 from ZhaohuiS/feature/caclmgrd_external_client_warning_log b4b368d Add warning log if destination port is not defined d4bb96d Merge branch 'master' into cacl_fabricns 35c76cb Add unit-test and fix typo. 17d44c2 Made Changes to be Python 3.7 compatible 978afb5 Aligning Code 1fbf8fb Merge remote-tracking branch 'upstream/master' into feature 7b8c7d1 Added UT for the changes 91c4c42 Merge pull request #9 from ZhaohuiS/feature/caclmgrd_external_client 7c0b56a Add 4 test cases for external_client_acl, including single port and port range for ipv4 and ipv6 b71e507 Merge remote-tracking branch 'origin/master' into HEAD d992dc0 Merge branch 'master' into feature/caclmgrd_external_client bd7b172 DST_PORT is configuralbe in json config file for EXTERNAL_CLIENT_ACL f9af7ae [CLI] Move hostname, mgmt interface/vrf config to hostcfgd (#2) 70ce6a3 Merge pull request #10 from sujinmkang/cold_reset 29be8d2 Added Support to render Feature Table using Device running metadata. Also added support to render 'has_asic_scope' field of Feature Table. 3437e35 [caclmgrd][chassis]: Add ip tables rules to accept internal docker traffic from fabric asic namespaces. 8720561 Fix and add hardware reboot cause determination tests 0dcc7fe remove the empty bracket if no hardware reboot cause minor e47d831 fix the wrong expected result comparision ef86b53 Fix startswith Attribute error 8a630bb fix mock patch 8543ddf update the reboot cause logic and update the unit test 53ad7cd fix the mock patch function 7c8003d fix the reboot-cause regix for test 1ba611f fix typo 25379d3 Add unit test case a56133b Add hardware reboot cause as actual reboot cause for soft reboot failed c7d3833 Support Restapi/gnmi control plane acls f6ea036 caclmgrd: Don't block traffic to mgmt by default a712fc4 Update test cases adc058b caclmgrd: Don't block traffic to mgmt by default 06ff918 Merge pull request #7 from bluecmd/patch-1 e3e23bc ci: Rename sonic-buildimage repository e83a858 Merge pull request #4 from kamelnetworks/acl-ip2me-test f5a2e50 [caclmgrd]: Tests for IP2ME rules generation ```
* Enable iproute2 as the SDK is also built Signed-off-by: Vivek Reddy <[email protected]> * [Nvidia] Dont use mkbmdeb method of dkms to build the package Signed-off-by: Vivek Reddy <[email protected]> * Include mft into the bookworm build Signed-off-by: Vivek Reddy <[email protected]> * Added linux image to the Depends section of mft Signed-off-by: Vivek Reddy <[email protected]> --------- Signed-off-by: Vivek Reddy <[email protected]>
* Enable iproute2 as the SDK is also built Signed-off-by: Vivek Reddy <[email protected]> * [Nvidia] Dont use mkbmdeb method of dkms to build the package Signed-off-by: Vivek Reddy <[email protected]> * Include mft into the bookworm build Signed-off-by: Vivek Reddy <[email protected]> * Added linux image to the Depends section of mft Signed-off-by: Vivek Reddy <[email protected]> --------- Signed-off-by: Vivek Reddy <[email protected]>
Signed-off-by: Vivek Reddy <[email protected]> [Nvidia] Enable iproute2 & fix mft build (#16) * Enable iproute2 as the SDK is also built Signed-off-by: Vivek Reddy <[email protected]> * [Nvidia] Dont use mkbmdeb method of dkms to build the package Signed-off-by: Vivek Reddy <[email protected]> * Added linux image to the Depends section of mft Signed-off-by: Vivek Reddy <[email protected]> [Nvidia] [Bookworm] Separate KERNEL_MFT into a new target (sonic-net#16782) * [Nvidia] Seperate KERNEL_MFT into a new target because of kernel header dependency Signed-off-by: Vivek Reddy <[email protected]> * Update linux-kernel submodule Signed-off-by: Vivek Reddy <[email protected]> * Fix paralell build problem Signed-off-by: Vivek Reddy <[email protected]> --------- Signed-off-by: Vivek Reddy <[email protected]>
Signed-off-by: Vivek Reddy <[email protected]> [Nvidia] Enable iproute2 & fix mft build (#16) * Enable iproute2 as the SDK is also built Signed-off-by: Vivek Reddy <[email protected]> * [Nvidia] Dont use mkbmdeb method of dkms to build the package Signed-off-by: Vivek Reddy <[email protected]> * Added linux image to the Depends section of mft Signed-off-by: Vivek Reddy <[email protected]> [Nvidia] [Bookworm] Separate KERNEL_MFT into a new target (sonic-net#16782) * [Nvidia] Seperate KERNEL_MFT into a new target because of kernel header dependency Signed-off-by: Vivek Reddy <[email protected]> * Update linux-kernel submodule Signed-off-by: Vivek Reddy <[email protected]> * Fix paralell build problem Signed-off-by: Vivek Reddy <[email protected]> --------- Signed-off-by: Vivek Reddy <[email protected]>
* [build] Fix bfinstall and bootimages packages URLs. * [build] Fix issue with "bootimages" package dependencies.
…sonic-net#17750) #### Why I did it src/dhcpmon ``` * 2443073 - (HEAD -> 202311, origin/202311) [counter] Clear counter table when dhcpmon init (#14) (#16) (2 days ago) [Yaqiang Zhu] ``` #### How I did it #### How to verify it #### Description for the changelog
…ly (sonic-net#20955) #### Why I did it src/sonic-bmp ``` * 4dcef92 - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #16 from FengPan-Frank/fix1 (25 hours ago) [Feng-msft] * 4735a94 - Bug fixing during integration test (35 hours ago) [Feng Pan] ``` #### How I did it #### How to verify it #### Description for the changelog
…tically (sonic-net#673) #### Why I did it src/sonic-sairedis ``` * f727bb5 - (HEAD -> 202412, origin/HEAD, origin/202412) [code sync] Merge code from sonic-net/sonic-sairedis:202411 to 202412 (#16) (55 minutes ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog
…omatically (sonic-net#690) #### Why I did it src/sonic-swss-common ``` * e787abe - (HEAD -> 202412, origin/HEAD, origin/202412) [code sync] Merge code from sonic-net/sonic-swss-common:202411 to 202412 (#16) (21 hours ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog
…tomatically (sonic-net#689) #### Why I did it src/sonic-linux-kernel ``` * 771ce48 - (HEAD -> 202412, origin/HEAD, origin/202412) [optoe] Reset page select byte to 0 before upper memory access on page 0h (sonic-net#464) (#16) (21 hours ago) [mssonicbld] ``` #### How I did it #### How to verify it #### Description for the changelog
… sensor errors (sonic-net#24783) - Why I did it Fix transient errors during bfb install on smartswitch platform. ERR pmon#sensord: Error getting sensor data: mp2975/#16: Kernel interface error - How I did it Use pre-shutdown procedures before doing a reboot - How to verify it Installation of bfb image on dpu from switch shouldn't cause errors Signed-off-by: Hemanth Kumar Tirupati <[email protected]>
…net#25643) * [build] Add build timing report and dependency analysis tools Add three scripts for build performance instrumentation: - scripts/build-timing-report.sh: Parse per-package timing from build logs (HEADER/FOOTER timestamps), generate sorted duration table, phase breakdown, parallelism timeline, and CSV export. - scripts/build-dep-graph.py: Parse rules/*.mk dependency graph, compute critical path, fan-out/fan-in bottleneck analysis, and generate DOT/JSON output for visualization. - scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O, and Docker container count during builds for resource utilization analysis. Add "make build-report" target to slave.mk that runs the timing report and dependency analysis after a build completes. Example output from a VS build on 24-core/30GB machine: - 210 packages built in 53m wall time (173m CPU) - Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4) - Critical path: 14 packages deep (libnl -> libswsscommon -> utilities) - Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents Signed-off-by: Rustiqly <[email protected]> * Address Copilot review: fix 17 bugs in build analysis scripts - Use free -m with division instead of free -g to avoid rounding (#1) - Add = and ?= to Makefile dependency regex patterns (#2, #7) - CPU calculation now uses /proc/stat delta (two reads) (#3, #14) - Fix misleading 'critical path estimate' comment (#4) - Fix parallelism timeline comment (60s not 10s) (#5) - Include after-relationship packages in fan stats (#6) - Guard disk I/O division by zero when INTERVAL<=1 (#8) - Remove unused elapsed_line variable (#9) - Remove redundant LIBSWSSCOMMON_DBG check (#10) - Remove active_make_jobs from CSV header comment (#11) - Wire up _RDEPENDS parsing to build reverse deps (#12) - Remove unnecessary 'if v' filter on rdeps JSON (#13) - Remove unused REPORT_FORMAT parameter (#15) - Add cycle detection to critical path algorithm (#16) - Add execute permission check for companion scripts (#17) Signed-off-by: Rustiqly <[email protected]> --------- Signed-off-by: Rustiqly <[email protected]> Co-authored-by: Rustiqly <[email protected]>
…dating udevd rules (sonic-net#26343) - Why I did it On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs. This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase. Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'. Program terminated with signal SIGABRT, Aborted. #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 (gdb) bt #0 0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #4 0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #5 0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #6 0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #7 0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #8 0x0000559f295519cf in ?? () #9 0x0000559f29553a77 in ?? () #10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so #14 0x0000559f29545820 in ?? () #15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6 #17 0x0000559f29545c51 in ?? () - How I did it Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op. - How to verify it Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step Reboot the switch Verify no new systemd-udevd coredumps in /var/core/ Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID ) Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running) Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules Signed-off-by: Hemanth Kumar Tirupati <[email protected]>
Why I did it
Fixes the issue
sonic-buildimage/issues/8619.How I did it
update_all_features_configwhich was roughly taking a 5-10 sec time to execute and thus the reason for blackoutHow to verify it
UT's:
Verified manually,
Which release branch to backport (provide reason below if selected)
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)