[loganalyzer] Fix end marker lost during rsyslog reload (#23562)#23591
Merged
wangxin merged 1 commit intosonic-net:masterfrom Apr 8, 2026
Merged
[loganalyzer] Fix end marker lost during rsyslog reload (#23562)#23591wangxin merged 1 commit intosonic-net:masterfrom
wangxin merged 1 commit intosonic-net:masterfrom
Conversation
Collaborator
Author
|
Note This PR was generated with the assistance of an AI agent. |
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
wangxin
previously approved these changes
Apr 3, 2026
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
@wangxin I asked agent to replace the sleep with a wait until loop. |
Collaborator
Author
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
Collaborator
Author
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
) The place_marker_to_syslog method called flush_rsyslogd() (systemctl reload rsyslog) immediately after writing the marker to syslog. The reload briefly closes and reopens log files, which can race with the marker message still sitting in rsyslog's internal buffer, causing the marker to be silently dropped. Fix: flush rsyslog *before* writing the marker (to drain old buffers), then after writing the marker, use a simple sleep(2) instead of a reload to let rsyslog write the marker to disk naturally. Fixes sonic-net#23562 Signed-off-by: Ying Xie <[email protected]>
e7bbd48 to
67a9484
Compare
Collaborator
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Collaborator
Author
|
AI agent on behalf of Ying.\n\n- Unable to verify unresolved human review comments.\n\n |
wangxin
approved these changes
Apr 8, 2026
nanali-msft
pushed a commit
to nanali-msft/sonic-mgmt
that referenced
this pull request
Apr 10, 2026
) (sonic-net#23591) ### Description of PR Fixes sonic-net#23562 #### Summary The `place_marker_to_syslog()` method called `flush_rsyslogd()` (`systemctl reload rsyslog`) immediately after writing the marker to syslog. The reload briefly closes and reopens log files, which can race with the marker message still sitting in rsyslog's internal buffer, causing the marker to be silently dropped. **Root cause:** PR sonic-net#22447 replaced `kill -HUP` with `systemctl reload rsyslog` to avoid syslog message loss during the *previous* flush. However, calling the reload *right after* writing a new marker creates a new race — the just-written marker can be dropped during the reload's file close/reopen cycle. **Fix:** Flush rsyslog *before* writing the marker (to drain old buffers), then after writing the marker, use a simple `time.sleep(2)` instead of a reload to let rsyslog write the marker to disk naturally. ### Type of change - [x] Bug fix ### Back port request - [ ] N/A ### Approach #### What is the motivation for this PR? LogAnalyzer's `add_end_marker` action fails intermittently with `RuntimeError: cannot find marker end-LogAnalyzer-... in /var/log/syslog`, causing test failures across all platforms. #### How did you do it? Moved the `flush_rsyslogd()` call to *before* writing the syslog marker and replaced the post-write flush with a 2-second sleep. This ensures: 1. Pre-existing buffered messages are flushed before the marker is written 2. The marker itself is not disrupted by a concurrent rsyslog reload 3. rsyslog has time to naturally write the marker to disk #### How did you verify/test it? - Code review of the race condition timeline - The fix eliminates the problematic pattern (write then immediate reload) identified in the issue --- > [!NOTE] > This PR was generated with the assistance of an AI agent. Signed-off-by: Ying Xie <[email protected]>
opcoder0
pushed a commit
to opcoder0/sonic-mgmt
that referenced
this pull request
Apr 13, 2026
) (sonic-net#23591) ### Description of PR Fixes sonic-net#23562 #### Summary The `place_marker_to_syslog()` method called `flush_rsyslogd()` (`systemctl reload rsyslog`) immediately after writing the marker to syslog. The reload briefly closes and reopens log files, which can race with the marker message still sitting in rsyslog's internal buffer, causing the marker to be silently dropped. **Root cause:** PR sonic-net#22447 replaced `kill -HUP` with `systemctl reload rsyslog` to avoid syslog message loss during the *previous* flush. However, calling the reload *right after* writing a new marker creates a new race — the just-written marker can be dropped during the reload's file close/reopen cycle. **Fix:** Flush rsyslog *before* writing the marker (to drain old buffers), then after writing the marker, use a simple `time.sleep(2)` instead of a reload to let rsyslog write the marker to disk naturally. ### Type of change - [x] Bug fix ### Back port request - [ ] N/A ### Approach #### What is the motivation for this PR? LogAnalyzer's `add_end_marker` action fails intermittently with `RuntimeError: cannot find marker end-LogAnalyzer-... in /var/log/syslog`, causing test failures across all platforms. #### How did you do it? Moved the `flush_rsyslogd()` call to *before* writing the syslog marker and replaced the post-write flush with a 2-second sleep. This ensures: 1. Pre-existing buffered messages are flushed before the marker is written 2. The marker itself is not disrupted by a concurrent rsyslog reload 3. rsyslog has time to naturally write the marker to disk #### How did you verify/test it? - Code review of the race condition timeline - The fix eliminates the problematic pattern (write then immediate reload) identified in the issue --- > [!NOTE] > This PR was generated with the assistance of an AI agent. Signed-off-by: Ying Xie <[email protected]> Signed-off-by: opcoder0 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of PR
Fixes #23562
Summary
The
place_marker_to_syslog()method calledflush_rsyslogd()(systemctl reload rsyslog) immediately after writing the marker to syslog. The reload briefly closes and reopens log files, which can race with the marker message still sitting in rsyslog's internal buffer, causing the marker to be silently dropped.Root cause: PR #22447 replaced
kill -HUPwithsystemctl reload rsyslogto avoid syslog message loss during the previous flush. However, calling the reload right after writing a new marker creates a new race — the just-written marker can be dropped during the reload's file close/reopen cycle.Fix: Flush rsyslog before writing the marker (to drain old buffers), then after writing the marker, use a simple
time.sleep(2)instead of a reload to let rsyslog write the marker to disk naturally.Type of change
Back port request
Approach
What is the motivation for this PR?
LogAnalyzer's
add_end_markeraction fails intermittently withRuntimeError: cannot find marker end-LogAnalyzer-... in /var/log/syslog, causing test failures across all platforms.How did you do it?
Moved the
flush_rsyslogd()call to before writing the syslog marker and replaced the post-write flush with a 2-second sleep. This ensures:How did you verify/test it?
Note
This PR was generated with the assistance of an AI agent.