Skip to content

[loganalyzer] Fix end marker lost during rsyslog reload (#23562)#23591

Merged
wangxin merged 1 commit intosonic-net:masterfrom
yxieca:fix/loganalyzer-end-marker-flush
Apr 8, 2026
Merged

[loganalyzer] Fix end marker lost during rsyslog reload (#23562)#23591
wangxin merged 1 commit intosonic-net:masterfrom
yxieca:fix/loganalyzer-end-marker-flush

Conversation

@yxieca
Copy link
Copy Markdown
Collaborator

@yxieca yxieca commented Apr 3, 2026

Description of PR

Fixes #23562

Summary

The place_marker_to_syslog() method called flush_rsyslogd() (systemctl reload rsyslog) immediately after writing the marker to syslog. The reload briefly closes and reopens log files, which can race with the marker message still sitting in rsyslog's internal buffer, causing the marker to be silently dropped.

Root cause: PR #22447 replaced kill -HUP with systemctl reload rsyslog to avoid syslog message loss during the previous flush. However, calling the reload right after writing a new marker creates a new race — the just-written marker can be dropped during the reload's file close/reopen cycle.

Fix: Flush rsyslog before writing the marker (to drain old buffers), then after writing the marker, use a simple time.sleep(2) instead of a reload to let rsyslog write the marker to disk naturally.

Type of change

  • Bug fix

Back port request

  • N/A

Approach

What is the motivation for this PR?

LogAnalyzer's add_end_marker action fails intermittently with RuntimeError: cannot find marker end-LogAnalyzer-... in /var/log/syslog, causing test failures across all platforms.

How did you do it?

Moved the flush_rsyslogd() call to before writing the syslog marker and replaced the post-write flush with a 2-second sleep. This ensures:

  1. Pre-existing buffered messages are flushed before the marker is written
  2. The marker itself is not disrupted by a concurrent rsyslog reload
  3. rsyslog has time to naturally write the marker to disk

How did you verify/test it?

  • Code review of the race condition timeline
  • The fix eliminates the problematic pattern (write then immediate reload) identified in the issue

Note

This PR was generated with the assistance of an AI agent.

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 3, 2026

Note

This PR was generated with the assistance of an AI agent.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

wangxin
wangxin previously approved these changes Apr 3, 2026
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@github-actions github-actions bot requested a review from wangxin April 3, 2026 15:28
@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 3, 2026

@wangxin I asked agent to replace the sleep with a wait until loop.

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 3, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 4, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

)

The place_marker_to_syslog method called flush_rsyslogd() (systemctl
reload rsyslog) immediately after writing the marker to syslog. The
reload briefly closes and reopens log files, which can race with the
marker message still sitting in rsyslog's internal buffer, causing the
marker to be silently dropped.

Fix: flush rsyslog *before* writing the marker (to drain old buffers),
then after writing the marker, use a simple sleep(2) instead of a
reload to let rsyslog write the marker to disk naturally.

Fixes sonic-net#23562

Signed-off-by: Ying Xie <[email protected]>
@yxieca yxieca force-pushed the fix/loganalyzer-end-marker-flush branch from e7bbd48 to 67a9484 Compare April 4, 2026 04:01
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 4, 2026

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@yxieca
Copy link
Copy Markdown
Collaborator Author

yxieca commented Apr 7, 2026

AI agent on behalf of Ying.\n\n- Unable to verify unresolved human review comments.\n\n

@wangxin wangxin merged commit 521f48f into sonic-net:master Apr 8, 2026
19 checks passed
nanali-msft pushed a commit to nanali-msft/sonic-mgmt that referenced this pull request Apr 10, 2026
) (sonic-net#23591)

### Description of PR
Fixes sonic-net#23562

#### Summary
The `place_marker_to_syslog()` method called `flush_rsyslogd()`
(`systemctl reload rsyslog`) immediately after writing the marker to
syslog. The reload briefly closes and reopens log files, which can race
with the marker message still sitting in rsyslog's internal buffer,
causing the marker to be silently dropped.

**Root cause:** PR sonic-net#22447 replaced `kill -HUP` with `systemctl reload
rsyslog` to avoid syslog message loss during the *previous* flush.
However, calling the reload *right after* writing a new marker creates a
new race — the just-written marker can be dropped during the reload's
file close/reopen cycle.

**Fix:** Flush rsyslog *before* writing the marker (to drain old
buffers), then after writing the marker, use a simple `time.sleep(2)`
instead of a reload to let rsyslog write the marker to disk naturally.

### Type of change
- [x] Bug fix

### Back port request
- [ ] N/A

### Approach
#### What is the motivation for this PR?
LogAnalyzer's `add_end_marker` action fails intermittently with
`RuntimeError: cannot find marker end-LogAnalyzer-... in
/var/log/syslog`, causing test failures across all platforms.

#### How did you do it?
Moved the `flush_rsyslogd()` call to *before* writing the syslog marker
and replaced the post-write flush with a 2-second sleep. This ensures:
1. Pre-existing buffered messages are flushed before the marker is
written
2. The marker itself is not disrupted by a concurrent rsyslog reload
3. rsyslog has time to naturally write the marker to disk

#### How did you verify/test it?
- Code review of the race condition timeline
- The fix eliminates the problematic pattern (write then immediate
reload) identified in the issue

---
> [!NOTE]
> This PR was generated with the assistance of an AI agent.

Signed-off-by: Ying Xie <[email protected]>
opcoder0 pushed a commit to opcoder0/sonic-mgmt that referenced this pull request Apr 13, 2026
) (sonic-net#23591)

### Description of PR
Fixes sonic-net#23562

#### Summary
The `place_marker_to_syslog()` method called `flush_rsyslogd()`
(`systemctl reload rsyslog`) immediately after writing the marker to
syslog. The reload briefly closes and reopens log files, which can race
with the marker message still sitting in rsyslog's internal buffer,
causing the marker to be silently dropped.

**Root cause:** PR sonic-net#22447 replaced `kill -HUP` with `systemctl reload
rsyslog` to avoid syslog message loss during the *previous* flush.
However, calling the reload *right after* writing a new marker creates a
new race — the just-written marker can be dropped during the reload's
file close/reopen cycle.

**Fix:** Flush rsyslog *before* writing the marker (to drain old
buffers), then after writing the marker, use a simple `time.sleep(2)`
instead of a reload to let rsyslog write the marker to disk naturally.

### Type of change
- [x] Bug fix

### Back port request
- [ ] N/A

### Approach
#### What is the motivation for this PR?
LogAnalyzer's `add_end_marker` action fails intermittently with
`RuntimeError: cannot find marker end-LogAnalyzer-... in
/var/log/syslog`, causing test failures across all platforms.

#### How did you do it?
Moved the `flush_rsyslogd()` call to *before* writing the syslog marker
and replaced the post-write flush with a 2-second sleep. This ensures:
1. Pre-existing buffered messages are flushed before the marker is
written
2. The marker itself is not disrupted by a concurrent rsyslog reload
3. rsyslog has time to naturally write the marker to disk

#### How did you verify/test it?
- Code review of the race condition timeline
- The fix eliminates the problematic pattern (write then immediate
reload) identified in the issue

---
> [!NOTE]
> This PR was generated with the assistance of an AI agent.

Signed-off-by: Ying Xie <[email protected]>
Signed-off-by: opcoder0 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug:[LogAnalyzer]End marker not found in /var/log/syslog

3 participants