fix(a2a): drain stale loopback events (#2302), detect stale PID on startup (#2295)#2318
Merged
fix(a2a): drain stale loopback events (#2302), detect stale PID on startup (#2295)#2318
Conversation
This was
linked to
issues
Mar 28, 2026
Closed
329a5a1 to
5715fe5
Compare
…ct stale PID on startup (#2295) - Drain remaining events from `output_rx` with `try_recv()` after breaking out of the recv loop in `AgentTaskProcessor::process`; prevents stale `Flush` events emitted by `flush_chunks()` from bleeding into the next request and producing empty artifacts. - Add `is_process_alive(pid)` to `zeph-core::daemon`; read and liveness-check an existing PID file before writing a new one — remove if stale, bail if the process is still alive. - Add unit tests: `is_process_alive_{current,nonexistent}_pid`, `loopback_stale_flush_drained_after_full_message`, `stale_pid_detection_{dead,live}_process`. Closes #2302, closes #2295
a2bba5d to
5868674
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
bug(a2a): message/send returns completed task with no artifacts when LLM response is empty #2302 —
message/sendreturnedcompletedwith noartifactson second and subsequent requests. Root cause: afterAgentTaskProcessorconsumedLoopbackEvent::FullMessageand broke out of the recv loop, a trailingLoopbackEvent::Flush(emitted byflush_chunks()) remained buffered inoutput_rx. The next request consumed this stale event first, produced an emptyArtifactChunk, and broke — never seeing the real LLM response. Fix: drainoutput_rxwithtry_recv()after every recv-loop exit so no stale events survive into the next request.bug(a2a): daemon PID file not cleaned on abnormal exit — restart requires manual cleanup #2295 — A PID file left by a crashed or SIGKILL'd daemon blocked restarts with
WARN: failed to write PID file: File exists. Fix: before writing the PID file, read any existing file and check whether the stored PID refers to a live process (kill -0). If stale → remove and proceed. If alive → bail with an actionable error message.Changes
crates/zeph-core/src/daemon.rs— addis_process_alive(pid: u32) -> bool(Unix:kill -0, non-Unix: alwaysfalse)src/daemon.rs— drain loop after recv loop (while let Ok(_) = handle.output_rx.try_recv() {}); stale-PID check inrun_daemonbeforewrite_pid_filesrc/tests.rs— 5 new tests covering both fixesCHANGELOG.md—[Unreleased]section updatedTest plan
cargo +nightly fmt --check— passescargo clippy --features full --workspace -- -D warnings— passescargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins— 6915/6915 passedis_process_alive_current_process,is_process_alive_nonexistent_pid,loopback_stale_flush_drained_after_full_message,stale_pid_detection_dead_process,stale_pid_detection_live_process