feat(clarify): SSE long-connection for real-time clarify notifications#1355
feat(clarify): SSE long-connection for real-time clarify notifications#1355jasonjcwu wants to merge 1 commit intonesquena:masterfrom
Conversation
Replaces the 1.5s HTTP polling loop with a Server-Sent Events endpoint at /api/clarify/stream that pushes clarify events to the browser instantly. - Backend: SSE subscribe/unsubscribe/notify lifecycle in api/clarify.py - sse_subscribe(session_id) — register a bounded Queue - sse_unsubscribe(session_id, queue) — cleanup on disconnect - _clarify_sse_notify(session_id, head, total) — push to all subscribers - submit_pending() calls _clarify_sse_notify() inside _lock - resolve_clarify() calls _clarify_sse_notify() after pop (trailing approval fix) - New route: GET /api/clarify/stream?session_id= - Atomic subscribe + initial snapshot under clarify._lock - 30s keepalive comments, _CLIENT_DISCONNECT_ERRORS handling - Frontend: EventSource SSE as primary + 3s HTTP polling fallback - 29 new tests (static analysis + unit + concurrency)
|
Thanks @jasonjcwu, this is a clean port of the SSE pattern from #1350 to clarify, and it's exactly where I'd want this to land next. A few review notes: Design — looks right
Questions / nits
Tests Once the queue-full behaviour is documented in a comment or test, this is good to merge from my side. Will do a closer line-level pass this week. |
…ndler The new _handle_clarify_sse_stream handler in #1355 holds clarify._lock and then calls clarify.get_pending(sid) under the lock. get_pending also acquires _lock internally — and clarify._lock is a non-reentrant threading.Lock(), so the second acquisition deadlocks the SSE handler thread the moment any client connects to /api/clarify/stream. Existing tests pass because they only exercise sse_subscribe, sse_unsubscribe, _clarify_sse_notify, and submit_pending directly — none of them invoke the route handler. The deadlock would only manifest when a real EventSource opens the connection. Reproduced with a tiny harness that holds _lock and calls get_pending: the worker thread is still blocked after a 2s timeout. With the fix, both empty and populated queue cases complete in <1ms. Fix: read clarify._gateway_queues / clarify._pending inline under the same _lock acquisition, mirroring the approval SSE handler's pattern at api/routes.py:2785-2793. No recursive lock; head-of-queue snapshot is identical to what get_pending would have returned. Added tests/test_pr1355_sse_handler_no_deadlock.py with three tests: - behavioural: empty queue snapshot completes within 2s - behavioural: populated queue snapshot returns the head entry - source-level invariant: routes.py must not call get_clarify_pending() inside `with _clarify_lock:` block (locks the regression in) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…al reconnect The original PR #1355 frontend health timer at 60s force-reconnected every 60s regardless of activity, with a comment claiming it was a 'no event in 60s' detector. This churned one TCP/SSE setup per minute per active session. Fix: track lastEventAt on initial+clarify event arrivals; only reconnect when the gap exceeds 60s. The server sends keepalive comments every 30s, but EventSource silently consumes those — we can only bump lastEventAt on real application events. Under healthy conditions the timer never fires. Sits on top of nesquena's 604b44a deadlock fix (parallel work — Nathan caught the same Opus MUST-FIX before my push landed). Opus SHOULD-FIX #2 from the v0.50.249 pre-release review.
#1355) Replaces the 1.5s HTTP polling loop for clarify with a Server-Sent Events endpoint at /api/clarify/stream that pushes clarify events to the browser instantly. Mirrors the approval SSE pattern from v0.50.248 (#1350) including all the correctness lessons: - Atomic subscribe + initial snapshot under clarify._lock - _clarify_sse_notify called inside _lock for ordering guarantees (no notify-out-of-order race) - Notify passes head=q[0].data (head-fidelity, not the just-appended entry) - resolve_clarify also calls notify after pop so trailing clarifies surface immediately (no stuck-clarify bug) - Empty-state notify with None,0 after pop-empty so frontend hides the card - 30s keepalive comments, _CLIENT_DISCONNECT_ERRORS handling - Bounded queue (maxsize=16) with silent drop on full - Frontend: EventSource with automatic 3s HTTP polling fallback on onerror Co-authored-by: fxd-jason <[email protected]>
Bundles 5 community PRs: - #1355 feat(clarify): SSE long-connection (mirrors #1350 pattern, includes all correctness lessons) - #1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear - #1357 fix: preserve imported session source metadata - #1358 fix: collapse sidebar session lineage rows - #1359 fix: sync active session across tabs Tests: 3444 passing (3411 -> 3444, +33)
|
Shipped in v0.50.249 (merge Production verified live at port 8787, The Clarify SSE feature you implemented landed cleanly with one MUST-FIX caught in pre-release review: the handler called This was caught by Opus + Nathan in parallel work (Nathan pushed the inline-snapshot fix at The rest of your implementation absorbed all 3 lessons from #1350 / v0.50.248 perfectly:
Excellent contribution. Already verified live: Thanks @fxd-jason! 🙏 |
nesquena#1355) Replaces the 1.5s HTTP polling loop for clarify with a Server-Sent Events endpoint at /api/clarify/stream that pushes clarify events to the browser instantly. Mirrors the approval SSE pattern from v0.50.248 (nesquena#1350) including all the correctness lessons: - Atomic subscribe + initial snapshot under clarify._lock - _clarify_sse_notify called inside _lock for ordering guarantees (no notify-out-of-order race) - Notify passes head=q[0].data (head-fidelity, not the just-appended entry) - resolve_clarify also calls notify after pop so trailing clarifies surface immediately (no stuck-clarify bug) - Empty-state notify with None,0 after pop-empty so frontend hides the card - 30s keepalive comments, _CLIENT_DISCONNECT_ERRORS handling - Bounded queue (maxsize=16) with silent drop on full - Frontend: EventSource with automatic 3s HTTP polling fallback on onerror Co-authored-by: fxd-jason <[email protected]>
Bundles 5 community PRs: - nesquena#1355 feat(clarify): SSE long-connection (mirrors nesquena#1350 pattern, includes all correctness lessons) - nesquena#1356 fix: context window indicator overflow (live SSE fallback) + uploading status clear - nesquena#1357 fix: preserve imported session source metadata - nesquena#1358 fix: collapse sidebar session lineage rows - nesquena#1359 fix: sync active session across tabs Tests: 3444 passing (3411 -> 3444, +33)
…ndler The new _handle_clarify_sse_stream handler in nesquena#1355 holds clarify._lock and then calls clarify.get_pending(sid) under the lock. get_pending also acquires _lock internally — and clarify._lock is a non-reentrant threading.Lock(), so the second acquisition deadlocks the SSE handler thread the moment any client connects to /api/clarify/stream. Existing tests pass because they only exercise sse_subscribe, sse_unsubscribe, _clarify_sse_notify, and submit_pending directly — none of them invoke the route handler. The deadlock would only manifest when a real EventSource opens the connection. Reproduced with a tiny harness that holds _lock and calls get_pending: the worker thread is still blocked after a 2s timeout. With the fix, both empty and populated queue cases complete in <1ms. Fix: read clarify._gateway_queues / clarify._pending inline under the same _lock acquisition, mirroring the approval SSE handler's pattern at api/routes.py:2785-2793. No recursive lock; head-of-queue snapshot is identical to what get_pending would have returned. Added tests/test_pr1355_sse_handler_no_deadlock.py with three tests: - behavioural: empty queue snapshot completes within 2s - behavioural: populated queue snapshot returns the head entry - source-level invariant: routes.py must not call get_clarify_pending() inside `with _clarify_lock:` block (locks the regression in) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Replaces the 1.5s HTTP polling loop for clarify with a Server-Sent Events endpoint at `/api/clarify/stream` that pushes clarify events to the browser instantly.
What changed
Follows the same SSE pattern as the approval stream (#1350), including the notify-ordering and head-fidelity fixes from `e68f74a`.
Testing
All 75 SSE-related tests pass (42 approval + 29 clarify + 4 regression):