feat(mine): queue requests during repair-rebuild + drain after by jphein · Pull Request #4 · jphein/palace-daemon

jphein · 2026-05-06T03:30:20Z

Closes a gap noticed during today's incident: when the daemon is in repair-mode rebuild, hook fires that POST /mine fail outright. The rebuild replaces the collection mid-flight; running a concurrent mine subprocess would race the swap. /silent-save covered diary writes; /mine had no equivalent.

Mirrors the existing _pending_writes infrastructure exactly:

_pending_mines_path() — separate jsonl queue file
_enqueue_pending_mine(payload) — off-loop append
_drain_pending_mines() — replay through the same subprocess pattern the live /mine uses, gated by _mine_sem, dedup by (dir, wing, mode) before replay (one mine covers a storm of hook fires via convo_miner's mtime-dedup)
/mine endpoint checks _repair_state, queues with {queued:true} response
/repair/status surfaces pending_mines count

5 new unittest cases — path distinctness, enqueue-drain replay, dedup, .failed-* quarantine, empty queue returns 0.

Not a fix for today's HNSW corruption — that came from concurrent update_drawer calls hitting chromadb HNSW concurrency hazards (CLAUDE.md row 15). The corruption-side mitigation is PALACE_MAX_WRITE_CONCURRENCY=1 in env. This PR covers a separate failure mode: hook writes during repair windows.

🤖 Generated with Claude Code

Mirrors the existing /silent-save queue pattern for /mine. Closes a gap JP noticed: when the daemon is in repair-mode rebuild, hook fires that POST /mine fail outright (the rebuild replaces the collection mid-flight; running a concurrent mine subprocess would race the swap). The /silent-save queue covered diary writes; /mine had no equivalent, so transcript-ingest requests during a rebuild window were lost. Adds three pieces, all parallel to the silent-save infrastructure: * `_pending_mines_path()` — separate jsonl queue file (next to the silent-save pending file). * `_enqueue_pending_mine(payload)` — appends a /mine request body to the queue, off-loop via asyncio.to_thread. * `_drain_pending_mines()` — replays queued mines after rebuild via the same subprocess pattern the live /mine endpoint uses, gated by `_mine_sem`. Same rename-then-read pattern as `_drain_pending_writes` so concurrent /mine POSTs landing during the drain go to a fresh queue file. Dedup by (dir, wing, mode) before replay — a storm of hook fires queues the same target many times, but one mine catches up all of them via convo_miner's mtime-based dedup, so we don't need to run it N times. The /mine endpoint checks `_repair_state` and queues if in rebuild mode, returning `{"queued": true, "reason": "repair-in-progress"}` to signal the caller. After the rebuild completes, `_drain_pending_mines()` runs alongside `_drain_pending_writes()`. Also extends `/repair/status` to surface `pending_mines` count alongside `pending_writes`. Tests: 5 new unittest cases — pending-mines path is distinct from writes, enqueue→drain replays each target, drain dedups repeated targets (hook-storm scenario), failed replays quarantine to .failed-* files instead of getting lost, empty queue returns 0. NOT a fix for the HNSW corruption that prompted the current incident — that came from concurrent update_drawer calls hitting chromadb's HNSW concurrency hazards (CLAUDE.md row 15). The corruption-side fix is `PALACE_MAX_WRITE_CONCURRENCY=1` in the daemon env. This PR covers a separate failure mode: hook writes during repair windows. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Copilot

Pull request overview

This PR adds a “pending mines” queue to prevent /mine requests from failing (or racing a collection swap) while /repair mode=rebuild is in progress, and replays the queued mines after rebuild completes—mirroring the existing pending silent-save write queue behavior.

Changes:

Add _pending_mines_path(), _enqueue_pending_mine(), and _drain_pending_mines() to queue and replay /mine requests across rebuild windows (including dedup + quarantine-on-failure behavior).
Update /mine to enqueue requests during rebuild instead of spawning the miner subprocess immediately, and update /repair to drain pending mines after rebuild.
Extend /repair/status to surface pending mines queue depth; add unit tests covering enqueue/drain, dedup, quarantine, and empty-queue behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`main.py`	Implements pending-mines queueing/draining, `/mine` queuing during rebuild, `/repair` draining, and `/repair/status` queue depth reporting.
`tests/test_mine_queue.py`	Adds unit tests for the new pending-mines queue behavior (enqueue/drain, dedup, quarantine, empty queue).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                wing = payload.get("wing", "general")
+                mode = payload.get("mode", "convos")
+                mempalace_bin = os.path.join(os.path.dirname(sys.executable), "mempalace")
+                cmd = [mempalace_bin, "mine", directory, "--mode", mode, "--wing", wing]


+                directory = _translate_client_path(payload["dir"])
+                if not Path(directory).is_dir():
+                    _log.warning("drain-mine: skipping %s — not a directory", directory)
+                    continue


+    async def test_enqueue_then_drain_replays_each_target(self):
+        await main._enqueue_pending_mine({"dir": "/a", "wing": "wa", "mode": "convos"})
+        await main._enqueue_pending_mine({"dir": "/b", "wing": "wb", "mode": "convos"})
+        self.assertTrue(os.path.isfile(self._queue_path))
+
+        # Stub the subprocess: each call returns rc=0
+        async def _fake_subprocess(*args, **kwargs):
+            proc = MagicMock()
+            proc.communicate = AsyncMock(return_value=(b"", b""))
+            proc.returncode = 0
+            return proc
+
+        with patch("asyncio.create_subprocess_exec", side_effect=_fake_subprocess) as spawn:
+            count = await main._drain_pending_mines()
+        self.assertEqual(count, 2)
+        # Queue file is gone after a clean drain
+        self.assertFalse(os.path.isfile(self._queue_path))
+        # Subprocess was invoked twice (once per unique target)
+        self.assertEqual(spawn.call_count, 2)


Seven findings, batched into one cleanup PR. PR #3 (file-watcher) findings: * watcher.py:30 — drop unused ``import time``. * watcher.py:336 — type ``_log_future_exception`` parameter as ``concurrent.futures.Future`` (NOT ``asyncio.Future``); that's what ``asyncio.run_coroutine_threadsafe`` returns. Catch its cancellation/state errors plus the asyncio variants so the callback can't itself crash on a concurrent cancellation. * main.py:560 — surface stderr/stdout when watcher mine returns non-zero. The rc alone hides 'No mempalace.yaml found' / python tracebacks operators need to diagnose. * main.py:1202 — ``GET /watch`` now double-checks ``watcher.is_running`` so a thread crash that flips is_running to False between startup and now is reflected in the endpoint response. Lifespan still gates publication; this adds defense in depth. PR #4 (mine-queue) findings: * main.py drain replay — re-apply optional ``extract`` and ``limit`` fields. The prior drain dropped them silently, so a queue entry with those fields got replayed without. Live /mine accepts them; replay must too. * main.py drain replay — enforce the same safety guards as the live /mine endpoint: ``isinstance(dir, str)``, absolute-path, no ``..`` traversal, ``mode in {convos, projects}``, ``extract in {exchange, general}`` if set, ``limit`` int-coercible if set. Without these a queue entry could smuggle through values the live endpoint would 400. Tests: +2 — replay-extract-and-limit (asserts the new fields make it onto the subprocess argv) and skips-invalid-payload-fields (5 invalid-payload variants, each correctly skipped pre-spawn). 35 daemon tests pass. Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

Copilot AI review requested due to automatic review settings May 6, 2026 03:30

jphein merged commit 924d078 into main May 6, 2026
1 check passed

jphein deleted the feat/mine-queue-on-rebuild branch May 6, 2026 03:30

Copilot started reviewing on behalf of jphein May 6, 2026 03:30 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

jphein mentioned this pull request May 6, 2026

fix: address Copilot review on PRs #3 + #4 #5

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mine): queue requests during repair-rebuild + drain after#4

feat(mine): queue requests during repair-rebuild + drain after#4
jphein merged 1 commit intomainfrom
feat/mine-queue-on-rebuild

jphein commented May 6, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jphein commented May 6, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants