Skip to content

fix(cli): paginate miner.status() to remove 10K drawer truncation#1

Merged
jphein merged 1 commit intojphein:feat/mcp-hooks-exportfrom
psaghelyi:fix/miner-status-pagination
Apr 10, 2026
Merged

fix(cli): paginate miner.status() to remove 10K drawer truncation#1
jphein merged 1 commit intojphein:feat/mcp-hooks-exportfrom
psaghelyi:fix/miner-status-pagination

Conversation

@psaghelyi
Copy link
Copy Markdown

Follow-up to the MemPalace#493 MCP fix — closes the CLI half of milla-jovovich/mempalace#478.

Context

@web3guru888 asked me to roll the CLI fix into this branch before feat/mcp-hooks-export lands, after I reported that MemPalace#493 left miner.status() still capped at 10,000 drawers. Detailed test report from a real 14,902-drawer palace: MemPalace#493 (comment)

Change

miner.py:850-876 — replace the single col.get(limit=10000, include=["metadatas"]) call with the same paginated offset loop the new _fetch_all_metadata() helper uses in mcp_server.py:

  • Get the authoritative count via col.count()
  • Page through metadatas in 1,000-item batches
  • Report the real total (was reporting len(metas), which topped out at 10,000)

Same fix pattern as palace_graph.py:49-51 and the server-side helper in this PR, so the whole codebase now has consistent pagination semantics on ChromaDB reads.

Test plan

Verified against a real 14,902-drawer / 17-wing palace on mempalace 3.1.0 + chromadb 0.6.3:

Before (this PR's parent feat/mcp-hooks-export @ 548abd6):

=======================================================
  MemPalace Status — 10000 drawers
=======================================================
# ...only 11 of 17 wings listed; askalot-analysis, docs, e2e-tests,
# infrastructure, scripts, shared-frontend silently dropped

After (this PR):

=======================================================
  MemPalace Status — 14902 drawers
=======================================================
# all 17 wings listed:
# armiger, askalot-ai, askalot-analysis, askalot-common, askalot-qml,
# balansor, docs, e2e-tests, infrastructure, platform, portor,
# roundtable, scripts, seneschal, shared-frontend, sirway, targetor

No new tests added — the change is mechanical and matches the existing _fetch_all_metadata() pattern already covered by MemPalace#493's MCP tests.

The CLI `mempalace status` command still capped at 10,000 drawers after
this PR's MCP server fix, because miner.status() retained the original
`col.get(limit=10000, include=["metadatas"])` call. On palaces larger
than 10K, it printed a wrong total and silently dropped every wing past
the cutoff (tested on a 14,902-drawer / 17-wing palace: CLI reported
"10000 drawers" and listed only 11 wings).

Replace the single bounded call with the same paginated offset loop the
new `_fetch_all_metadata()` helper uses in mcp_server.py (1,000-item
batches), and report the true total via `col.count()`.

Closes the CLI half of MemPalace#478.
jphein pushed a commit that referenced this pull request Apr 10, 2026
…silon, add schema bounds

- Paginate miner.status() past 10K drawer limit (fixes PR #1 from psaghelyi)
- Use math.isclose(abs_tol=0.001) instead of abs() < 0.01 for mtime dedup
- Add minimum/maximum bounds to min_similarity in MCP search tool schema
- Add debug logging to bare except in file_already_mined()

https://claude.ai/code/session_01LAi5NQmr4KKyx6QNwZrRXq
@jphein jphein merged commit 40517bc into jphein:feat/mcp-hooks-export Apr 10, 2026
jphein added a commit that referenced this pull request Apr 11, 2026
Hybrid search (TODO #1): when vector results are poor (best distance
> 1.0), automatically falls back to keyword text-match via ChromaDB
where_document.$contains. Extracts most distinctive non-stopword token
from query, or accepts explicit keyword param. Results merged, deduped,
sorted by distance. MCP server exposes new keyword parameter.

Wing fix (TODO #0): _ingest_transcript() now derives project wing from
Claude Code transcript path (-Projects-<name>/) instead of hardcoding
"sessions". Per-project search now finds auto-mined content.

692 tests pass (22 new).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
jphein added a commit that referenced this pull request Apr 11, 2026
Co-Authored-By: Claude Opus 4.6 <[email protected]>
jphein added a commit that referenced this pull request Apr 19, 2026
Two items in "Fork Changes (still ahead of upstream after v3.3.1 merge)"
were never — or are no longer — fork-only. Demote both:

1. Epsilon mtime comparison (palace.py)
   Upstream merged Arnold Wender's equivalent fix as PR MemPalace#610 on
   2026-04-12 (commit bb7ed80). Their threshold is 0.001 vs our fork's
   0.01, but abs(stored - current) < epsilon is semantically identical.
   Moved to "Merged into upstream (post-v3.3.1)".

2. ".jsonl exempt from JUNK_FILE_SIZE cap"
   The description was wrong. The actual change (commit 560fdbd) adds
   ".jsonl" to READABLE_EXTENSIONS in miner.py — a whitelist addition,
   not a size-cap exemption. And it was authored by MSL (upstream
   maintainer) at the same SHA on upstream/develop. Never was a fork
   contribution. Moved to "Pulled in from upstream/develop".
   Related: upstream also raised MAX_FILE_SIZE 10MB → 500MB in d137d12
   (the actual size-cap fix, separate concern).

Clarified that item now at #1 (bulk_check_mined) is fork-only and
independent of the mtime comparison fix. Renumbered remaining "still
ahead" items 1-18.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
jphein added a commit that referenced this pull request Apr 21, 2026
Resolutions:
- `.claude-plugin/.mcp.json`, `plugin.json` — adopt upstream's `mempalace-mcp`
  console-script command (added via upstream MemPalace#340 for pipx/uv). Run
  `pip install -e .` in plugin venv after merge to install the entry point.
- `.claude-plugin/hooks/*.sh`, `hooks/*.sh` — adopt upstream's console-command
  resolution order (`mempalace` script → `python3 -m mempalace` → `python`).
  `MEMPAL_PYTHON` override still works inside `hooks_cli.py`.
- `mempalace/hooks_cli.py`, `tests/test_hooks_cli.py` — keep fork's
  `_mempalace_python()` helper (fork-ahead #4); upstream only had
  `sys.executable`, which loses MEMPAL_PYTHON override.
- `mempalace/miner.py` — keep fork's concurrent mining path (fork-ahead #1),
  apply upstream's unicode-`✓` → ASCII-`+` fix (MemPalace#681) to both paths.
- `mempalace/backends/chroma.py` — take upstream's refined `quarantine_stale_hnsw`
  docstring (it's the version merged via our own MemPalace#1000).

Brought in: 33 upstream commits including Belarusian/Chinese/German/Spanish/French
entity detection, console-script entry points, hook plugin-root space quoting,
and v3.3.2 tag (which contains our MemPalace#681/MemPalace#1000/MemPalace#1023).

Tests: 1096 passed, 106 deselected (benchmarks). Ruff clean.
jphein added a commit that referenced this pull request Apr 24, 2026
…ostgres

Three things merged into one README pass:

1. Badge: link version-3.3.4 to jphein/mempalace/releases (the v3.3.4
   tag we just pushed) and add an upstream-3.3.3 secondary badge so
   readers can tell fork vs upstream version at a glance. Was sitting
   uncommitted from earlier today.

2. Multi-client coordination section: replaced the three-fix v3.3.4
   summary with a four-fix one. Added @felipetruman's MemPalace#976 num_threads
   pin (cherry-picked at 552a0d7) as fix #1 — the actual root-cause
   fix. Reframed our MemPalace#1171/MemPalace#1173/MemPalace#1177 as defense-in-depth around
   symptoms. Walked back palace-daemon from "primary concurrency story
   in progress" to "deferred pending observation" — with MemPalace#976's fix in
   place, the daemon's same-machine value drops; multi-machine and
   Windows remain its differentiators but neither is current pain.

3. Postgres + pgvector: walked back from "parallel track" framing to
   "long-term option, no immediate move" for the same reason. Migration
   cost stays real, current pain is mitigated, decision deferred until
   v3.3.4 stack is observed in production or TS rewrite ships.

Removed two stale paragraphs that were left over from the previous
"daemon as primary" framing.
jphein pushed a commit that referenced this pull request May 3, 2026
The MCP `mempalace_get_drawer` tool returned the entire raw drawer
metadata blob to any connected client, and the `source_file` field
in that blob is the absolute filesystem path written by the miners
(`miner.py`, `convo_miner.py` — `source_file = str(filepath)`). On
a single-user local deployment this is self-disclosure, but in
nested-agent or multi-server MCP topologies the client is a separate
trust domain and the host's directory layout has no documented
client-side use.

Mirror the mitigation that `searcher.search_memories()` already applies
on its own return path: reduce `source_file` to its basename via
`Path(source_file).name` before handing the metadata to the client.
Citations still work — the directory layout does not leak.

Companion to #1 (omit palace_path from tool_status). Same threat class,
different surface:

- mempalace_status — palace dir path     → fixed in #1
- mempalace_get_drawer — per-drawer source_file path → this PR

Other read tools were audited and do not leak host paths:
- mempalace_search    — already basenames source_file
- mempalace_list_drawers — returns wing/room/preview only
- mempalace_diary_read   — date/timestamp/topic/content only
- mempalace_reconnect    — success/message/drawers only
- mempalace_kg_*         — entity/predicate strings, counts
- mempalace_check_duplicate — wing/room/preview only

Changes:
- mempalace/mcp_server.py: tool_get_drawer() now basenames metadata.source_file
- tests/test_mcp_server.py: regression test asserting the absolute path
  and its parent directory do not appear anywhere in the response
- website/reference/mcp-tools.md: clarify the documented return shape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants