Skip to content

Add Cursor and factory.ai session mining and improve local search resilience#702

Closed
mfhens wants to merge 17 commits intoMemPalace:developfrom
mfhens:main
Closed

Add Cursor and factory.ai session mining and improve local search resilience#702
mfhens wants to merge 17 commits intoMemPalace:developfrom
mfhens:main

Conversation

@mfhens
Copy link
Copy Markdown

@mfhens mfhens commented Apr 12, 2026

What does this PR do?

This PR expands MemPalace's ingestion and retrieval workflow for local AI usage. It adds first-class Cursor and factory.ai session mining, broadens transcript normalization for additional AI
tools, and makes search more reliable in offline or restricted environments where the default embedding setup may not be available.

Area What changed Why it matters
Session ingestion Added mempalace mine --mode cursor and a dedicated Cursor miner for ~/.cursor/chats store.db sessions Makes real Cursor conversations directly
ingestible without manual export
Transcript normalization Added support for GitHub Copilot CLI and Factory.ai JSONL formats, while filtering injected/system content Improves import coverage and keeps
low-signal context out of the palace
Project mining Added AST-aware Python chunking for functions, classes, and methods, plus symbol metadata and mtime-based re-mining Improves symbol-level retrieval and keeps
mined code fresher
Search / MCP Shared collection setup, added a local hash-embedding fallback, reranked hits using lexical overlap, and added min_similarity filtering Keeps search usable
offline and reduces low-quality matches
CLI / examples Added mempalace kg subcommands with optional --kg path override, plus Bash and PowerShell mine-all-sessions helpers Makes knowledge-graph access and
multi-source ingestion easier

How to test

Run mempalace mine --mode cursor, then query mempalace to see the results

Checklist

  • python -m pytest tests/ -v --ignore=tests/benchmarks (628 passed)
  • No hardcoded paths: The new examples and help text introduce home-relative defaults like ${HOME}/.cursor/chats, $env:USERPROFILE.cursor\chats, and ~/.mempalace/knowledge_graph.sqlite3. These are configurable and not machine-specific absolute paths, but they are still baked-in default locations.
  • Linter passes (ruff check .)

mfhens and others added 12 commits April 9, 2026 10:27
Parse Factory.ai/Droid session files (JSONL, session_start fingerprint).
Strip <system-reminder> injection blocks from user content. Skip tool_use
and thinking blocks in assistant content. Register after Copilot CLI parser
to avoid false matches. Add 7 tests (16/16 passing).

Also fix UnicodeEncodeError on Windows cp1252 by replacing the -> arrow
in convo_miner.py dry-run print statements.

Co-authored-by: Copilot <[email protected]>
Add 'mempalace kg' with four sub-actions:
  - kg add <subject> <predicate> <obj> [--source] [--from] [--confidence]
  - kg query <entity> [--as-of] [--direction]
  - kg timeline [<entity>]
  - kg stats

Routes via two-level dispatch (same pattern as 'hook' and 'instructions').
Also restore accidentally dropped 'def cmd_repair(args):' function header.

Co-authored-by: Copilot <[email protected]>
Allows pointing kg subcommands at a specific sqlite3 file,
enabling the KG to live in a project repo and be tracked in git.

Usage: mempalace kg --kg ./mempalace/knowledge_graph.sqlite3 add ...

Co-authored-by: Copilot <[email protected]>
The checkmark character (U+2713) in the progress line caused
UnicodeEncodeError on Windows terminals using cp1252 encoding.

Co-authored-by: Copilot <[email protected]>
Routes .py files through chunk_python_ast() which emits one chunk per
top-level function, class, and method. Each chunk carries symbol_type,
symbol_name, and parent_symbol metadata for filtered retrieval.
Falls back to chunk_text() on SyntaxError or empty AST.

feat(search): min_similarity threshold parameter

search_memories() and the MCP tool_search() accept min_similarity (default
0.0, backward-compatible). Results below threshold are filtered post-query.

Co-authored-by: Copilot <[email protected]>
Adds cursor_miner.py to extract user/assistant exchange pairs from
Cursor's store.db SQLite files (~/.cursor/chats/**/<session>/store.db).

Key design:
- Extracts <user_query> tags from user messages; skips system-context blobs
- Caps assistant text at 1500 chars per exchange for focused retrieval
- Uses rowid order for message sequence (SQLite insertion order)
- Copies DB to temp file before reading to avoid Cursor file locks
- Stores session_name and workspace_hash as metadata for filtering
- Dedup by source_key = 'cursor:<workspace_hash>/<session_hash>'

CLI: mempalace mine ~/.cursor/chats --mode cursor [--wing my_wing]

Co-authored-by: Copilot <[email protected]>
Adds two equivalent scripts (PowerShell + bash) that mine all three AI
chat sources in a single command.

  cursor   → ~/.cursor/chats          --mode cursor  --wing cursor_chats
  copilot  → ~/.copilot/session-state --mode convos  --wing copilot_sessions
  factory  → ~/.factory/sessions      --mode convos  --wing factory_sessions

Both scripts support --dry-run and selective source args.

Co-authored-by: Copilot <[email protected]>
- knowledge_graph: replace shared SQLite connection with thread-local
  storage (threading.local) to fix data-corruption risk in concurrent
  MCP server contexts; remove check_same_thread=False
- mcp_server: hash full content (not content[:100]) for drawer_id to
  eliminate silent ID collisions between entries sharing a common prefix
- hooks/mempal_save_hook.sh: guard TRANSCRIPT_PATH with -n before -f
  to prevent confusing errors when sanitization yields an empty path

Co-authored-by: Copilot <[email protected]>
Includes new palace.py module, expanded mcp_server tools, knowledge
graph refactor, extensive new test suite, CI Windows job, and
integrations/openclaw SKILL.md.

Fixes applied before merge:
- Thread-safe SQLite connections in KnowledgeGraph (threading.local)
- Full-content hashing for drawer IDs in mcp_server
- TRANSCRIPT_PATH empty-string guard in mempal_save_hook.sh

Conflict resolution:
- tests/test_miner.py: merged chunk_python_ast tests (main) with
  file_already_mined tests (PR)
- tests/test_normalize.py: merged copilot/factory normalizer tests
  (main) with PR's extended normalize test suite; combined imports
- tests/test_searcher.py: merged min_similarity tests (main) with
  PR's query-error and filter tests

Co-authored-by: Copilot <[email protected]>
Expand transcript ingestion with Cursor chat mining and broader normalizers so more local AI sessions can be filed into the palace.
Harden Chroma collection access and search ranking so builds and tests pass reliably without network model downloads.

Made-with: Cursor
@igorls igorls changed the base branch from main to develop April 13, 2026 04:46
@igorls igorls added area/cli CLI commands area/hooks Claude Code hook scripts (Stop, PreCompact, SessionStart) area/kg Knowledge graph area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval labels Apr 14, 2026
bensig added a commit that referenced this pull request Apr 18, 2026
Draft plugin specification for source adapters, mirroring RFC 001's
role for storage backends. Formalizes the contract six community
ingester PRs (#274, #23, #169, #232, #567, #98, #702) plus #981's
metadata-only mode have been reinventing ad-hoc, so adapter authors
can build to a stable surface.

Key decisions:
- Single ingest() method; lazy adapters yield SourceItemMetadata
  ahead of drawers, eager adapters interleave
- Declared-transformation model (§1.4) replaces informal verbatim
  promise with a verifiable one; byte_preserving adapters declare
  the empty set, declared_lossy adapters enumerate. Existing
  miner.py and the convo_miner+normalize pipeline map cleanly
- Palace is the incremental cursor via is_current(item, metadata);
  no sidecar persistence
- Routing is adapter-owned; detect_room/detect_hall move into the
  filesystem adapter
- Flat metadata per ChromaDB (RFC 001 §1.4) — entity hints as
  json_string field, KG triples route to SQLite knowledge graph
- Closets stay core-built as a post-step; adapters may emit flat
  closet_hints. Closes existing gap where convo drawers get no
  closets
- No per-drawer field renames: source_file, filed_at, source_mtime,
  added_by, normalize_version, entities, ingest_mode all preserved.
  Spec adds adapter_name, adapter_version, privacy_class

§9 enumerates the cleanup PR prerequisites (mempalace/sources/
module, PalaceContext facade, KnowledgeGraph.add_triple gaining
backwards-compatible source_drawer_id + adapter_name params).

Tracking issue: #989
mfhens and others added 5 commits April 19, 2026 07:52
- palace.py: remove duplicate hashlib/re imports; restore get_client,
  get_embedding_function, distance_to_similarity, _SafeEmbeddingFunction,
  and _HashEmbeddingFunction removed by backend refactor
- miner.py: fix indentation — add_drawer() call and if added: block were
  dedented out of the for chunk in chunks: loop
- mcp_server.py: add missing import of get_embedding_function and
  distance_to_similarity from palace
- searcher.py: remove duplicate mid-file import of removed palace symbols
- knowledge_graph.py: add missing self._local = threading.local() init
- tests/conftest.py: remove unused get_client/get_collection imports;
  add import chromadb for fixture at line 103
- tests/test_miner.py: fix syntax error in multi-name import; replace
  get_client with import chromadb (used directly in tests)
- tests/test_convo_miner.py: remove duplicate shutil import; add
  get_collection as get_palace_collection import
- tests/test_mcp_server.py: remove unused get_collection import

Co-authored-by: Copilot <[email protected]>
# Conflicts:
#	hooks/mempal_save_hook.sh
#	mempalace/cli.py
#	mempalace/layers.py
#	mempalace/miner.py
#	tests/test_hooks_cli.py
#	uv.lock
#	website/.vitepress/config.mts
#	website/.vitepress/theme/landing/HeroSection.vue
#	website/.vitepress/theme/landing/landing.css
@mfhens mfhens closed this by deleting the head repository Apr 29, 2026
jphein pushed a commit to jphein/mempalace that referenced this pull request Apr 30, 2026
…Code, MemPalace#274/MemPalace#232 Cursor, MemPalace#169 Pi, MemPalace#702 Cursor+factory.ai)

Updates the multi-agent-support bullet to cite the actual upstream
work instead of just gesturing at it. RFC 002 itself is PR MemPalace#990
(tracking issue MemPalace#989). Existing third-party prototypes already
proposed against the spec:

* OpenCode SQLite — PR MemPalace#23
* Cursor SQLite — issue MemPalace#274
* Cursor JSONL (earlier variant) — PR MemPalace#232
* Pi agent JSONL — PR MemPalace#169
* Combined Cursor + factory.ai — PR MemPalace#702

Each becomes a mempalace-source-<agent> package once RFC 002 lands.
Names the path explicitly: fork unblocks the pattern by helping land
RFC 002; per-agent adapter PRs land from their respective authors.

Aider, Gemini CLI, Codex CLI, and Warp are roadmap targets without
existing adapter PRs and are listed as such (no fabricated PR refs).

https://claude.ai/code/session_01GvwducFnFtN8KYmfbWKMR6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli CLI commands area/hooks Claude Code hook scripts (Stop, PreCompact, SessionStart) area/kg Knowledge graph area/mcp MCP server and tools area/mining File and conversation mining area/search Search and retrieval

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants