Problem
MemPalace currently has two modes for any given file: mine everything (chunk all content into drawers), or exclude entirely (via .gitignore / SKIP_DIRS). There's no middle ground.
Real projects have directories where the palace needs to know what files are and what they do, but embedding every line of content is wasteful or harmful:
- Experiment result dumps — hundreds of KB of JSON numbers. The palace should know 'this is centroid tracking data from Phase 2' not embed 2000 chunks of float arrays.
- Test directories — the palace should know 'pytest suite for the miner and MCP server', not embed every assert statement.
- Data/content folders — static assets, CSV datasets, model checkpoints. Descriptions matter, content doesn't.
- Generated output — build artifacts, compiled files, log directories.
Today the only workaround is: .gitignore the heavy files, then manually write a separate markdown doc describing them and hope it gets mined. This is fragile and doesn't scale.
Proposed solution
Add a descriptions section to \mempalace.yaml:
\\yaml
wing: myproject
rooms:
- name: experiments
keywords: [data, results]
descriptions:
'data/results/.json': 'Experiment result JSONs from task-geometry Phase 2 — centroid tracking, behavioral validation, spectral features'
'data/checkpoints/': 'Model checkpoints saved during DPO fine-tuning runs'
'tests/': 'Pytest test suite covering mining, MCP server, room detection, and ignore logic'
'**/_responses.jsonl': 'Raw model response dumps from steering/calibration experiments'
\\
Mining behavior
When \scan_project\ encounters a file matching a description pattern:
- Skip normal content chunking (no 800-char chunks of raw data)
- Create one drawer with the description text as content
- Set \source_file\ metadata to the matched path (so search can surface it)
- Respect mtime — only re-create the drawer if the description changed
This gives the palace semantic knowledge about what's in a path without the noise of embedding raw content.
Pattern matching
Use the same glob syntax as .gitignore — users already know it. Patterns are matched against the project-relative path.
Use cases
- Any ML project with large result files, activation dumps, or model outputs
- Web projects with asset directories, uploaded content, or build output
- Monorepos where some subdirectories should be described, not indexed
- Projects with test suites that are useful to reference but not embed line-by-line
Problem
MemPalace currently has two modes for any given file: mine everything (chunk all content into drawers), or exclude entirely (via .gitignore / SKIP_DIRS). There's no middle ground.
Real projects have directories where the palace needs to know what files are and what they do, but embedding every line of content is wasteful or harmful:
Today the only workaround is: .gitignore the heavy files, then manually write a separate markdown doc describing them and hope it gets mined. This is fragile and doesn't scale.
Proposed solution
Add a descriptions section to \mempalace.yaml:
\\yaml
wing: myproject
rooms:
keywords: [data, results]
descriptions:
'data/results/.json': 'Experiment result JSONs from task-geometry Phase 2 — centroid tracking, behavioral validation, spectral features'
'data/checkpoints/': 'Model checkpoints saved during DPO fine-tuning runs'
'tests/': 'Pytest test suite covering mining, MCP server, room detection, and ignore logic'
'**/_responses.jsonl': 'Raw model response dumps from steering/calibration experiments'
\\
Mining behavior
When \scan_project\ encounters a file matching a description pattern:
This gives the palace semantic knowledge about what's in a path without the noise of embedding raw content.
Pattern matching
Use the same glob syntax as .gitignore — users already know it. Patterns are matched against the project-relative path.
Use cases