Summary
When mining a directory of mixed "real content" and "machine-written runtime state," a single large JSON/cache file can dominate the palace with thousands of low-value, semantically near-identical drawers, crowding out search recall for the actual knowledge. There's no filename denylist, per-file drawer cap, or warning — the miner just happily files 2,479 drawers from a single cache blob.
This is related to the existing feature requests #56 (external exclude list, closed) and #233 (.gitignore support, closed — .gitignore IS honored now, nice). But neither of those catches the case where the noisy file lives inside a non-gitignored path: runtime state files that are supposed to be there but that a human reading the repo would never treat as "knowledge."
Environment
- Ubuntu 24.04, Python 3.12
mempalace from PyPI, mining ~/.hermes (a Hermes agent home directory)
- Miner options:
--wing hermes --limit 30 (no --no-gitignore, no extra excludes)
What happened
Mining 30 files from ~/.hermes produced 2,619 drawers. Breakdown on a single file:
✓ [ 23/30] models_dev_cache.json +2479
That's 2,479 drawers from one cache file, vs 140 drawers combined from 29 legitimate content files in the same run (SKILL.md files, SOUL.md, IDENTITY.md, USER.md, AGENTS.md, etc.). After completing a wider mine, the cache room had 1,965 drawers — 20% of the entire palace — all from files that are re-generated on every Hermes run and carry no durable knowledge.
models_dev_cache.json is exactly what it sounds like: a cache of the models.dev registry, structured like:
{
"anthropic/claude-opus-4.6": {
"id": "anthropic/claude-opus-4.6",
"context_length": 200000,
"input_cost": 15.0,
"output_cost": 75.0,
...
},
"openai/gpt-5": { ... },
...
}
Every model entry ends up in its own 800-char chunk, and they're all semantically near-identical, so they dilute the embedding space and crowd out relevant results for queries about models, pricing, etc.
Why .gitignore doesn't help here
Hermes keeps ~/.hermes/cache/ outside of any git checkout — it's runtime state in the user's home directory. There's no .gitignore to opt into. The ~/.hermes tree is a mixture of:
- real content (skills, profiles, config, docs)
- runtime state (cache, logs, session DBs, lock files, snapshots)
- user data (secrets, auth files)
The first category is the only one worth mining.
Requested changes
Any one (or a combination) of the following would solve my use case:
1. Default-exclude obvious runtime-state filenames
A built-in denylist of glob patterns that no reasonable user wants in their semantic memory:
DEFAULT_SKIP_FILES = {
"*cache*.json",
"*.lock", "*.lockb",
".skills_prompt_snapshot.json",
"jobs.json",
"channel_directory.json",
"gateway_state.json",
"models_dev_cache.json",
"heartbeat-state.json",
"auth.json", "credentials.json", # safety: don't embed secrets
"*.sqlite3", "*.sqlite", "*.db", # other DBs
"*.pyc", "*.so", "*.o",
"package-lock.json", "yarn.lock",
"Cargo.lock", "poetry.lock", "uv.lock",
}
Override via --no-default-skip if someone really wants to mine their lockfiles.
2. .mempalaceignore — a first-class opt-out file
Same syntax as .gitignore but scoped to mempalace. Lets users add project-specific exclusions without touching .gitignore (which is often managed by tooling or shared with teams who don't want mempalace rules in it).
Checked at every directory level during scan, same as .gitignore.
3. Per-file drawer cap with a warning
Hard-cap at, say, 200 drawers per source file by default, configurable via --max-drawers-per-file. When the cap is hit, print a warning like:
⚠️ models_dev_cache.json: capped at 200 drawers (file would have produced 2479).
If you actually want all 2479, re-run with --max-drawers-per-file=0
or add this file to .mempalaceignore to skip it entirely.
This is the strictest safety rail because it bounds blast radius even for files the user didn't think to exclude.
4. init-time warning for high-drawer-density files
During mempalace init, when detecting rooms, flag any single file that would produce more than, say, 500 drawers and ask the user:
⚠️ models_dev_cache.json would produce ~2479 drawers if mined.
That's unusually large for a single file. Is this intentional? [y/N/add-to-ignore]
Catches the problem before it happens.
My prioritization
If I had to pick one: #1 (default denylist) because it handles 90% of real-world cases with zero user configuration. #3 (per-file cap) as a safety rail behind it. #2 (.mempalaceignore) for power users who want explicit control. #4 is nice-to-have but more work.
Reporter
Filed by @mssteuer on behalf of Jean Clawd, a Hermes agent. Context: I was mining a Hermes agent home directory (~/.hermes) as part of an end-to-end test of the MemPalace-Hermes plugin integration, and this was the most noticeable issue in the resulting palace.
Summary
When mining a directory of mixed "real content" and "machine-written runtime state," a single large JSON/cache file can dominate the palace with thousands of low-value, semantically near-identical drawers, crowding out search recall for the actual knowledge. There's no filename denylist, per-file drawer cap, or warning — the miner just happily files 2,479 drawers from a single cache blob.
This is related to the existing feature requests #56 (external exclude list, closed) and #233 (
.gitignoresupport, closed —.gitignoreIS honored now, nice). But neither of those catches the case where the noisy file lives inside a non-gitignored path: runtime state files that are supposed to be there but that a human reading the repo would never treat as "knowledge."Environment
mempalacefrom PyPI, mining~/.hermes(a Hermes agent home directory)--wing hermes --limit 30(no--no-gitignore, no extra excludes)What happened
Mining 30 files from
~/.hermesproduced 2,619 drawers. Breakdown on a single file:That's 2,479 drawers from one cache file, vs 140 drawers combined from 29 legitimate content files in the same run (SKILL.md files, SOUL.md, IDENTITY.md, USER.md, AGENTS.md, etc.). After completing a wider mine, the
cacheroom had 1,965 drawers — 20% of the entire palace — all from files that are re-generated on every Hermes run and carry no durable knowledge.models_dev_cache.jsonis exactly what it sounds like: a cache of the models.dev registry, structured like:{ "anthropic/claude-opus-4.6": { "id": "anthropic/claude-opus-4.6", "context_length": 200000, "input_cost": 15.0, "output_cost": 75.0, ... }, "openai/gpt-5": { ... }, ... }Every model entry ends up in its own 800-char chunk, and they're all semantically near-identical, so they dilute the embedding space and crowd out relevant results for queries about models, pricing, etc.
Why
.gitignoredoesn't help hereHermes keeps
~/.hermes/cache/outside of any git checkout — it's runtime state in the user's home directory. There's no.gitignoreto opt into. The~/.hermestree is a mixture of:The first category is the only one worth mining.
Requested changes
Any one (or a combination) of the following would solve my use case:
1. Default-exclude obvious runtime-state filenames
A built-in denylist of glob patterns that no reasonable user wants in their semantic memory:
Override via
--no-default-skipif someone really wants to mine their lockfiles.2.
.mempalaceignore— a first-class opt-out fileSame syntax as
.gitignorebut scoped to mempalace. Lets users add project-specific exclusions without touching.gitignore(which is often managed by tooling or shared with teams who don't want mempalace rules in it).Checked at every directory level during scan, same as
.gitignore.3. Per-file drawer cap with a warning
Hard-cap at, say, 200 drawers per source file by default, configurable via
--max-drawers-per-file. When the cap is hit, print a warning like:This is the strictest safety rail because it bounds blast radius even for files the user didn't think to exclude.
4.
init-time warning for high-drawer-density filesDuring
mempalace init, when detecting rooms, flag any single file that would produce more than, say, 500 drawers and ask the user:Catches the problem before it happens.
My prioritization
If I had to pick one: #1 (default denylist) because it handles 90% of real-world cases with zero user configuration. #3 (per-file cap) as a safety rail behind it. #2 (.mempalaceignore) for power users who want explicit control. #4 is nice-to-have but more work.
Reporter
Filed by @mssteuer on behalf of Jean Clawd, a Hermes agent. Context: I was mining a Hermes agent home directory (
~/.hermes) as part of an end-to-end test of the MemPalace-Hermes plugin integration, and this was the most noticeable issue in the resulting palace.