Summary
The EMOTION_MARKERS list in mempalace/general_extractor.py contains a wildcard regex that matches any text wrapped in single asterisks:
EMOTION_MARKERS = [
r"\blove\b", r"\bscared\b", r"\bafraid\b", ...
r"\*[^*]+\*", # <-- this one
]
This pattern matches every Markdown *italic* and every inner pair of **bold**. Because assistant responses in technical conversations use bold for emphasis, command names, variable names, section headers, etc., practically every non-trivial paragraph scores non-zero against emotional. Since classification picks max(scores) and most technical paragraphs trigger only this one marker (no decision/problem/milestone keywords), they all land in the emotional room.
Reproduction
Mined 273 Claude Code session JSONLs (entirely technical DevOps content — Kubernetes, Helm, Elasticsearch, GitLab, Jira) with:
mempalace mine C:/Users/.../.claude/projects/<project>/ \
--mode convos --extract general --wing vibeOps
Result:
=======================================================
Done.
Files processed: 273
Drawers filed: 2443
By room:
emotional 1615 files <-- 66% of all drawers
milestone 387 files
decision 322 files
problem 114 files
preference 5 files
=======================================================
Spot-check of mempalace search <any> --wing vibeOps --room emotional returns results like:
- "Sampling: Type: parentbased_traceidratio — если у трейса есть родитель..."
- "Namespace: opentelemetry · Имя: instrumentation"
- Tables of Jira statuses
- OpenTelemetry collector endpoint lists
Zero emotional content. Every one of these paragraphs only matches \*[^*]+\* via Markdown bold in the source text.
Root cause
general_extractor.py line 160 (mempalace 3.0.14):
EMOTION_MARKERS = [
...
r"\*[^*]+\*",
]
This regex is too greedy for any text that contains Markdown formatting. It was presumably intended to catch things like *whispers* or *sighs* emotes in personal chat logs, but in developer contexts it catches every *foo* / **bar**.
Proposed fixes (in order of preference)
- Remove the regex entirely. Users who want emote detection can opt in via a separate marker list.
\blove\b, \bscared\b, etc. are already precise enough.
- Require a surrounding word boundary that excludes code-like content, e.g.
r"(?<!\w)\*[a-z][^*]{2,}\*(?!\w)" plus a blacklist of common programmer terms — but this is brittle.
- Strip Markdown before scoring (
_extract_prose already exists in the file; extend it to strip **bold** / *italic* / backtick code spans before running the regex scorer).
- Prefer non-emotional max when scores tie (low-effort bias fix): if a paragraph scores equally on emotional and another type, pick the other type. Doesn't fix the root cause but reduces false positives.
Option (3) is probably the right long-term answer — the same issue affects the \* marker and will affect future markers that collide with Markdown syntax.
Environment
- mempalace 3.0.14
- Source: 273 Claude Code conversation JSONL files (technical DevOps)
- Extractor mode:
--extract general
Summary
The
EMOTION_MARKERSlist inmempalace/general_extractor.pycontains a wildcard regex that matches any text wrapped in single asterisks:This pattern matches every Markdown
*italic*and every inner pair of**bold**. Because assistant responses in technical conversations use bold for emphasis, command names, variable names, section headers, etc., practically every non-trivial paragraph scores non-zero againstemotional. Since classification picksmax(scores)and most technical paragraphs trigger only this one marker (no decision/problem/milestone keywords), they all land in theemotionalroom.Reproduction
Mined 273 Claude Code session JSONLs (entirely technical DevOps content — Kubernetes, Helm, Elasticsearch, GitLab, Jira) with:
Result:
Spot-check of
mempalace search <any> --wing vibeOps --room emotionalreturns results like:Zero emotional content. Every one of these paragraphs only matches
\*[^*]+\*via Markdown bold in the source text.Root cause
general_extractor.pyline 160 (mempalace 3.0.14):This regex is too greedy for any text that contains Markdown formatting. It was presumably intended to catch things like
*whispers*or*sighs*emotes in personal chat logs, but in developer contexts it catches every*foo*/**bar**.Proposed fixes (in order of preference)
\blove\b,\bscared\b, etc. are already precise enough.r"(?<!\w)\*[a-z][^*]{2,}\*(?!\w)"plus a blacklist of common programmer terms — but this is brittle._extract_prosealready exists in the file; extend it to strip**bold**/*italic*/ backtick code spans before running the regex scorer).Option (3) is probably the right long-term answer — the same issue affects the
\*marker and will affect future markers that collide with Markdown syntax.Environment
--extract general