sanitize_name() rejects Unicode names with diacritics, blocking valid knowledge graph writes via MCP
Problem
MemPalace 3.1.0 rejects valid Unicode names with diacritics in the MCP / knowledge graph write path, even though the underlying knowledge graph and SQLite storage handle Unicode correctly.
Examples that fail before patch:
This makes it impossible to store many real personal names correctly in languages such as Latvian.
Root cause
The blocker appears to be sanitize_name() in mempalace/config.py.
Current regex:
_SAFE_NAME_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_ .'-]{0,126}[a-zA-Z0-9]?$")
This is ASCII-only, so names containing Unicode letters are rejected before they ever reach knowledge_graph.py.
Important detail
The issue does not appear to be in the knowledge graph itself.
For example, KnowledgeGraph._entity_id("Pēteris") works and produces a Unicode-safe lowercase ID (pēteris) on my system. SQLite storage also handles Unicode fine.
So the failure is specifically in the validation layer before KG insertion.
Repro
Using the MCP tool / KG write path:
mempalace_kg_add(
subject="Guntis Endzelis",
predicate="parent_of",
object="Pēteris",
valid_from="2009-10-24"
)
Before patch, this fails with an error like:
object contains invalid characters
Local patch that fixed it
I locally changed the regex in mempalace/config.py to a Unicode-aware one:
_SAFE_NAME_RE = re.compile(r"^(?=.{1,128}$)[\\w][\\w .'-]{0,126}[\\w]?$", re.UNICODE)
After restarting the MCP host, writes with proper Latvian diacritics worked:
and KG queries returned them correctly with diacritics preserved.
Suggested fix
Replace the ASCII-only validator with Unicode-aware validation for entity/name fields.
At minimum, MemPalace should allow Unicode letters in:
- knowledge graph subjects
- predicates, where appropriate
- knowledge graph objects
- possibly other user-facing name fields validated by
sanitize_name()
Notes / caution
If sanitize_name() is shared with wing/room/path-like identifiers, it may be worth separating:
- a stricter validator for file/path-ish identifiers
- a Unicode-friendly validator for human names and KG entities
That would avoid unintentionally broadening constraints in places that really do need tighter ASCII-safe rules.
Expected behavior
Valid human names with Unicode letters should be accepted and stored without transliteration to ASCII.
Actual behavior
Names with diacritics are rejected as invalid characters before the KG write happens.
Environment
- MemPalace 3.1.0
- Python 3.10
- OpenClaw MCP integration
- Linux / WSL2 in this case, but the bug appears to be regex-level and likely cross-platform
sanitize_name()rejects Unicode names with diacritics, blocking valid knowledge graph writes via MCPProblem
MemPalace 3.1.0 rejects valid Unicode names with diacritics in the MCP / knowledge graph write path, even though the underlying knowledge graph and SQLite storage handle Unicode correctly.
Examples that fail before patch:
PēterisMatīssĢirtsThis makes it impossible to store many real personal names correctly in languages such as Latvian.
Root cause
The blocker appears to be
sanitize_name()inmempalace/config.py.Current regex:
This is ASCII-only, so names containing Unicode letters are rejected before they ever reach
knowledge_graph.py.Important detail
The issue does not appear to be in the knowledge graph itself.
For example,
KnowledgeGraph._entity_id("Pēteris")works and produces a Unicode-safe lowercase ID (pēteris) on my system. SQLite storage also handles Unicode fine.So the failure is specifically in the validation layer before KG insertion.
Repro
Using the MCP tool / KG write path:
Before patch, this fails with an error like:
Local patch that fixed it
I locally changed the regex in
mempalace/config.pyto a Unicode-aware one:After restarting the MCP host, writes with proper Latvian diacritics worked:
PēterisMatīssand KG queries returned them correctly with diacritics preserved.
Suggested fix
Replace the ASCII-only validator with Unicode-aware validation for entity/name fields.
At minimum, MemPalace should allow Unicode letters in:
sanitize_name()Notes / caution
If
sanitize_name()is shared with wing/room/path-like identifiers, it may be worth separating:That would avoid unintentionally broadening constraints in places that really do need tighter ASCII-safe rules.
Expected behavior
Valid human names with Unicode letters should be accepted and stored without transliteration to ASCII.
Actual behavior
Names with diacritics are rejected as invalid characters before the KG write happens.
Environment