Skip to content

Unicode / diacritics rejected in sanitize_name() for KG + MCP writes #637

@xguntis

Description

@xguntis

sanitize_name() rejects Unicode names with diacritics, blocking valid knowledge graph writes via MCP

Problem

MemPalace 3.1.0 rejects valid Unicode names with diacritics in the MCP / knowledge graph write path, even though the underlying knowledge graph and SQLite storage handle Unicode correctly.

Examples that fail before patch:

  • Pēteris
  • Matīss
  • Ģirts

This makes it impossible to store many real personal names correctly in languages such as Latvian.

Root cause

The blocker appears to be sanitize_name() in mempalace/config.py.

Current regex:

_SAFE_NAME_RE = re.compile(r"^[a-zA-Z0-9][a-zA-Z0-9_ .'-]{0,126}[a-zA-Z0-9]?$")

This is ASCII-only, so names containing Unicode letters are rejected before they ever reach knowledge_graph.py.

Important detail

The issue does not appear to be in the knowledge graph itself.

For example, KnowledgeGraph._entity_id("Pēteris") works and produces a Unicode-safe lowercase ID (pēteris) on my system. SQLite storage also handles Unicode fine.

So the failure is specifically in the validation layer before KG insertion.

Repro

Using the MCP tool / KG write path:

mempalace_kg_add(
    subject="Guntis Endzelis",
    predicate="parent_of",
    object="Pēteris",
    valid_from="2009-10-24"
)

Before patch, this fails with an error like:

object contains invalid characters

Local patch that fixed it

I locally changed the regex in mempalace/config.py to a Unicode-aware one:

_SAFE_NAME_RE = re.compile(r"^(?=.{1,128}$)[\\w][\\w .'-]{0,126}[\\w]?$", re.UNICODE)

After restarting the MCP host, writes with proper Latvian diacritics worked:

  • Pēteris
  • Matīss

and KG queries returned them correctly with diacritics preserved.

Suggested fix

Replace the ASCII-only validator with Unicode-aware validation for entity/name fields.

At minimum, MemPalace should allow Unicode letters in:

  • knowledge graph subjects
  • predicates, where appropriate
  • knowledge graph objects
  • possibly other user-facing name fields validated by sanitize_name()

Notes / caution

If sanitize_name() is shared with wing/room/path-like identifiers, it may be worth separating:

  • a stricter validator for file/path-ish identifiers
  • a Unicode-friendly validator for human names and KG entities

That would avoid unintentionally broadening constraints in places that really do need tighter ASCII-safe rules.

Expected behavior

Valid human names with Unicode letters should be accepted and stored without transliteration to ASCII.

Actual behavior

Names with diacritics are rejected as invalid characters before the KG write happens.

Environment

  • MemPalace 3.1.0
  • Python 3.10
  • OpenClaw MCP integration
  • Linux / WSL2 in this case, but the bug appears to be regex-level and likely cross-platform

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/i18nMultilingual, Unicode, non-English embeddingsarea/kgKnowledge grapharea/mcpMCP server and toolsenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions