Skip to content

feat(i18n): add entity detection to German, Spanish, and French locales#1001

Merged
igorls merged 4 commits intoMemPalace:developfrom
mvalentsev:feat/i18n-de-es-fr-entity
Apr 21, 2026
Merged

feat(i18n): add entity detection to German, Spanish, and French locales#1001
igorls merged 4 commits intoMemPalace:developfrom
mvalentsev:feat/i18n-de-es-fr-entity

Conversation

@mvalentsev
Copy link
Copy Markdown
Contributor

@mvalentsev mvalentsev commented Apr 18, 2026

Summary

Adds entity detection to the de, es, and fr locale JSONs. These three languages shipped in the original 8-language release (baf3c0a, PR #718) with terms / cli / aaak / regex sections but no entity section. Entity detection was moved into per-locale JSON by PR #911, but that refactor only retrofitted en.json. Subsequent locales (ru, it, pt-br, hi, id) arrived with dedicated entity sections, and uk is in-flight in #994; the original non-English 7 stayed entity-less.

For a palace configured with MEMPALACE_LANG=de / es / fr, get_entity_patterns((lang,)) at mempalace/i18n/__init__.py:284 hits not found_any and silently falls back to English regex patterns. After this PR, those locales get locale-specific candidate patterns, verb patterns, pronouns, dialogue markers, direct-address alternations, project-verb patterns, and stopwords.

Scope

In scope: de, es, fr.

Out of scope:

Changes

File Lines added
mempalace/i18n/de.json +82 (entity section appended)
mempalace/i18n/es.json +84
mempalace/i18n/fr.json +82
tests/test_i18n.py +73 (3 smoke tests + 1 schema invariant)

Schema follows the loader contract in mempalace/i18n/__init__.py:191-223, parity with existing en.json / ru.json / it.json / pt-br.json / hi.json / id.json:

  • candidate_pattern (str), multi_word_pattern (str)
  • person_verb_patterns, pronoun_patterns, dialogue_patterns, project_verb_patterns (lists)
  • direct_address_pattern: singular string with |-alternation per loader at :209-210 (the trap that caught feat(i18n): add Ukrainian language support #994)
  • stopwords (list)

Language-specific notes

  • German: every German noun is capitalized, so candidate_pattern [A-ZÄÖÜ][a-zäöüß]{1,19} matches every noun in running text. Stopword coverage is more aggressive than English (days of week, months, common nouns) to keep downstream filtering tractable.
  • French: past tense is typically passé composé (a dit, a demandé), so most person_verb_patterns cover the auxiliary + participle form. Apostrophe-elided articles (l'architecture) are handled literally in a few patterns; native speakers may find contractions that slip through.
  • Spanish: preterite single-word forms (dijo, preguntó, decidió) plus present (piensa, quiere). Reflexive verb endings are covered at a basic level; native review would catch missed inflection forms.

New schema-invariant test

test_direct_address_key_is_singular_string_for_all_locales asserts that any locale declaring direct-address uses the singular direct_address_pattern (str) key, never the plural direct_address_patterns (list). The plural name is the output shape of the merged dict, not the input shape of locale files; declaring plural silently drops every direct-address pattern in that locale. This test would have caught #994 pre-review and guards every future locale.

Verification

  • python -m pytest tests/test_i18n.py tests/test_i18n_lang_case.py -v: 18/18 pass (10 existing i18n smoke + 4 new entity + 4 case-insensitivity)
  • python -m pytest tests/ --ignore=tests/benchmarks: 565 pass, 1 pre-existing env-flake unrelated to i18n (tests/test_mcp_stdio_protection.py::test_module_import_redirects_stdout_to_stderr fails on clean upstream/develop too, due to a missing transitive httpx dependency in my local env)
  • ruff check . clean, ruff format --check . clean
  • E2E loader check: get_entity_patterns(("de","es","fr")) returns non-empty pattern lists per locale (de: 15 verbs / 9 pronouns / 197 stopwords; es: 15/9/190; fr: 15/9/180)
  • Per-locale direct-address sample matching confirms all greeting phrases (hallo Peter, hola María, bonjour Pierre, etc.) match; non-address text (Pierre est arrivé) correctly rejects
  • Per-locale person-verb sample matching: all 15 verbs per locale match

Disclosure

The German, Spanish, and French regex patterns and stopword lists in this PR were drafted by Claude with editorial review on my end for structural consistency against existing en.json / it.json / pt-br.json / ru.json locales. I don't claim native speaker expertise in any of these languages, so translations should be scrutinized before merge, particularly:

  • German: noun capitalization means candidate_pattern matches every noun, producing high candidate noise. Stopword coverage matters more here than in other locales.
  • French: apostrophe/elision handling (j'ai, l', d') in verb patterns is brittle; contractions may break person_verb_patterns.
  • Spanish: reflexive verb endings are covered at a basic level; native review would catch missed inflection forms.

Maintainers or native speakers, if you spot errors, please push corrections directly or flag lines in a review.

Test plan

  • Targeted i18n tests pass
  • Case-insensitivity regression tests pass
  • Full suite passes (excluding benchmarks and the pre-existing env-flake described above)
  • ruff check + ruff format --check clean
  • E2E loader check for de/es/fr returns non-empty pattern lists
  • Native speaker review requested (see Disclosure)

@mvalentsev mvalentsev marked this pull request as ready for review April 18, 2026 17:25
@igorls igorls merged commit 6d42f61 into MemPalace:develop Apr 21, 2026
6 checks passed
jphein pushed a commit to jphein/mempalace that referenced this pull request Apr 24, 2026
Restore-integrity release. Unbreaks fresh `pip install mempalace` from
v3.3.2 by re-tagging current develop, which carries both the plugin.json
consumer (shipped in 3.3.2) and the matching mempalace-mcp entry point
in pyproject.toml (added on develop ~10h after the 3.3.2 tag via MemPalace#340
by @messelink). MemPalace#1093 diagnosed by @jphein.

Bumps (all 5 sources agree per Version Guard / CLAUDE.md):
- mempalace/version.py              3.3.2 → 3.3.3
- pyproject.toml                     3.3.2 → 3.3.3
- .claude-plugin/plugin.json         3.3.2 → 3.3.3
- .claude-plugin/marketplace.json    3.3.2 → 3.3.3
- .codex-plugin/plugin.json          3.3.2 → 3.3.3
- CHANGELOG.md                        new [3.3.3] entry

No code changes. The fix for MemPalace#1093 is already on develop via merged PRs
MemPalace#340, MemPalace#1021, MemPalace#851, MemPalace#942, MemPalace#833, MemPalace#673, MemPalace#661, MemPalace#659, MemPalace#1097, MemPalace#1051, MemPalace#1001,
MemPalace#945.

Branch name intentionally outside the `release/*` ruleset so follow-up
CI-fix commits aren't gated behind a nested PR. (Supersedes MemPalace#1143 —
closed for exactly that reason after it missed 3 of 5 version files.)

Smoke-tested locally from a fresh develop clone:
  grep mempalace-mcp pyproject.toml .claude-plugin/plugin.json   # both ✓
  python -m build --wheel                                        # ✓
  pip install …-py3-none-any.whl                                 # ✓
  which mempalace-mcp                                            # ✓
  mempalace-mcp --help                                           # ✓
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants