feat(i18n): add Ukrainian language support by alpiua · Pull Request #994 · MemPalace/mempalace

alpiua · 2026-04-18T15:25:06Z

feat(i18n): Add Ukrainian language support

Description

This PR adds comprehensive native Ukrainian language support to the MemPalace entity detection pipeline by introducing the uk.json locale.

Motivation & Context

With the introduction of the multi-language JSON-based entity detection architecture in v3.3.1, adding robust localization is now entirely configuration-driven. This PR introduces the BCP 47 uk mappings to natively handle Cyrillic Ukrainian inputs, enabling teams working with Ukrainian documentation or notes to fully utilize agentic memory and the knowledge graph extraction.

Key Additions

Complete Lexical Patterns: Full coverage of Ukrainian pronouns, dialogue markers (e.g., сказав, прокоментував), action verbs (e.g., відчуває, вирішив), and direct addresses (e.g., добрий день {name}).
Hybrid Character Classes: The candidate_pattern leverages the combined [А-ЯІЇЄҐA-Z][а-яіїєґ'a-z] regex range. Since Ukrainian workspaces often mix English tech terminology with Ukrainian prose, this guarantees that Latin project names embedded inside Cyrillic sentences won't be dropped by the candidate extractor.
Project Context Synonyms: Native action patterns dedicated to IT projects: репозиторій, пайплайн, задеплоїв, деплою, розгорнув etc., dynamically mapped to the {name} variable.
AAAK Prompt Localization: Translated the AAAK indexing instruction to direct an LLM in native Ukrainian for better context retention when compressing documents.

How to Test

Verified locally by running detect_entities against markdown files containing conversational Ukrainian mixed with English tool names, successfully categorizing Cyrillic names and Latin project identifiers.

mvalentsev · 2026-04-18T16:07:09Z

uk.json declares direct_address_patterns (plural, list of 10 regexes) under entity. The loader at mempalace/i18n/__init__.py:209-210 only reads the singular key direct_address_pattern as a single |-alternation string; the plural key is never consulted. Every working locale (en, ru, it, pt-br, hi, id) uses the singular form. Effect after merge: all 10 direct-address patterns in this PR are silently dropped, and Ukrainian users get no direct-address entity detection until the schema matches.

Fix is collapsing the 10 alternatives into one regex and renaming the key:

"direct_address_pattern": "\\bпривіт\\s+{name}\\b|\\bдякую\\s+{name}\\b|\\bдобридень\\s+{name}\\b|\\bдобрий\\s+день\\s+{name}\\b|\\bшановний\\s+{name}\\b|\\bшановна\\s+{name}\\b|\\bдорогий\\s+{name}\\b|\\bдорога\\s+{name}\\b|\\bhey\\s+{name}\\b|\\bhi\\s+{name}\\b"

Minor, separately: two person_verb_patterns have a literal hyphen in their alternation group:

"\\b{name}\\s+засмія(-|в|л)(ся|ась|ись)?\\b"
"\\b{name}\\s+посміхну(-|в|л)(ся|ась|ись)?\\b"

(-|в|л) matches a literal - or в or л. Natural past-tense forms are засміявся / посміхнувся without a hyphen, so the intended group is (в|л).

Otherwise topic_pattern, quote_pattern («»), candidate_pattern, and the stop_words / stopwords split align with ru.json. Once the direct-address key is corrected the file should load cleanly.

alpiua · 2026-04-18T18:03:12Z

@mvalentsev sorry, missed that.
thank you for checking.

igorls · 2026-04-21T03:57:30Z

Thanks for the Ukrainian locale — content is structurally solid, and the hybrid Cyrillic+Latin candidate_pattern is a nice touch for UK/English mixed tech prose. Two small things before I can merge:

Drop multi-word stopwords. "будь ласка" in the stopwords list can never fire — the BM25 tokenizer splits on whitespace (\w{2,} tokens), so space-containing entries don't match any single token. Either split it into separate single-word entries or remove it. Same for any other multi-word phrases in the list.
Add a uk sample to tests/test_i18n.py::test_dialect_compress_samples — every other shipped locale has one, and it's a useful smoke test that Dialect round-trips your locale correctly. Something like:
```
"uk": "Ми вирішили перейти з SQLite на PostgreSQL для кращого паралельного запису. Бен учора схвалив PR.",
```

I'll kick off CI once you push the fixes.

- Drop 5 dead entries from entity.stopwords: - 4 single-char ("a", "i", "o", "u") never match candidates that require >=2 chars per candidate_pattern - 1 multi-word ("si us plau") never matches single-token splits - Add "ca" sample to test_dialect_compress_samples so the locale is exercised on every CI run Same shape as the fixes applied to MemPalace#994 (uk locale).

alpiua requested review from bensig and milla-jovovich as code owners April 18, 2026 15:25

mvalentsev mentioned this pull request Apr 18, 2026

feat(i18n): add entity detection to German, Spanish, and French locales #1001

Merged

6 tasks

alpiua force-pushed the feat/i18n-uk branch from 88532c2 to d97fc1f Compare April 18, 2026 17:55

igorls added enhancement New feature or request area/i18n Multilingual, Unicode, non-English embeddings labels Apr 24, 2026

mvalentsev mentioned this pull request May 4, 2026

feat(i18n): add Catalan translation #1336

Open

6 tasks

feat(i18n): add Ukrainian language support

b7c0565

alpiua force-pushed the feat/i18n-uk branch from d97fc1f to b7c0565 Compare May 6, 2026 13:55

alpiua requested a review from igorls as a code owner May 6, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(i18n): add Ukrainian language support#994

feat(i18n): add Ukrainian language support#994
alpiua wants to merge 1 commit intoMemPalace:developfrom
alpiua:feat/i18n-uk

alpiua commented Apr 18, 2026

Uh oh!

mvalentsev commented Apr 18, 2026 •

edited

Loading

Uh oh!

alpiua commented Apr 18, 2026

Uh oh!

igorls commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

alpiua commented Apr 18, 2026

feat(i18n): Add Ukrainian language support

Description

Motivation & Context

Key Additions

How to Test

Uh oh!

mvalentsev commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alpiua commented Apr 18, 2026

Uh oh!

igorls commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mvalentsev commented Apr 18, 2026 •

edited

Loading

igorls commented Apr 21, 2026 •

edited

Loading