Skip to content

feat(i18n): normalize translation files structure and patch zh-TW translations#247

Merged
jundot merged 6 commits intojundot:mainfrom
xiaoran007:feat(i18n)/normalization
Mar 21, 2026
Merged

feat(i18n): normalize translation files structure and patch zh-TW translations#247
jundot merged 6 commits intojundot:mainfrom
xiaoran007:feat(i18n)/normalization

Conversation

@xiaoran007
Copy link
Copy Markdown
Contributor

@xiaoran007 xiaoran007 commented Mar 15, 2026

This PR standardizes the structure of all i18n translation files under omlx/admin/i18n/ by treating en.json as the single source of truth.

Over time, keys were added or updated in en.json without being consistently synchronized to other locale files, which led to key drift and structural inconsistencies across translations. This PR addresses that maintenance issue by introducing a normalization utility and applying it to the current locale files. It also fills in missing entries for Traditional Chinese.

Changes made

1. Added scripts/normalize_i18n.py

This developer utility normalizes locale files against en.json.

It:

  • uses en.json as the baseline schema
  • aligns key structure and ordering across locale files
  • fills missing keys with English fallback values
  • removes deprecated extra keys
  • Update based on review: The script now strictly relies on the standard Python json library instead of regex for better robustness and maintainability. All JSON files are now formatted with a standard 2-space indentation.

2. Normalized existing locale files

Applied the normalization script to:

  • zh.json
  • zh-TW.json
  • ja.json
  • ko.json

This produces a one-time large diff because the current files are being brought into a consistent canonical structure.

3. Completed missing Traditional Chinese entries

I also identified 10 keys missing from zh-TW.json that already existed in zh.json, and added Traditional Chinese translations for them.

Notes for reviewers

The large diff is primarily caused by the initial normalization pass (reordering / restructuring to match en.json), not by broad semantic translation changes. This should be mostly a one-time cleanup and should reduce churn in future i18n updates.

Related Issues

None.

Type of Change

New feature / Enhancement (non-breaking change which adds functionality or improves DX)

Checklist:

No tests required for i18n JSON files/developer scripts, and I have performed a self-review

Copy link
Copy Markdown
Owner

@jundot jundot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the normalization work and the zh-TW translations. Couple of things i noticed:

1. the normalize script parses JSON with regex instead of a JSON parser

scripts/normalize_i18n.py processes locale files line by line using a regex pattern (r'^(\s*)"([^"]+)"(\s*:\s*)(.*)$'). This works for the current flat key-value structure but it's fragile. If the i18n files ever get nested objects or arrays, this breaks silently. Using json.load() to read and json.dump() with sort_keys (or a custom key order from en.json) would be more robust and still preserve the key ordering goal.

2. 2-space to 4-space indent change

The normalization changes all locale files from 2-space to 4-space indentation because it mirrors en.json's formatting. But 2-space is the more common convention for JSON files. If the goal is consistency across all locale files, would it make more sense to update en.json to use 2-space instead? That way the diff for ja/ko/zh files becomes purely key reordering without the indent noise.

@xiaoran007
Copy link
Copy Markdown
Contributor Author

Thanks for the great feedback! I agree with both points.

  1. For the Regex Parsing: I've removed the regex logic entirely. The normalize_i18n.py script now strictly uses json library.
  2. For the Indentation: I've updated en.json to use 2-space indentation, and configured the script to output all files with indent=2. This significantly reduced the diff noise and properly aligns everything with standard JSON conventions.

(bwt, I originally used regex trying to preserve the empty lines used for visual grouping in en.json, but I agree that standard and safe JSON serialization is much more important for maintainability).

I've pushed the updated commits.

@jundot jundot force-pushed the main branch 7 times, most recently from f6faf2f to c2beead Compare March 21, 2026 05:58
@jundot
Copy link
Copy Markdown
Owner

jundot commented Mar 21, 2026

Looks good, thanks for the updates. Merging this now.

I noticed a couple minor things but i'll handle them in a follow-up commit:

  • en.json still has blank lines between sections but the other locale files don't (since json.dump strips them). Will align these.
  • The script silently drops keys that exist in locale files but not in en.json. Will add a warning for that.

@jundot jundot merged commit bf3a3f2 into jundot:main Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants