Skip to content

Make the glossary regex more deterministic#1801

Merged
amieiro merged 2 commits into
developfrom
glossary-deterministic
Feb 28, 2024
Merged

Make the glossary regex more deterministic#1801
amieiro merged 2 commits into
developfrom
glossary-deterministic

Conversation

@akirk
Copy link
Copy Markdown
Member

@akirk akirk commented Feb 28, 2024

Problem

@amieiro noticed that with the Spanish glossary, a term didn't match that existed in the Glossary. On the other hand, Portuguese, which had the same rivalling terms got it right:
Spanish translations with the wrong glossary match for Add-On

Portuguese translations with the right glossary match for Add-On

The problem arises from the fact that we now group the regex by suffixes. For each English term the suffix is determined and the term sorted into that bin and the regex is built from those bins.

Add-on and Add fall in different bins (the first just with an "s" suffix, the second with possible suffixes "s", "ed", or "ing") but in Spanish the second bin is created first because in the sequence of glossary terms, the word "troubleshooting" that starts the bin is processed before "customization" which crates the "s" bin. In Portuguese its reversed, more or less by chance, because "customization" is processed before "downgrading":

addon-spanish
addon-portuguese

Solution

The proposed solution of making the regex more deterministic so that there are not differences between languages. In this case it fixes the problem but there could be other occurrences where a krsort would make it work. We need to

Testing Instructions

Create a language with a glossary that contains the words "troubleshooting", "customization", "add", and "add-on" and an original that contains the word "Add-on". Before this PR it will only match the "add".

@amieiro amieiro merged commit 7d75933 into develop Feb 28, 2024
@amieiro amieiro deleted the glossary-deterministic branch February 28, 2024 12:05
@pedro-mendonca
Copy link
Copy Markdown
Member

It works here, thanks :)

@psmits1567
Copy link
Copy Markdown

Currently this still does not work for locale nl_NL, as "select" is in our glossary, but "selection" is not. Still GlotPress indicates "selection" as glossary word, which is not true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants