Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve a bug with the glossary variations #1706

Merged
merged 5 commits into from
Oct 5, 2023

Conversation

amieiro
Copy link
Member

@amieiro amieiro commented Sep 28, 2023

Problem

In #1696 I introduced a bug with the glossary variations, not detecting them. You can see it in the next screenshot.

image

Solution

This PR updates the regular expression that splits the original string with the glossary terms and with the placeholders.

image

Testing Instructions

I have added two new tests to avoid this error another time:

  • test_map_glossary_entries_with_variations.
  • test_map_glossary_entries_with_variations_and_placeholders.

The load time using this method is very similar:

  • Before PR. Mean load time: 733.000 milliseconds
  • After PR. Mean load time: 723.000 millisecond

@amieiro amieiro requested review from akirk and trymebytes September 28, 2023 12:25
Copy link
Member

@akirk akirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test doesn't test the bug, it passes both with and without the change to gp-templates/helper-functions.php.

@pedro-mendonca
Copy link
Member

pedro-mendonca commented Sep 28, 2023

The glossary variations are matched in an enterely different way on #1373
Can you please check if this problem exists with that code?

@amieiro
Copy link
Member Author

amieiro commented Sep 28, 2023

@pedro-mendonca I detected this problem when I was setting up my environment with your .pot file and your glossary CSV, but with the develop branch, so I think it will work in your PR.

Develop branch

This is the regex using Pedro's glossary

(%(?:[1-9]\$)?[bcdefgosuxEFGX%l@])|(\b(?:abbreviate|contravene|contribute|authorize|subscribe|supervise|compress|disperse|perceive|abstain|compose|confuse|convert|correct|encrypt|exclaim|exclude|explode|precede|prevent|resolve|decide|delete|emerge|extend|format|ignite|insert|invade|resume|submit|tiptoe|tomato|agree|coach|expel|recur|waltz|watch|care|dish|edit|hero|kiss|lens|make|pass|pull|push|quiz|box|bus|dye|fix|see|do|go)(?:s|es|ed|ing)?|(?:liquef|specif|dela|lad|pla|gu|ke|to|tr)(?:y|ies|ys)?|(?:che|lea|wol)(?:f|ves)?|(?:wom)(?:an|en)?|(?:wi)(?:fe|ves)?\b)

and this is the matched string, the same that I showed in the first screenshot in this PR:

image

The regex from the unit test test_map_glossary_entries_with_variations_and_placeholders():

(%(?:[1-9]\$)?[bcdefgosuxEFGX%l@])|(\b(?:convert|see)(?:s|es|ed|ing)?|(?:dela|gu|ke|to)(?:y|ies|ys)?\b)

and this is the matched string (different from the previous screenshot):

image

This PR

With this PR, this is the regex using Pedro's glossary

(%(?:[1-9]\$)?[bcdefgosuxEFGX%l@])|\b((?:abbreviate|contravene|contribute|authorize|subscribe|supervise|compress|disperse|perceive|abstain|compose|confuse|convert|correct|encrypt|exclaim|exclude|explode|precede|prevent|resolve|decide|delete|emerge|extend|format|ignite|insert|invade|resume|submit|tiptoe|tomato|agree|coach|expel|recur|waltz|watch|care|dish|edit|hero|kiss|lens|make|pass|pull|push|quiz|box|bus|dye|fix|see|do|go)(?:s|es|ed|ing)?|(?:liquef|specif|dela|lad|pla|gu|ke|to|tr)(?:y|ies|ys)?|(?:che|lea|wol)(?:f|ves)?|(?:wom)(?:an|en)?|(?:wi)(?:fe|ves)?)\b

and this is the matched string, the same that I showed in the second screenshot in this PR:

image

The regex from the unit test test_map_glossary_entries_with_variations_and_placeholders():

(%(?:[1-9]\$)?[bcdefgosuxEFGX%l@])|\b((?:convert|see)(?:s|es|ed|ing)?|(?:dela|gu|ke|to)(?:y|ies|ys)?)\b

and this is the matched string:

image

This PR is need, because currently the variations are not well managed, but I didn't add enough different items in the glossary to make fail the previous code with the new tests. I have added a few items to the glossaries in one test, so the previous code will fail with the test_map_glossary_entries_with_variations_and_placeholders() test. Good catch @akirk

@amieiro amieiro requested a review from akirk September 28, 2023 17:08
@amieiro amieiro added the [Type] Bug An existing feature is broken. label Sep 30, 2023
@amieiro amieiro self-assigned this Oct 5, 2023
Copy link
Member

@akirk akirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again, now the unit test covers the change. Thank you!

@amieiro amieiro merged commit 2d361ec into GlotPress:develop Oct 5, 2023
pedro-mendonca added a commit to pedro-mendonca/GlotPress that referenced this pull request Oct 6, 2023
amieiro added a commit that referenced this pull request Jan 10, 2024
* Add 'ed', 'ion', and 'ing' to suffix list for glossary term ending with 'e'

* Add known suffixes for nouns and verbs

* Revert previous commits to further rebase

* Add known suffixes for nouns and verbs

* Replace the whole list of known plurals by part_of_speech.

* Only trims ending pipe

* Check if key term is set and is an array

* Make suffix doc more clear

* Add unit tests for suffixes

* Fix tests descriptions

* Rename variables for a generic approach for nouns and verbs

* Improve list of Nouns formed by verbs with the suffix -ion

* Move 'changes' to the pairs ending => change

* Improve suffixes changes documentation

* Test verbs that form nouns ending with -tion

* Test verbs that form nouns ending with -sion

* Remove unnecessary code

* Fix endings documentation and some formatting

* Apply coding standards

* Add link to the plurals formation documentation

* Improve suffix comment

* Don't duplicate endings to glossary entry suffixes.

* Combine the suffixes for shorter regular expression. (#1651)

* Combine the suffixes for shorter regular expression.

* Whitespace

* Add unit tests

---------

Co-authored-by: Alex Kirk <[email protected]>
# Conflicts:
#	tests/phpunit/testcases/test_template_helper_functions.php

* We don't need it anymore because the match checks what precedes the ending

* Fix part of speech for its own set of endings

* Make the Suffix list filterable.

* Rename filter and improve description

* Fix typo in tests part_of_speech introduced in PR #1706

* Remove matching for irregular Nouns ending with '-an'

* Improve legibility and consistency of dockblocks comments

* Add support for verbs ending with single sibilant '-s' (bias)

* Fix error in ending size

* Improve glossary matches doc comments

* Change matches array keys order, preceded first

* Use actual preceded for any letters instead of null value.

* Add support for placeholders in the endings, to allow reuse of the current ending

* Add matches that double the ending consonant

* Add check_map_glossary() to test different glossary matches

* Fix comment.

* Add all the glossary matches tests using check_map_glossary()

* Remove irregular Noun plural match

* Add test for verb focus

* Fixed comments

* Add data files for manual testing

* Remove ending newlines in example data file

* Add glossary matching case example in docblock

* Remove glossary suffix .pot and .csv example files

* Add check_map_glossary() docblock types

---------

Co-authored-by: Jesús Amieiro Becerra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Type] Bug An existing feature is broken.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants