Skip to content

Conversation

@dd32
Copy link
Contributor

@dd32 dd32 commented Jul 11, 2023

Problem

Per #1647 large glossaries cause the expression to fail to execute when the glossary item count is high.
This is due to regular expressions having a limit to the overall length.

Solution

This solution is not the most ideal, ideally this should not be using a regular expression at all.

This solution simply combines the regex such that the suffix matches are only included once, rather than individually on every single term.

For Translate.WordPress.org glossaries, it reduces the regex length...

  • en_AU: 6,709 down to 2,765 chars.
  • tw_ZH: 33,362 down to 16,083 chars.
  • de_DE: 8,846 down to 4,225 chars.

Roughly a 50-60% decrease in length.

Initially there were no logical changes I could see, but now that I'm PR'ing it, I can see that there's a "small" change - the terms are no longer sorted by length, as they're only sorted within their "suffix group".

Test Case Regex
Before \b(favorited(?:s|es|ed|ing)?|favorites(?:s|es|ed|ing)?|zip code|url(?:s|es|ed|ing)?)\b
After \b((?:favorited|favorites|url)(?:s|es|ed|ing)?)|(?:zip code))\b

(Yes, those suffixes are wildly inaccurate, but that's not the purpose of this issue/pr)

To-do

  • See if there's unit tests, and/or if anyone else wants to add them.
  • See if anyone wants to write a proper solution not using regex.

Testing Instructions

  1. Apply patch
  2. View Translations
  3. Ensure glossary items are matched

Screenshots or screencast

Copy link
Member

@akirk akirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well, I've added unit tests.

@akirk akirk merged commit d93b07d into GlotPress:develop Jul 11, 2023
pedro-mendonca added a commit to pedro-mendonca/GlotPress that referenced this pull request Aug 26, 2023
* Combine the suffixes for shorter regular expression.

* Whitespace

* Add unit tests

---------

Co-authored-by: Alex Kirk <[email protected]>
# Conflicts:
#	tests/phpunit/testcases/test_template_helper_functions.php
amieiro added a commit that referenced this pull request Jan 10, 2024
* Add 'ed', 'ion', and 'ing' to suffix list for glossary term ending with 'e'

* Add known suffixes for nouns and verbs

* Revert previous commits to further rebase

* Add known suffixes for nouns and verbs

* Replace the whole list of known plurals by part_of_speech.

* Only trims ending pipe

* Check if key term is set and is an array

* Make suffix doc more clear

* Add unit tests for suffixes

* Fix tests descriptions

* Rename variables for a generic approach for nouns and verbs

* Improve list of Nouns formed by verbs with the suffix -ion

* Move 'changes' to the pairs ending => change

* Improve suffixes changes documentation

* Test verbs that form nouns ending with -tion

* Test verbs that form nouns ending with -sion

* Remove unnecessary code

* Fix endings documentation and some formatting

* Apply coding standards

* Add link to the plurals formation documentation

* Improve suffix comment

* Don't duplicate endings to glossary entry suffixes.

* Combine the suffixes for shorter regular expression. (#1651)

* Combine the suffixes for shorter regular expression.

* Whitespace

* Add unit tests

---------

Co-authored-by: Alex Kirk <[email protected]>
# Conflicts:
#	tests/phpunit/testcases/test_template_helper_functions.php

* We don't need it anymore because the match checks what precedes the ending

* Fix part of speech for its own set of endings

* Make the Suffix list filterable.

* Rename filter and improve description

* Fix typo in tests part_of_speech introduced in PR #1706

* Remove matching for irregular Nouns ending with '-an'

* Improve legibility and consistency of dockblocks comments

* Add support for verbs ending with single sibilant '-s' (bias)

* Fix error in ending size

* Improve glossary matches doc comments

* Change matches array keys order, preceded first

* Use actual preceded for any letters instead of null value.

* Add support for placeholders in the endings, to allow reuse of the current ending

* Add matches that double the ending consonant

* Add check_map_glossary() to test different glossary matches

* Fix comment.

* Add all the glossary matches tests using check_map_glossary()

* Remove irregular Noun plural match

* Add test for verb focus

* Fixed comments

* Add data files for manual testing

* Remove ending newlines in example data file

* Add glossary matching case example in docblock

* Remove glossary suffix .pot and .csv example files

* Add check_map_glossary() docblock types

---------

Co-authored-by: Jesús Amieiro Becerra <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants