Improved I18n::translateCount() method #2432

afbora · 2020-02-08T14:35:18Z

Describe the PR

Support for translation callbacks in I18n::translateCount()
Support for an arbitrary number of count strings per translation in I18n::translateCount()
I18n::translateCount() now replaces the {{ count }} even if just a single translation string is defined

Related issues

Fixes I18n::translateCount() does not work with languages with more than 3 cardinal forms #2430
Fixes I18n::translateCount() doesn't replace {{ count }} for single-string values #2431

Ready?

Added unit tests for fixed bug/feature
Passing all unit tests
Fixed code style issues with CS fixer and composer fix
Added in-code documentation (if needed)

neildaniels · 2020-02-09T04:42:40Z

This is not how I would suggest addressing #2430.

In the case of a Irish, for example, you would need to define literally 11 different forms (one for 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11) to cover all cases, but only five should be necessary.

In the case of Polish, there are 4 rules:

1
ends in 2, 3, 4; excluding 12-14
every other integer
any decimal
(Though, Kirby’s current number format only accepts integers, so the last rule is somewhat irrelevant.)

In this case, the current mechanics would make it infeasible to cover every number, such as `544B, which should use the second rule.

For maximum compatibility, I would suggest that translation strings should take an associative array that can have the following keys:

zero
one
two
few
many
other

These are taken from Unicode’s Language Plural Rules.

Polish example:

months:
  one: {{ count }} miesiąc
  few: {{ count }} miesiące
  many: {{ count }} miesięcy
  other: {{ count }} miesiąca # this deals with decimals, and is thus kind of irrelevant today because Kirby’s only accepts integers

My background is from an iOS programming, and the localization toolkit on Apple platforms already has the pluralization rules built into the locales so that feeding it knows which key to select. This is what ngettext() is built to do.

I suspect the rules are already available on most PHP system, but if they’re not, there will need to be some kind of function (per Kirby language) that would accept the number to be formatted and that function would return a key such as one, few, many, other, etc.

function polishPlural($n) {
    switch ($n) {
          case is_float($n):
               return 'other';
          case 1:
               return 'one';
          case (in_array($n % 10, [2, 3, 4]) && !in_array($n, [12, 13, 14])):
               return 'few';
          default:
               return 'many';
    }
}

afbora · 2020-02-09T07:05:57Z

Thank you for all details. I understand, for now I wrote the simpler and more comprehensive first version already as a draft. If really needed, more advanced method will be written according to your detailed explanations.

afbora · 2020-02-16T15:14:27Z

@lukasbestle What do you think about that?

lukasbestle

I'd say this is already a huge improvement over the old implementation and still fully backwards-compatible. I like that it is still pretty simple.

ngettext() would require us to rewrite basically the whole translation management and a custom mapping per language would be quite complex and would require the users to actually write those mapping rules even for languages that don't have as complex counting rules.

We could still implement optional custom mapping rules on top like @neildaniels suggested (so that the rules would be used if they are defined and otherwise the key would be determined by the number itself). Basically the pseudo-code would look like this:

if (translation has 'countMapping' key and 'countMapping' is PHP Closure) {
  $count = $countMapping($count);
}

Still pretty simple.

neildaniels · 2020-02-19T21:37:59Z

If using the Mozilla documentation as a guide (which is based off the GNU gettext documentation), you'll note that there are only 20 distinct "rules" that sould ever need to be implemented (Rule 0 through Rule 19).

The GNU documentation and Unicode table provide pseudocode (or, actual C code) for these rules (they don't seem to perfectly match up with the Mozilla docs though).

Would it be reasonable to have Kirby include support for all 20 rules?

lukasbestle · 2020-02-20T06:30:09Z

The idea is to support any number of rules the dev needs. You can return any string from the countMapping callback and that string will determine the plural form to use.

What we can‘t do is to ship a list of all possible languages and their plural rules with Kirby – that would be just too much data for this little feature. That‘s why I think that the custom callback approach is a good and very flexible compromise.

neildaniels · 2020-02-20T06:45:01Z

Yes, but what I’m saying is there’s only 20 possible plural rules across all the languages.

In other words: there will only ever be 20 possible correct implementations of the callback method. And each of those implementations have fairly trivial rules.

What I’m not suggesting is mapping each language to one of those 20 rules.

afbora · 2020-02-20T07:40:37Z

@neildaniels Yes that would be great. When i think again, this seems to me to be a very comprehensive development for a simple feature.

I looked at what the huge framework like Laravel uses. The structure it uses is quite simple:
https://laravel.com/docs/6.x/localization#pluralization

'minutes_ago' => '{1} :value minute ago|[2,*] :value minutes ago',

echo trans_choice('time.minutes_ago', 5, ['value' => 5]);

It is already quite comprehensive now.
With @lukasbestle proposed callback, developers will be able to extend as they wish.
I think this is a great idea.

lukasbestle · 2020-02-20T08:06:33Z

The issue with implementing the 20 rules is that we can‘t assume that the users know what those rules are for, how they work and which one to choose for their languages. A callback that allows a custom mapping is a bit more work for experienced devs to implement, but easier to understand in general.

afbora · 2020-02-20T09:04:14Z

@lukasbestle I preferred to use direct callback instead of countMapping translation key. How does it look?

nilshoerrmann · 2020-02-20T09:08:05Z

The issue with implementing the 20 rules is that we can‘t assume that the users know what those rules are for, how they work and which one to choose for their languages.

@lukasbestle: But this is something the user has to learn nonetheless, even if there are only 3 rules (which is a very western assumption, by the way). Having a callback with custom mapping is a nightmare for anyone who is not a programmer.

The full Laravel syntax in the linked docs is very nice by the way, @afbora.

#2432

afbora · 2020-02-20T09:16:41Z

@nilshoerrmann Yes, I agree that the callback feature is not easy for non-developers.
But for non-developers, we have an easy syntax, if not as much as Laravel.

'apply' => [
    'No Apple',
    '{{ count }} Apple',
    '{{ count }} Apples',
    'Many Apples'
]

nilshoerrmann · 2020-02-20T09:31:25Z

@afbora I know, but I was referring to the extended plural options discussed in this thread. The above example can only deal with the variants required by most Western European languages. I just wanted to support @neildaniels's request to support all those 20 rules.

lukasbestle · 2020-02-20T11:28:17Z

The issue I see with the 20 rules is that they don‘t have names and it is therefore not obvious if the language you are working with needs rule 4 or 18 or whatever. That‘s an unnecessary barrier unlike a flexible implementation that is potentially more effort, but can be configured exactly like the user needs it.

lukasbestle

I like the simplicity of this. However this means that every single translation key would need to be implemented as a Closure. This is an issue if the translations are imported from a translation tool. The code would have to be written by hand for each translation.

I think we could still keep the direct callback as another option for those who want to use it. But we should still have support for a global mapping callback.

Basically it would work like this:

If the translation itself is a Closure, use its return value directly and ignore the mapping callback (like in your current implementation).
Otherwise check if a mapping callback is defined and use its return value as the $count.
Otherwise use the $count directly.

lukasbestle · 2020-02-23T11:25:51Z

Another idea: We could have a public static $countMappings = [] array that could be extended by plugins (associative array with named keys and Closure values). The countMapping key of each language could then also be set to a string with the mapping rule key from this array.

There could then be a third-party plugin that implements the 20 language rules. What do you think @nilshoerrmann @neildaniels?

afbora · 2020-02-23T13:36:51Z

@lukasbestle I tried to implement your global mapping idea 👍

neildaniels · 2020-03-03T18:44:13Z

Another suggestion:

There should be automatic handling that maps 0 to being zero. For a vast majority of languages, there is no strict need for a special 0 case (such as in English), but the 0 case is functionally useful to call out because it's inherently nothing (e.g. "No cars", as opposed to "0 cars").

While a custom countMappings closure could provide a zero case, it'll end up yielding incorrect results for certain languages if you don't want a special zero case.

Take French for example, there are two cases:

'one': which covers both 0 and 1
- e.g.: "0 jour", "1 jour"
'other': which covers everything else
- e.g.: "2 jours", "55 jours"

For example, given this translation setup for French:

'car' => [
     'zero' => 'pas de voitures',
     'one' => '{{ count }} voiture',
     'other' => '{{ count }} voitures'
 ],
'day' => [
     'one' => '{{ count }} jour',
     'other' => '{{ count }} jours'
 ],
 'countMapping' => function ($count) {
     switch ($count) {
         case 0:
             return 'zero';
         case 1:
             return 'one';
         default:
             return 'other';
     }
 }

It'll end up yielding the wrong translation for the "0 days" case:

echo tc('car', 0); // prints "pas de voitures"
echo tc('day', 0); // prints "0 jours", but should be "0 jour"

I would suggest confining the countMapping to what is strictly necessary for the language, and letting zero be a special case.

This is how Apple's platforms handle translations, and I that's worked very well in my experience.

Thus, I would suggest this:

'car' => [
     'zero' => 'pas de voitures',
     'one' => '{{ count }} voiture',
     'other' => '{{ count }} voitures'
 ],
'day' => [
     'one' => '{{ count }} jour',
     'other' => '{{ count }} jours'
 ],
 'countMapping' => function ($count) {
     switch ($count) {
         case 0:
         case 1:
             return 'one';
         default:
             return 'other';
     }
 }

which would print correctly (assuming zero is specially handled):

echo tc('car', 0); // prints "pas de voitures"
echo tc('day', 0); // prints "0 jour"

lukasbestle · 2020-03-04T20:35:12Z

@neildaniels To be honest, I'm still not convinced that we need to constrain the user and reduce the flexibility by implementing a fixed set of options and even special cases.

If you follow our way of doing things, you will know that we strive to make Kirby features as simple as possible. Kirby is not a huge framework made up of specialized components with dozens of options for the tiniest edge-cases – instead it's a tool made up of simple building blocks you can extend yourself if needed. The reason for this goal is that we want Kirby to be simple for beginners and powerful for advanced users at the same time.

What you are describing in your last two comments and also during our discussion is behavior that actually needs documentation. Our approach is different: We try to build features in a way that can be understood as quickly as possible with as minimal external information as needed. If you had to actually study how each feature works in detail before you can get started at all, it wouldn't be Kirby anymore.

Your suggestions have presuppositions built in (the naming, the zero fallback etc.) – we want to avoid that as much as possible.

neildaniels · 2020-03-04T22:02:17Z

I'm merely giving feedback as someone who has had experience working on internationalization projects on other platforms, including working with professional translators.

The suggestions I'm giving are genuinely rooted in how I've seen many people approach localization/translations. Lots of software and translators have adopted Unicode's keywords of zero, one, two, few, many, and other. It works.

I appreciate Kirby's general attitude of "constrain as little as possible," but I don't necessarily agree that internationalization is an area where that is actually beneficial. There is not an infinite number of languages, nor are there an infinite number of language rules. I cannot imagine a scenario where someone would ever take advantage of arbitrary plural rules—other than to avoid reading documentation and hope they get it right. Practically speaking, this isn't an area of "creative freedom" that should be left up developers.

The nuances of languages inherently make this kind of work tricky. Yes, there are edge cases. But because of that—in my mind—that's exactly why a framework should be able to accommodate and document those edge cases. If plurals were actually straightforward, Unicode wouldn't have an entire document talking about plurals.

My original issue #2430 was rooted in the fact that Kirby's current system legitimately did not work for certain languages. I think that's clear, and thankfully that strictly has been fixed with this PR. Thank you @afbora for this—I appreciate it.

I think we need to accept that there will be necessary documentation to use this feature. If my suggestion about other being the default is not accepted, then the fact that other should be the last key is—to me—something that must be documented.

@neildaniels In fact, translations and countMapping() is a user-controlled area. If user sets the last index to other it will return as correct.

I don't think many developers actually want to think much about localization more than they have to (especially if they're unfamiliar with the languages). I also don't think many professional translators want to have to worry about the format they give their translations in.

If translations were something that were entirely and always up to a developer to do, I think I could get behind letting Kirby do as little as possible. But I suspect the reality is: most developers won't be the ones do the translations and when they get an email about Language X has some "weird" rules about plurals, they're going to either:

Not notice that Kirby even has a translateCount() method at all, and come up with some workaround.
See that it exists, but be discouraged because they need to figure out what the rules are and have to write them.

I think having documentation (that gives a primer about why this method actually exists and points to Unicode's Plural Rules) isn't a concession or sacrifice. It's a strong hint that thought went into providing a useful implementation that doesn't require you to worry about the edge cases.

I will agree that the zero case is not strictly necessary for correctness, but you do sacrifice the ability to have this practical feature for some languages. It's a feature that makes a great translation system shine.

My takeaway from some of the comments is that I should just go build my own utopian function that does everything I want. That utopian version would have like 3 more lines of code and have the language rules baked into it. But, if that's the outcome, so be it.

lukasbestle · 2020-03-05T06:38:12Z

Don‘t get me wrong: There should definitely be documentation for this feature (and there will be). 🙂

@bastianallgeier @distantnative What do you think about the different ways of implementing this?

neildaniels · 2020-03-05T08:49:37Z

As an aside: I went through Unicode's list and made PHP implementations for nearly every language.

It's missing 3 languages: lv, prg, si.
- I can't confidently interpret the rules for lv or prg. I wouldn't want a developer trying to simply use a set of translations to have to worry about having to do this themselves.
A number of decimal checks misuse fmod() and therefor don't work correctly. I need to correct this.
- I misunderstood how fmod() worked: I incorrectly assumed something like fmod(12.34, 10) would return 4 (treating the fractional amount as if it was an integer, and ignoring the actual integer part); it actually returns 2.34.
There's probably some other errors.

https://gist.github.com/neildaniels/abbbe67519d33509d3ef85744eefc0ee

I think Kirby could integrate in these rules and free developers from ever having to write this boilerplate code. But, even if they did, you'd end up with basically the same logic.

Also: consider if a plugin starts to offer translations with plurals variations. Even with this current PR, you have to hope that the plugin offering the translations will be compatible with a site's plural rules closure. In that sense, there necessarily will become a de facto way to handle plurals.

distantnative · 2020-03-05T09:36:24Z

I'm trying to take a step back here:
I think that the whole i18n classes part of Kirby isn't great at the moment. And we could offer better support there for pluralization etc.

But should we really reinvent the wheel for all of this? I would think that this could be a very useful case for the few times where we take a good library on board. So I would be much more interested in figuring out if there is a good library that fulfils our/all these requirements.

afbora · 2020-03-09T08:35:41Z

If i18n will be refactored and this will not happen anytime soon, we can take the first commit of PR as step one 🤔

nilshoerrmann · 2020-03-11T09:26:10Z

Is everyone aware of this plugin by Oblik Studio:
https://github.com/OblikStudio/kirby-plurals

neildaniels · 2020-03-11T21:35:57Z

I was not, but that looks perfect.

lukasbestle · 2020-03-19T20:55:08Z

@nilshoerrmann Oh, that's awesome! I didn't know there was already a plugin for this. A plugin makes a lot more sense in my opinion.

So maybe we should simplify this PR by reducing it to the useful smaller improvements. Which would mean: Remove the count mapping stuff, but keep the rest (translation callbacks, support for the count placeholder for simple strings and flexible number of keys). What do you think @neildaniels @afbora?

afbora · 2020-03-19T21:06:51Z

@lukasbestle #2432 (comment) As I said before, I think we should continue by simplifying it like your said 👍. We can take the first two commits.

lukasbestle · 2020-03-19T21:18:26Z

Some of the improvements in the last commit still apply. I will update and finalize the PR this weekend. :)

neildaniels · 2020-03-20T03:11:17Z

Sounds good to me.

bastianallgeier · 2020-03-27T16:12:38Z

Sorry for skipping the long discussion. What's the state of this? @lukasbestle?

distantnative · 2020-03-28T10:35:21Z

@bastianallgeier A general disagreement about how much of singular-plural rules for all languages should be integrated into the core, whether we should just offer a general callback method, or whether this should rather be handled by a good library (maybe as part of a general i18n refactoring).

lukasbestle · 2020-03-29T20:13:22Z

No, I think we have agreement now. I just haven't found the time so far to finalize this. But it's on my list for the next days.

lukasbestle

This PR now only includes the minor improvements we have discussed. All more complex features are covered by the kirby-plurals plugin.

nilshoerrmann · 2020-04-05T17:05:10Z

Maybe it would be interesting to ask @OblikStudio for a cookbook entry so the plugin gets a bit more coverage?

afbora added type: enhancement ✨ labels Feb 8, 2020

afbora requested review from bastianallgeier, distantnative and lukasbestle February 8, 2020 14:35

afbora self-assigned this Feb 8, 2020

afbora added the needs: discussion 🗣 Requires further discussion to proceed label Feb 16, 2020

lukasbestle reviewed Feb 19, 2020

View reviewed changes

afbora removed the needs: discussion 🗣 Requires further discussion to proceed label Feb 20, 2020

afbora added a commit that referenced this pull request Feb 20, 2020

Apply CSFixer

bfba737

#2432

afbora removed the missing: test 🔍 label Feb 20, 2020

afbora marked this pull request as ready for review February 20, 2020 13:53

lukasbestle reviewed Feb 22, 2020

View reviewed changes

lukasbestle force-pushed the fix/2430-improve-translateCount-method branch from f7e21db to e1f9089 Compare February 23, 2020 14:26

distantnative removed the status: to review 👀 label Mar 5, 2020

distantnative removed this from the 3.3.5 milestone Mar 5, 2020

distantnative added the needs: discussion 🗣 Requires further discussion to proceed label Mar 5, 2020

afbora removed the needs: discussion 🗣 Requires further discussion to proceed label Mar 31, 2020

lukasbestle added this to the 3.4.0 milestone Apr 5, 2020

Improved I18n::translateCount() method #2430 #2431

b8634b5

lukasbestle dismissed their stale review via b8634b5 April 5, 2020 14:31

lukasbestle force-pushed the fix/2430-improve-translateCount-method branch from e1f9089 to b8634b5 Compare April 5, 2020 14:31

lukasbestle changed the title ~~Improved translateCount() method~~ Improved I18n::translateCount() method Apr 5, 2020

lukasbestle approved these changes Apr 5, 2020

View reviewed changes

Merge branch 'develop' into fix/2430-improve-translateCount-method

4ee2a8d

bastianallgeier merged commit 1b5c0c4 into develop May 4, 2020

bastianallgeier deleted the fix/2430-improve-translateCount-method branch May 4, 2020 12:57

Uh oh!

Improved I18n::translateCount() method #2432

Improved I18n::translateCount() method #2432

Uh oh!

Conversation

afbora commented Feb 8, 2020 • edited by lukasbestle Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the PR

Related issues

Ready?

Uh oh!

neildaniels commented Feb 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afbora commented Feb 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

afbora commented Feb 16, 2020

Uh oh!

lukasbestle left a comment

Choose a reason for hiding this comment

Uh oh!

neildaniels commented Feb 19, 2020

Uh oh!

lukasbestle commented Feb 20, 2020

Uh oh!

neildaniels commented Feb 20, 2020

Uh oh!

afbora commented Feb 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukasbestle commented Feb 20, 2020

Uh oh!

afbora commented Feb 20, 2020

Uh oh!

nilshoerrmann commented Feb 20, 2020

Uh oh!

afbora commented Feb 20, 2020

Uh oh!

nilshoerrmann commented Feb 20, 2020

Uh oh!

lukasbestle commented Feb 20, 2020

Uh oh!

lukasbestle left a comment

Choose a reason for hiding this comment

Uh oh!

lukasbestle commented Feb 23, 2020

Uh oh!

afbora commented Feb 23, 2020

Uh oh!

neildaniels commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukasbestle commented Mar 4, 2020

Uh oh!

neildaniels commented Mar 4, 2020

Uh oh!

lukasbestle commented Mar 5, 2020

Uh oh!

neildaniels commented Mar 5, 2020

Uh oh!

distantnative commented Mar 5, 2020

Uh oh!

afbora commented Mar 9, 2020

Uh oh!

nilshoerrmann commented Mar 11, 2020

Uh oh!

neildaniels commented Mar 11, 2020

Uh oh!

lukasbestle commented Mar 19, 2020

Uh oh!

afbora commented Mar 19, 2020

Uh oh!

lukasbestle commented Mar 19, 2020

Uh oh!

neildaniels commented Mar 20, 2020

Uh oh!

bastianallgeier commented Mar 27, 2020

Uh oh!

distantnative commented Mar 28, 2020

afbora commented Feb 8, 2020 •

edited by lukasbestle

Loading

neildaniels commented Feb 9, 2020 •

edited

Loading

afbora commented Feb 9, 2020 •

edited

Loading

afbora commented Feb 20, 2020 •

edited

Loading

neildaniels commented Mar 3, 2020 •

edited

Loading