Skip to content

Conversation

@afbora
Copy link
Member

@afbora afbora commented Feb 8, 2020

Describe the PR

  • Support for translation callbacks in I18n::translateCount()
  • Support for an arbitrary number of count strings per translation in I18n::translateCount()
  • I18n::translateCount() now replaces the {{ count }} even if just a single translation string is defined

Related issues

Ready?

  • Added unit tests for fixed bug/feature
  • Passing all unit tests
  • Fixed code style issues with CS fixer and composer fix
  • Added in-code documentation (if needed)

@neildaniels
Copy link
Contributor

neildaniels commented Feb 9, 2020

This is not how I would suggest addressing #2430.

In the case of a Irish, for example, you would need to define literally 11 different forms (one for 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11) to cover all cases, but only five should be necessary.

In the case of Polish, there are 4 rules:

  • 1
  • ends in 2, 3, 4; excluding 12-14
  • every other integer
  • any decimal
    (Though, Kirby’s current number format only accepts integers, so the last rule is somewhat irrelevant.)

In this case, the current mechanics would make it infeasible to cover every number, such as `544B, which should use the second rule.

For maximum compatibility, I would suggest that translation strings should take an associative array that can have the following keys:

  • zero
  • one
  • two
  • few
  • many
  • other

These are taken from Unicode’s Language Plural Rules.

Polish example:

months:
  one: {{ count }} miesiąc
  few: {{ count }} miesiące
  many: {{ count }} miesięcy
  other: {{ count }} miesiąca # this deals with decimals, and is thus kind of irrelevant today because Kirby’s only accepts integers

My background is from an iOS programming, and the localization toolkit on Apple platforms already has the pluralization rules built into the locales so that feeding it knows which key to select. This is what ngettext() is built to do.

I suspect the rules are already available on most PHP system, but if they’re not, there will need to be some kind of function (per Kirby language) that would accept the number to be formatted and that function would return a key such as one, few, many, other, etc.

function polishPlural($n) {
    switch ($n) {
          case is_float($n):
               return 'other';
          case 1:
               return 'one';
          case (in_array($n % 10, [2, 3, 4]) && !in_array($n, [12, 13, 14])):
               return 'few';
          default:
               return 'many';
    }
}

@afbora
Copy link
Member Author

afbora commented Feb 9, 2020

Thank you for all details. I understand, for now I wrote the simpler and more comprehensive first version already as a draft. If really needed, more advanced method will be written according to your detailed explanations.

@afbora
Copy link
Member Author

afbora commented Feb 16, 2020

@lukasbestle What do you think about that?

@afbora afbora added the needs: discussion 🗣 Requires further discussion to proceed label Feb 16, 2020
Copy link
Member

@lukasbestle lukasbestle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this is already a huge improvement over the old implementation and still fully backwards-compatible. I like that it is still pretty simple.

ngettext() would require us to rewrite basically the whole translation management and a custom mapping per language would be quite complex and would require the users to actually write those mapping rules even for languages that don't have as complex counting rules.

We could still implement optional custom mapping rules on top like @neildaniels suggested (so that the rules would be used if they are defined and otherwise the key would be determined by the number itself). Basically the pseudo-code would look like this:

if (translation has 'countMapping' key and 'countMapping' is PHP Closure) {
  $count = $countMapping($count);
}

Still pretty simple.

@neildaniels
Copy link
Contributor

If using the Mozilla documentation as a guide (which is based off the GNU gettext documentation), you'll note that there are only 20 distinct "rules" that sould ever need to be implemented (Rule 0 through Rule 19).

The GNU documentation and Unicode table provide pseudocode (or, actual C code) for these rules (they don't seem to perfectly match up with the Mozilla docs though).

Would it be reasonable to have Kirby include support for all 20 rules?

@lukasbestle
Copy link
Member

The idea is to support any number of rules the dev needs. You can return any string from the countMapping callback and that string will determine the plural form to use.

What we can‘t do is to ship a list of all possible languages and their plural rules with Kirby – that would be just too much data for this little feature. That‘s why I think that the custom callback approach is a good and very flexible compromise.

@neildaniels
Copy link
Contributor

Yes, but what I’m saying is there’s only 20 possible plural rules across all the languages.

In other words: there will only ever be 20 possible correct implementations of the callback method. And each of those implementations have fairly trivial rules.

What I’m not suggesting is mapping each language to one of those 20 rules.

@afbora
Copy link
Member Author

afbora commented Feb 20, 2020

@neildaniels Yes that would be great. When i think again, this seems to me to be a very comprehensive development for a simple feature.

I looked at what the huge framework like Laravel uses. The structure it uses is quite simple:
https://laravel.com/docs/6.x/localization#pluralization

'minutes_ago' => '{1} :value minute ago|[2,*] :value minutes ago',

echo trans_choice('time.minutes_ago', 5, ['value' => 5]);

It is already quite comprehensive now.
With @lukasbestle proposed callback, developers will be able to extend as they wish.
I think this is a great idea.

@lukasbestle
Copy link
Member

The issue with implementing the 20 rules is that we can‘t assume that the users know what those rules are for, how they work and which one to choose for their languages. A callback that allows a custom mapping is a bit more work for experienced devs to implement, but easier to understand in general.

@afbora
Copy link
Member Author

afbora commented Feb 20, 2020

@lukasbestle I preferred to use direct callback instead of countMapping translation key. How does it look?

@afbora afbora removed the needs: discussion 🗣 Requires further discussion to proceed label Feb 20, 2020
@nilshoerrmann
Copy link
Contributor

The issue with implementing the 20 rules is that we can‘t assume that the users know what those rules are for, how they work and which one to choose for their languages.

@lukasbestle: But this is something the user has to learn nonetheless, even if there are only 3 rules (which is a very western assumption, by the way). Having a callback with custom mapping is a nightmare for anyone who is not a programmer.

The full Laravel syntax in the linked docs is very nice by the way, @afbora.

afbora added a commit that referenced this pull request Feb 20, 2020
@afbora
Copy link
Member Author

afbora commented Feb 20, 2020

@nilshoerrmann Yes, I agree that the callback feature is not easy for non-developers.
But for non-developers, we have an easy syntax, if not as much as Laravel.

'apply' => [
    'No Apple',
    '{{ count }} Apple',
    '{{ count }} Apples',
    'Many Apples'
]

@nilshoerrmann
Copy link
Contributor

@afbora I know, but I was referring to the extended plural options discussed in this thread. The above example can only deal with the variants required by most Western European languages. I just wanted to support @neildaniels's request to support all those 20 rules.

@lukasbestle
Copy link
Member

The issue I see with the 20 rules is that they don‘t have names and it is therefore not obvious if the language you are working with needs rule 4 or 18 or whatever. That‘s an unnecessary barrier unlike a flexible implementation that is potentially more effort, but can be configured exactly like the user needs it.

@afbora afbora marked this pull request as ready for review February 20, 2020 13:53
Copy link
Member

@lukasbestle lukasbestle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplicity of this. However this means that every single translation key would need to be implemented as a Closure. This is an issue if the translations are imported from a translation tool. The code would have to be written by hand for each translation.

I think we could still keep the direct callback as another option for those who want to use it. But we should still have support for a global mapping callback.

Basically it would work like this:

  • If the translation itself is a Closure, use its return value directly and ignore the mapping callback (like in your current implementation).
  • Otherwise check if a mapping callback is defined and use its return value as the $count.
  • Otherwise use the $count directly.

@lukasbestle
Copy link
Member

Another idea: We could have a public static $countMappings = [] array that could be extended by plugins (associative array with named keys and Closure values). The countMapping key of each language could then also be set to a string with the mapping rule key from this array.

There could then be a third-party plugin that implements the 20 language rules. What do you think @nilshoerrmann @neildaniels?

@afbora
Copy link
Member Author

afbora commented Feb 23, 2020

@lukasbestle I tried to implement your global mapping idea 👍

@lukasbestle lukasbestle force-pushed the fix/2430-improve-translateCount-method branch from f7e21db to e1f9089 Compare February 23, 2020 14:26
@neildaniels
Copy link
Contributor

neildaniels commented Mar 3, 2020

Another suggestion:

There should be automatic handling that maps 0 to being zero. For a vast majority of languages, there is no strict need for a special 0 case (such as in English), but the 0 case is functionally useful to call out because it's inherently nothing (e.g. "No cars", as opposed to "0 cars").

While a custom countMappings closure could provide a zero case, it'll end up yielding incorrect results for certain languages if you don't want a special zero case.

Take French for example, there are two cases:

  • 'one': which covers both 0 and 1
    • e.g.: "0 jour", "1 jour"
  • 'other': which covers everything else
    • e.g.: "2 jours", "55 jours"

For example, given this translation setup for French:

'car' => [
     'zero' => 'pas de voitures',
     'one' => '{{ count }} voiture',
     'other' => '{{ count }} voitures'
 ],
'day' => [
     'one' => '{{ count }} jour',
     'other' => '{{ count }} jours'
 ],
 'countMapping' => function ($count) {
     switch ($count) {
         case 0:
             return 'zero';
         case 1:
             return 'one';
         default:
             return 'other';
     }
 }

It'll end up yielding the wrong translation for the "0 days" case:

echo tc('car', 0); // prints "pas de voitures"
echo tc('day', 0); // prints "0 jours", but should be "0 jour"

I would suggest confining the countMapping to what is strictly necessary for the language, and letting zero be a special case.

This is how Apple's platforms handle translations, and I that's worked very well in my experience.


Thus, I would suggest this:

'car' => [
     'zero' => 'pas de voitures',
     'one' => '{{ count }} voiture',
     'other' => '{{ count }} voitures'
 ],
'day' => [
     'one' => '{{ count }} jour',
     'other' => '{{ count }} jours'
 ],
 'countMapping' => function ($count) {
     switch ($count) {
         case 0:
         case 1:
             return 'one';
         default:
             return 'other';
     }
 }

which would print correctly (assuming zero is specially handled):

echo tc('car', 0); // prints "pas de voitures"
echo tc('day', 0); // prints "0 jour"

@lukasbestle
Copy link
Member

@neildaniels To be honest, I'm still not convinced that we need to constrain the user and reduce the flexibility by implementing a fixed set of options and even special cases.

If you follow our way of doing things, you will know that we strive to make Kirby features as simple as possible. Kirby is not a huge framework made up of specialized components with dozens of options for the tiniest edge-cases – instead it's a tool made up of simple building blocks you can extend yourself if needed. The reason for this goal is that we want Kirby to be simple for beginners and powerful for advanced users at the same time.

What you are describing in your last two comments and also during our discussion is behavior that actually needs documentation. Our approach is different: We try to build features in a way that can be understood as quickly as possible with as minimal external information as needed. If you had to actually study how each feature works in detail before you can get started at all, it wouldn't be Kirby anymore.

Your suggestions have presuppositions built in (the naming, the zero fallback etc.) – we want to avoid that as much as possible.

@neildaniels
Copy link
Contributor

I'm merely giving feedback as someone who has had experience working on internationalization projects on other platforms, including working with professional translators.

The suggestions I'm giving are genuinely rooted in how I've seen many people approach localization/translations. Lots of software and translators have adopted Unicode's keywords of zero, one, two, few, many, and other. It works.

I appreciate Kirby's general attitude of "constrain as little as possible," but I don't necessarily agree that internationalization is an area where that is actually beneficial. There is not an infinite number of languages, nor are there an infinite number of language rules. I cannot imagine a scenario where someone would ever take advantage of arbitrary plural rules—other than to avoid reading documentation and hope they get it right. Practically speaking, this isn't an area of "creative freedom" that should be left up developers.

The nuances of languages inherently make this kind of work tricky. Yes, there are edge cases. But because of that—in my mind—that's exactly why a framework should be able to accommodate and document those edge cases. If plurals were actually straightforward, Unicode wouldn't have an entire document talking about plurals.

My original issue #2430 was rooted in the fact that Kirby's current system legitimately did not work for certain languages. I think that's clear, and thankfully that strictly has been fixed with this PR. Thank you @afbora for this—I appreciate it.

I think we need to accept that there will be necessary documentation to use this feature. If my suggestion about other being the default is not accepted, then the fact that other should be the last key is—to me—something that must be documented.

@neildaniels In fact, translations and countMapping() is a user-controlled area. If user sets the last index to other it will return as correct.

I don't think many developers actually want to think much about localization more than they have to (especially if they're unfamiliar with the languages). I also don't think many professional translators want to have to worry about the format they give their translations in.

If translations were something that were entirely and always up to a developer to do, I think I could get behind letting Kirby do as little as possible. But I suspect the reality is: most developers won't be the ones do the translations and when they get an email about Language X has some "weird" rules about plurals, they're going to either:

  • Not notice that Kirby even has a translateCount() method at all, and come up with some workaround.
  • See that it exists, but be discouraged because they need to figure out what the rules are and have to write them.

I think having documentation (that gives a primer about why this method actually exists and points to Unicode's Plural Rules) isn't a concession or sacrifice. It's a strong hint that thought went into providing a useful implementation that doesn't require you to worry about the edge cases.

I will agree that the zero case is not strictly necessary for correctness, but you do sacrifice the ability to have this practical feature for some languages. It's a feature that makes a great translation system shine.


My takeaway from some of the comments is that I should just go build my own utopian function that does everything I want. That utopian version would have like 3 more lines of code and have the language rules baked into it. But, if that's the outcome, so be it.

@lukasbestle
Copy link
Member

Don‘t get me wrong: There should definitely be documentation for this feature (and there will be). 🙂

@bastianallgeier @distantnative What do you think about the different ways of implementing this?

@neildaniels
Copy link
Contributor

As an aside: I went through Unicode's list and made PHP implementations for nearly every language.

  • It's missing 3 languages: lv, prg, si.
    • I can't confidently interpret the rules for lv or prg. I wouldn't want a developer trying to simply use a set of translations to have to worry about having to do this themselves.
  • A number of decimal checks misuse fmod() and therefor don't work correctly. I need to correct this.
    • I misunderstood how fmod() worked: I incorrectly assumed something like fmod(12.34, 10) would return 4 (treating the fractional amount as if it was an integer, and ignoring the actual integer part); it actually returns 2.34.
  • There's probably some other errors.

https://gist.github.com/neildaniels/abbbe67519d33509d3ef85744eefc0ee

I think Kirby could integrate in these rules and free developers from ever having to write this boilerplate code. But, even if they did, you'd end up with basically the same logic.

Also: consider if a plugin starts to offer translations with plurals variations. Even with this current PR, you have to hope that the plugin offering the translations will be compatible with a site's plural rules closure. In that sense, there necessarily will become a de facto way to handle plurals.

@distantnative
Copy link
Member

I'm trying to take a step back here:
I think that the whole i18n classes part of Kirby isn't great at the moment. And we could offer better support there for pluralization etc.

But should we really reinvent the wheel for all of this? I would think that this could be a very useful case for the few times where we take a good library on board. So I would be much more interested in figuring out if there is a good library that fulfils our/all these requirements.

@distantnative distantnative removed this from the 3.3.5 milestone Mar 5, 2020
@distantnative distantnative added the needs: discussion 🗣 Requires further discussion to proceed label Mar 5, 2020
@afbora
Copy link
Member Author

afbora commented Mar 9, 2020

If i18n will be refactored and this will not happen anytime soon, we can take the first commit of PR as step one 🤔

@nilshoerrmann
Copy link
Contributor

Is everyone aware of this plugin by Oblik Studio:
https://github.com/OblikStudio/kirby-plurals

@neildaniels
Copy link
Contributor

I was not, but that looks perfect.

@lukasbestle
Copy link
Member

@nilshoerrmann Oh, that's awesome! I didn't know there was already a plugin for this. A plugin makes a lot more sense in my opinion.

So maybe we should simplify this PR by reducing it to the useful smaller improvements. Which would mean: Remove the count mapping stuff, but keep the rest (translation callbacks, support for the count placeholder for simple strings and flexible number of keys). What do you think @neildaniels @afbora?

@afbora
Copy link
Member Author

afbora commented Mar 19, 2020

@lukasbestle #2432 (comment) As I said before, I think we should continue by simplifying it like your said 👍. We can take the first two commits.

@lukasbestle
Copy link
Member

Some of the improvements in the last commit still apply. I will update and finalize the PR this weekend. :)

@neildaniels
Copy link
Contributor

Sounds good to me.

@bastianallgeier
Copy link
Member

Sorry for skipping the long discussion. What's the state of this? @lukasbestle?

@distantnative
Copy link
Member

@bastianallgeier A general disagreement about how much of singular-plural rules for all languages should be integrated into the core, whether we should just offer a general callback method, or whether this should rather be handled by a good library (maybe as part of a general i18n refactoring).

@lukasbestle
Copy link
Member

No, I think we have agreement now. I just haven't found the time so far to finalize this. But it's on my list for the next days.

@afbora afbora removed the needs: discussion 🗣 Requires further discussion to proceed label Mar 31, 2020
@lukasbestle lukasbestle added this to the 3.4.0 milestone Apr 5, 2020
@lukasbestle lukasbestle force-pushed the fix/2430-improve-translateCount-method branch from e1f9089 to b8634b5 Compare April 5, 2020 14:31
@lukasbestle lukasbestle changed the title Improved translateCount() method Improved I18n::translateCount() method Apr 5, 2020
Copy link
Member

@lukasbestle lukasbestle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR now only includes the minor improvements we have discussed. All more complex features are covered by the kirby-plurals plugin.

@nilshoerrmann
Copy link
Contributor

Maybe it would be interesting to ask @OblikStudio for a cookbook entry so the plugin gets a bit more coverage?

@bastianallgeier bastianallgeier merged commit 1b5c0c4 into develop May 4, 2020
@bastianallgeier bastianallgeier deleted the fix/2430-improve-translateCount-method branch May 4, 2020 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

7 participants