Skip to content

zwj codepoints, skin tones, families, and kissesΒ #2

@isaacs

Description

@isaacs

Consider these various glyphs:

  1. πŸ‘Ά
  2. πŸ‘ΆπŸ½
  3. πŸ‘©β€πŸ‘©β€πŸ‘¦β€πŸ‘¦
  4. πŸ‘¨β€β€οΈβ€πŸ’‹β€πŸ‘¨

The first is a generic "simpsons-colored" baby. This module correctly interprets it as a single column. (One might argue it really ought to be considered full-width, or 2 columns, since most terminals render emoji as extra wide, but one would be wrong to make that argument, because most terminals also "incorrectly" overlap the next character on top of the emoji, so it actually only "takes up" one column.)

The second is a baby with a specific skin tone. This module doesn't handle the zero-width-joiner (or "zwj", pronounced "zwidge") properly, so it reads as 2 columns.

The third is a "woman [zwj] woman [zwj] boy [zwj] boy". It's a full 25 bytes of familial goodness, and this module treats it as 7 columns.

The fourth is "man [zwj] heart [zwj] kiss [zwj] man", and comes in at 8 columns.

Is this problem even solvable? Conceivably, something like "fireman [zwj] cat" could be turned into "fire cat" by Apple or Google or Microsoft tomorrow, and a current 2 column set of code points could become 1.

If not, it seems like maybe it should be called out in the readme as just an impossible thing we can never hope to account for? Another way would be to optimistically treat anything with zero width joiners as single chars, but that might be too optimistic?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions