SP5: improve international font detection

Maybe the solution is to replace [MatchCharacter(String, Char)](https://learn.microsoft.com/en-us/dotnet/api/skiasharp.skfontmanager.matchcharacter?view=skiasharp-2.88#skiasharp-skfontmanager-matchcharacter(system-string-system-char)) with [MatchCharacter(String, Int32)](https://learn.microsoft.com/en-us/dotnet/api/skiasharp.skfontmanager.matchcharacter?view=skiasharp-2.88#skiasharp-skfontmanager-matchcharacter(system-string-system-int32)) because some "characters" don't fit in the 16-byte `char` type

_EDIT: Tests below indicate this is not the case_

Additional context from Discord (Thanks @prime167 and ChrisL)

```cs
WpfPlot1.Plot.Axes.Left.Label.FontName = Fonts.Detect("测试");   not work
```

> "that string is a mix of UTF16 and UTF32 encodings": If we're talking about the Chinese characters "测 试 时 间", they all fit in 16 bits each. Not sure how much you already know about Unicode, so sorry if I'm overexplaining, but maybe it will be interesting to other people in that case. No guarantees that it is perfectly accurate, let me know if I got any details wrong.
> 
> "Character" is confusing. To users it probably means a single "character" on the screen. To a programmer it might mean uint8, uint16, C# 'char', UTF-8 / UTF-16 / UTF-32.
> 
> What you see on the screen can be called "grapheme cluster" (C# "text element") instead to be precise.
> 
> A "grapheme cluster" consists of one or more Unicode code points. Code points can be combined in various ways to create complex grapheme clusters. In theory, a grapheme cluster can require an unlimited amount of code points to represent.
> 
> A Unicode code point is a logical 32-bit value. It can be physically encoded using UTF-8 / UTF-16 / UTF-32.
> 
> UTF-8 and UTF-16 are not uint8 / uint16. Instead, they are variable length encodings of a logical 32-bit Unicode code point. They only require one uint8 or uint16 for more common code points, but can require 2-4x uint8 or 2x uint16 in some cases.
> 
> The individual uint8 / uint16 values are called code units (not code points). One or two UTF-16 code units (=uint16) are required to represent a UTF-16 code point.
> 
> A C# 'char' is always 16 bits. It represents a UTF-16 code unit. If your UTF-16 code point value requires 2 UTF-16 code units, then it cannot be placed in a C# 'char'.
> 
> Example: The string "☠️" can be represented as a single Unicode code point = 1x UTF-16. Since it is not in the Basic Multilingual Plane (BMP,  the first 64k code points), it requires two UTF-16 code units.
> 
> Example: The string "👩🏽‍🚒" is represented by four Unicode code points and contains seven C# 'char' instances:
> 
> U+1F469 WOMAN
> U+1F3FD EMOJI MODIFIER FITZPATRICK TYPE-4
> U+200D ZERO WIDTH JOINER
> U+1F692 FIRE ENGINE 
> 
> --ChrisL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SP5: improve international font detection #3220

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SP5: improve international font detection #3220

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions