You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the word tokenizer for WordDiff and WordWithSpaceDiff uses \b in its regular expression. that considers word characters as [a-zA-Z0-9_], which fails on anything beyond 7 bit.
f.e. the german phrase "wir üben" splits to:
'wir üben'.split(/\b/);
-> ["wir", " ü", "ben"]
replacing the tokenizer with value.split(/(\s+)/) is sufficient in my use-case, but i don't have newlines in my text. some further testing needed, i think.
the word tokenizer for
WordDiffandWordWithSpaceDiffuses\bin its regular expression. that considers word characters as[a-zA-Z0-9_], which fails on anything beyond 7 bit.f.e. the german phrase "wir üben" splits to:
replacing the tokenizer with
value.split(/(\s+)/)is sufficient in my use-case, but i don't have newlines in my text. some further testing needed, i think.further reading:
http://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters/10590620#10590620