-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Describe the bug
If we inspect the string "/foo/bar/ゆびわ", we see:
Dimensions: (12)
Element type: CHARACTER
Total size: 12
Adjustable: NIL
Fill pointer: NIL
Contents:
0: #\/
1: #\f
2: #\o
3: #\o
4: #\/
5: #\b
6: #\a
7: #\r
8: #\/
9: #\HIRAGANA_LETTER_YU
10: #\HIRAGANA_LETTER_BI
11: #\HIRAGANA_LETTER_WA
Wonderful. However, if we attempt to include a unicode character in a pathname, like (parse-namestring "/foo/bar/ゆびわ"), the debugger opens and we're told:
Cannot coerce string "/foo/bar/�s�" to a base-string
Somewhat cryptic. However, base-string is a hint, and the Clasp docs also mention:
Clasp supports Unicode by default. code-char and char-code work with Unicode codepoints.
...
Type base-char includes only single byte characters, i.e. Basic Latin and Latin-1 Supplement.
So perhaps somewhere in the depths of parsing the path, characters are assumed to be non-Unicode and a conversion to base-string (probably an array of base-char?) is attempted.
Expected behavior
It should be possible to contain Unicode characters with pathnames, as people in non-English-speaking countries often have Unicode characters in filepaths on their computers.
Actual behavior
(shown above)
Note also that this occurs for #p literals as well (probably powered by parse-namestring underneath).