Replace wchar_t string decoding implementation with a uint32_t-based one#555
Conversation
Codecov Report
@@ Coverage Diff @@
## main #555 +/- ##
==========================================
+ Coverage 91.81% 91.84% +0.03%
==========================================
Files 6 6
Lines 1856 1852 -4
==========================================
- Hits 1704 1701 -3
+ Misses 152 151 -1
Continue to review full report at Codecov.
|
This fixes character handling on platforms with 16-bit wchar_t (notably, Windows), which was broken (in different ways) on both CPython and PyPy. Fixes ultrajson#552
eb9c5c1 to
bc7bdff
Compare
bwoodsend
left a comment
There was a problem hiding this comment.
Nice. I was expecting a replacement of all strings to be a much bigger, scarier looking change set.
|
Yeah, much of the code essentially assumed 32-bit ints already for proper operation, so not many changes were needed at all. Also, just realised I forgot about the benchmarks. Some quick tests right now indicate that it's very marginally faster than the previous code by a couple per cent or so. |
…nt32_t`-based one" Backport ultrajson/ultrajson#555
…nt32_t`-based one" (#67) Backport ultrajson/ultrajson#555
…nt32_t`-based one" (explosion#67) Backport ultrajson/ultrajson#555
…nt32_t`-based one" (explosion#67) Backport ultrajson/ultrajson#555
…nt32_t`-based one" (explosion#67) Backport ultrajson/ultrajson#555
This fixes character handling on platforms with 16-bit
wchar_t(notably, Windows), which was broken (in different ways) on both CPython and PyPy.Fixes #552
Remarks:
Py_UCS4 == JSUINT32check magic, see the comments on Surrogates fix fails tests with PyPy on Windows #552.PyUnicode_FromWideChardoes some extra work compared toPyUnicode_FromKindAndData(mostly due to surrogate handling). On 16-bitwchar_tplatforms, the larger buffer size might have some impact though; I won't be able to run comparisons for that though, I think.