Skip to content

Replace wchar_t string decoding implementation with a uint32_t-based one#555

Merged
hugovk merged 1 commit intoultrajson:mainfrom
JustAnotherArchivist:fix-decode-surrogates-2
Jun 20, 2022
Merged

Replace wchar_t string decoding implementation with a uint32_t-based one#555
hugovk merged 1 commit intoultrajson:mainfrom
JustAnotherArchivist:fix-decode-surrogates-2

Conversation

@JustAnotherArchivist
Copy link
Copy Markdown
Collaborator

This fixes character handling on platforms with 16-bit wchar_t (notably, Windows), which was broken (in different ways) on both CPython and PyPy.

Fixes #552

Remarks:

  • For the disgusting Py_UCS4 == JSUINT32 check magic, see the comments on Surrogates fix fails tests with PyPy on Windows #552.
  • The changelog might need a little touch-up here as this is essentially a continuation of Fix handling of surrogates on decoding #550.
  • I have not run any performance comparisons yet. In general, I would expect it to perform at least as well as the previous implementation since PyUnicode_FromWideChar does some extra work compared to PyUnicode_FromKindAndData (mostly due to surrogate handling). On 16-bit wchar_t platforms, the larger buffer size might have some impact though; I won't be able to run comparisons for that though, I think.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 18, 2022

Codecov Report

Merging #555 (bc7bdff) into main (cc70119) will increase coverage by 0.03%.
The diff coverage is 90.90%.

@@            Coverage Diff             @@
##             main     #555      +/-   ##
==========================================
+ Coverage   91.81%   91.84%   +0.03%     
==========================================
  Files           6        6              
  Lines        1856     1852       -4     
==========================================
- Hits         1704     1701       -3     
+ Misses        152      151       -1     
Impacted Files Coverage Δ
tests/test_ujson.py 99.45% <ø> (+0.17%) ⬆️
lib/ultrajsondec.c 92.56% <90.00%> (ø)
python/JSONtoObj.c 88.04% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc70119...bc7bdff. Read the comment docs.

Comment thread lib/ultrajsondec.c Outdated
This fixes character handling on platforms with 16-bit wchar_t (notably, Windows), which was broken (in different ways) on both CPython and PyPy.

Fixes ultrajson#552
@hugovk hugovk added the changelog: Fixed For any bug fixes label Jun 20, 2022
Copy link
Copy Markdown
Collaborator

@bwoodsend bwoodsend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I was expecting a replacement of all strings to be a much bigger, scarier looking change set.

@hugovk hugovk merged commit 67ec071 into ultrajson:main Jun 20, 2022
@JustAnotherArchivist
Copy link
Copy Markdown
Collaborator Author

Yeah, much of the code essentially assumed 32-bit ints already for proper operation, so not many changes were needed at all.

Also, just realised I forgot about the benchmarks. Some quick tests right now indicate that it's very marginally faster than the previous code by a couple per cent or so.

adrianeboyd added a commit to adrianeboyd/srsly that referenced this pull request Jul 20, 2022
adrianeboyd added a commit to explosion/srsly that referenced this pull request Jul 20, 2022
adrianeboyd added a commit to adrianeboyd/srsly that referenced this pull request Jul 18, 2023
adrianeboyd added a commit to adrianeboyd/srsly that referenced this pull request Jul 18, 2023
adrianeboyd added a commit to adrianeboyd/srsly that referenced this pull request Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: Fixed For any bug fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surrogates fix fails tests with PyPy on Windows

4 participants