Skip to content

Support ujson.loads(bytearray(...)) and other bytes-like objects.#573

Merged
bwoodsend merged 2 commits intoultrajson:mainfrom
bwoodsend:bytes-like
Dec 20, 2022
Merged

Support ujson.loads(bytearray(...)) and other bytes-like objects.#573
bwoodsend merged 2 commits intoultrajson:mainfrom
bwoodsend:bytes-like

Conversation

@bwoodsend
Copy link
Copy Markdown
Collaborator

Fixes #572

Changes proposed in this pull request:

  • Generalise the support for using ujson.loads() on bytes to any C contiguous bytes like object such as bytearray(), memoryview() and array.array().

@bwoodsend
Copy link
Copy Markdown
Collaborator Author

Ughh, I hate supporting PyPy

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 17, 2022

Codecov Report

Merging #573 (15bfae0) into main (1860724) will increase coverage by 0.14%.
The diff coverage is 92.85%.

❗ Current head 15bfae0 differs from pull request most recent head efb514b. Consider uploading reports for the commit efb514b to get more accurate results

@@            Coverage Diff             @@
##             main     #573      +/-   ##
==========================================
+ Coverage   91.49%   91.64%   +0.14%     
==========================================
  Files           6        6              
  Lines        1905     1938      +33     
==========================================
+ Hits         1743     1776      +33     
  Misses        162      162              
Impacted Files Coverage Δ
tests/test_ujson.py 98.80% <92.59%> (-0.31%) ⬇️
python/JSONtoObj.c 90.81% <93.33%> (+2.77%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@JustAnotherArchivist
Copy link
Copy Markdown
Collaborator

@bwoodsend
Copy link
Copy Markdown
Collaborator Author

bwoodsend commented Dec 17, 2022

Hmm, I'd have expected this line to fail too if it was the same cause since bytes() would have returned the original unsliced bytes string:

assert ujson.loads(bytes(buffer)) == [1, 2, 3]

I'm just going to create a new ticket.

@bwoodsend
Copy link
Copy Markdown
Collaborator Author

Filed upstream: https://foss.heptapod.net/pypy/pypy/-/issues/3872

How do we want to handle this?

  1. Merging this PR without adding some kind of additional safeguard for PyPy sounds to me like another CVE in the making.
  2. Cancelling this PR just because of PyPy feels silly.
  3. Adding if PyPy: use old logic / else use new logic is both ugly from the developer's perspective (all the fiddly remember to dec-ref/free this object in all successful and erroneous cases would double) and potentially confusing to users who may wonder why ujson happily reads bytearray() under CPython but not PyPy.
  4. Any better ideas?

@JustAnotherArchivist
Copy link
Copy Markdown
Collaborator

How about an extra test inside the bytes-like case for PyPy, testing specifically for bytes and bytearray? That keeps the error condition code minimal, and it'd still add support for bytearray on PyPy. We basically have to exclude anything that isn't guaranteed to be C-contiguous (which bytes and bytearray are, I think, although I doubt it's documented anywhere) until that bug is fixed.

  bool is_bytes_like = !PyObject_GetBuffer(arg, &buffer, PyBUF_C_CONTIGUOUS);
  if (is_bytes_like)
  {
    #ifdef PYPY_VERSION
      // PyPy's buffer protocol implementation is buggy: https://foss.heptapod.net/pypy/pypy/-/issues/3872
      if (!PyBytes_Check(arg) && !PyByteArray_Check(arg)) {
        PyBuffer_Release(&buffer);
        PyErr_Format(PyExc_TypeError, "Expected string, bytes, or bytearray object");
        return NULL;
      }
    #endif
    raw = buffer.buf;
    sarg_length = buffer.len;
  }

Not sure what we'd want to do once the bug is fixed in PyPy though. Maybe we could change that to a version check, either in the code or somewhere in the build process, but not sure that's the best approach.

Copy link
Copy Markdown
Collaborator

@JustAnotherArchivist JustAnotherArchivist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor things, else LGTM.

Comment thread python/JSONtoObj.c Outdated
Comment thread python/JSONtoObj.c Outdated
Comment thread python/JSONtoObj.c Outdated
Due to a bug in PyPy [1], PyObject_GetBuffer() fails to detect non C
contiguous inputs. Allowing ujson.loads() to run on non-contiguous
buffers such as a strided NumPy array would lead to mayhem. Approximate
the intention of PyObject_GetBuffer() by explicitly checking that the
input is within the set of common bytes-like objects known to be
unconditionally C contiguous.

- [1]: https://foss.heptapod.net/pypy/pypy/-/issues/3872
@bwoodsend bwoodsend merged commit 1876c02 into ultrajson:main Dec 20, 2022
@bwoodsend bwoodsend deleted the bytes-like branch December 20, 2022 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog: Added For new features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Load from bytearray

3 participants