bpo-40120: Fix unbounded struct char[] undefined behavior.#19232
bpo-40120: Fix unbounded struct char[] undefined behavior.#19232gpshead wants to merge 5 commits intopython:mainfrom
Conversation
It is undefined behaviour if index is beyond array size. The workaround for this is standard C99 feature known as "Flexible array member". https://en.wikipedia.org/wiki/Flexible_array_member
|
(yes a BPO issue is needed, i'm starting with the PR to see how the CI bots fare) |
|
It looks like we thankfully got rid of any use of sizeof(PyBytesObject) a long time ago (which never really made much sense anyways) as part of https://bugs.python.org/issue4445 so I believe this should have no negative consequences. |
|
There are other instances of this pattern in our codebase. Objects/unicodeobject.c has encoding_map who's level32 member appears to use this? Grepping I don't immediately see others. This one was found by a coworker working to enable clang analyzer. |
Well, there was no other choice in ISO C89 than using Is You can use the new buildbot label to test you change on more platforms. |
|
Moving discussion to the bug. (TL;DR I will do the buildbot tests and add this feature to PEP 7 unless we find an important platform that balks about it). Not being explicit in PEP7 yet suggests this should not be backported. |
This is really nothing more than a change test, it needs to match the C struct when no level23 mappings will be allocated. An empty char[] does not consume any C sizeof() space whereas the previous `char[1]` consumed one alignment size worth, even when unused.
| The + 1 accounts for the trailing \0 byte that we include as a safety | ||
| measure for code that treats the underlying char * as a C string. | ||
| */ | ||
| #define PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) |
There was a problem hiding this comment.
Wait, you made PyBytesObject 1 byte smaller, and you didn't have to substract 1 somewhere? Does it mean that Python overallocates 1 byte since forever?
There was a problem hiding this comment.
Thankfully not. Because this code used offsetof the macro did not need to change. The offset never changes. The existing + 1 in the macro is easy to misinterpret as having been there to account for the [1] but that isn't how it was used - PyBytesObject is always allocated with that one extra byte that is filled with a NUL for C safety for PyBytes_AsString users.
vstinner
left a comment
There was a problem hiding this comment.
The [1] thing is used in more places:
PyObject *f_localsplus[1];in PyFrameObjectPyObject *ob_item[1];in PyTupleObjectvoid *elements[1];in asdl_seq and int elements[1];` in asdl_int_seqdigit ob_digit[1];in PyLongObjectPy_ssize_t ob_array[1];in PyMemoryViewObject- _tracemalloc.c:
frame_t frames[1];in traceback_t
Do we need to update all of them? Do you want to update them all?
|
Refleaks buildbots are likely caused by https://bugs.python.org/issue40115 (known leak) |
Yep, this is supported by xlc. |
| unsigned char level1[32]; | ||
| int count2, count3; | ||
| unsigned char level23[1]; | ||
| unsigned char level23[]; |
There was a problem hiding this comment.
@gpshead: Can you please write a PR which only contains this change? Since it doesn't touch the public C API, C++ compatibility is not an issue (I don't think that Python can be built with a C++ compiler). We can start with this change and see how it goes? See also the discussion at python/peps#1349
| PyObject_VAR_HEAD | ||
| Py_hash_t ob_shash; | ||
| char ob_sval[1]; | ||
| char ob_sval[]; |
There was a problem hiding this comment.
I suggest to:
- Leave the structure as it is
- Use a code search to check which C extension access directly PyBytesObject structure members (well, first check Cython and pybind11)
- Prepare affected C extensions
- Once the number of affected C extensions is low enough: make the structure opaque, enforce the usage of public C functions.
|
see the bug, a new PR can be made. |
It is undefined behaviour if index is beyond array size. The workaround
for this is the standard C99 feature known as "Flexible array member".
https://en.wikipedia.org/wiki/Flexible_array_member
https://bugs.python.org/issue40120