Const-qulify PyUFuncGenericFunction's arguments#40728
Const-qulify PyUFuncGenericFunction's arguments#40728edudev wants to merge 1 commit intotensorflow:masterfrom
Conversation
|
I just noticed this change [1]. I believe my PR resolves all the breaking changes in the numpy ABI and the commit can be reversed. [1] 79518fa |
|
@jaingaurav Can you please review this PR ? Thanks! |
|
I'd suggest to include a revert of 79518fa in this PR as this resolves the TODO. I hope the TF guys agree. |
I believe it should be reverted, too. |
What do you mean by that exactly? -->
|
Ah, I totally forgot that it's a non-breaking change. My mistake. |
Remove the limit of numpy's version
8bed450 to
84cc21a
Compare
|
Well, I'm just waiting for a review. Something that I did notice though is that the relevant function type ( And something that I sadly just noticed: this change is in fact breaking. Once merged, it will not be possible to compile tensorflow with numpy below 1.19. Moreover, I believe that using tensorflow with lower versions of numpy will also be impossible, due to differences in the mangled name of the relevant functions. |
Can you elaborate why? I mean: those functions call numpy functions and those numpy function either take non-const pointers (<1.19) or const-pointers (1.19+), so both work now. Only thing which will be impossible is to compile TF with <1.19 and use with 1.19+. Or is that why you meant? |
Yes, it will not be possible to compile tensorflow with numpy versions below 1.19 due to the change in the function pointer type (from non-const pointer argument to const pointer argument). About the runtime issues: had the relevant code (bfloat16) been compiled as a C source, this would not have been an issue. However, in C++ the argument types actually influence the function names. Notice how changing the argument from a normal pointer to a constant pointer changes the name of the function, as well (there's an extra Now, coming to the numpy case, it's... complicated... way more complicated, because the offending argument is actually a pointer to a function. |
|
Hmm, okay, I took a closer look at this once again. As I said, it's quite complicated. The linking issue is at the boundary of tensorflow and numpy. Including headers and having const/non-const is only a compile-time issue, as already mentioned. Normally, when the tensorflow call to Here are how the two names of this function would look like (without and with const qualifiers for the parameters of the (they are different) Okay, up until now, this pretty much says that we would not be able to use tensorflow with versions of numpy before 1.19. (these declarations differ from the one I provided above, as it was intentionally made simpler, with less arguments) I have not gone through the numpy source code, so I am not exactly sure when each of these two versions is used. My guess would be that the former is used for internal numpy complication, while the latter is used when compiling against external libraries (such as tensorflow). And this is just a simple cast of Therefore, in conclusion, I believe it may not be an issue to use older versions of numpy with tensorflow. |
|
I didn't follow the discussion in the most recent comment, but it looks to me like it is overcomplicating this. I think all you need to do to fix any source compatibility issues with older versions of NumPy is to call |
|
You can ignore my comments on this failing to link due to the C++ name mangling. I just noticed that there's an About the reinterpret cast: as I was wrong about the C++ function name mangling occurring, it will not be needed. So... my question is: how bad would it be to require numpy 1.19 for tensorflow compilation (but still be able to work alongside old versions during runtime)? |
|
It seems to me that you are confusing things. I don't see how this change is not compatible with numpy before and after 1.19. Can you point out why you think this is not the case? Did you try it? Let me address what we have up until now:
Which overloaded functions? There are no overloaded functions here are they? You either have ones with const-pointers (this change) or without (prior to this changed)
After reading the above, do you still see an issue? Which one? Have you tried it? |
|
Once again, you can ignore my comments on linking errors. The only questions that remains is whether tensorflow should be compiled with numpy < 1.19. In either case (with the reinterpret cast, or without it), tensorflow will continue to work with the older versions of numpy at runtime. |
IMO that is worth it. Maybe with a slightly safer version that static_asserts that the signature is what is expected and does the cast only then. So either:
Reason: If they change it again or the function here is changed then it will result in a build error instead of a silent failure
One I can think of: User having numpy already installed. That won't be upgraded unless they do the explicitly (or implicitly if TF requires the new version) but maybe users have other software they use which doesn't work with the older version due to similar problems to this |
|
Glad we're finally on the same page about the issue of the pull request. About the cast: I honestly am not sure which is better, which is why I asked for arguments for the cast. Another reason that I don't like having a cast is that it is to support an old version of numpy. I looked for official information on their release guidelines and end-of-life timeline, but I could find nothing. If a decision is made to support numpy 1.18, I would also add a comment on this (probably also with a link to this issue). Nonetheless, I feel like a cast would be a 'temporary' solution (until numpy 1.18 gets forgotten), which would remain there permanently (I try to avoid such code, as much as possible). And lastly, the cast is in fact not future-proof, as you've noted. Although extremely unlikely, the ABI may change again to something which is incompatible (unlike the current const qualifiers). I tried to look for some macros to detect for numpy 1.19, but could not find anything (most likely I missed it, though). |
|
You are missing an important group: people who install packages like system package maintainers and system admins. For them it is not possible to have conflicting versions or "just upgrade" a package. So having wider compatibility is a big bonus. Numpy 1.18 is not old enough to ignore it. And just adding the cast is less work than all this discussion. For your last point: imo it is irrelevant as long as we detect that at compile time. See my solutions above. They won't completly change it without at least raising the major version and then a decision needs to be made. But that is all fruitless speculation. So: best to get something to work for current versions and don't limit them just to save 4 lines of code or so. Rely on reported versions (is there a macro of the Numpy version?) or add a comment. Then nothing is forgotten |
|
@alextp does this PR break the C ABI guarantees we have in place? |
|
@ebrevdo I don't see any change to our ABI-stable headers in tensorflow/c/{eager,} so I don't think it's ABI-breaking. I also believe that the numpy abi breaking change is actually not an abi breaking change (i.e. the code links fine) but an api-breaking change (i.e. the const makes C++ compilers not willing to compile old code against the new header file but old code compiled agains the old header file links just fine against existing numpy). That said I didn't review this change at all so don't trust me to approve. |
|
FWIW: All the functions changed are internal linkage so not exposed. Hence no ABI break is possible. And yes I agree that this is an API break only from numpy However before this is accepted I'd like to see support for pre and post numpy 1.19 handling. I.e. and need to be adapted so the new and the old signature (PyUFuncGenericFunction) are accepted (and only those 2)To make the evaluation easier see this godbolt link which demonstrates the effect of compiling against numpy pre 1.19 and post 1.19 via a define: https://godbolt.org/z/5MKMWK |
|
Thanks for clarifying, @Flamefire ; looks like we may need an #if/#else that checks the version of numpy. |
|
I think always passing a const should work without the need for an #if. And
it's also fine if we require a particular version of numpy's header files
for TF to build against.
…On Wed, Sep 23, 2020 at 8:58 AM ebrevdo ***@***.***> wrote:
Thanks for clarifying, @Flamefire <https://github.com/Flamefire> ; looks
like we may need an #if/#else that checks the version of numpy.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40728 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABHRPIDPQ4Z2JT6L6G5YTSHILLPANCNFSM4OGBKTEA>
.
--
- Alex
|
|
@alextp Not really. The problem is that the numpy API either expects a function like My proposed alternative: https://godbolt.org/z/5s9xMj The idea is to use a function which accepts the new signature and casts it to what numpy expects. A |
|
I thought this issue was already resolved by 75ea0b3 ? |
|
Question: what do we need to do from this PR, given that it has been partly solved/merged in previous commits (if I understand the previous comments)? I am currently working on fixing the dependencies to work with the new pip resolver and this issue popped up. |
|
It seems I was able to build on Python3.8 with latest numpy. |
|
With aafe25d numpy is now past the API breakage |
Due to recent changes in Numpy [1], PyUFuncGenericFunction's argument should be const-qualified.
This change is non-breaking (i.e. it will work for both numpy 1.18 and 1.19).
[1] numpy/numpy#15355