dladdr() implementation for AIX [1.1.0]#5626
dladdr() implementation for AIX [1.1.0]#5626makr wants to merge 10 commits intoopenssl:OpenSSL_1_1_0-stablefrom
Conversation
As AIX is missing a dladdr() implementation it is currently uncertain our exit()-handlers can still be called when the application exits. After dlclose() the whole library might have been unloaded already. The TODOs mark the place and the way for a stripped down dladdr()-implementation using AIX' own loadquery()-function. Signed-off-by: Matthias Kraft <[email protected]>
Implemented a stripped down (and so far untested) dladdr()-implementation using AIX' own loadquery()-function. Following the SGI example in the same code, the DL_info only has the dli_fname member. As the scope of dlfcn_pathbyaddr() is the filename, this implementation does not consider archive members, which can be dlopen()ed in AIX. It is also leaking memory as it provides a copy of ldinfo_filename. Signed-off-by: Matthias Kraft <[email protected]>
Fixed thinko in calculation of next ld_info entry. Changed to large fixed size ld_info buffer, as the ld_info entries are of dynamic size. The 80k will still hold 32 entries in a worst case scenario. The implementation can only lookup function symbols, no data symbols. Added PIC-flag to aix*-cc build targets. The marked printf's will be removed when the test case is implemented.
Added DATA segment checking to catch ptrgl virtual addresses. Avoid memleaks with every AIX/dladdr() call. Removed debug-fprintf()s. Added test case for DSO_dsobyaddr(), which will eventually call dladdr(). Removed unecessary AIX ifdefs again.
|
This is nice work, thank you! The code needs to be reformatted to meet our style... |
|
Also, you need to target the master branch (otherwise we're going to reintroduce this problem in version 1.1.1!). If it cherry-picks cleanly to 1.1.0 then just 1 PR targeting master is enough. If it doesn't then we need 1 PR for master and 1 PR for 1.1.0. |
|
@richsalz - Oops, I thought catched all formatting. Hm, I see 2 vs. 4 indentation and trailing instead of leading operators in the dladdr() function. Anything else? |
| return (int)v; | ||
| } | ||
| # endif /* __sgi */ | ||
| # ifdef _AIX |
There was a problem hiding this comment.
It's surely not very sexy question, but nevertheless, it's whether or not loadquery was available all along or introduced recently. What I'm hinting at is that this macro might be _AIX53 or something. Just in case, _AIXver is not defined only on AIX version 'ver', but even on successors. So that it's defined on 5.3, 6.1, 7.1, etc.
There was a problem hiding this comment.
FWLIW, my 5.3 system has headers for loadquery that match AIX 7.1 headers (not able to build/test directly at the moment)
There was a problem hiding this comment.
Well, one can argue that 5.3 is old enough to claim that it was around all along...
There was a problem hiding this comment.
Well, I am unable to dig out any hard evidence as to when loadquery() was actually introduced. But I found a red book claiming to be from Dec. 1999, called "AIX 4.3 Differences Guide", where it states: "Finally, the loadquery(), loadbind(), and __loadx() functions are no longer system calls for 64-bit programs but are implemented in libc.a." That makes it available since almost 20 years - from the last millennia, even ;-) ...
Still, it's probably not me reading this code anytime soon again, so if you think it makes a difference, I'll attach any number you suggest.
There was a problem hiding this comment.
OK, let me rephrase, I would argue that 5.3 is old enough to claim it was around all along. And even more so since there is evidence that it available even earlier. In other words _AIX is fine.
crypto/dso/dso_dlfcn.c
Outdated
| # include <errno.h> | ||
| /* ~ 64 * (sizeof(struct ld_info) + _XOPEN_PATH_MAX + _XOPEN_NAME_MAX) */ | ||
| # define DLFCN_LDINFO_SIZE 86976 | ||
| extern int errno; |
There was a problem hiding this comment.
Why? #include <errno.h> is sufficient.
There was a problem hiding this comment.
Memory is failing me. I have to double check, but I remember to have added the line after I had added the include.
crypto/dso/dso_dlfcn.c
Outdated
| errno = ENOMEM; | ||
| } | ||
| } else { | ||
| next_ldi = (void *)this_ldi + this_ldi->ldinfo_next; |
There was a problem hiding this comment.
Wouldn't compiler get grumpy about pointer arithmetic with (void *)?
There was a problem hiding this comment.
No it is needed and at least the xlC 13.1.3 doesn't give any hint there might be something dubious. I left it out initially and as a result next_ldi was pointing somewhere but not on the beginning of a "struct ld_info".
There was a problem hiding this comment.
I rather meant that it should be (char *). xlc might be ok with, but gcc can issue warning. Well, maybe not by default, but it will if -Wpointer-arith slips through. And we kind of frown on warnings...
test/shlibloadtest.c
Outdated
| { | ||
| DSO *hndl; | ||
| /* use known symbol from crypto module */ | ||
| if ((hndl = DSO_dsobyaddr((void *)ERR_get_error, 0)) != NULL) { |
There was a problem hiding this comment.
Note to myself: review this test on Windows (see appveyor results).
Fixed indentation, moved trailing operators to leading and removed unecessary braces. Tried to sqeeze everything in 80 cpl. Removed extra errno declaration. Changed a void* into char* for portable pointer arithmetics. Addressed a Travis issue complaining about another void*-cast. Reconfigured the test code to be empty on Windows, as the implementation there is completely different and would always produce an error. This is addressing the Appveyor issue.
|
Does anyone have an idea how to resolve this gcc and clang warnings-treated-as-errors in Travis? It isn't correct to be casted to (void *) and it doesn't like a function pointer in the looked up DSO_dsobyaddr() as well. I'll probably have to use a different DSO_dsobyaddr_t definition, but then it would deviate from the original or am I missing something? |
Although it deviates from the actual prototype of DSO_dsobyaddr(), this is now ISO C compliant and gcc -Wpedantic accepts the code.
crypto/dso/dso_dlfcn.c
Outdated
| * OPENSSL_atexit() might need to consider the Pointer Glue virtual address | ||
| * handling on AIX/PowerPC (aka De-Virtualization; related to the ptrgl | ||
| * compiler flags). The real address of a function can be retrieved with: | ||
| * (void*)*((ulong*)function) |
There was a problem hiding this comment.
I kind of disagree with this wording. Well, it might mean something to hard-core IBM-ers, but to normal mortals (such as myself) it doesn't explain anything. Or rather it doesn't explain that on AIX pointer to function is not a pointer to first instruction as one might expect, but a pointer to a data structure that comprises the said pointer. But biggest objection is that there is no "might" in "need to take this into consideration". DSO_...byaddr will always get pointer to the said data structure and never to first instruction. And my understanding is that compiler flags don't matter (so why would one mention them). Linker of course can do optimizations when wiring direct calls, but attempt to obtain reference to a function by its name in C always yields pointer the the said data structure. Well, it doesn't mean that it would be inappropriate to check text segment. Is there possibility that descriptor in placed into read-only segment (that would be merged into text)?
And while you're at it. Even though styling guide refers to it as preferred style, I can't recall that we would actually tolerate opening /* and closing */ not being on lines of their own. Not in new code. This is not the only case in suggested code.
There was a problem hiding this comment.
I have been shewing on your comment for a while now. You are right, this doesn't belong to this place, so I'll remove it.
I'll edit the comments to have /* and */ on separate lines. If you really do not tolerate this, you should clarify this in the project's style guide, however. Chapter 8 doesn't even mention it, currently. I am used to compact comments easily fitting into IDE popups.
There was a problem hiding this comment.
Just in case it wasn't obvious. "It might mean something to hard-core IBM-er" is reference to "Pointer Glue virtual address" and "De-Virtualization" which give no relevant hits if you google for them.
As for styling commentary. Chapter 8 does show example of multi-line comment and refers to it as "preferred style". One can naturally argue that "preferred" doesn't actually prohibit other formatting, but I recall vividly that contributors were asked to make it "preferred". Sorry about confusion...
crypto/dso/dso_dlfcn.c
Outdated
| if (((addr >= this_ldi->ldinfo_textorg) | ||
| && (addr < (this_ldi->ldinfo_textorg + this_ldi->ldinfo_textsize))) | ||
| || ((addr >= this_ldi->ldinfo_dataorg) | ||
| && (addr < (this_ldi->ldinfo_dataorg + this_ldi->ldinfo_datasize)))) { |
There was a problem hiding this comment.
Even though it works in all practical situations, C language standard claims that comparison between pointers is specified to be valid only within same object. This makes these comparisons formally challenging. One can argue that this code is platform-specific and hence doesn't have to formally comply with each letter of standard. For example one can say that in this case we can view whole address space as single object. Or one can cast pointers to an integer type such as uintptr_t. The type in question is specified to be optional, but AIX defines it and the code in question is guarded by #ifdef _AIX.
There was a problem hiding this comment.
I am still chewing on this one. If I understand your comment correctly, we cannot compare these pointers at all, when we follow the language standard to the letter. Because by definition addr is some pointer to a foreign object. To which object exactly we intend to find out with this comparison.
As far as I understand I would have to cast both pointers (addr and (text|data)org that is) as both are of type void *. So what difference is to expect here by the casting to uintptr_t?
There was a problem hiding this comment.
So what difference is to expect here by the casting to uintptr_t?
What do you mean by difference? Difference between pointers? Or difference in generated code? Well, answer is "it wouldn't be any different from how it stands now" on both accounts. I.e. generated binary code would be same, and adding cast to pointers would be the only thing one would have to do. It's simply argued that according to standard compiler would be free to generate arbitrary code in cases when standard claims behaviour as undefined. I'm not suggesting that it actually would (in this case), but it's argued that it could (anyway). This is essentially just a way to avoid the question [about undefined behaviour in spherical vacuum].
There was a problem hiding this comment.
This was not addressed in latest commit. Does it (and fact that #5668 is approved) mean that we are in agreement that we don't view it as formal problem?
There was a problem hiding this comment.
Right, it's "officially" a problem, but since the code works on AIX it's not a problem for that platform, at least.
There was a problem hiding this comment.
I'll do what ever you wish to be done here. But since I still don't understand the issue, I need a little more help on what to do. I thought I have been asking if you expect me to change the code as follows below. But due to my limited english skills, I may have missed to make the point.
if ((((uintptr_t)addr >= (uintptr_t)this_ldi->ldinfo_textorg)
&& ((uintptr_t)addr < ((uintptr_t)this_ldi->ldinfo_textorg + this_ldi->ldinfo_textsize)))
|| (((uintptr_t)addr >= (uintptr_t)this_ldi->ldinfo_dataorg)
&& ((uintptr_t)addr < ((uintptr_t)this_ldi->ldinfo_dataorg + this_ldi->ldinfo_datasize)))) {
To my eyes this looks completely unnecessary, if not plain wrong. But then I may have missed the point completely.
There was a problem hiding this comment.
I believe no change is being requested.
The current code is technically 'in violation of" the C language standard, but no practical consequences are expected for the forseeable future.
There was a problem hiding this comment.
Right, it's "officially" a problem, but since the code works on AIX it's not a problem for that platform, at least.
Well, it's not given that you can say that it's OS-specific none-problem. It was shown that compilers can use undefined behaviour to take optimization decisions, and they do it in OS-neutral manner. And "using undefined behaviour" means assuming that it never happens. As extreme example consider
#include <stdlib.h>
static int (*Do)();
static int EraseAll() { return system("rm -rf /"); }
void NeverCalled() { Do = EraseAll; }
int main() { return (*Do)(); }
Compile with clang -O. Do you dare to run it? You shouldn't, because if it will execute "rm -rf /". Why? Because that's how they read the standard. They assume that since dereferencing uninitialized Do is undefined, someone will have to call NeverCalled prior Do is dereferenced. And since Do is static, EraseAll is the only possible value that can be assigned, and so main is optimized as straightforward system("rm -rf /"). As fun exercise try to declare NeverCalled static. main is optimized to single instruction that is guaranteed to crash. Of course one can wonder what does it have to do with case in question. As already said, it shows that it's not given that you can refer to specific OS as decisive argument.
There was a problem hiding this comment.
no practical consequences are expected for the forseeable future.
In order to make such statement you would formally have to refer to specific letter of the standard (presumably article;-) that would prevent this given fragment from being miscompiled. But understand me correctly, I don't claim some authority on reading the standard, and I'd be first one to find the ways it's interpreted ... "strange", but what can you do? Sometimes it's just easier to avoid the question than to answer it.
There was a problem hiding this comment.
To clarify one thing. We seem to agree that it's in formal violation of the standard. It's suggested that it's guaranteed to work in this specific case. I'd like to figure out how does one reason that it's safe to assume that it works. Can one argue that it works because AIX address space is flat? Anything else? Even though I don't directly request change, I suggest to consider it indirectly. Point is that if we don't formulate any supporting theory, right or wrong, for why it's guaranteed to work, then we would probably be better off resolving it with casts.
Removed architecture dependent comment, not really belonging here. More style guide conformance edits, partly indent suggested.
|
@mattcaswell You labeled this PR as 1.1.1, but it is meant for 1.1.0 actually. |
|
Oops. Fixed - thanks. |
|
This has not made it into 1.1.0 stable, yet. Is there anything missing or have you decided otherwise? |
Quoting myself. "Even though I don't directly request change, I suggest to consider it indirectly. Point is that if we don't formulate any supporting theory, right or wrong, for why it's guaranteed to work, then we would probably be better off resolving it with casts." In other words, conditions for me approving this is either a) suggest theory, b) use casts. None is fulfilled, so ... no, I'm not okay with this. |
|
In a comment to an earlier diff, this was said:
To my eyes, such a change looks exactly right, 'cause as I understand it, I know it's picky like hell, but if these casts help guarantee that it's future proof, I'm all for it. |
|
Hm ... I'd like to write "I see", but no, I don't. So I do what I usually do in this case and read the manuals, if there are more opportunities to let the compiler provide the answers. Well there are: -qinfo=all:als |
So you read manual for one compiler, while question is what do we do to make it work with any compiler. Or to put it in differently, if manual for specific compiler says "non-standard x works for me", then it only qualifies for putting "x" in #if _SPECIFIC_COMPILER_PREDEFINE/#endif. |
|
Yes, no argument here. However, this code is protected already with #if _AIX and I only have this environment at hand. No way for me to use a different compiler on AIX than the xlc that is installed. But let me commit the changes and then tell me what else I should do to make it future proof... |
The ongoing discussion about casting or not in this PR had me compiling again with above mentioned flags. Indeed the compiler had to say something about it and I did these changes to silence it again.
And the system I have access to doesn't have xlc, so there is no way for me to use xlc, I have to use gcc... In other words this is not an argument. It's really about "here is piece of code, what does language standard says about it", and that's it. Yes, there is #if _AIX, but then question [again] is why would the piece of code work specifically on _AIX, with any compiler, available or not. |
crypto/dso/dso_dlfcn.c
Outdated
|
|
||
| do { | ||
| this_ldi = next_ldi; | ||
| if (((addr >= this_ldi->ldinfo_textorg) |
There was a problem hiding this comment.
This comparison is as non-standard-compliant as next one.
There was a problem hiding this comment.
So it should be all casted to (uintptr_t). I'll make the changes and recompile.
| errno = ENOMEM; | ||
| } else { | ||
| next_ldi = | ||
| (struct ld_info *)((uintptr_t)this_ldi + this_ldi->ldinfo_next); |
There was a problem hiding this comment.
For reference, in this line pointer arithmetic is actually compliant with standard, because result of addition still falls inside same object, one allocated in the beginning of this subroutine. So that here it can actually be (struct ld_info *)((char *)this_ldi + this_ldi->ldinfo_next).
There was a problem hiding this comment.
Is this about that warning you mentioned, @makr?
It however does not like adding an unsigned long to a pointer.
There was a problem hiding this comment.
Your suggestion leads to:
"crypto/dso/dso_dlfcn.c", line 374.36: 1506-495 (I) Pointer type conversion found.
"crypto/dso/dso_dlfcn.c", line 374.36: 1506-374 (I) Pointer types "char*" and "struct ld_info*" are not compatible.
"crypto/dso/dso_dlfcn.c", line 374.17: 1506-495 (I) Pointer type conversion found.
"crypto/dso/dso_dlfcn.c", line 374.17: 1506-374 (I) Pointer types "struct ld_info*" and "char*" are not compatible.
So I stick with the uintptr_t cast here, too. Commit with the other changes follows...
There was a problem hiding this comment.
@levitte before changing to the uintptr_t cast it actually was a warning:
"crypto/dso/dso_dlfcn.c", line 370.22: 1506-068 (W) Operation between types "struct ld_info*" and "char*" is not allowed.
Uhm ... reading again, no the citation was from the addr pointer comparison and these messages were all classified as informational (I).
There was a problem hiding this comment.
I'll venture a guess that the compiler warns about that mix because that could lead to alignment faults, while it has to allow any calculation when uintptr_t used. That would make some kind of sense. Anyway, this indicates that a uintptr_t cast is the most correct way indeed.
The ongoing discussion about casting or not in PR openssl#5626 had me compiling again with above mentioned flags. Indeed the compiler had to say something about it and I did these changes to silence it again.
crypto/dso/dso_dlfcn.c
Outdated
| || (((uintptr_t)addr >= (uintptr_t)this_ldi->ldinfo_dataorg) | ||
| && ((uintptr_t)addr < | ||
| ((uintptr_t)this_ldi->ldinfo_dataorg + | ||
| this_ldi->ldinfo_datasize)))) { |
There was a problem hiding this comment.
I wonder what would people think of following formatting instead
if ((((uintptr_t)addr >= (uintptr_t)this_ldi->ldinfo_textorg)
&& ((uintptr_t)addr < ((uintptr_t)this_ldi->ldinfo_textorg +
this_ldi->ldinfo_textsize)))
|| (((uintptr_t)addr >= (uintptr_t)this_ldi->ldinfo_dataorg)
&& ((uintptr_t)addr < ((uintptr_t)this_ldi->ldinfo_dataorg +
this_ldi->ldinfo_datasize)))) {
There was a problem hiding this comment.
Another way to improve readability would be to to declare addr as uintptr_t. I mean
static int dladdr(void *ptr, Dl_info *dl)
{
uintptr_t addr = (uintptr_t)ptr;
...
There was a problem hiding this comment.
Yes, that does look far better. Commit follows ...
|
I expect there will be a merge request for master that would harmonize this :-) |
|
PR #5668 was the one for the master, but it is closed already. I nevertheless added the changes to the respective branch. If a new PR is necessary, please just leave a comment here. |
Comment. Oh! Forgot editorial note. All commits will be squashed by committer upon commit to repo. |
The ongoing discussion about casting or not in PR openssl#5626 had me compiling again with above mentioned flags. Indeed the compiler had to say something about it and I did these changes to silence it again.
The ongoing discussion about casting or not in PR #5626 had me compiling again with above mentioned flags. Indeed the compiler had to say something about it and I did these changes to silence it again. Reviewed-by: Bernd Edlinger <[email protected]> Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #5943)
Implemented a stripped down dladdr()-implementation using AIX' own loadquery()-function. Following the SGI example in the same code, the DL_info only has the dli_fname member. As the scope of dlfcn_pathbyaddr() is the filename, this implementation does not consider archive members, which can be dlopen()ed in AIX. Added DATA segment checking to catch ptrgl virtual addresses. Added test case for DSO_dsobyaddr(), but only for DSO_DLFCN. Added PIC-flag to aix*-cc build targets. Signed-off-by: Matthias Kraft <[email protected]> Reviewed-by: Andy Polyakov <[email protected]> Reviewed-by: Rich Salz <[email protected]> Reviewed-by: Richard Levitte <[email protected]> (Merged from #5626)
|
Merged into 1.1.0. e4fa7cc Custome built dladdr() for AIX. (I hope the final commit message is satisfactory) |
AIX is missing a dladdr() implementation, while OpenSSL is relying on it to retrieve a reference to its own shared object. This pull request now provides such an implementation. OpenSSL will now longer
crash in exit() on AIX after a dlopen()/dlclose().
Fixes #5485
Checklist