Allow linking empty .cmxa files on MSVC#9011
Conversation
MSVC .lib format doesn't support having no .obj files in the library, so ocamlopt -o foo.cmxa -a generates foo.cmxa but not foo.lib (there's no error from the Microsoft Linker). The resulting foo.cmxa is unlinkable, since OCaml passes foo.lib on to the linker. This patch relaxes the requirement for foo.lib if the .cmxa contains no units.
|
I intentionally haven't done anything to stop the linker being called when the .cmxa is created - at present, |
xavierleroy
left a comment
There was a problem hiding this comment.
There is something I don't understand with list reversals. Plus some suggestions below.
asmcomp/asmlink.ml
Outdated
| end else | ||
| reqd) | ||
| infos.lib_units tolink | ||
| in |
There was a problem hiding this comment.
For symmetry I suggest
let tolink = ...
and objfiles_rev = ...
in (tolink, objfiles_rev)
The two computations are independent, as far as I can see.
asmcomp/asmlink.ml
Outdated
| if Config.ccomp_type = "msvc" | ||
| && infos.lib_units = [] | ||
| && not (Sys.file_exists (object_file_name obj_name)) then |
There was a problem hiding this comment.
It would make sense to just test infos.lib_units = []. Even with non-MSVC toolchains, it makes sense to not pass an empty library to the linker, resulting in a shorter command line and less work for the linker.
As to testing that the file doesn't exist, this is to be commended as very prudent, but 1- it causes one extra system call per file to link, and 2- I can't imagine a scenario where object files would be added by hand to the library file that goes with an OCaml .cmxa file.
There was a problem hiding this comment.
I have no problem with doing that - the test I've put is just belt-and-braces to fix exactly the bug. The the .a file may have been subsequently altered (e.g. to add stub objects directly) - I've no knowledge of somewhere doing this, but do we definitely want to break some undefined behaviour while fixing this particular bug?
I did also consider whether it would be better to add a field to the .cmxa format which explicitly specifies the object to link - so on MSVC for an empty library, this field would explicitly be blank. In principle, .cmx format could do the same thing, although a .o and .obj file is always associated, even for an empty .ml file.
There was a problem hiding this comment.
I (potentially temporarily) left this as-is. This change alters this PR from a bugfix to a potentially breaking change (albeit by using largely unspecified behaviour of the linker).
There was a problem hiding this comment.
Am I reading right that the third part of the test is here to protect users that were creating a .lib file by hand behind the back on the compiler on msvc? Do we really intend to support this use case?
There was a problem hiding this comment.
@Octachron - it's more that I don't wish to intentionally break it in this PR. I believe there are some Mirage libraries which do this (although none which would hit this explicit case - i.e. an empty library with objects being added)
|
Thanks, @xavierleroy! I'll push fixes later - what do you think of shifting the fix into .cmxa format? |
I don't feel it would be a significant improvement. But it would be a bigger change. I like small changes :-) |
|
Code corrected (I hope). |
b23aada to
cef6c3d
Compare
|
No - the problem is not just that an empty |
cef6c3d to
21d4036
Compare
21d4036 to
cfa0f06
Compare
There is a way to simplify both fixes at the same time:
That would be nicer than the current code + the current PR. |
|
@xavierleroy - I'm puzzled, this PR does exactly your second point, with the only difference that the test is made specific to MSVC, doesn't it? (see #9011 (comment) for why it's so specific). I don't at the moment see what I'm to change in this PR. For #1094, we'd just be simplifying the three calls in However, a null Regardless, this PR should have at least one test, which I'm just doing... |
Yes. But in order to simplify the macOS situation as well, linking of empty libraries should also be skipped on macOS. At this point, let's just skip it on all platforms. What I'm aiming at is a platform-independent solution to both the MSVC and the macOS issues. Wouldn't this be nice? |
|
So I can simply remove the It turns out an empty .a is corrupt too: |
|
OK, I think I've got hung on up on "simpler", but I think I follow where you're coming from. I certainly don't think we should be creating invalid files (so no zero-length files and no using other ways to create .lib files with no members). MSVC forces us (and so build systems) to consider that |
|
I have verified that the test fails on trunk built with msvc64 |
|
Not creating .a/.lib for empty libs seems natural enough, doesn't this complicate the life for build systems, who typically consider those .a/.lib files as dependencies for executables that link the libraries (and rightly so)? |
|
@alainfrisch - it does complicate it, yes, but if that build system wishes to support MSVC properly then it must complicate it. That said, a simple build system need only worry about the .cmxa, since the only time a simple build system has to interact with the .a/.lib file is to install it. |
e9cd7ef to
6b0f113
Compare
Well, another option would be to make sure the .a/.lib is not really empty, by adding a dummy (but valid) .o/.obj file to it.
I don't understand. If you define a rule to build an executable from a set of modules and libraries, the build system has to be aware of which files are used to build the resulting executable, which include the .a/.lib file for each library, no? |
|
There's no need for a time-stamp based build system compiling an OCaml program ever to worry about My strong opinions on this are that we should allow empty .cmxa files and we should not generate invalid files in other formats (so no zero-byte .a or .lib files). I don't have a strong opinion as to which of the two possible ways to fix it should be done. The primary user of this feature is stdlib-shims, and Dune is already supports this (it was part of Dune 2.0). I lean more towards fixing a very small number of build systems once for something which should then never change vs a slightly dirtier hack (what you propose is basically what the macOS workaround does) which if it does affect a package will affect it in a weird way (conflict with the dummy name, or anything like that) |
|
The problem is not with the build system but with the "install" Makefile entries, |
|
Sure, but that's either one of two easy fixes in a |
I was thinking about checksum based build systems indeed. The build system might not know whether a library is empty or not (in particular is not under the control of the same build system), and anyway, it might be non-trivial logic to add in order to support this case. |
6b0f113 to
eab4c4f
Compare
|
Rebased onto latest trunk. I'm indescribably keen for the actual bugfix in this PR to be in 4.11 (that's the ability to run The final commit goes further and alters the behaviour of |
xavierleroy
left a comment
There was a problem hiding this comment.
I'm pretty sure the files are now in the correct order, and I agree that not giving the linker archive files that were previously empty is a good thing to do.
I'm still a little worried about installation rules (in Makefiles or elsewhere) that systematically install a .a/.obj file for every library. These will break, but perhaps the "empty library" case occurs rarely enough that we don't care. I'm curious to know what Dune would do in this case, for example.
eab4c4f to
e6ab329
Compare
|
Dune's been fixed since 2.0.0 - see ocaml/dune#2829 |
|
Thanks @xavierleroy, I'll merge and cherry-pick once CI replies |
|
After a discussion with @Octachron, the plan is to merge this PR but only cherry-pick the first (bug-fix) commit to 4.11. The aforementioned Dune PR was written with the assumption that it was MSVC only, so it doesn't like Unix compilers not producing So: 4.11.0 will be able to build stdlib-shims correctly on msvc and msvc64 and 4.12.0 gains the cleaned-up and simplified linker code. |
Allow linking empty .cmxa files on MSVC (cherry picked from commit 7b6098a)
An analysis of the code should convince yourself that this function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_infos] which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This first commit removes use of the function in [scan_file] which was introduced by PR ocaml#9011. The result of [object_file_name] here is equal to [file_name] (follow [read_file]).
This function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.
This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.
An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.
This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.
An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.
This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.
…rt upstream PR#9943) (ocaml#557) * Asmlink.object_file_name removal: add object_file_name_of_file An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here. * Asmlink.object_file_name removal: remove. This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used. Co-authored-by: Daniel Bünzli <[email protected]>
…s) in asmlink (port upstream PR#9943) (ocaml#557) * Asmlink.object_file_name removal: add object_file_name_of_file An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here. * Asmlink.object_file_name removal: remove. This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used. Co-authored-by: Daniel Bünzli <[email protected]>
MSVC .lib format doesn't support having no
.objfiles in the library, soocamlopt -o foo.cmxa -ageneratesfoo.cmxabut notfoo.lib(there's no error from the Microsoft Linker). The resultingfoo.cmxais un-linkable, since OCaml passesfoo.libon to the linker. This doesn't affect cc-based toolchains sincearrather more logically allows empty files!This patch relaxes the requirement for
foo.libiffoo.cmxacontains no units.This is part of a fix for ocaml/stdlib-shims#11
cc @db4