Allow linking empty .cmxa files on MSVC by dra27 · Pull Request #9011 · ocaml/ocaml

dra27 · 2019-10-03T09:56:55Z

MSVC .lib format doesn't support having no .obj files in the library, so ocamlopt -o foo.cmxa -a generates foo.cmxa but not foo.lib (there's no error from the Microsoft Linker). The resulting foo.cmxa is un-linkable, since OCaml passes foo.lib on to the linker. This doesn't affect cc-based toolchains since ar rather more logically allows empty files!

This patch relaxes the requirement for foo.lib if foo.cmxa contains no units.

This is part of a fix for ocaml/stdlib-shims#11

cc @db4

MSVC .lib format doesn't support having no .obj files in the library, so ocamlopt -o foo.cmxa -a generates foo.cmxa but not foo.lib (there's no error from the Microsoft Linker). The resulting foo.cmxa is unlinkable, since OCaml passes foo.lib on to the linker. This patch relaxes the requirement for foo.lib if the .cmxa contains no units.

dra27 · 2019-10-03T10:02:05Z

I intentionally haven't done anything to stop the linker being called when the .cmxa is created - at present, lib silently does nothing. If in future Microsoft were to make it an error (unlikely) then that'd need patching, but if the less unlikely change of supporting empty .lib files were added, then it would magically work (and the existing code would be there for compatibility with older releases)

xavierleroy

There is something I don't understand with list reversals. Plus some suggestions below.

asmcomp/asmlink.ml

xavierleroy · 2019-10-06T09:34:04Z

asmcomp/asmlink.ml

+             end else
+               reqd)
+          infos.lib_units tolink
+      in


For symmetry I suggest

let tolink = ... and objfiles_rev = ... in (tolink, objfiles_rev)

The two computations are independent, as far as I can see.

xavierleroy · 2019-10-06T09:38:19Z

asmcomp/asmlink.ml

+        if Config.ccomp_type = "msvc"
+        && infos.lib_units = []
+        && not (Sys.file_exists (object_file_name obj_name)) then


It would make sense to just test infos.lib_units = []. Even with non-MSVC toolchains, it makes sense to not pass an empty library to the linker, resulting in a shorter command line and less work for the linker.

As to testing that the file doesn't exist, this is to be commended as very prudent, but 1- it causes one extra system call per file to link, and 2- I can't imagine a scenario where object files would be added by hand to the library file that goes with an OCaml .cmxa file.

I have no problem with doing that - the test I've put is just belt-and-braces to fix exactly the bug. The the .a file may have been subsequently altered (e.g. to add stub objects directly) - I've no knowledge of somewhere doing this, but do we definitely want to break some undefined behaviour while fixing this particular bug?

I did also consider whether it would be better to add a field to the .cmxa format which explicitly specifies the object to link - so on MSVC for an empty library, this field would explicitly be blank. In principle, .cmx format could do the same thing, although a .o and .obj file is always associated, even for an empty .ml file.

I (potentially temporarily) left this as-is. This change alters this PR from a bugfix to a potentially breaking change (albeit by using largely unspecified behaviour of the linker).

Am I reading right that the third part of the test is here to protect users that were creating a .lib file by hand behind the back on the compiler on msvc? Do we really intend to support this use case?

@Octachron - it's more that I don't wish to intentionally break it in this PR. I believe there are some Mirage libraries which do this (although none which would hit this explicit case - i.e. an empty library with objects being added)

asmcomp/asmlink.ml

dra27 · 2019-10-06T10:21:23Z

Thanks, @xavierleroy! I'll push fixes later - what do you think of shifting the fix into .cmxa format?

xavierleroy · 2019-10-06T12:56:40Z

what do you think of shifting the fix into .cmxa format?

I don't feel it would be a significant improvement. But it would be a bigger change. I like small changes :-)

dra27 · 2019-10-10T09:21:26Z

Code corrected (I hope).

damiendoligez · 2019-10-15T12:02:46Z

Note that there is a similar problem on Macos (#6550, #1094) which was fixed in a completely different way. Is there a way to merge the two fixes?

dra27 · 2019-10-15T12:05:12Z

No - the problem is not just that an empty .lib file isn't generated, it's also that the Microsoft Linker returns an error if you give it a .lib file containing no units (it's an error to try to remove the last one using the LIB tool as well)

xavierleroy · 2020-04-22T08:06:58Z

Note that there is a similar problem on Macos (#6550, #1094) which was fixed in a completely different way. Is there a way to merge the two fixes?

There is a way to simplify both fixes at the same time:

At creation time, if the library has zero members, just create an empty .a/.lib file (close_out (open_out f)), don't call ar/lib at all.
At link time, if the library has zero members, do not pass its .a/.lib file to the C linker.

That would be nicer than the current code + the current PR.

dra27 · 2020-04-22T08:37:24Z

@xavierleroy - I'm puzzled, this PR does exactly your second point, with the only difference that the test is made specific to MSVC, doesn't it? (see #9011 (comment) for why it's so specific). I don't at the moment see what I'm to change in this PR.

For #1094, we'd just be simplifying the three calls in Ccomp.macos_create_empty_archive.

However, a null foo.lib is a corrupt file - I'm not comfortable with the idea that we intentionally start generating files which the toolchain regards as corrupt, however weird the requirement of the toolchain may be. #1094 works around a bug where a toolchain doesn't generate a file it should be, which seems different.

Regardless, this PR should have at least one test, which I'm just doing...

xavierleroy · 2020-04-22T08:46:52Z

I'm puzzled, this PR does exactly your second point, with the only difference that the test is made specific to MSVC, doesn't it?

Yes. But in order to simplify the macOS situation as well, linking of empty libraries should also be skipped on macOS. At this point, let's just skip it on all platforms.

What I'm aiming at is a platform-independent solution to both the MSVC and the macOS issues. Wouldn't this be nice?

dra27 · 2020-04-22T11:17:55Z

So I can simply remove the Config.ccomp_type = "msvc" in the test in asmlink L207 above?

It turns out an empty .a is corrupt too:

dra27@summer:~$ touch foo.a ; ar t foo.a
ar: foo.a: File format not recognised
dra27@summer:~$ ar rc bar.a ; ar t bar.a
dra27@summer:~$ cat bar.a
!<arch>

dra27 · 2020-04-22T12:46:25Z

OK, I think I've got hung on up on "simpler", but I think I follow where you're coming from. I certainly don't think we should be creating invalid files (so no zero-length files and no using other ways to create .lib files with no members).

MSVC forces us (and so build systems) to consider that foo.cmxa might not have foo.a/foo.lib so if we have MSVC not permitting empty archives and macOS unable to create them without trickery, then we could be entire platform independent - as you suggest - by not calling the archiver when there are no modules including on Linux.

dra27 · 2020-04-22T13:18:51Z

I have verified that the test fails on trunk built with msvc64

alainfrisch · 2020-04-22T13:20:45Z

Not creating .a/.lib for empty libs seems natural enough, doesn't this complicate the life for build systems, who typically consider those .a/.lib files as dependencies for executables that link the libraries (and rightly so)?

dra27 · 2020-04-22T13:48:27Z

@alainfrisch - it does complicate it, yes, but if that build system wishes to support MSVC properly then it must complicate it. That said, a simple build system need only worry about the .cmxa, since the only time a simple build system has to interact with the .a/.lib file is to install it.

alainfrisch · 2020-04-22T14:01:41Z

it does complicate it, yes, but if that build system wishes to support MSVC properly then it must complicate it.

Well, another option would be to make sure the .a/.lib is not really empty, by adding a dummy (but valid) .o/.obj file to it.

That said, a simple build system need only worry about the .cmxa, since the only time a simple build system has to interact with the .a/.lib file is to install it.

I don't understand. If you define a rule to build an executable from a set of modules and libraries, the build system has to be aware of which files are used to build the resulting executable, which include the .a/.lib file for each library, no?

dra27 · 2020-04-22T14:38:44Z

There's no need for a time-stamp based build system compiling an OCaml program ever to worry about .a and .o dependencies. Depending on .cmx and .cmxa is sufficient and ocamlopt hides all the details (cf. the venerable OCamlMakefile - executables depend on the .cmxa, it's only the install rules which pull in .$(EXT_LIB) files). Checksum based build systems are not simple 😉

My strong opinions on this are that we should allow empty .cmxa files and we should not generate invalid files in other formats (so no zero-byte .a or .lib files). I don't have a strong opinion as to which of the two possible ways to fix it should be done. The primary user of this feature is stdlib-shims, and Dune is already supports this (it was part of Dune 2.0). I lean more towards fixing a very small number of build systems once for something which should then never change vs a slightly dirtier hack (what you propose is basically what the macOS workaround does) which if it does affect a package will affect it in a weird way (conflict with the dummy name, or anything like that)

xavierleroy · 2020-04-22T14:41:00Z

The problem is not with the build system but with the "install" Makefile entries, ocamlfind install, and the like. My suggestion to produce empty files with .a/.lib suffixes was so that these installers have something to copy. Of course no linker will ever be given those fake .a/.lib files as argument.

dra27 · 2020-04-22T14:54:15Z

Sure, but that's either one of two easy fixes in a Makefile, or an easy enough fix in ocamlfind for to understand that if foo.a can't be found and foo.cmxa is being installed then it's OK (and ocamlfind usually requires updating with every release of OCaml anyway). Is it just me who finds the idea of intentionally generating and installing a corrupt file distasteful? The Debian guys don't even like when we put valid extra bits in other people's file formats!

alainfrisch · 2020-04-22T15:31:23Z

Checksum based build systems are not simple 😉

I was thinking about checksum based build systems indeed. The build system might not know whether a library is empty or not (in particular is not under the control of the same build system), and anyway, it might be non-trivial logic to add in order to support this case.

dra27 · 2020-06-23T16:02:00Z

Rebased onto latest trunk. I'm indescribably keen for the actual bugfix in this PR to be in 4.11 (that's the ability to run echo 'print_endline "Hello, empty world!"' > bar.ml ; ocamlopt -a -o foo.cmxa ; ocamlopt -o bar foo.cmxa bar.ml on msvc64 without error which is what the first commit fixes (and the second commit adds a test for).

The final commit goes further and alters the behaviour of ocamlopt -a -o foo.cmxa so that foo.a is not generated either. This latter part seems to have stalled the PR. Please can I either have approval to merge this PR as-is, or split it into two, with the first part being approved soon?

xavierleroy

I'm pretty sure the files are now in the correct order, and I agree that not giving the linker archive files that were previously empty is a good thing to do.

I'm still a little worried about installation rules (in Makefiles or elsewhere) that systematically install a .a/.obj file for every library. These will break, but perhaps the "empty library" case occurs rarely enough that we don't care. I'm curious to know what Dune would do in this case, for example.

asmcomp/asmlink.ml

dra27 · 2020-06-23T19:09:16Z

Dune's been fixed since 2.0.0 - see ocaml/dune#2829

dra27 · 2020-06-23T19:09:33Z

Thanks @xavierleroy, I'll merge and cherry-pick once CI replies

dra27 · 2020-06-24T10:06:34Z

After a discussion with @Octachron, the plan is to merge this PR but only cherry-pick the first (bug-fix) commit to 4.11. The aforementioned Dune PR was written with the assumption that it was MSVC only, so it doesn't like Unix compilers not producing .a files. The fix for Dune is trivial (and I'm about to open the PR for it), but we think it's a little late in the release cycle for that change in 4.11.

So: 4.11.0 will be able to build stdlib-shims correctly on msvc and msvc64 and 4.12.0 gains the cleaned-up and simplified linker code.

Allow linking empty .cmxa files on MSVC (cherry picked from commit 7b6098a)

An analysis of the code should convince yourself that this function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_infos] which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This first commit removes use of the function in [scan_file] which was introduced by PR ocaml#9011. The result of [object_file_name] here is equal to [file_name] (follow [read_file]).

This function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.

This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.

An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.

This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.

An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here.

This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used.

…rt upstream PR#9943) (ocaml#557) * Asmlink.object_file_name removal: add object_file_name_of_file An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here. * Asmlink.object_file_name removal: remove. This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used. Co-authored-by: Daniel Bünzli <[email protected]>

…s) in asmlink (port upstream PR#9943) (ocaml#557) * Asmlink.object_file_name removal: add object_file_name_of_file An analysis of the code should convince yourself that the [object_file_name] function is not needed and that its erroring paths are dead code. The reasoning is the following: all the uses of [object_file_name] in asmlink.ml are performed on [obj_name]s which went through [read_file] before. The latter does exactly the same file name lookup and erroring treatement, except not in a stringly manner like [object_file_name] does. This new function derives the same information as [object_file_name] except it does it on the [file] datatype returned by [read_file]. Note that all the erroring code paths of [object_file_name] have been handled by the [read_file] which derived the [file] value. We integrate the logic added by PR ocaml#9011 for empty cmxa here. * Asmlink.object_file_name removal: remove. This commit does the following four things (it's difficult to them in separate commits that compile). 1) It removes the [read_file] from [scan_file]. Reading the files is done seperately before returning an [obj_infos] list of [file] values. This turns [scan_file] into a function that operates on values of type [file]. 2) In [scan_file] it removes the separate list of [obj_files] introduced by ocaml#9011. We can derive the same list using the function [object_file_name_of_file] introduced in the previous commit on the list of [obj_infos]. Effectively we bring back [scan_file] to the state before ocaml#9011 modulo the [read_file] removal. 3) We derive the list of [obj_files] directly from the [obj_infos] list via [object_file_name_of_file]. Note that the logic introduced by ocaml#9011 is preserved by virtue of [object_file_name_of_file]'s optional result. 4) Deletes [object_file_name] which is no longer used. Co-authored-by: Daniel Bünzli <[email protected]>

dra27 requested a review from a user October 3, 2019 09:56

This was referenced Oct 3, 2019

Fix native code generation of empty libraries on MSVC ocaml/dune#2693

Closed

Fix MSVC build for 4.07-4.09 ocaml/stdlib-shims#12

Merged

xavierleroy requested changes Oct 6, 2019

View reviewed changes

dra27 force-pushed the fix-msvc-empty-cmxa branch from b23aada to cef6c3d Compare October 15, 2019 09:05

dra27 added this to the 4.10 milestone Oct 15, 2019

damiendoligez added the bug label Oct 15, 2019

dra27 force-pushed the fix-msvc-empty-cmxa branch from cef6c3d to 21d4036 Compare October 19, 2019 15:26

damiendoligez modified the milestones: 4.10, 4.11 Feb 20, 2020

dra27 force-pushed the fix-msvc-empty-cmxa branch from 21d4036 to cfa0f06 Compare April 16, 2020 16:50

Test for linking archives with no modules

d732593

Don't call the archiver/librarian for empty .cmxa

e6ab329

dra27 force-pushed the fix-msvc-empty-cmxa branch from e9cd7ef to 6b0f113 Compare April 22, 2020 13:56

dra27 force-pushed the fix-msvc-empty-cmxa branch from 6b0f113 to eab4c4f Compare June 23, 2020 15:55

xavierleroy approved these changes Jun 23, 2020

View reviewed changes

asmcomp/asmlink.ml Outdated Show resolved Hide resolved

dra27 force-pushed the fix-msvc-empty-cmxa branch from eab4c4f to e6ab329 Compare June 23, 2020 19:06

dra27 merged commit 7b6098a into ocaml:trunk Jun 24, 2020

dra27 added a commit that referenced this pull request Jun 24, 2020

Merge pull request #9011 from dra27/fix-msvc-empty-cmxa

aae783b

Allow linking empty .cmxa files on MSVC (cherry picked from commit 7b6098a)

dra27 added a commit that referenced this pull request Jun 24, 2020

Adjust Changes for #9011

9be2c01

dra27 deleted the fix-msvc-empty-cmxa branch June 24, 2020 10:16

dra27 mentioned this pull request Jun 24, 2020

In OCaml 4.12.0, empty archives no longer generate .a files ocaml/dune#3576

Merged

dbuenzli mentioned this pull request Sep 26, 2020

Simplify processing and remove dead code (error paths) in asmlink #9943

Merged

dra27 mentioned this pull request Jan 15, 2022

opam switches 4.12.0 and up fail to produce stdlib_shims.a ocaml/stdlib-shims#20

Closed

gasche mentioned this pull request Jul 25, 2022

Installation fails with ar: no archive members specifiedon MacOS with OCaml 4.04.2 ocaml/camlp-streams#7

Open

dra27 mentioned this pull request Mar 29, 2025

Tweak the reproducibility of cmt{,i} format #13828

Merged

Conversation

dra27 commented Oct 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dra27 commented Oct 3, 2019

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xavierleroy Oct 6, 2019

Choose a reason for hiding this comment

Uh oh!

xavierleroy Oct 6, 2019

Choose a reason for hiding this comment

Uh oh!

dra27 Oct 6, 2019

Choose a reason for hiding this comment

Uh oh!

dra27 Oct 10, 2019

Choose a reason for hiding this comment

Uh oh!

Octachron Jan 23, 2020

Choose a reason for hiding this comment

Uh oh!

dra27 Apr 21, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dra27 commented Oct 6, 2019

Uh oh!

xavierleroy commented Oct 6, 2019

Uh oh!

dra27 commented Oct 10, 2019

Uh oh!

damiendoligez commented Oct 15, 2019

Uh oh!

dra27 commented Oct 15, 2019

Uh oh!

xavierleroy commented Apr 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

xavierleroy commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

alainfrisch commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

alainfrisch commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

xavierleroy commented Apr 22, 2020

Uh oh!

dra27 commented Apr 22, 2020

Uh oh!

alainfrisch commented Apr 22, 2020

Uh oh!

dra27 commented Jun 23, 2020

Uh oh!

xavierleroy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dra27 commented Jun 23, 2020

Uh oh!

dra27 commented Jun 23, 2020

Uh oh!

dra27 commented Jun 24, 2020

Uh oh!

Reviewers

dra27 commented Oct 3, 2019 •

edited

Loading

xavierleroy commented Apr 22, 2020 •

edited

Loading