Skip to content

Asm_directives: new abstraction layer for assembly directive emission#756

Closed
mshinwell wants to merge 173 commits intoocaml:trunkfrom
mshinwell:asm_directives
Closed

Asm_directives: new abstraction layer for assembly directive emission#756
mshinwell wants to merge 173 commits intoocaml:trunkfrom
mshinwell:asm_directives

Conversation

@mshinwell
Copy link
Contributor

Emission of DWARF debugging information necessitates support for various kinds of assembly directives, some of which are quite tricky with regard to their types (mainly widths of various integers, some of which may depend on the target machine, and some of which may not). This new abstraction layer provides a uniform interface for the emission of assembly directives.

This is currently sufficient for testing the debugging support on x86, but is not sufficient for merging. I think pieces missing at present are:

  • testing on 32-bit x86 (64-bit x86 has been tested fairly well);
  • an Asm_directives implementation for backends where there is no "assembly DSL" (i.e. everything except x86) -- there is some commented-out code from an old version of this, but it needs mostly rewriting and checking thoroughly against the current asm_directives.mli interface which has extensively changed since that old code was written;
  • the hooks in the other backends, following the change in amd64/emit.mlp;
  • build system support for building the correct Asm_directives_... implementation.

This pull request has #755 as a prerequisite.

If anyone would like to volunteer to help bring this pull request up to scratch, it would be much appreciated. If not then I will get to it in due course. There are unfortunately some tricky, assembler-dependent details.

@mshinwell mshinwell added this to the 4.05-or-later milestone Oct 17, 2016
@mshinwell
Copy link
Contributor Author

This patch has been significantly reworked. By removing the non-x86-specific directives from X86_dsl / X86_ast I've been able to make a uniform assembly directives emission layer that should be usable by all of the backends and the DWARF emitter.

@alainfrisch I believe you have some code that depends on the "directive" type (now Asm_directives.Directive.t with the x86-specific ones still in X86_dsl.D) -- could you check that these modifications won't cause a problem? I don't think they should.

I need to do some more tidying up in the x86-64 backend and then make all of the other backends use Asm_directives. Review comments are welcome now though.

@mshinwell
Copy link
Contributor Author

(This patch also includes #753 for the moment.)

@mshinwell
Copy link
Contributor Author

I've also now tried to tidy up the terminology surrounding "labels" and "symbols". "Labels" are now always of type Linearize.label in the context of assembly code generation; "symbols" are textual entities. (At some point we should consider using the middle-end type Symbol.t, or an equivalent, for the latter.) All mangling of label numbers and symbol names is done in Asm_directives now.

@alainfrisch Could you possibly check that the typ arguments to define_symbol and define_label are correct in amd64/emit.mlp? Only one of these was specified before. I've made the argument non-optional so we have to think about it. Or is this just not important?

@alainfrisch
Copy link
Contributor

I've only looked very quickly to this PR. It's not completely clear to which extend the notion of "directive" can be made general. Even if there are many similarities in different backends, isn't it inherently tied to the target system and the specific assembly language syntax? Perhaps it is a bit theoretical, but an ocamlopt backend could use a very different directive syntax than gas or masm, so putting concrete syntax emitters in Asm_directives (which is linked for all backends) does not seem right. Similarly, Target_system, in its current form, seems rather specific to Intel backends.

The rationale for this big refactoring would be much clearer (to me) if non-Intel backends were ported to the new abstraction layer as well. Otherwise it is hard to know how general this layer will be. As long as it remains used only by Intel backends, I don't see the benefits.

could you check that these modifications won't cause a problem?

Thanks for thinking about it, and no, indeed, I don't foresee any problem to adapt our code.

let lbl = emit_symbol lbl in
D.global lbl;
_label lbl
D.define_label lbl ~typ:NONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this matters but all instances of _label should be translated to ~typ:QWORD, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping you were going to tell me the answer to that :)
I don't know exactly what masm uses this parameter for, but could it always be the natural machine width in our case? Maybe we could delete the parameter in our code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but could it always be the natural machine width in our case

I'll do some experiments... Stay tuned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, simply generating "foo LABEL DWORD/QWORD" does not work (the assembler complains); at least for labels in code (jump targets), one needs to use "foo:". I've tried using "foo:" in code section and "foo LABEL ?DWORD" in data section, and a good fraction of the testsuite passes, except for some non-deterministic segfaults (perhaps depending on where the image is loaded).

_label lbl;
D.switch_to_section Jump_tables;
D.align ~bytes:4;
D.define_label lbl ~typ:NONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

List.iter emit_call_gc !call_gc_sites;
emit_call_bound_errors ();
let end_of_function_label = Cmm.new_label () in
D.define_label end_of_function_label ~typ:NONE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use of this new label?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for DWARF emission, it shouldn't be in this patch.

efa_string = (fun s -> D.bytes (s ^ "\000"))
D.between_this_and_label_offset_32bit ~upper:lbl
~offset_upper:(Targetint.of_int32 ofs));
efa_def_label = (fun l -> D.define_label l ~typ:NONE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark (_label -> ~typ:QWORD).

@alainfrisch
Copy link
Contributor

Keeping the two kinds of NewLabel looks confusing. Previously, the "typ" could only be NONE/DWORD/QWORD (and it is only used for the masm syntax). Couldn't this info be kept in the common layer (so as to get rid of the NewLabel in the intel-specific part)?

@mshinwell
Copy link
Contributor Author

Thanks for looking at this. I don't really see the problem with having the concrete syntax emitters in Asm_directives; it seems fairly unlikely that we're going to be targeting a platform in the near future that doesn't use gas, gas-like syntax or masm.

This layer is actually used by all platforms for the DWARF emitters. I agree that we should port the other backends to use it throughout, but that can be done as a separate piece of work.

@damiendoligez damiendoligez modified the milestone: 4.05-or-later Feb 15, 2017
@mshinwell mshinwell mentioned this pull request Mar 7, 2017
@xavierleroy
Copy link
Contributor

I tried to review and failed because this huge collection of semi-related changes defies reviewing.

A first step would be for @mshinwell to write down a high-level description of what's in this PR, which goes way beyond than factoring out the concrete syntax of directives, but also includes a bunch of other refactorings, new uses of the Targetint interface, many new uses of a Linkage_name interface I'm not familiar with, and probably more.

I also noticed abstractions that are not (abstractions). For example the Target_system interface needs (by construction) to be extended for every new target architecture of CompCert. So, it doesn't belong in asmcomp/. Likewise I'm skeptical that we gain anything by factoring bits and pieces of emit_function in Emitaux.

The game is not to have the smallest possible emitters for each target. The game is to enforce a modular structure where code emitters hide target-related details from the rest of the system. Examples of such details that (I think) this PR leaks outside the emitters are 1- the need to set local symbols to work around MacOS X limitations, and 2- the PPC64 oddities.

@mshinwell
Copy link
Contributor Author

Some of the changes can probably now be pulled out into separate pull requests which will make things easier. I shall try to do this for the Linkage_name and Target_system changes first.

One point on the other comments: things such as the need for set on Mac OS X is exactly what Asm_directives is supposed to abstract from. (This might arguably not be the case for the PPC ELFv1 oddities.) In particular the DWARF emitting code shouldn't need to know about this.

@xavierleroy
Copy link
Contributor

xavierleroy commented Aug 14, 2017

I'm not yet asking for separate PRs. As a first step I'd like to see an up to date and comprehensive description of what's in this PR.

@mshinwell
Copy link
Contributor Author

I think I'm going to do separate PRs for some of this anyway, since thinking about some of the changes in isolation has already raised a few issues, which will need thinking about.

As for the whole, this contains:

  1. The new layer which abstracts platform-specific details of assembly directive emission and provides the necessary "measuring" functions needed for DWARF emission.
  2. Changes to use the new layer instead of direct emission of strings.
  3. Factoring out of a large amount of common code into Emitaux from the emitters. This was done because the work in point 2 made substantial chunks of the code in the emitters very similar or the same. This work showed that there existed various inconsistencies and at least one bug (as I recall in CFI offset adjustment).
  4. To match up with the new abstraction layer Asm_directives and hopefully to improve matters, use of specialised types in the backend rather than generic types. There are two examples: the first is extending the use of Targetint; the second is using Linkage_name instead of string.
  5. A Target_system module which is intended to improve upon the current mechanisms for communicating various platform-specific details to the backend. Some tidying up of rather old names in the configure script comes along with this change. Something that I haven't discussed yet (cf. caml-devel discussion) is that I think keeping this module properly in sync with all supported platforms and ensuring that we have CI machines for all such platforms (subject to reasonable restrictions on officially-supported operating systems on such platforms) has the potential to significantly improve our QA.

The extended use of Targetint works towards a goal of having the whole compiler pipeline handle cross-compilation correctly. In this pull request this conversion only started from Mach but I'm working on the whole thing from the middle-end (Closure or Flambda) onwards, since it's not too hard.

The Linkage_name change (I will probably rename this to Asm_symbol) aims to be more precise about names in the backend. The proposed roadmap is that Symbol.t be used in the middle-end (this is currently only used in Flambda) to represent functions or data lifted to toplevel without saying anything about the mangling conventions. Then in the backend Asm_symbol.t represents the OCaml-specific mangling of such names (which in particular should be able to be demangled by the debugger) and an associated type describes such names equipped with relocation (or other similar) information. Finally, in the emitters, we just use string for example as arguments in the x86-specific DSL---at this stage the symbol names are basically unstructured data.

@lpw25 will have in his forthcoming namespaces proposal an effectively analogous set of renamings in the front-end of the compiler (instead of Ident.t everywhere).

Merging some of these changes together would save a reasonable amount of work, which is why they were done together in the first place---in particular the change to use the new Asm_directives layer together with the refactoring of the emitters. However I think some parts (e.g. the integer and symbol changes, plus Target_system) should be split.

@mshinwell
Copy link
Contributor Author

One further comment: the use of Linkage_name.t is kind of wrong in Flambda at the moment and the plan is to disentangle that, so it would only be used (renamed as Asm_symbol.t) in the backend. Then it could be moved to asmcomp/ where it belongs.

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Oct 11, 2017
@DemiMarie
Copy link
Contributor

I would definitely like to see this merged. Right now it is not possible to debug OCaml native code, which is annoying. Bytecode works but

  • is slow (at runtime)
  • ocamldebug has rather poor IDE integration (it only speaks to emacs)

@bikallem
Copy link

I want to echo similar sentiment as @DemiMarie. Debugging native code in ocaml is very painful right now. Having to use a crude mixture of printf/format is very time-consuming. A good debugger would also be very useful when you are trying to understand a large ocaml codebase. Would definitely like to see the debugging GPRs merged.

@mshinwell
Copy link
Contributor Author

Work is underway to split this pull request into pieces that are easier to review. Watch this space.

@mshinwell
Copy link
Contributor Author

This GPR is going to be superceded.

@mshinwell mshinwell closed this Sep 26, 2018
stedolan added a commit to stedolan/ocaml that referenced this pull request Sep 21, 2022
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Feb 21, 2023
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
25188da flambda-backend: Missed comment from PR802 (ocaml#887)
9469765 flambda-backend: Improve the semantics of asynchronous exceptions (new simpler version) (ocaml#802)
d9e4dd0 flambda-backend: Fix `make runtest` on NixOS (ocaml#874)
4bbde7a flambda-backend: Simpler symbols (ocaml#753)
ef37262 flambda-backend: Add opaqueness to Obj.magic under Flambda 2 (ocaml#862)
a9616e9 flambda-backend: Add build system hooks for ocaml-jst (ocaml#869)
045ef67 flambda-backend: Allow the compiler to build with stock Dune (ocaml#868)
3cac5be flambda-backend: Simplify Makefile logic for natdynlinkops (ocaml#866)
c5b12bf flambda-backend: Remove unnecessary install lines (ocaml#860)
ff12bbe flambda-backend: Fix unused variable warning in st_stubs.c (ocaml#861)
c84976c flambda-backend: Static check for noalloc: attributes (ocaml#825)
ca56052 flambda-backend: Build system refactoring for ocaml-jst (ocaml#857)
39eb7f9 flambda-backend: Remove integer comparison involving nonconstant polymorphic variants (ocaml#854)
c102688 flambda-backend: Fix soundness bug by using currying information from typing (ocaml#850)
6a96b61 flambda-backend: Add a primitive to enable/disable the tick thread (ocaml#852)
f64370b flambda-backend: Make Obj.dup use a new primitive, %obj_dup (ocaml#843)
9b78eb2 flambda-backend: Add test for ocaml#820 (include functor soundness bug) (ocaml#841)
8f24346 flambda-backend: Add `-dtimings-precision` flag (ocaml#833)
65c2f22 flambda-backend: Add test for ocaml#829 (ocaml#831)
7b27a49 flambda-backend: Follow-up PR#829 (comballoc fixes for locals) (ocaml#830)
ad7ec10 flambda-backend: Use a custom condition variable implementation (ocaml#787)
3ee650c flambda-backend: Fix soundness bug in include functor (ocaml#820)
2f57378 flambda-backend: Static check noalloc (ocaml#778)
aaad625 flambda-backend: Emit begin/end region only when stack allocation is enabled (ocaml#812)
17c7173 flambda-backend: Fix .cmt for included signatures (ocaml#803)
e119669 flambda-backend: Increase delays in tests/lib-threads/beat.ml (ocaml#800)
ccc356d flambda-backend: Prevent dynamic loading of the same .cmxs twice in private mode, etc. (ocaml#784)
14eb572 flambda-backend: Make local extension point equivalent to local_ expression (ocaml#790)
487d11b flambda-backend: Fix tast_iterator and tast_mapper for include functor. (ocaml#795)
a50a818 flambda-backend: Reduce closure allocation in List (ocaml#792)
96c9c60 flambda-backend: Merge ocaml-jst
a775b88 flambda-backend: Fix ocaml/otherlibs/unix 32-bit build (ocaml#767)
f7c2679 flambda-backend: Create object files internally to avoid invoking GAS (ocaml#757)
c7a46bb flambda-backend: Bugfix for Cmmgen.expr_size with locals (ocaml#756)
b337cb6 flambda-backend: Fix build_upstream for PR749 (ocaml#750)
8e7e81c flambda-backend: Differentiate is_int primitive between generic and variant-only versions (ocaml#749)

git-subtree-dir: ocaml
git-subtree-split: 25188da
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants