Skip to content

Add absolute directory names to bytecode format for ocamldebug to use#14

Closed
jwatzman wants to merge 1 commit intoocaml:trunkfrom
jwatzman:ocamldebug
Closed

Add absolute directory names to bytecode format for ocamldebug to use#14
jwatzman wants to merge 1 commit intoocaml:trunkfrom
jwatzman:ocamldebug

Conversation

@jwatzman
Copy link

Continuation of http://caml.inria.fr/mantis/view.php?id=6270

The need for a long list of -I directives makes interactively using
ocamldebug a pain in the butt. Many folks have solved this with various
find invocations or even Python wrappers, but those lead to other
problems when it might include files you weren't expecting (or miss
things you were). But all of this is really annoying since the tooling
should be able to figure out itself, even heuristically, where your
source files are -- gdb gets this right, why can't we?

This patch implements one of the more important heuristics from gdb: you
typically debug on the same machine you built on, so looking for the
source files and built artifacts in the absolute paths where they were
during compilation is a good first try. We write out absolute paths into
a new structure at the beginning of the debug section and then
automatically append those directories into the load path.

This means mean that if you happen to be debugging on a machine
where the original source and build artifacts are not available in
their original absolute locations, things will work as before, using the
standard load path mechanism. You can also explicitly use -I to prepend
directories to the load path and override the defaults located by this
new mechanism.

I personally find this makes using ocamldebug much more pleasant :)

@gasche
Copy link
Member

gasche commented Feb 22, 2014

I tried to review this patch but failed for two reasons:

  • if I understand correctly, the patch here is just a part of the change proposed in PR#6270: you're getting absolute directory paths to the debug section, but not yet using them from ocamldebug (so the commit message is outdated)
  • there is no textual description of how the patch evolved to address the concerns Alain voiced on Mantis. I understand that separating the directory names from the basenames is precisely meant to avoid increasing the debug section too much, but it is hard to assess without the ocamldebug-part of the patch. In particular, I wonder whether this approach makes absolute paths less precise (eg. a risk that a distinct machine get recognized as having the same filesystem structure while it hasn't, because of path ambiguity), but I cannot tell.

@jwatzman
Copy link
Author

I tried to review this patch but failed for two reasons:

Once again, thanks so much for taking the time to look at this!

you're getting absolute directory paths to the debug section, but not yet using them from ocamldebug

I believe that they are in fact being used. This is the intent of my changes to debugger/program_management.ml, debugger/symbols.ml, and debugger/symbols.mli -- they read the new addition to the debug section and add those directories to the search path.

https://gist.github.com/jwatzman/9162697 is the file I've been using to test these changes. Here's a transcript of my test; I moved the binary -- but not the source -- into a different directory to simulate this being a more complex project across several submodules.

[01:28P][josh@charybdis:~/ocaml-test] $ ~/ocaml-prefix/bin/ocamlc -g -o foo foo.ml
[01:28P][josh@charybdis:~/ocaml-test] $ ./foo
25
[01:29P][josh@charybdis:~/ocaml-test] $ mkdir otherdir
[01:30P][josh@charybdis:~/ocaml-test] $ cd otherdir/
[01:30P][josh@charybdis:~/ocaml-test/otherdir] $ cp ../foo .
[01:30P][josh@charybdis:~/ocaml-test/otherdir] $ ~/ocaml-prefix/bin/ocamldebug foo
    OCaml Debugger version 4.02.0+dev3-2013-12-19

(ocd) break @ Foo 7
Loading program... done.
Breakpoint 1 at 6288: file foo.ml, line 7, characters 2-27
(ocd) r
Time: 19 - pc: 6288 - module Foo
Breakpoint: 1
7   <|b|>let y = frob1 x in
(ocd) bt
Backtrace:
#0 Foo foo.ml:7:2
#1 Foo foo.ml:13:10
#2 Foo foo.ml:17:16
(ocd) next
Time: 23 - pc: 6304 - module Foo
8   <|b|>y + 3
(ocd) p y
y: int = 22
(ocd) 

I also tested this change with an internal project at Facebook and it worked well there too; unfortunately that code not publicly available at this time so I can't give you steps to check this yourself.

there is no textual description of how the patch evolved to address the concerns Alain voiced on Mantis.

I didn't add this to the commit message, but would be happy to do so. Here are his two concerns and my thoughts on them:

  • Location displayed in backtraces shows filenames meaningful only on the development machine.

This is no longer the case; the debug events are unmodified in this version of the patch.

  • The size of the generated bytecode or .cds file could grow significantly, esp. because sharing is lost between all the pos_fname strings (always re-created from absolute_path). This could be mitigated by memoizing the absolute_path function.

The specific concern here, about sharing in pos_fname, is no longer relevant since I don't change pos_fname anymore. The more general concern around increasing the size of the bytecode is potentially still valid. However, I don't think this is a big deal at all -- it only increases by the size of the path length of the //directory// names in a given compilation unit -- not the full source file name. I'm careful to use a Set in order to ensure that the directory names are deduplicated and written into the bytecode only once. This is basically the minimal amount of information you need in order to get the core idea of this patch to work at all -- you have to store directory information somewhere in order to find your original source files.

I think the tradeoff of a tiny bit more space for this is absolutely worth it to avoid the contortions to come up with a list of -I directives at debug time. The experience of just being able to say gdb foo as opposed to searching my bash history for the right ocamldebug incantation involving a find subshell is a huge difference in the usability of the two debuggers on nontrivial projects IMO.

Furthermore, those who really do care about this space efficiency can just turn off -g and not worry about it. I really strongly believe that making debugging as painless as possible when complied in debug mode is absolutely worth it.

I understand that separating the directory names from the basenames is precisely meant to avoid increasing the debug section too much, but it is hard to assess without the ocamldebug-part of the patch.

As above, the ocamldebug part is included here (unless I'm missing something, totally possible of course). I'm not sure what exactly you're getting at "evaluating" here anyways -- just the aforementioned size increase? Again, I'm not sure why we are so worried about increasing the size of the debug section (within reason, which I believe this patch falls well within, as above).

In particular, I wonder whether this approach makes absolute paths less precise (eg. a risk that a distinct machine get recognized as having the same filesystem structure while it hasn't, because of path ambiguity), but I cannot tell.

I'm not sure what you mean by this. Do you mean that a different system that just so happens to have the same directory layout but totally different files in those directories might make ocamldebug confused? I thought we had some protection in place to make sure that at least the structure interfaces matched or something like that -- I remember running into this exact problem when I had a -I list to ocamldebug that was slightly wrong and specified files with the same module names but with modules that were slightly different than what was in the bytecode file, and ocamldebug would exit with an assertion failure.

In any event, again I think the added ease of use of ocamldebug in the common case is more than worth it for this super rare edge case, especially folks who //do// run into this case can just pass the right -I flags manually and things will continue working as before.

Or do I misunderstand and you're getting at something else?

Again, thanks for your feedback, and looking forward to your response!

@avsm
Copy link
Member

avsm commented Feb 22, 2014

In any event, again I think the added ease of use of ocamldebug in the common case is more
than worth it for this super rare edge case, especially folks who //do// run into this case can just
pass the right -I flags manually and things will continue working as before.

I do agree with this sentiment for ocamldebug, as this patch turns a tool I've hardly ever touched for the last 14 years into something that's actually quite useful out of the box...

@gasche
Copy link
Member

gasche commented Feb 22, 2014

Thanks for your detailed reply. Given the new patch is much less invasive than the previous one, I mistakenly assumed that it was only part of the changes. I'll try to give it another look as soon as possible.

@jwatzman
Copy link
Author

No problem! This patch should stand alone (and replace the original version on Mantis), and is indeed much less invasive than that previous version -- thanks for the suggestion to include the information in the debug section, it works out a lot better IMO.

@hhugo
Copy link
Contributor

hhugo commented Feb 23, 2014

I just had a quick look.
Would It be possible to output debug_dirs at the end of the Debug section (after events ?) so it will break less code relying on Debug section structure (I think of js_of_ocaml)

@jwatzman
Copy link
Author

@gasche, gotten a chance to look at this again yet? If we're blocking on @hhugo's request, it sounds quite reasonable to me, and I'd be happy to make that change. Just haven't done so yet in case there are other changes requested.

@gasche
Copy link
Member

gasche commented Mar 21, 2014

I haven't had time to look at it again, sorry. But in any case, I probably won't be the one to make a decision regarding this patch, as I now little about the debugger. Any upstreaming decision would have to be made by someone knowledgeable, such as Alain Frisch or Xavier Clerc.

@jwatzman
Copy link
Author

Updated from jwatzman@f67ba3b to jwatzman@913d2e3 -- moved the new information to the end of the debug section as suggested, which I'm pretty sure means I don't need a format magic number bump or any of the code to skip over the data at the beginning of the section. (Not sure why I put it there to begin with, thanks for pointing out that oversight!)

I will see if I can get Alain Frisch or Xavier Clerc to take a look at this, thanks for giving me the pointers.

@xclerc
Copy link
Contributor

xclerc commented Mar 23, 2014

I only had a first/quick look at the patch. It seems correct,
and also respects the style of the surrounding code. Just
one remark: if it has not been done by your previous patch,
you should "bump" the magic number for cmo files.

@jwatzman
Copy link
Author

Thanks for taking a look!

I had bumped the magic numbers in a previous version of this patch, but un-did it when I moved the new information to the end of the debug section, since it seems things are now backwards-compatible. (Old programs just won't know there is more to read in the debug section before they go off and seek to another section, but the information that currently exists will remain in the same place.)

If that logic is incorrect I'm happy to re-bump the number :) Do I need to bump just the cmo version? I had previously bumped the exec, cmi, cmo, and cma magic numbers, though that was based on intuition ("just bump all the bytecode versions") and some looking back thru previous commits, not any real information.

@alainfrisch
Copy link
Contributor

I have zero experience with the bytecode debugger, so I'll let someone else (xclerc?) decide.

@gasche
Copy link
Member

gasche commented Mar 24, 2014

I just had a look at the code. Three things (only opinions/questions, none of them "blocking"):

  • I find dirs to be a relatively poor name for an exposed variable in the Symbol module; could we use something more descriptive? program_include_dirs? program_loadpath? something else?

The two questions are about the Config.load_path := !Config.load_path @ !Symbols.dirs; line added to Program_management:

  • Parameters.add_path does an Envaux.reset_cache () after changing the load_path variable; shouldn't the same be done here (or this function used instead of direct modification)? A possible use case is the user killing the program, recompiling his application (without quitting the debugger session) with slightly different interfaces, and then re-running the program
  • relatedly, Parameters.add_path, as well as Loadprinter.loadfiles, seem to be careful not to duplicate directory entries in the load_path variable; maybe it would be prudent to respect that inveriant in this assignment (or delegate to add_path)?

@jwatzman
Copy link
Author

I find dirs to be a relatively poor name for an exposed variable in the Symbol module

Agreed, will fix.

The two questions are about the Config.load_path := !Config.load_path @ !Symbols.dirs; line added to Program_management: [...]

I didn't delegate to add_path since that prepends instead of appending. I wanted to make sure that my new additions to the load path ended up at the end, so that user-specified -I directives would have the same effect as before, and even be able to override any load path additions my new code adds. Thanks for pointing out my violation of program invariants here -- I'll either define append_path in parameters.ml or otherwise fix this up. (Or if you think that this whole thing isn't a big deal, I'll just stick them on the front via add_path?)

I'll wait incorporate both of these changes with any additional changes that @xclerc suggests when he responds.

@xclerc
Copy link
Contributor

xclerc commented Mar 24, 2014

I think the idea of appending rather than prepending is good.
Regarding non-duplication of path entries, I think the invariant
is not here for correction but for performance. So, we are safe,
but enforcing it would not hurt.

@jwatzman
Copy link
Author

Okay, so I still need to:

  • Change the name of the dirs variable
  • Call Envaux.reset_cache ()
  • Bump the cmo magic number. @xclerc, can you respond to my earlier question about this -- do I still need to bump since the new data is appended to the end of the debug section, and do I need to bump the exec, cmi, and/or cma versions as well?

Will wait to update patch on answer to question above.

@alainfrisch
Copy link
Contributor

Since you change the file format of cmo files, it seems pretty clear that you need to change the magic number of cmo files. If you still have .cmo files around generated by an older version of the compiler and you try to link them with the new version, it will try to do an extra input_value, which will fail. Or did I missed something?

@xclerc
Copy link
Contributor

xclerc commented Mar 24, 2014

Sorry for missing the magic number part. I second @alainfrisch opinion;
the magic number should be updated for cmo files, cma files (that are
basically concatenations of cmo files), and executable bytecode files.
Unless I missed something, cmi files seems to be left unchanged, so is
their magic number.

@jwatzman
Copy link
Author

Aha, I see. Yeah that makes sense. Will send an updated diff with all the suggested revisions in the next couple days. Thanks again for the reviews!

@jwatzman
Copy link
Author

Updated from jwatzman@913d2e3 to jwatzman@395b924 -- I believe I fixed all the issues requested above. Let me know if there's anything else that still needs to be done! Excited to get this committed :)

@yallop
Copy link
Member

yallop commented Mar 31, 2014

Closing, since this has been merged to trunk (dd7bd5b).

@yallop yallop closed this Mar 31, 2014
@gasche gasche reopened this Mar 31, 2014
@gasche
Copy link
Member

gasche commented Mar 31, 2014

I had to temporarily revert the commit as it broke the backtrace testsuite. To reproduce the failure, you can do:

make world.opt
cd testsuite
make one DIR=tests/backtrace

I'm sorry for not noticing this earlier (code review is good but we should have caught the testsuite failure). I'm not sure where the error comes from; it may be a trivially fixable issue of bootstrap after the magic number change.

@jwatzman
Copy link
Author

Drat, I could have sworn I ran the testsuite myself anyways. Thanks for letting me know -- I'll take a look.

@jwatzman
Copy link
Author

jwatzman commented Apr 1, 2014

I took a quick look at this and sadly it's not as simple as a magic number bump (or at least if it is, it was non-obvious where that needed to happen :)). The backtrace code seems to be trying to read a bad value out of somewhere in the bytecode. I will take a closer look at it in the next few days, but it does look like a legitimate bug in my patch somewhere.

The need for a long list of -I directives makes interactively using
ocamldebug a pain in the butt. Many folks have solved this with various
`find` invocations or even Python wrappers, but those lead to other
problems when it might include files you weren't expecting (or miss
things you were). But all of this is really annoying since the tooling
should be able to figure out itself, even heuristically, where your
source files are -- gdb gets this right, why can't we?

This patch implements one of the more important heuristics from gdb: you
typically debug on the same machine you built on, so looking for the
source files and built artifacts in the absolute paths where they were
during compilation is a good first try. We write out absolute paths into
a new structure at the beginning of the debug section and then
automatically append those directories into the load path.

This means mean that if you happen to be debugging on a machine
where the original source and build artifacts are *not* available in
their original absolute locations, things will work as before, using the
standard load path mechanism. You can also explicitly use -I to prepend
directories to the load path and override the defaults located by this
new mechanism.

I personally find this makes using ocamldebug much more pleasant :)
@jwatzman
Copy link
Author

jwatzman commented Apr 1, 2014

Updated from jwatzman@395b924 to jwatzman@cc5bc17 -- I needed to ignore the new information in a couple of places, notably backtrace.c. (My original diff did this, but I incorrectly thought I didn't need to do so anymore once I moved the debug info after the other stuff. cc @hhugo in case this affects js_of_ocaml.)

The entire test suite now passes, except for tests/lib-threads/testsocket.ml which looks unrelated.

@hhugo
Copy link
Contributor

hhugo commented Apr 3, 2014

Indeed. It also affect js_of_ocaml. thanks

This was referenced Mar 14, 2019
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Feb 20, 2020
stedolan pushed a commit to stedolan/ocaml that referenced this pull request May 24, 2022
* Use effects and coeffects of primitives in middle_end

Currently only used when builtin=true.

* Remove the constraint on builtin=true

Use effect and coeffect of Primitive in Semantics_of_primitives
even for primitives that don't have builtin=true
DMaroo pushed a commit to DMaroo/ocaml that referenced this pull request Jan 10, 2024
Make integer conversions macros and update hyperfine result parsing script
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
import tutorials from ocaml.org and convert to mdx
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants