Skip to content

Win64 stack overflow detection#938

Merged
mshinwell merged 8 commits intoocaml:trunkfrom
oandrieu:win64-stack-overflow
May 9, 2017
Merged

Win64 stack overflow detection#938
mshinwell merged 8 commits intoocaml:trunkfrom
oandrieu:win64-stack-overflow

Conversation

@oandrieu
Copy link
Contributor

This branch enables stack overflow detection for native code on Windows 64. There are mainly two changes compared to the existing Win32 code:

  • use a different Win32 API to register the exception handler ; the previous one (SetUnhandledExceptionFilter) wasn't working anymore with recent MSVC CRT ;
  • adapt the context manipulation in the exception handler for 64 bits

I'm not too sure about the backtrace stashing mechanism: caml_last_return_address and caml_bottom_of_stack have not been updated so it seems the backtrace may be incorrect. But the Unix equivalent in signals_asm.c ought to have the same issue and it does not bypass the stashing …

$(OCAMLOPT) -w a -o $$F.native$(EXE) $$f; \
fi; \
done
$(if $(findstring win32,$(UNIX_OR_WIN32)),:, \
Copy link
Member

@dra27 dra27 Nov 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right - surely config/s-nt.h needs updating to define HAS_STACK_OVERFLOW_DETECTION for Win64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, HAS_STACK_OVERFLOW_DETECTION enables the very Unix-specific SIGSEGV handling code that handles the stack overflow in signals_asm.c ; that just won't compile on Win32.

byterun/win32.c Outdated

void caml_win32_overflow_detection()
{
SetErrorMode(SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading, rather than trying it at this stage: is this functionally necessary?

@dra27
Copy link
Member

dra27 commented Nov 29, 2016

This is a great leap towards closing a very tedious piece of behaviour on Windows, thanks!

Just to expand on the problem: since Visual Studio 2005, the Microsoft CRT includes various calls (principally via abort) which clear the exception filter in order to guarantee that the runtime is torn down. MinGW in its default configuration is therefore unaffected because the core msvcrt included in Windows is based on .NET 2002/3's implementation. However, given its latest incarnation (the Universal CRT or UCRT) is now a Windows component again, that "stability" is not guaranteed to last and in fact I think mingw can already be configured to use other versions of the CRT if you wish.

I'm not (yet) 100% convinced about the equivalence of using a vectored exception handler instead of the unhandled filter - but I'm just putting that out there for now (the alternative is to hook SetUnhandledExceptionFilter and effectively disable the CRT from calling it)

@dra27
Copy link
Member

dra27 commented Nov 29, 2016

I haven't delved into the history enough - was supporting Windows '9x the original reason for being unable to detect stack overflow?

@oandrieu
Copy link
Contributor Author

I had a look at the sources of the CRT, trying to find out why SetUnhandledExceptionFilter isn't able to catch the exception — no success.

was supporting Windows '9x the original reason for being unable to detect stack overflow?

er what do you mean ? Stack overflow detection was working, just only on 32-bits and not with the recent MSVC CRT that prevent SetUnhandledExceptionFilter from working.
I simply removed the Win9x code because it's obsolete now, OCaml requires at least WinXP, right ?

@dra27
Copy link
Member

dra27 commented Nov 29, 2016

OK, looked at this in a bit more detail (including building it). I did not realise that 32 bit native stack overflow detection actually worked because as it happens on my machine it's masked by the crash dialog (and possibly the fact that the machine I'm sat at has a slightly old OCaml by default) and had always assumed it wasn't available on 32 or 64-bit and, more importantly, the test for it has always been skipped (I'm very familiar with differences in the testsuite between the platforms......)

I think that it should be refactored to use HAS_STACK_OVERFLOW_DETECTION - the fact that the test was skipped before was a bug and I think that the bug is that s-nt.h should be defining that, rather than extra tests being included in the Makefile itself. The code in asmrun should be refactored so that the signals bit is under a further #ifdef and the Windows stuff under another but both activated by HAS_STACK_OVERFLOW_DETECTION. @shindere may have an opinion on this too - basically, changes should be moving towards allowing a clear configure route for the Windows ports where the features are similarly-named.

Could the relevant section in the manual be updated in this GPR as well (I don't think there's a policy of separating PRs - @gasche, @Octachron?)

@gasche
Copy link
Member

gasche commented Nov 29, 2016

On the contrary, having documentation changes in the same PR is better. (It helps reviewer who can read the doc, and also writing the doc can clarify things and prompt changes to the implementation.)

@oandrieu
Copy link
Contributor Author

@dra27, OK waiting on some other opinion on this because I'm not too hot about the #ifdef twiddling.

@shindere
Copy link
Contributor

shindere commented Nov 29, 2016 via email

@shindere
Copy link
Contributor

shindere commented Nov 29, 2016 via email

@dra27
Copy link
Member

dra27 commented Nov 29, 2016

@shindere - it could be made formal inasmuch as the s.h generated by configure could be converted into a Makefile include as well (or one could be generated at the same time), or potentially stack overflow detection should be included in the output of ocamlc -config... but that's possibly a separate matter? My main point is that the C side should be able to rely on the features detected in m.h and s.h being the same across platforms.

Given that only one (I think?) C file is affected, it doesn't seem too major a thing to need to do. @oandrieu - I'm not clear whether it's that you don't like the idea of the change or don't like the idea of making the change... if the latter, then I'm happy to push a commit showing it!

@oandrieu
Copy link
Contributor Author

oandrieu commented Nov 29, 2016

@shindere - the -4 - 1 corresponds to the Win64 calling convention: caller is supposed to allocate backing space on the stack for the 4 argument-passing registers ; the -1 is the space for the return address (unused here since we never return).

@dra27 - I don't quite see what you have in mind. HAS_STACK_OVERFLOW_DETECTION really means HAS_STACK_OVERFLOW_DETECTION_USING_UNIX_SIGSEGV_HANDLER. This PR is indeed the same feature, but implemented with completely different C code, in a different file. #define-ing it in s-nt.h will just pollute signals_asm.c which will have to exclude _WIN32 from all HAS_STACK_OVERFLOW sections.

@shindere
Copy link
Contributor

shindere commented Dec 3, 2016 via email

@shindere
Copy link
Contributor

shindere commented Dec 3, 2016 via email

@xavierleroy
Copy link
Contributor

I'm not enough of a Win32 guru to review this code, and I'm not sure we'll find enough experts who can. Still, my feeling is that the new code is probably better than the old one, even if not perfect, so we don't have anything to lose by using the new code.

@dra27
Copy link
Member

dra27 commented Dec 6, 2016

I agree that AddVectoredExceptionHandler is a better alternative: I think that the culprit is gs_report.c in the CRT and that this comes about as part of the security features added in VS 2005 (related to the /GS option which my understanding is that release CRTs are compiled with). I couldn't quite get the hacky workaround which is meant to disable it completely to work in order to convince myself that was definitely the cause, annoyingly.

Having done more testing, I have three changes to propose:

  1. I cannot see the motivation for commit 23ab217 (disabling the AppCrash dialog) - why is this necessary? At the moment, I would prefer this commit to be removed.
  2. Either 3647699 or fab3b7a should make the following changes:
  • asmrun/signals_asm.c L308-310 (the call to caml_win32_overflow_detection should be moved to asmrun/startup.c below the call to the caml_init_signals - this code never had any place here given, as you point out, that the handling has nothing to do with signals.
  • The three places in asmrun/signals_asm.c where HAS_STACK_OVERFLOW_DETECTION is tested should be altered to #if defined(HAS_STACK_OVERFLOW_DETECTION) && !defined(_WIN32) (what you call pollution I call correct operation!).
  • #define HAS_STACK_OVERFLOW_DETECTION should be added to config/s-nt.h. I'm not convinced it's necessary to split this into two feature defines (so both Windows and relevant Unices define HAS_STACK_OVERFLOW_DETECTION and the Unices also define HAS_STACK_OVERFLOW_DETECTION_SIGNAL) but I am concerned at how long the broken behaviour went unnoticed because of not setting features correctly in the first place.
  1. Commit 84dc64e is then unnecessary

@xavierleroy, @shindere - what are your thoughts?

@oandrieu
Copy link
Contributor Author

oandrieu commented Dec 7, 2016

disabling the AppCrash dialog - why is this necessary?

Right, it's not. I just stumbled on the function in the documentation. I'm not sure what good is this crash dialog for an OCaml program, but yes it's unrelated to the whole stack overflow detection — I'll remove it.

the call to caml_win32_overflow_detection should be moved to asmrun/startup.c

OK, makes sense.

Commit 84dc64e is then unnecessary

Well the modification to the testcase is still useful to test that the system stack overflow detection has been properly re-set.

@dra27
Copy link
Member

dra27 commented Dec 7, 2016

You are completely right about the new addition to the test case - sorry, I forgot that that there was too! Please do keep that :)

@xavierleroy
Copy link
Contributor

Could we restart discussion and come to a decision for this PR? Are there any pending issues? If not I'm in favor of using this code.

@oandrieu
Copy link
Contributor Author

I've transposed the changes from commit 339bcbb. The stack probe is smaller (4K instead of 32K) but it still appropriately report a Stack_overflow on the testcase of PR#5064.

@xavierleroy
Copy link
Contributor

This is probably too late or too risky at this point for inclusion in 4.05, but I'm still in favor of merging in master soon, so that the code gets some exposure before ending in 4.06.

@mshinwell
Copy link
Contributor

Does this code operate correctly if the fault happens in a thread that is not the main thread?

@oandrieu
Copy link
Contributor Author

@mshinwell Yes I believe so. I found no indication that the protection mechanism is different in non-main threads and tests work as expected.

@mshinwell
Copy link
Contributor

I think this is ready for merge, except: it looks like @oandrieu needs to sign a CLA first.

@damiendoligez
Copy link
Member

I tend to agree with @dra27 on the #define business. We need two pieces of information:

  • the test suite needs to know whether stack overflow is detected, whatever the platform
  • signals_asm.c needs to know whether unix-style stack overflow detection is available.

Since signals_asm.c already has some #ifdef _WIN32 I don't think it's a big deal to add a condition on the #ifdef HAS_STACK_OVERFLOW_DETECTION. I'm not quite sure how that helps with the test suite, though.

@damiendoligez
Copy link
Member

@xavierleroy
Copy link
Contributor

@damiendoligez : @oandrieu sent me a signed CLA on March 15th, so everything is OK on the legal front. (My CLA notes that you might have consulted are not up to date, sorry about that, will update today.)

@mshinwell
Copy link
Contributor

@damiendoligez @dra27 Could you please come to a clear statement as to what you would like to see changed, if anything, in this pull request before it is merged?

@oandrieu Please rebase to resolve conflicts.

@oandrieu oandrieu force-pushed the win64-stack-overflow branch from 9f0df4b to 22abcc1 Compare May 5, 2017 13:27
@oandrieu
Copy link
Contributor Author

oandrieu commented May 5, 2017

@mshinwell done — not sure what the AppVeyor failure is about though.

@mshinwell
Copy link
Contributor

@oandrieu I think if you rebase again now it will be ok.

@oandrieu oandrieu force-pushed the win64-stack-overflow branch from 22abcc1 to 720989c Compare May 9, 2017 08:15
oandrieu added 8 commits May 9, 2017 10:16
for some reason, SetUnHandledExceptionFilter is overridden by the Visual
CRT, so it's not working on the MSVC port (it's working fine with mignw
though).

The AddVectoredExceptionHandler API seems a better fit for this (requires
WinXP)
also modify the test so that it triggers two stack overflows, to
make sure that the Windows stack has been properly re-set by our
exception handler
Win32 overflow detection hasn't much to do with signals.
This is an adaptation of SVN commit r12159:
,----
| PR#5064, PR#5485: try to ensure that 4K words of stack are available
| before calling into C functions, raising a Stack_overflow exception
| otherwise.  This reduces (but does not eliminate) the risk of
| segmentation faults due to stack overflow in C code.
`----
Except that the stack probe is only one page size ahead, so as not to
overstep the guard page marking the end of the stack.
@damiendoligez
Copy link
Member

About HAS_STACK_OVERFLOW_DETECTION, the code will probably have to be changed anyway when we get autoconf working on Windows, so I'm in favor of merging as-is.

@mshinwell
Copy link
Contributor

OK, I'm going to merge this then; I think there's value in having this in sooner rather than later so as to get more testing. We can always tweak it at a later date.

@mshinwell mshinwell merged commit eb00976 into ocaml:trunk May 9, 2017
@shindere
Copy link
Contributor

The MSVC32 port does not compile for the current trunk.
Bisecting shows that the first bad commmit is
eb00976, which is part of this PR:

boot/ocamlrun ./ocamlopt -g -nostdlib -I stdlib -I otherlibs/dynlink -o ocamlc.opt
compilerlibs/ocamlcommon.cmxa compilerlibs/ocamlbytecomp.cmxa driver/main.cmx -cclib "advapi32.lib ws2_32.lib"
** Cannot resolve symbols for stdlib\libasmrun.lib(win32.obj):
_caml_system__code_begin
_caml_system__code_end
File "caml_startup", line 1:
Error: Error during linking
Makefile:862: recipe for target 'ocamlc.opt' failed

@oandrieu: can you see where the porblem comes from?

Are you able to submit a fix for it?

If required I'll be happy to help out with testing on the CI machines.

Thanks!

@oandrieu
Copy link
Contributor Author

@shindere ah crap, sorry.
Yes, a fix is easy enough: oandrieu/ocaml@4ecd375
Should I submit a new PR ? or is there a way to amend/extend this one ?

@shindere
Copy link
Contributor

shindere commented May 12, 2017 via email

@oandrieu
Copy link
Contributor Author

@shindere OK, thanks. I've updated the commit message and rebased on trunk here: oandrieu/ocaml@1a111e1 . It would be great if you could arrange for CI to run on win32,win64 × msvc,mingw .

@shindere
Copy link
Contributor

shindere commented May 12, 2017 via email

@shindere
Copy link
Contributor

shindere commented May 13, 2017 via email

sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Feb 21, 2023
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
b11eea1 flambda-backend: Introduce Import_info (ocaml#1036)
bc5b135 flambda-backend: Fix `ocamlobjinfo` on flambda2 .cmx files (ocaml#1029)
c8babbd flambda-backend: Compilation_unit optimisations (ocaml#1035)
e8d3e22 flambda-backend: Use 4.14.0 opam switch for building (includes upgrading ocamlformat to 0.24.1) (ocaml#1030)
eb14a86 flambda-backend: Port PR81 from ocaml-jst (ocaml#1024)
131bc12 flambda-backend: Merge ocaml-jst 2022-12-13 (ocaml#1022)
06c189a flambda-backend: Make stack allocation the default (ocaml#1013)
98debd5 flambda-backend: Initial support for value slots not of value kind (ocaml#946)
deb1714 flambda-backend: Add is_last flag to closinfo words (ocaml#938)
d07fce1 flambda-backend: Disable poll insertion in Configure (ocaml#967)
0f1ce0e flambda-backend: Regenerate ocaml/configure autoconf 2.69 (instead of 2.71) (ocaml#1012)
27132d8 flambda-backend: Fix for spurious typing error related to expanding through functor arguments (ocaml#997)
724fb68 flambda-backend: Use `Compilation_unit.t` instead of `Ident.t` for globals (ocaml#871)
396d5b8 flambda-backend: Add a test for frametable setup in natdynlinked libraries (ocaml#983)
b73ab12 flambda-backend: Fix invocation of `caml_shared_startup` in native dynlink (ocaml#980)
7c7d75a flambda-backend: Fix split_default_wrapper which did not trigger anymore with flambda2 (ocaml#970)
8fb75bd flambda-backend: Port ocaml#11727 and ocaml#11732 (ocaml#965)
fdb7987 flambda-backend: Fix include functor issue after 4.14 merge. (ocaml#948)
9745cdb flambda-backend: Print -dprofile/-dtimings output to stdout like 4.12 (ocaml#943)
5f51f21 flambda-backend: Merge pull request ocaml#932 from mshinwell/4.14-upgrade
841687d flambda-backend: Run make alldepend in ocaml/ (ocaml#936)
72a7658 flambda-backend: Remove reformatting changes only in dynlink/dune (preserving PR889 and adjusting to minimise diff)
6d758cd flambda-backend: Revert whitespace changes in dune files, to match upstream
c86bf6e flambda-backend: Remove duplicate tests for polling
971dbeb flambda-backend: Testsuite fixes
32f8356 flambda-backend: Topeval fix for symbols patch
befea01 flambda-backend: Compilation fixes / rectify merge faults
a84543f flambda-backend: Merge ocaml-jst
8e65056 flambda-backend: Merge ocaml-jst
4d70045 flambda-backend: Remove filename from system frametable (amd64) (ocaml#920)
5e57b7d flambda-backend: Bugfix for runtime frame_descr logic for C frames (ocaml#918)
6423d5e flambda-backend: Merge pull request ocaml#914 from mshinwell/merge-ocaml-jst-2022-10-24
ead605c flambda-backend: Add a missing Extract_exception (ocaml#916)
c8f1481 flambda-backend: Resolve conflicts and add specialise/specialised attributes to Builtin_attributes
cf4d0d3 flambda-backend: Merge fixes (ocaml#21)
c2f742f flambda-backend: Re-enable some tests for Flambda2 (ocaml#881)
3d38d13 flambda-backend: Long frames in frametable (ocaml#797)
85aec7b flambda-backend: Add loop attribute to Builtin_attributes
c0f16e3 flambda-backend: Compilation fixes
90dea23 flambda-backend: Merge flambda-backend/main
5acc6ea flambda-backend: Fixes after merge
e501946 flambda-backend: Merge ocaml-jst
115083b flambda-backend: Merge ocaml-jst
9943b2e flambda-backend: Revert "Revert "Transform tail-recursive functions into recursive continuations (ocaml#893)"" (ocaml#909)
ce339f1 flambda-backend: Fix alloc modes and call kinds for overapplications (ocaml#902)
e6a317c flambda-backend: Revert "Transform tail-recursive functions into recursive continuations (ocaml#893)"
853c488 flambda-backend: Transform tail-recursive functions into recursive continuations (ocaml#893)
5a977e4 flambda-backend: Fix missing End_region primitives on switch arms (ocaml#898)
7fa7f9d flambda-backend: Add missing dependencies to Dune files (ocaml#889)
3cd36f0 flambda-backend: Have Lambda `Pgetglobal` and `Psetglobal` take `Compilation_unit.t` (ocaml#896)
7565915 flambda-backend: [@poll error] attribute (ocaml#745)
9eb9448 flambda-backend: Backport the main safepoints PRs (ocaml#740)
689bdda flambda-backend: Add strict mode for ocamldep (ocaml#892)

git-subtree-dir: ocaml
git-subtree-split: b11eea1
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants