Fix segfault on infinite recursion in some cases#9617
Conversation
src/libexpr/eval.cc
Outdated
|
|
||
| void EvalState::callFunction(Value & fun, size_t nrArgs, Value * * args, Value & vRes, const PosIdx pos) | ||
| { | ||
| if (depth > 10000) |
There was a problem hiding this comment.
Chosen arbitrarily.
There was a problem hiding this comment.
in a strong tradition 😆
Line 202 in 1e3d811
I don't think 10k stack levels is ever reasonable to do, so it seems fair. Magic number should perhaps be extracted to a constexpr though!
There was a problem hiding this comment.
What happens if we stack overflow before that?
There was a problem hiding this comment.
Status quo: segmentation fault on macOS because our segv handler is Linux x86 only, or so I've heard, and a less than helpful error message if our segv handler does actually work.
Fixing the handler is beyond the scope of this specific work.
There was a problem hiding this comment.
Apparently I'm doing something unreasonable(?), because I'm hitting 12k call depth in two expressions (both doing import-from-derivation). Luckily I can use max-call-depth = 20000 to fix it, but I have to distribute this setting to not leak this detail to users.
What does 10k mean when translated into memory usage? Or, what's the call depth at which the Nix evaluator would segfault at before? (I guess it varies depending on platform?) Although 10k sounds a lot, I have no idea where the segfault would typically hit. 100k? 1M?
src/libexpr/eval.cc
Outdated
|
|
||
| void EvalState::callFunction(Value & fun, size_t nrArgs, Value * * args, Value & vRes, const PosIdx pos) | ||
| { | ||
| if (depth > 10000) |
There was a problem hiding this comment.
in a strong tradition 😆
Line 202 in 1e3d811
I don't think 10k stack levels is ever reasonable to do, so it seems fair. Magic number should perhaps be extracted to a constexpr though!
fe7c5de to
9bb7992
Compare
thufschmitt
left a comment
There was a problem hiding this comment.
Thanks,
I'm not the right person to judge on the overall mechanism (it feels a bit brittle to me because of the arbitrary magic value, but it's also incredibly nice to have a stack trace – esp on macOS). I'll let others (@edolstra or @roberth in particular) have a more informed opinion here.
Other than that, I let a couple comments. The implementation is great, my main concern is that it doesn't have an impact on the overall performance.
src/libutil/error.cc
Outdated
| inline bool operator<(const AbstractPos& lhs, const AbstractPos& rhs) | ||
| { | ||
| if (lhs.line != rhs.line) | ||
| return lhs.line < rhs.line; | ||
| if (lhs.column != rhs.column) | ||
| return lhs.column < rhs.column; | ||
| return false; | ||
| } | ||
| inline bool operator> (const AbstractPos& lhs, const AbstractPos& rhs) { return rhs < lhs; } | ||
| inline bool operator<=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs > rhs); } | ||
| inline bool operator>=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs < rhs); } | ||
|
|
||
| inline bool operator<(const Trace& lhs, const Trace& rhs) | ||
| { | ||
| if (lhs.pos != rhs.pos) { | ||
| if (!lhs.pos) | ||
| return true; | ||
| if (!rhs.pos) | ||
| return false; | ||
| return *lhs.pos < *rhs.pos; | ||
| } | ||
|
|
||
| if (lhs.hint.str() != rhs.hint.str()) | ||
| return lhs.hint.str() < rhs.hint.str(); | ||
|
|
||
| if (lhs.frame != rhs.frame) | ||
| return lhs.frame < rhs.frame; | ||
|
|
||
| return false; | ||
| } | ||
| inline bool operator> (const Trace& lhs, const Trace& rhs) { return rhs < lhs; } | ||
| inline bool operator<=(const Trace& lhs, const Trace& rhs) { return !(lhs > rhs); } | ||
| inline bool operator>=(const Trace& lhs, const Trace& rhs) { return !(lhs < rhs); } |
There was a problem hiding this comment.
Unless you care about the exact semantics of the ordering, I think this could be replaced by a bit of nasty cpp:
| inline bool operator<(const AbstractPos& lhs, const AbstractPos& rhs) | |
| { | |
| if (lhs.line != rhs.line) | |
| return lhs.line < rhs.line; | |
| if (lhs.column != rhs.column) | |
| return lhs.column < rhs.column; | |
| return false; | |
| } | |
| inline bool operator> (const AbstractPos& lhs, const AbstractPos& rhs) { return rhs < lhs; } | |
| inline bool operator<=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs > rhs); } | |
| inline bool operator>=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs < rhs); } | |
| inline bool operator<(const Trace& lhs, const Trace& rhs) | |
| { | |
| if (lhs.pos != rhs.pos) { | |
| if (!lhs.pos) | |
| return true; | |
| if (!rhs.pos) | |
| return false; | |
| return *lhs.pos < *rhs.pos; | |
| } | |
| if (lhs.hint.str() != rhs.hint.str()) | |
| return lhs.hint.str() < rhs.hint.str(); | |
| if (lhs.frame != rhs.frame) | |
| return lhs.frame < rhs.frame; | |
| return false; | |
| } | |
| inline bool operator> (const Trace& lhs, const Trace& rhs) { return rhs < lhs; } | |
| inline bool operator<=(const Trace& lhs, const Trace& rhs) { return !(lhs > rhs); } | |
| inline bool operator>=(const Trace& lhs, const Trace& rhs) { return !(lhs < rhs); } | |
| GENERATE_CMP_EXT(inline, AbstraxtPos, me->line, me->column); | |
| GENERATE_CMP_EXT(inline, Trace, me->pos, me->hint.str(), me->frame); |
(where GENERATE_CMP_EXT comes from libutil/comparator.hh)
There was a problem hiding this comment.
std::tie seems like a great middle ground of having specified semantics, but less code.
There was a problem hiding this comment.
In the C++20 future, one might want to consider using the three-way comparison operator <=>. Its implementation allows the compiler to deduce the rest.
This will have multiple advantages:
- we can get rid of preprocessor macros
- strange semantics that are already implemented individually in the different operator implementation might surface in the light of a unified implementation of the
<=>op
There was a problem hiding this comment.
Interesting. Do you know what's blocking this? We do use c++2a. Should we track it somewhere?
There was a problem hiding this comment.
I don't know if anything blocks this at all. Maybe it's just that no one thought about using it, yet.
There was a problem hiding this comment.
one of the fields here is shared_ptr which by default has very undesirable comparison semantics (pointer address).
Well, if the only reason we need that comparison operator is putting stuff in a Set, pointer comparisons is quite reasonable (and as fast as it can get)
There was a problem hiding this comment.
Pointer comparisons are not reasonable, because each Trace is newly constructed. There's no interning. So it's quite common to have position values that are semantically equal (same line and column number) with different pointer values. I've tried this change and it breaks the tests.
That said, I can simplify these comparisons somewhat and I can replace the AbstractPos one with the default.
There was a problem hiding this comment.
I wonder how costly it would be to intern positions. We quite possibly should do that anyhow, but at a future date.
There was a problem hiding this comment.
Looks like LLVM only got <=> for tuples implemented last year: llvm/llvm-project#50396
EDIT: And std::string. Looks like we'll need clang 16 for this.
There was a problem hiding this comment.
I wonder how costly it would be to intern positions. We quite possibly should do that anyhow, but at a future date.
We do have PosTable.
This PR's use case is not in the hot path, so we don't need to explore this now I think.
src/libexpr/eval.cc
Outdated
|
|
||
| void EvalState::callFunction(Value & fun, size_t nrArgs, Value * * args, Value & vRes, const PosIdx pos) | ||
| { | ||
| if (depth > 10000) |
There was a problem hiding this comment.
What happens if we stack overflow before that?
src/libexpr/eval.cc
Outdated
| if (depth > 10000) | ||
| error("stack overflow").atPos(pos).template debugThrow<EvalError>(); | ||
| CallLevel _level(depth); |
There was a problem hiding this comment.
I think we'll want to benchmark this to make sure that it doesn't bear a noticeable evaluation performance impact.
My usual benchmark is a simple nix search nixpkgs blah --option eval-cache false, but maybe @tfc will have something more meaningful to suggest since he's been working heavily on the evaluator performance.
There was a problem hiding this comment.
Nix 2.18.1:
Executed in 13.19 secs fish external
usr time 11.33 secs 0.12 millis 11.33 secs
sys time 1.76 secs 1.38 millis 1.76 secs
This PR:
Executed in 13.57 secs fish external
usr time 12.36 secs 0.15 millis 12.35 secs
sys time 1.10 secs 2.19 millis 1.10 secs
3% slowdown. Seems OK for preventing a segfault.
There was a problem hiding this comment.
3% slowdown. Seems OK for preventing a segfault.
3% isn't really trivial. But I reliably couldn't reproduce it:
Summary
'nixMaster' ran
1.00 ± 0.01 times faster than 'nixFromPR'
So I think it's fine
There was a problem hiding this comment.
Oh, I was probably running an unoptimized build. Thanks for checking it on your end!
src/libutil/error.cc
Outdated
| } | ||
| } else { | ||
| oss << "\n" << ANSI_WARNING "(" << skippedTraces.size() << " duplicate traces omitted)" ANSI_NORMAL << "\n"; | ||
| tracesSeen.clear(); |
There was a problem hiding this comment.
(nit): Why do we care about that (and likely clearing skippedTraces below)?
There was a problem hiding this comment.
I've added a comment to explain this, but I'll copy it here:
Consider a mutually recursive stack trace with:
- 10 entries of A
- 10 entries of B
- 10 entries of A
If we don't clear tracesSeen here, we would print output like this:
- 1 entry of A
- (9 duplicate traces omitted)
- 1 entry of B
- (19 duplicate traces omitted)
This obscures the control flow, which went from A, to B, and back to A again.
In contrast, if we do clear tracesSeen, the output looks like this:
- 1 entry of A
- (9 duplicate traces omitted)
- 1 entry of B
- (9 duplicate traces omitted)
- 1 entry of A
- (9 duplicate traces omitted)
See: tests/functional/lang/eval-fail-mutual-recursion.nix for a test case exercising this property.
|
For a bigger picture please also read #9627 |
src/libutil/error.cc
Outdated
| return lhs.line < rhs.line; | ||
| if (lhs.column != rhs.column) | ||
| return lhs.column < rhs.column; | ||
| return false; |
There was a problem hiding this comment.
What other cases can occur here other than line and column comparison? This makes the code and semantics harder to understand if the code leaves open what else can lead to the false return.
src/libutil/error.cc
Outdated
| inline bool operator<(const AbstractPos& lhs, const AbstractPos& rhs) | ||
| { | ||
| if (lhs.line != rhs.line) | ||
| return lhs.line < rhs.line; | ||
| if (lhs.column != rhs.column) | ||
| return lhs.column < rhs.column; | ||
| return false; | ||
| } | ||
| inline bool operator> (const AbstractPos& lhs, const AbstractPos& rhs) { return rhs < lhs; } | ||
| inline bool operator<=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs > rhs); } | ||
| inline bool operator>=(const AbstractPos& lhs, const AbstractPos& rhs) { return !(lhs < rhs); } | ||
|
|
||
| inline bool operator<(const Trace& lhs, const Trace& rhs) | ||
| { | ||
| if (lhs.pos != rhs.pos) { | ||
| if (!lhs.pos) | ||
| return true; | ||
| if (!rhs.pos) | ||
| return false; | ||
| return *lhs.pos < *rhs.pos; | ||
| } | ||
|
|
||
| if (lhs.hint.str() != rhs.hint.str()) | ||
| return lhs.hint.str() < rhs.hint.str(); | ||
|
|
||
| if (lhs.frame != rhs.frame) | ||
| return lhs.frame < rhs.frame; | ||
|
|
||
| return false; | ||
| } | ||
| inline bool operator> (const Trace& lhs, const Trace& rhs) { return rhs < lhs; } | ||
| inline bool operator<=(const Trace& lhs, const Trace& rhs) { return !(lhs > rhs); } | ||
| inline bool operator>=(const Trace& lhs, const Trace& rhs) { return !(lhs < rhs); } |
There was a problem hiding this comment.
In the C++20 future, one might want to consider using the three-way comparison operator <=>. Its implementation allows the compiler to deduce the rest.
This will have multiple advantages:
- we can get rid of preprocessor macros
- strange semantics that are already implemented individually in the different operator implementation might surface in the light of a unified implementation of the
<=>op
a359356 to
4c114a2
Compare
4c114a2 to
852bd1d
Compare
roberth
left a comment
There was a problem hiding this comment.
A stack limit is a good thing to have, regardless of the other steps we can take (ie #9627 and related issues).
We do need a setting for the limit to make sure we don't regress. See comment.
Scoped out: a page about stack behavior in the manual. This is very needed, but it would increase the scope too much. It would be appreciated very much (and improve and shorten the release note fwiw).
tests/functional/lang/eval-fail-infinite-recursion-lambda.err.exp
Outdated
Show resolved
Hide resolved
|
|
||
| printSkippedTracesMaybe(); | ||
| oss << "\n" << prefix; | ||
| } |
There was a problem hiding this comment.
thought: If the algorithm could be factored out, it'd be easier to understand and improve.
thought: One possible improvement is to make it understand that clusters of repetition can themselves repeat. This happens when you're doing some iterative thing for each node in a tree for example. Probably there's an easier improvement that I'm not thinking of.
There was a problem hiding this comment.
Factored it out, although it should maybe be a class to avoid passing around so much context.
One possible improvement is to make it understand that clusters of repetition can themselves repeat.
Yeah, I considered this but also wasn't sure of an elegant implementation. I'll keep turning it over.
12fcf78 to
be44f2e
Compare
roberth
left a comment
There was a problem hiding this comment.
If you could rename EvalState::depth to EvalState::callDepth and review my suggestions for comments, I think this is good to go.
587417c to
8d39b17
Compare
This fixes a segfault on infinite function call recursion (rather than
infinite thunk recursion) by tracking the function call depth in
`EvalState`.
Additionally, to avoid printing extremely long stack traces, stack
frames are now deduplicated, with a `(19997 duplicate traces omitted)`
message. This should only really be triggered in infinite recursion
scenarios.
Before:
$ nix-instantiate --eval --expr '(x: x x) (x: x x)'
Segmentation fault: 11
After:
$ nix-instantiate --eval --expr '(x: x x) (x: x x)'
error: stack overflow
at «string»:1:14:
1| (x: x x) (x: x x)
| ^
$ nix-instantiate --eval --expr '(x: x x) (x: x x)' --show-trace
error:
… from call site
at «string»:1:1:
1| (x: x x) (x: x x)
| ^
… while calling anonymous lambda
at «string»:1:2:
1| (x: x x) (x: x x)
| ^
… from call site
at «string»:1:5:
1| (x: x x) (x: x x)
| ^
… while calling anonymous lambda
at «string»:1:11:
1| (x: x x) (x: x x)
| ^
… from call site
at «string»:1:14:
1| (x: x x) (x: x x)
| ^
(19997 duplicate traces omitted)
error: stack overflow
at «string»:1:14:
1| (x: x x) (x: x x)
| ^
6cd3513 to
7434cac
Compare
|
Thank you @9999years! |
Fix segfault on infinite recursion in some cases (cherry picked from commit bf1b294) Change-Id: Id137541426ec8536567835953fccf986a3aebf16
Context
Closes #9616.
This fixes a segfault on infinite function call recursion (rather than infinite thunk recursion) by tracking the function call depth in
EvalState.Additionally, to avoid printing extremely long stack traces, stack frames are now deduplicated, with a
(19997 duplicate traces omitted)message. This should only really be triggered in infinite recursion scenarios.Before:
After:
Future work
--option function-call-depth-limit 500.In some cases, this may print a message like(1 duplicate traces omitted), which is kind of unhelpful. It would be nice to only print that message if more than a couple traces are omitted.Priorities
Add 👍 to pull requests you find important.