Skip to content

BUG: Fix segfault due to out of bound pointer in floatstatus check#17751

Merged
charris merged 2 commits intonumpy:masterfrom
seberg:fix-ci-segfaults
Nov 10, 2020
Merged

BUG: Fix segfault due to out of bound pointer in floatstatus check#17751
charris merged 2 commits intonumpy:masterfrom
seberg:fix-ci-segfaults

Conversation

@seberg
Copy link
Copy Markdown
Member

@seberg seberg commented Nov 10, 2020

This should fix the CI segfaults tracked in #17542.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Nov 10, 2020

I am not quite sure whether the minimum/maximum which also calls this function needs similar treatment. It seems to me that loop is only used for reductions and thus is fine?

@charris charris added 09 - Backport-Candidate PRs tagged should be backported component: numpy._core labels Nov 10, 2020
@charris charris added this to the 1.19.5 release milestone Nov 10, 2020
@charris
Copy link
Copy Markdown
Member

charris commented Nov 10, 2020

Glad you are checking other uses, there were three in the simd file. Looks like the problematic commit was merged last February.

@charris
Copy link
Copy Markdown
Member

charris commented Nov 10, 2020

Looks like we are more likely to get more recent hardware on actions.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Nov 10, 2020

Yeah, that would explain why it is suddenly so common. I think the other things are fine, but most of our other code just uses dimensions, etc. (variables which are never modified) here. So if that is enough to avoid compiler reordering, (which it probably is due to the volatile flag?), then I should do the same here as it is simpler. E.g.:

npy_clear_floatstatus_barrier((char*)dimensions);

@charris
Copy link
Copy Markdown
Member

charris commented Nov 10, 2020

Does the actual value matter as long as it is valid memory? I vaguely recall the discussion about these barrier functions but have forgotten how they worked.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Nov 10, 2020

The value doesn't matter, the whole point is that it is types as volatile so that the process can't reorder the instructions/math. I doubt reordering is likely here anyway, since there is a loop, but I am not sure what reorder magic compilers are capable of.

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Nov 10, 2020

But considering that, I am honestly not sure that a stack allocated value like this is a valid "barrier"...

@charris
Copy link
Copy Markdown
Member

charris commented Nov 10, 2020

ISTR that the only reason the variable was passed in was to make volatile work. If so, then it has no other effect.

@charris charris merged commit d6ecc97 into numpy:master Nov 10, 2020
@charris
Copy link
Copy Markdown
Member

charris commented Nov 10, 2020

Thanks Sebastian, lets see if this fixes things.

@seberg seberg deleted the fix-ci-segfaults branch November 10, 2020 23:45
@mattip
Copy link
Copy Markdown
Member

mattip commented Nov 11, 2020

FWIW, when I put this in I used the compiler explorer to test out what assembly was emitted by different constructs. I added the local variable reference to force the compiler not to reorder the barrier call. It didn't seem to matter what variable I used, as long as the compiler did not optimize it away.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Nov 11, 2020
@charris charris removed this from the 1.19.5 release milestone Nov 11, 2020
@tjb900
Copy link
Copy Markdown

tjb900 commented Dec 10, 2020

Thanks very much for fixing this - it's actually been causing us a reasonable amount of grief. Given the simple and low-risk nature of the final change, just wondering if it's possible to reconsider it for inclusion in 1.19.5?

We've worked around most occurrences of this issue with some python-level code changes (e.g. using np.real(np.conj(x) * x)) but of course some uses of complex np.abs exist in other libraries which are still tripping us up.

Either way, thanks again!

@seberg
Copy link
Copy Markdown
Member Author

seberg commented Dec 10, 2020

@tjb900 it is already included in the 1.19.x branch, just waiting for the actual release.

@tjb900
Copy link
Copy Markdown

tjb900 commented Dec 11, 2020

Ah, of course - I misinterpreted the label changes. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants