ARROW-9285: [C++] Detect unauthorized memory allocations in function kernels#12089
ARROW-9285: [C++] Detect unauthorized memory allocations in function kernels#12089joosthooz wants to merge 13 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Opening JIRAs ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename pull request title in the following format? or See also: |
|
@joosthooz I have following initial comments:
|
|
@joosthooz Need to change this PR title to: |
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
I am curious if hashing all the values-buffers pointers would suffice instead of storing them in a set. We would still need to traverse nested data structures to capture all the child buffers. This could be solved using a Visitor.
There was a problem hiding this comment.
Hashing would be nicer, but it would also trigger the error if a kernel deletes a buffer it doesn't need anymore. And I don't think we care about that, just newly allocated ones. What do you think?
pitrou
left a comment
There was a problem hiding this comment.
Sorry for not posting a more detailed review (for now!), these are only two high-level comments.
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
I'm not sure how we want to handle this feature at all, but we definitely don't want to run those checks unconditionally, unless they're extremely trivial. Perhaps this can be enabled only on debug builds? (#ifndef _NDEBUG, for example)
cc @lidavidm
There was a problem hiding this comment.
I think debug-only is fine as this will be most useful for developers.
There was a problem hiding this comment.
The checks are now inside #ifndef NDEBUG.
cpp/src/arrow/compute/api_scalar.cc
Outdated
There was a problem hiding this comment.
These should not be added in the public APIs. Instead, these functions should only exist in the corresponding test file, IMHO.
There was a problem hiding this comment.
I've removed them again, also from the function registry. I'm now using a different way of calling the function through an executor, instead of going through CallFunction
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
Nit: to reduce the performance overhead, std::unordered_set<Buffer*> would be better.
cpp/src/arrow/compute/exec.cc
Outdated
There was a problem hiding this comment.
ExecutionError a Gandiva-specific error. Perhaps use Status::Invalid?
|
I think there's a problem with this approach: Besides, child buffers are never preallocated. So what happens is that you must check that preallocated buffers are not changed, but this doesn't prevent other buffers from being allocated as well. |
# Conflicts: # cpp/src/arrow/compute/kernels/CMakeLists.txt
|
Thank you for your contribution. Unfortunately, this pull request has been marked as stale because it has had no activity in the past 365 days. Please remove the stale label or comment below, or this PR will be closed in 14 days. Feel free to re-open this if it has been closed in error. If you do not have repository permissions to reopen the PR, please tag a maintainer. |
Targets https://issues.apache.org/jira/browse/ARROW-9285
If a kernel uses preallocated memory allocation (
MemAllocation::PREALLOCATE), it should not create new buffers. This code aims to check for this.Questions/TODO;
DCHECK_*, it is not possible to test whether a misbehaving kernel is detected. I added an example test with a misbehaving kernel, but for that to work the check should throw an Error that the test can then check for usingASSERT_RAISES_WITH_MESSAGEpreallocate_contiguous_, but there might be additional cases where this check could be useful? For example ifvalidity_preallocated_is true, we could check if there aren't any newly created validity buffers. But the currentAddBuffersToSet()code does not support this.(kernel_->mem_allocation == MemAllocation::PREALLOCATE)but I don't know if that is the right way to go.