Create empty buffer for a buffer specified in the C Data Interface with length zero#8009
Create empty buffer for a buffer specified in the C Data Interface with length zero#8009alamb merged 9 commits intoapache:mainfrom
Conversation
| Some(buf) => Ok(buf), | ||
| Some(buf) => { | ||
| // External libraries may use a dangling pointer for a buffer with length 0. | ||
| // We respect the array length specified in the C Data Interface. Actually, |
There was a problem hiding this comment.
I don't understand this comment. I guess in my mind "if the length is incorrect we can't make a good buffer" is "obvious" but maybe it is not for other readers
There was a problem hiding this comment.
I just wanted to say that if there is a question on if the length is zero, but the pointer isn't a dangling pointer. Previously we simply take any non-null pointer as valid one and didn't consider the length. We ignored the possibility of a dangling pointer.
| } | ||
|
|
||
| #[test] | ||
| #[cfg(not(feature = "force_validate"))] |
There was a problem hiding this comment.
I don't understand why this can't work with force_validate
I ran the test without this line and it seems to work fine:
diff --git a/arrow-array/src/ffi.rs b/arrow-array/src/ffi.rs
index 2ee2fd379e..5b05bcd81d 100644
--- a/arrow-array/src/ffi.rs
+++ b/arrow-array/src/ffi.rs
@@ -621,7 +621,6 @@ mod tests_to_then_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_decimal_round_trip() -> Result<()> {
// create an array natively
let original_array = [Some(12345_i128), Some(-12345_i128), None]
@@ -1523,7 +1522,6 @@ mod tests_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_empty_string_with_non_zero_offset() -> Result<()> {
use super::ImportedArrowArray;
use arrow_buffer::{MutableBuffer, OffsetBuffer};
@@ -1677,7 +1675,6 @@ mod tests_from_ffi {
}
#[test]
- #[cfg(not(feature = "force_validate"))]
fn test_utf8_view_ffi_from_dangling_pointer() {
let empty = GenericByteViewBuilder::<StringViewType>::new().finish();
let buffers = empty.data_buffers().to_vec();cargo test -p arrow-array --features=ffitest result: ok. 185 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 24.13s
There was a problem hiding this comment.
If force_validate is enabled, when we call GenericByteViewArray::new_unchecked to create a Utf8View array, even it is "unchecked" one, but force_validate will force to validate the views buffer. But in this test the ScalarBuffer is a invalid one (it uses an unaligned dangling pointer), so there will be a runtime error.
There was a problem hiding this comment.
I ran the test without this line and it seems to work fine:
cargo test -p arrow-array --features=ffi
Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works. This new test only works when force_validate is not enabled.
There was a problem hiding this comment.
Because cargo test -p arrow-array --features=ffi doesn't enable force_validate so it works.
Right, so force_validate is not enabled
This new test only works when force_validate is not enabled.
Isn't that same? The test works when force_validate is not enabled 😕
There was a problem hiding this comment.
?
Yea, the test works when force_validate is not enabled. So if you run with cargo test -p arrow-array --features=ffi , it works.
There was a problem hiding this comment.
Hmm, did you want to ask why I added #[cfg(not(feature = "force_validate"))]? It is because our CI runs arrow-array tests with force_validate enabled. If I don't add this line, CI will run the test and fail. So I must add the line to pass CI.
Co-authored-by: Andrew Lamb <[email protected]>
|
Thanks again @viirya |
|
Thank you @alamb |
Which issue does this PR close?
Rationale for this change
The failure was described in the issue. In short, the buffer pointer of an empty buffer array exported from polars is a dangling pointer not aligned. But currently we take the raw pointer from C Data Interface and check its alignment before interpreting it as
ScalarBuffer. Thus it causes the failure case.What changes are included in this PR?
This patch changes FFI module to create an empty buffer for exported buffer with length zero. As we never dereference the dangling pointer, seems it's not necessary to require the alignment for it.
For non empty buffers, we still keep the alignment check.
Are these changes tested?
Added a unit test with necessary utility functions.
Are there any user-facing changes?
No