GH-34154: [Python] Add is_nan method to Array and Expression#34184
GH-34154: [Python] Add is_nan method to Array and Expression#34184jorisvandenbossche merged 11 commits intoapache:mainfrom
is_nan method to Array and Expression#34184Conversation
|
|
78ec952 to
36db249
Compare
|
Is this expected? 🔍 |
|
Looking at the tests in C++, I think it is expected: arrow/cpp/src/arrow/compute/kernels/scalar_validity_test.cc Lines 121 to 132 in 53752ad arrow/cpp/src/arrow/compute/kernels/scalar_validity_test.cc Lines 229 to 244 in 53752ad |
It's not super clear from the assert message, but so it is the "null" that is different (see my inline comment). As Alenka mentioned above, that is indeed expected (and tested on the C++ side) behaviour. In general, predicate functions (checking something, returning a boolean) propagate nulls (in the idea of "you don't know the value, so you also don't know if it is NaN or not, so whether it should be True or False). |
|
I learned something today. Thanks @AlenkaF for pointing that out. Updated the PR with also adding the option of returning |
|
Great, thanks for adding this! |
a56f3fb to
2229918
Compare
2229918 to
2793dd5
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Code looks good, added some comments on the docstrings.
You also still have a related doctest failure:
__________________ [doctest] pyarrow.lib.ChunkedArray.is_nan ___________________
331 ChunkedArray.is_nan(self)
332
333 Return boolean array indicating the nan values.
334
335 Examples
336 --------
337 >>> import pyarrow as pa
338 >>> import numpy as np
339 >>> arr = pa.chunked_array([[2, np.nan, 4], [4, None, 100]])
340 >>> arr.is_nan()
Differences (unified diff with -expected +actual):
@@ -1,12 +1,10 @@
-<pyarrow.lib.ChunkedArray object at ...>
+<pyarrow.lib.ChunkedArray object at 0x7f91d0ff3360>
[
[
false,
true,
- false
- ],
- [
false,
- None,
+ false,
+ null,
false
]
FYI you can check those locally as well: https://arrow.apache.org/docs/dev/developers/python.html#doctest
python/pyarrow/array.pxi
Outdated
|
|
||
| Returns | ||
| ------- | ||
| array : boolean Array, where null values are None |
There was a problem hiding this comment.
What does this "where null values are None" mean exactly?
There was a problem hiding this comment.
That a null value, will result in a Python None:
arrow/python/pyarrow/tests/test_compute.py
Lines 1639 to 1643 in df625f2
There was a problem hiding this comment.
Ah, OK, I didn't get that at first. Now, I don't think that is something we need to mention here in this docstring (that's true for all functions that return arrow data). And it's also only when converting to native Python objects that this happens, the actual result here is a pyarrow.array, and that doesn't use None (so the explanation would be relevant for a method like Array.to_pylist().
The reason you see it in the tests, is because we allow you to use None to specify a null value when constructing an array (but the None gets converted to a proper "null").
There was a problem hiding this comment.
@jorisvandenbossche Okay, that's fine with me as well. It was not directly obvious for me, but it makes sense now I'm aware of it.
Co-authored-by: Joris Van den Bossche <[email protected]>
is_nan expressionis_nan method to Array and Expression
|
Benchmark runs are scheduled for baseline = b679a96 and contender = 2ff4e3a. 2ff4e3a is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
is_nanfor Python #34154