-
Notifications
You must be signed in to change notification settings - Fork 2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
When applied IsNull filter on a dataframe's column with only null values, the filtered result is empty.
To Reproduce
I used datafusion python binding with version 27.0.0 as a simple example.
>>> import datafusion
>>> datafusion.__version__
'27.0.0'
import pandas as pd
from datafusion import functions as f
pandas_df = pd.DataFrame({"a": [None], "b": [1]})
ctx = SessionContext()
df = ctx.from_pandas(pandas_df)
df.show()The above prints the dataframe where a's value is null:
>>> df.show()
DataFrame()
+---+---+
| a | b |
+---+---+
| | 1 |
+---+---+
Now filter column a with is_null:
df.filter(f.col("a").is_null()).show()The result is empty:
DataFrame()
++
++
However if there are more rows with non-null values, the filtered result is correct:
>>> pandas_df = pd.DataFrame({"a": [None, 1], "b": [1, 2]})
>>> df = ctx.from_pandas(pandas_df)
>>> df.show()
DataFrame()
+-----+---+
| a | b |
+-----+---+
| | 1 |
| 1.0 | 2 |
+-----+---+
>>> df.filter(f.col("a").is_null()).show()
DataFrame()
+---+---+
| a | b |
+---+---+
| | 1 |
+---+---+
Expected behavior
The filtered result should contain the rows with null value.
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working