Skip to content

IsNull filter seems not working correctly when there is only null values #7038

@xuchen-plus

Description

@xuchen-plus

Describe the bug

When applied IsNull filter on a dataframe's column with only null values, the filtered result is empty.

To Reproduce

I used datafusion python binding with version 27.0.0 as a simple example.

>>> import datafusion
>>> datafusion.__version__
'27.0.0'
import pandas as pd
from datafusion import functions as f

pandas_df = pd.DataFrame({"a": [None], "b": [1]})
ctx = SessionContext()
df = ctx.from_pandas(pandas_df)
df.show()

The above prints the dataframe where a's value is null:

>>> df.show()
DataFrame()
+---+---+
| a | b |
+---+---+
|   | 1 |
+---+---+

Now filter column a with is_null:

df.filter(f.col("a").is_null()).show()

The result is empty:

DataFrame()
++
++

However if there are more rows with non-null values, the filtered result is correct:

>>> pandas_df = pd.DataFrame({"a": [None, 1], "b": [1, 2]})
>>> df = ctx.from_pandas(pandas_df)
>>> df.show()
DataFrame()
+-----+---+
| a   | b |
+-----+---+
|     | 1 |
| 1.0 | 2 |
+-----+---+
>>> df.filter(f.col("a").is_null()).show()
DataFrame()
+---+---+
| a | b |
+---+---+
|   | 1 |
+---+---+

Expected behavior

The filtered result should contain the rows with null value.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions