COUNT and COUNT DISTINCT produce incorrect results for dictionary arrays with null values

### Describe the bug

DataFusion's COUNT and COUNT DISTINCT aggregate functions produce incorrect results when operating on dictionary arrays that contain null values. The functions appear to be counting dictionary keys rather than properly handling the null values referenced by those keys.

Two specific issues have been identified:

COUNT with dictionary arrays: When a dictionary array has keys that reference null values, COUNT incorrectly counts those null references as valid values instead of ignoring them.

COUNT DISTINCT with all-null dictionary arrays: When a dictionary array contains only keys that reference null values, COUNT DISTINCT should return 0 but may return incorrect results.




### To Reproduce

Issue 1 - COUNT with mixed null/non-null dictionary values:

```rust
use arrow::array::{DictionaryArray, Int32Array, StringArray};
use arrow::datatypes::Int32Type;
use std::sync::Arc;

// Create dictionary with values ["a", null, "c"]
let values = StringArray::from(vec![Some("a"), None, Some("c")]);
// Keys [0, 1, 2, 0, 1] reference: "a", null, "c", "a", null
let keys = Int32Array::from(vec![0, 1, 2, 0, 1]);
let dict_array = DictionaryArray::<Int32Type>::try_new(keys,
Arc::new(values))?;

// COUNT should return 3 (only non-null values: "a", "c", "a")
// But may incorrectly count the null references 
```
Issue 2 - COUNT DISTINCT with all-null dictionary values:

```rust
// Create dictionary where all keys reference null values
let dict_values = StringArray::from(vec![None, Some("abc")]);
let dict_indices = Int32Array::from(vec![0; 5]); // All keys point to
null
let dict_array = DictionaryArray::<Int32Type>::try_new(dict_indices,
Arc::new(dict_values))?;

// COUNT DISTINCT should return 0 since all referenced values are null   
```

### Expected behavior

COUNT: Should only count non-null values in dictionary arrays by properly dereferencing dictionary keys to their actual values and ignoring null references.

COUNT DISTINCT: Should return 0 when all dictionary keys reference null values, and should properly count only distinct non-null values when there's a mix of null and non-null references.

Both functions should handle dictionary arrays by:

Dereferencing dictionary keys to their actual values Applying null-checking logic to the dereferenced values, not the keys Following the same null-handling semantics as regular arrays


### Additional context

This issue affects the correctness of aggregate queries on dictionary-encoded columns, which are commonly used in analytical workloads for memory efficiency. The bug could lead to incorrect query results in production environments.

The issue is present in the core aggregation logic for both regular COUNT and COUNT DISTINCT operations when processing DictionaryArray inputs. The functions need to properly handle the indirection layer that dictionary encoding introduces.  

🔑 Root cause: Arrow’s current DictionaryArray::is_null implementation only inspects the index bitmap (i.e. it checks which keys are null) but does not consult the values buffer to see whether a key points at a null slot. As a result, any code (including DataFusion’s aggregates) that relies on is_null will mis-classify null references as non-null. [Arrow mentioned that they took the approach for performance reasons](https://github.com/apache/arrow-rs/pull/7608).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COUNT and COUNT DISTINCT produce incorrect results for dictionary arrays with null values #16339

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

COUNT and COUNT DISTINCT produce incorrect results for dictionary arrays with null values #16339

Description

Describe the bug

To Reproduce

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions