Skip to content

DictionaryArray::is_null does not consider null values in the dictionary value array #7607

@kosiew

Description

@kosiew

Describe the bug

DictionaryArray::is_null(i) currently returns false for entries where the dictionary index is valid, even if the value it points to in the dictionary value array is null. This results in incorrect behavior when dictionary-encoded arrays are used in higher-level operations like count(distinct ...), which rely on is_null to exclude nulls.

This violates expected null semantics — the dictionary key is not null, but the resolved value is — which should still be treated as null.


To Reproduce

use arrow::array::{ArrayRef, DictionaryArray, Int32Array, StringArray};
use std::sync::Arc;

fn main() {
    let dict_values = StringArray::from(vec![None, Some("abc")]);
    let dict_indices = Int32Array::from(vec![0, 0, 0, 0, 0]); // All indices point to a null value
    let dict = DictionaryArray::new(dict_indices, Arc::new(dict_values) as ArrayRef);

    for i in 0..dict.len() {
        println!("is_null({}) = {}", i, dict.is_null(i));
    }
}

Additional context

First raised here - apache/datafusion#16228

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions