-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Describe the bug
DictionaryArray::is_null(i) currently returns false for entries where the dictionary index is valid, even if the value it points to in the dictionary value array is null. This results in incorrect behavior when dictionary-encoded arrays are used in higher-level operations like count(distinct ...), which rely on is_null to exclude nulls.
This violates expected null semantics — the dictionary key is not null, but the resolved value is — which should still be treated as null.
To Reproduce
use arrow::array::{ArrayRef, DictionaryArray, Int32Array, StringArray};
use std::sync::Arc;
fn main() {
let dict_values = StringArray::from(vec![None, Some("abc")]);
let dict_indices = Int32Array::from(vec![0, 0, 0, 0, 0]); // All indices point to a null value
let dict = DictionaryArray::new(dict_indices, Arc::new(dict_values) as ArrayRef);
for i in 0..dict.len() {
println!("is_null({}) = {}", i, dict.is_null(i));
}
}Additional context
First raised here - apache/datafusion#16228
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested