Skip to content

Add a way to get the dictionary index and values array reference #672

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
To work with the value of a dictionary array at a particular index, the following pattern shows up many times in the arrow-rs and datafusion codebase:

    fn do_something <K: ArrowDictionaryKeyType>(
        array: &ArrayRef,
        index: usize,
    ) -> Result<Self> {
        let dict_array = array.as_any().downcast_ref::<DictionaryArray<K>>().unwrap();

        // look up the index in the values dictionary
        let keys_col = dict_array.keys();
        let values_index = keys_col.value(index).to_usize().ok_or_else(|| {
            DataFusionError::Internal(format!(
                "Can not convert index to usize in dictionary of type creating group by value {:?}",
                keys_col.data_type()
            ))
        })?;
        
        // do actual work with dict_array.values and values_index
        // ...
    }
 }

Repeating this code "find the index" code is tedious

Describe the solution you'd like

Add a function such as the following on to DictionaryArray (would love suggestions about better names):

impl DictionaryArray<K: ArrowDictionaryKeyType> {
  
  // return the index into the dictionary values for array@index as well
  // as the dictionary values
  #[inline]
  fn dict_value(
    self: 
    index: usize,
  ) -> Result<(&ArrayRef, Option<usize>)>
  {
    // look up the index in the values dictionary
    let keys_col = self.keys();
    if !keyd_col.is_valid(index) {
      return Ok((self.values(), None));
    }
    let values_index = keys_col.value(index).to_usize().ok_or_else(|| {
        DataFusionError::Internal(format!(
            "Can not convert index to usize in dictionary of type creating group by value {:?}",
            keys_col.data_type()
        ))
    })?;

    Ok((self.values(), Some(values_index)))
}

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions