Skip to content

Implement std::fmt::Write for StringViewBuilder to permit non contiguous data to be written #6373

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

DataFusion has an optimized version of concat(col1, ...) for StringView added by @devanbenz in apache/datafusion#12224 that uses a StringViewArrayBuilder which is similar but not the same as StringViewBuilder in arrow: https://github.com/apache/datafusion/blob/9bc39a0522840ed90de2a4d23157de2e192cd00f/datafusion/functions/src/string/common.rs#L464-L536

The major differences are:

  1. You can call write to incrementally build up each string and then call append_offset to create each string. StringBuilder requires each input to be a single contiguous string to call
  2. You can avoid creating the null buffer and pass it in to the finish

Describe the solution you'd like
I would like the APIs in arrow-rs to be sufficiently complete that we could use the arrow-rs versions rather than our own custom versions in DataFusion

Describe alternatives you've considered
StringBuilder allows this like

use std::fmt::Write;
use arrow_array::builder::GenericStringBuilder;
let mut builder = GenericStringBuilder::<i32>::new();

// Write data in multiple `write!` calls
write!(builder, "foo").unwrap();
write!(builder, "bar").unwrap();
// The next call to append_value finishes the current string
// including all previously written strings.
builder.append_value("baz");

let array = builder.finish();
assert_eq!(array.value(0), "foobarbaz");

I think it would be cool to try and follow the same API for StringViewBuilder -- though note that API may be more complicated to ensure we don't get any performance regressions

Additional context

  1. See Improve GenericStringBuilder documentation #6372 for better docs of what StringBuilder does
  2. Ability to append non contiguous strings to StringBuilder #6347 lists more background

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions