-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
DataFusion has an optimized version of concat(col1, ...) for StringView added by @devanbenz in apache/datafusion#12224 that uses a StringViewArrayBuilder which is similar but not the same as StringViewBuilder in arrow: https://github.com/apache/datafusion/blob/9bc39a0522840ed90de2a4d23157de2e192cd00f/datafusion/functions/src/string/common.rs#L464-L536
The major differences are:
- You can call
writeto incrementally build up each string and then callappend_offsetto create each string.StringBuilderrequires each input to be a single contiguous string to call - You can avoid creating the null buffer and pass it in to the finish
Describe the solution you'd like
I would like the APIs in arrow-rs to be sufficiently complete that we could use the arrow-rs versions rather than our own custom versions in DataFusion
Describe alternatives you've considered
StringBuilder allows this like
use std::fmt::Write;
use arrow_array::builder::GenericStringBuilder;
let mut builder = GenericStringBuilder::<i32>::new();
// Write data in multiple `write!` calls
write!(builder, "foo").unwrap();
write!(builder, "bar").unwrap();
// The next call to append_value finishes the current string
// including all previously written strings.
builder.append_value("baz");
let array = builder.finish();
assert_eq!(array.value(0), "foobarbaz");I think it would be cool to try and follow the same API for StringViewBuilder -- though note that API may be more complicated to ensure we don't get any performance regressions
Additional context
- See Improve
GenericStringBuilderdocumentation #6372 for better docs of what StringBuilder does - Ability to append non contiguous strings to
StringBuilder#6347 lists more background