[arrow-cast] Support cast from Numeric (Int, UInt, etc) to Utf8View#6719
[arrow-cast] Support cast from Numeric (Int, UInt, etc) to Utf8View#6719tlm365 wants to merge 9 commits intoapache:mainfrom
Int, UInt, etc) to Utf8View#6719Conversation
Signed-off-by: Tai Le Manh <[email protected]>
|
@Omega359 During the code review, I found that it is possible to implement support cast for all numeric values ( |
f4dfcda to
2a937fe
Compare
arrow-cast/src/cast/string.rs
Outdated
| let nulls = array.nulls(); | ||
| for i in 0..array.len() { | ||
| match nulls.map(|x| x.is_null(i)).unwrap_or_default() { | ||
| false => builder.append_value(formatter.value(i).try_to_string()?), |
There was a problem hiding this comment.
It would be more optimal to use the std::fmt::write support as for StringArray above.
As written this will allocate for every value which will be very expensive
There was a problem hiding this comment.
@tustvold Thanks so much for reviewing.
It would be more optimal to use the std::fmt::write support as for StringArray above.
As written this will allocate for every value which will be very expensive
I get it now. Will try to implement it. TYSM ❤️
Signed-off-by: Tai Le Manh <[email protected]>
arrow-cast/src/cast/mod.rs
Outdated
| (BinaryView, _) => Err(ArrowError::CastError(format!( | ||
| "Casting from {from_type:?} to {to_type:?} not supported", | ||
| ))), | ||
| (from_type, Utf8View) if from_type.is_primitive() => { |
There was a problem hiding this comment.
I believe this also fixes the Timestamp -> Utf8View issue. It would be good to have tests for temporal -> Utf8View added to cover this case.
There was a problem hiding this comment.
After reviewing the code, I realized that the Timestamp -> Utf8View cast is not supported yet.
The main issue comes from the current implementation of formatter.format.write (source) which currently only applies to DisplayIndex derives (source), but the Temporal datatype is implemented based on DisplayIndexState (source).
I think this issue deserves a separate PR to handle the temporal -> string view casting.
There was a problem hiding this comment.
I'll file another PR today to cover the temporal -> Utf8View case unless someone beats me to it.
Signed-off-by: Tai Le Manh <[email protected]>
1a6868a to
74de9bc
Compare
Decimal (numeric) to Utf8View
Decimal (numeric) to Utf8ViewInt, UInt, etc) to Utf8View`
| /// ``` | ||
| pub type StringViewBuilder = GenericByteViewBuilder<StringViewType>; | ||
|
|
||
| impl std::fmt::Write for StringViewBuilder { |
There was a problem hiding this comment.
this is also what is contemplated by #6373 (aka I think this PR fixes that ticket as well)
|
|
||
| impl std::fmt::Write for StringViewBuilder { | ||
| fn write_str(&mut self, s: &str) -> std::fmt::Result { | ||
| self.append_value(s); |
There was a problem hiding this comment.
I was writing some tests for this, and it turns out this is different behavior than StringViewBuilder
Specifically, calling write_str doesn't compete the row 🤔
I made a PR showing the problem: tlm365#1
There was a problem hiding this comment.
I am working on a potential solution so we can unblock this PR
|
Here is proposal for how to implement std::fmt::Write: #6777 |
Int, UInt, etc) to Utf8View`Int, UInt, etc) to Utf8View
Add documentation examples for `StringViewBuilder::write_str`
| pub(crate) fn value_to_string_view( | ||
| array: &dyn Array, | ||
| options: &CastOptions, | ||
| ) -> Result<ArrayRef, ArrowError> { | ||
| let mut builder = StringViewBuilder::with_capacity(array.len()); | ||
| let formatter = ArrayFormatter::try_new(array, &options.format_options)?; | ||
| let nulls = array.nulls(); | ||
| for i in 0..array.len() { | ||
| match nulls.map(|x| x.is_null(i)).unwrap_or_default() { | ||
| true => builder.append_null(), | ||
| false => formatter.value(i).write(&mut builder)?, | ||
| } | ||
| } | ||
| Ok(Arc::new(builder.finish())) | ||
| } |
There was a problem hiding this comment.
I have been dreaming about this PR and how to unblock it.
I think it is currently stalled by trying to figure out how to handle std::fmt::Write
Here is an alternate proposal:
- Change this implementation to avoid reallocating on each row, but still copy
- Remove the std::fmt::Write implementation (and we can sort that out in a different PR)
So the first point might look something like this:
| pub(crate) fn value_to_string_view( | |
| array: &dyn Array, | |
| options: &CastOptions, | |
| ) -> Result<ArrayRef, ArrowError> { | |
| let mut builder = StringViewBuilder::with_capacity(array.len()); | |
| let formatter = ArrayFormatter::try_new(array, &options.format_options)?; | |
| let nulls = array.nulls(); | |
| for i in 0..array.len() { | |
| match nulls.map(|x| x.is_null(i)).unwrap_or_default() { | |
| true => builder.append_null(), | |
| false => formatter.value(i).write(&mut builder)?, | |
| } | |
| } | |
| Ok(Arc::new(builder.finish())) | |
| } | |
| pub(crate) fn value_to_string_view( | |
| array: &dyn Array, | |
| options: &CastOptions, | |
| ) -> Result<ArrayRef, ArrowError> { | |
| let mut builder = StringViewBuilder::with_capacity(array.len()); | |
| let formatter = ArrayFormatter::try_new(array, &options.format_options)?; | |
| let nulls = array.nulls(); | |
| // buffer to avoid reallocating on each value | |
| // TODO: replace with write to builder after https://github.com/apache/arrow-rs/issues/6373 | |
| mut buffer = String::new(); | |
| for i in 0..array.len() { | |
| match nulls.map(|x| x.is_null(i)).unwrap_or_default() { | |
| true => builder.append_null(), | |
| false => { | |
| // write to buffer first and then copy into target array | |
| buffer.clear(); | |
| formatter.value(i).write(&mut buffer)?, | |
| bulder.append_value(&buffer) | |
| } | |
| } | |
| } | |
| Ok(Arc::new(builder.finish())) | |
| } |
There was a problem hiding this comment.
This will work nicely (well, once it compiles) and would unblock downstream issues. 👍🏻
|
Let's call it a good team effort! |
Which issue does this PR close?
Closes #6714.
related to #6373
Rationale for this change
Add support cast from numeric(
Int/Float/Decimal) to string view (Utf8View).What changes are included in this PR?
The cast logic and corresponding unit tests.
Are there any user-facing changes?
No.