Fix RowConverter roundtrip for List of Dictionary#8067
Closed
LiaCastaneda wants to merge 2 commits intoapache:mainfrom
Closed
Fix RowConverter roundtrip for List of Dictionary#8067LiaCastaneda wants to merge 2 commits intoapache:mainfrom
LiaCastaneda wants to merge 2 commits intoapache:mainfrom
Conversation
tustvold
reviewed
Aug 6, 2025
| arrow-buffer = { workspace = true } | ||
| arrow-data = { workspace = true } | ||
| arrow-schema = { workspace = true } | ||
| arrow-cast = { workspace = true} |
Contributor
There was a problem hiding this comment.
We ideally want to avoid doing this, as it adds a significant additional dependency
Contributor
|
I think the issue is that Datafusion is not handling the fact that row encoding "hydrates" dictionaries. It should be updated to understand that |
Contributor
|
See #7165 |
Author
|
Thanks @tustvold , then I understand the solution on @ding-young 's PR #7627 would be what we want? since I understand it keeps the flattened List |
Author
|
closing this one in favour of ding-young's PR |
alamb
added a commit
that referenced
this pull request
Aug 21, 2025
# Which issue does this PR close? - related to #7627 - Related to #4811 # Rationale for this change It was not clear to me what the expected behavior for round trip through row converter was for DictionaryArrays, so let's document what @tustvold says here: #8067 (comment) > I think the issue is that Datafusion is not handling the fact that row encoding "hydrates" dictionaries. It should be updated to understand that List<Dictionary<...>> will be converted to List<...>, much like it already handles this for the non-nested case. Converting back to a dictionary is expensive, and likely pointless, not to mention a breaking change. # What changes are included in this PR? Document expected behavior with english comments and doc test # Are these changes tested? Yes (doctests) # Are there any user-facing changes? More docs, no behavior change
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes apache/datafusion#17012
Rationale for this change
The rountrip
List<Dictionary<(),()>> -> Row -> List<Dictionary<(),()>>is failing:Expected infallible creation of GenericListArray from ArrayDataRef failed: InvalidArgumentError("[Large]ListArray's child datatype Utf8 does not correspond to the List's datatype Dictionary(Int8, Utf8)")What changes are included in this PR?
My approach is to encode back to Dictionary when we convert back the rows to Arrays, a straightforward way to do is is ussing
arrow_cast::castthat under the hood builds a dictionary from an Array.Are these changes tested?
Yes, I added a roundtrip test that verifies that the original Array is the same as the resulting one after the roundtrip
Are there any user-facing changes?
No