ARROW-10261: [Rust] [Breaking] Change List datatype to Box<Field>#8608
ARROW-10261: [Rust] [Breaking] Change List datatype to Box<Field>#8608nevi-me wants to merge 6 commits intoapache:masterfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
I reviewed all the changes in this PR and they look good to me -- they do what the PR says they do and the rationale for the change on https://issues.apache.org/jira/browse/ARROW-10261 makes sense too
One thought I had that might make the code slightly simpler would be to add a DataType::new_list() function, which would allow code like this:
DataType::List(Box::new(Field::new(
"array",
DataType::Boolean,
true,
))),
to be written like this
DataType::new_list(DataType::Boolean, true)
We can always add such a convenience function later, however. It is definitely not required in this PR
rust/arrow/src/ipc/convert.rs
Outdated
There was a problem hiding this comment.
I wonder if there is any reason to leave this commented out? As in why not remove the old version?
I have the same question for the other instances in this file
There was a problem hiding this comment.
Forgot about it(them), I was struggling with compiling due to lifetime issues, so I commented out those lines in case I needed to revert them. I've now removed them
jorgecarleitao
left a comment
There was a problem hiding this comment.
LGTM. Thanks a lot, @nevi-me !
This is part 1 of potentially 3 parts, it only changes the `arrow` crate `parquet` depends on `arrow`, and `datafusion` depends on both. `datafusion` will thus only compile after `parquet` is fixed.
69198da to
dd8178c
Compare
Yes @alamb, we can address it as a follow-up. There's more work that I also need to do on top of this, I've hardcoded "item" in a bunch of places, and on the Parquet side, there's some compatibility tests that I need to write. @jorgecarleitao happy that we merge this if/when tests pass, or would you like another reviewer to look at this? |
|
Ready to merge if the tests pass. I went through all the changes and they seem reasonable. Most issues I had were the same that @alamb suggested, and the rest seems straightforward given the changes on the DataType. I have a question related to why do we use a |
We need some indirection to break the potential infinite recursion error[E0072]: recursive type `datatypes::Field` has infinite size
--> arrow\src\datatypes.rs:188:1
|
188 | pub struct Field {
| ^^^^^^^^^^^^^^^^ recursive type has infinite size
189 | name: String,
190 | data_type: DataType,
| ------------------- recursive without indirection
|
= help: insert indirection (e.g., a `Box`, `Rc`, or `&`) at some point to make `datatypes::Field` representable |
This changes a list datatype to use
Box<Field>instead ofBox<DataType>.This change is needed in order to make both Parquet and IPC roundtrips pass.
The C++ implementation uses
Field, as it allows for preservice list field names and nullability.There are some implementation details which we did not cover in this PR, and will work on as a follow-up, and these are: