Split out arrow-array crate (#2594)#2769
Conversation
| } | ||
| } | ||
|
|
||
| impl<T> PyArrowConvert for T |
There was a problem hiding this comment.
This can't be implemented as it errors complaining that arrow_schema::DataType could be updated to implement Array + From<ArrayData> which would then cause a conflict with the impl PyArrowConvert for DataType.
Ultimately this impl is not hugely important, as it is just a case of using make_array and Array::data
|
marking as api-change due to removal of |
alamb
left a comment
There was a problem hiding this comment.
This is a pretty epic PR -- I went through it fairly carefully and it looks great to me
| /// assert_eq!(array.keys(), &Int8Array::from(vec![0, 0, 1, 2])); | ||
| /// assert_eq!(array.values(), &values); | ||
| /// ``` | ||
| pub type Int8DictionaryArray = DictionaryArray<Int8Type>; |
There was a problem hiding this comment.
these pub types are new, right?
There was a problem hiding this comment.
| use std::any::Any; | ||
|
|
||
| /// | ||
| /// # Example: Using `collect` |
There was a problem hiding this comment.
💯 for adding basic doc examples to these typedefs
| assert!(!as_decimal_array(&array).is_empty()); | ||
| let result_decimal = as_decimal_array(&array); | ||
| assert_eq!(result_decimal, &array); | ||
| } |
| use crate::builder::*; | ||
|
|
||
| #[test] | ||
| fn test_buffer_builder_availability() { |
There was a problem hiding this comment.
I wonder if this is the kind of thing that should be in in a tests type integration test to ensure that the types are pub and not pub(crate) for example
| fn schema(&self) -> SchemaRef; | ||
|
|
||
| /// Reads the next `RecordBatch`. | ||
| #[deprecated( |
There was a problem hiding this comment.
🎉 This is another breaking API change (nice cleanup
|
|
||
| // export | ||
| array.to_pyarrow(py) | ||
| array.data().to_pyarrow(py) |
There was a problem hiding this comment.
this seems a very reasonable change
(but is it also an API change?)
There was a problem hiding this comment.
Yes, sorry I made this change after I wrote the PR description
| fn schema(&self) -> SchemaRef; | ||
|
|
||
| /// Reads the next `RecordBatch`. | ||
| #[deprecated( |
There was a problem hiding this comment.
Oh, whoops -- maybe we should remove this deprecated API (perhaps as a follow on PR)
|
FWIW it occurs to me we probably need to update the github workflow triggers to reflect this new code organization: SHould probably include |
|
Running the benchmarks, some of the faster benchmarks do show the odd ~10% regression, but we're talking 10s of microseconds here. I'm inclined to think this is not an issue, and if it transpires to be so, we can revisit those kernels. |
|
Benchmark runs are scheduled for baseline = 6bee576 and contender = 06c204c. 06c204c is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
arrow-array cratearrow-array crate (#2594)
Draft as I wish to perform another pass, and double-check the benchmarksWhich issue does this PR close?
Part of #2594
Rationale for this change
Continues the process of splitting apart the crate, so that components can depend on just what they need, compilation parallelizes better, etc...
What changes are included in this PR?
Moves the array, array builders, and record batch definitions into a new arrow-array crate
Are there any user-facing changes?
The deprecated
RecordBatch::concatis removed, otherwise there are no breaking changes 🎉