Use generics in create_hashes#42
Conversation
| .with_precision_and_scale(20, 3) | ||
| .unwrap(); | ||
| let array_ref = Arc::new(array); | ||
| let array_ref: ArrayRef = Arc::new(array); |
There was a problem hiding this comment.
this is the major user facing change -- if you explicitly create an Arc ... something to create_hashes, before this PR the type inference algorithm would automatically determine ArrayRef. Now you need to explicitly declare that you want an ArrayRef
| &random_state, | ||
| hashes_buff, | ||
| )?; | ||
| let hashes = create_hashes([&left.columns()[0]], &random_state, hashes_buff)?; |
There was a problem hiding this comment.
it makes the callsites look much nicer
|
Wrong target repo ?! |
Hi @martin-g -- actually in this case I meant to make a PR in this repo (this is a PR to a branch in the pydantic fork -- so basically merging it will update the PR in the apache repo) |
|
OK! I noticed your comment at apache#18448 (comment) and thought that it might have been a mistake. |
| let mut hashes2 = vec![0; array.len()]; | ||
| create_hashes_from_arrays(&[array.as_ref()], &random_state, &mut hashes2) | ||
| .unwrap(); | ||
| create_hashes([array], &random_state, &mut hashes2).unwrap(); |
There was a problem hiding this comment.
I don't think this works. The whole point is that I want to call create_hashes from a function where I only have &dyn Array, which does not implement AsRef<dyn Array>.
As far as I know there's no way to convert an &dyn Array into AsRef<dyn Arary without some sort of wrapper type:
struct AsRefWrapper<'a>(&'a dyn Array);
impl<'a> AsRef<dyn Array> for AsRefWrapper<'a> {
fn as_ref(&self) -> &dyn Array {
self.0
}
}Or something like that. Which seems... a bit annoying to have to do at each call site.
I think I tried the generics method and ended up with two functions because of this limitation.
Do you know a better way around this?
There was a problem hiding this comment.
I see
I think we should at least add a test for the specific usecase. Here is what I have:
#[test]
fn test_create_hashes_from_dyn_arrays() {
let int_array: ArrayRef = Arc::new(Int32Array::from(vec![1, 2, 3, 4]));
let float_array: ArrayRef =
Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0, 4.0]));
// Verify that we can call create_hashes with only &dyn Array
fn test(arr1: &dyn Array, arr2: &dyn Array) {
let random_state = RandomState::with_seeds(0, 0, 0, 0);
let hashes_buff = &mut vec![0; arr1.len()];
let hashes =
create_hashes([arr1, arr2], &random_state, hashes_buff).unwrap();
assert_eq!(hashes.len(), 4,);
}
test(&*int_array, &*float_array);
}Maybe we can do this with a special trait -- I'll play around with it
There was a problem hiding this comment.
I made a special trait for this -- I don't really know why it is necessary but it does seem to work 🤔
There was a problem hiding this comment.
"creative" could either be good or bad in this context 😆
There was a problem hiding this comment.
I think good! It's an internal implementation detail that users won't have to think about, but it allows us to make the improvements to this function for all callers without introducing a new API. It's good 👍🏻
| Arc::new(Float64Array::from(vec![1.0, 2.0, 3.0, 4.0])); | ||
|
|
||
| // Verify that we can call create_hashes with only &dyn Array | ||
| fn test(arr1: &dyn Array, arr2: &dyn Array) { |
There was a problem hiding this comment.
here is a test showing calling create_hashes with only &dyn Arrays
| // combine hashes with `combine_hashes` for all columns besides the first | ||
| let rehash = i >= 1; | ||
| hash_single_array(array, random_state, hashes_buffer, rehash)?; | ||
| pub trait AsDynArray { |
There was a problem hiding this comment.
i assume we can't implement this for T: AsRef<dyn Array>?
There was a problem hiding this comment.
The answer is no:
conflicting implementations of trait `AsDynArray` for type `&dyn arrow::array::Array`
upstream crates may add a new impl of trait `std::convert::AsRef<(dyn arrow::array::Array + 'static)>` for type `&dyn arrow::array::Array` in future versions
🤷🏻
|
Merged let's continue in apache#18448, thanks! |
This is a proposal for updating this PR
Merging this PR will update apache#18448
Basically the idea is to change the signature of the
create_hashesso it can take any iterator of things that can turn into a &dyn Arrayrather than having multiple functionsI think it makes create_hashes a little more complicated, but the callsites are much easier to understand
This is also mostly backwards compatibable -- the only code I needed to change was tests