Skip to content

ARROW-7364: [Rust] Specialize cast_options.safe casting for performance#2

Merged
seddonm1 merged 1 commit intoseddonm1:safe-castfrom
alamb:alamb/faster_cast
Apr 5, 2021
Merged

ARROW-7364: [Rust] Specialize cast_options.safe casting for performance#2
seddonm1 merged 1 commit intoseddonm1:safe-castfrom
alamb:alamb/faster_cast

Conversation

@alamb
Copy link

@alamb alamb commented Apr 5, 2021

@seddonm1 -- This is a proposed addition to your PR apache#9682, where we added support for CastOptions::safe

As part of that new code, we added several calls to collect() which means an extra copy of the data is made, as well as a check for CastOption::safe for each element.

Changes

This PR improves performance by refactoring the code to use the pre CastOptions codepath when CastOptions::safe is false, and the new codepath when CastOptions::safe is true

Benchmarks:

(I am working on them)

@github-actions
Copy link

github-actions bot commented Apr 5, 2021

@seddonm1
Copy link
Owner

seddonm1 commented Apr 5, 2021

@alamb Yes, this was my next approach. I didn't realise that rust treated closures as different branches which makes the reuse of code less possible. I think this is safe to merge.

@seddonm1 seddonm1 merged commit 9527cab into seddonm1:safe-cast Apr 5, 2021
@alamb alamb deleted the alamb/faster_cast branch April 5, 2021 21:41
@alamb
Copy link
Author

alamb commented Apr 5, 2021

. I didn't realise that rust treated closures as different branches which makes the reuse of code less possible.

The issue I found was that if two branches both return types that implement Iterator of the same item type (e.g. i32) it still won't compile unless the actual underlying iterator type is the same

seddonm1 pushed a commit that referenced this pull request Apr 7, 2021
From a deadlocked run...

```
#0  0x00007f8a5d48dccd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8a5d486f05 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f8a566e7e89 in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#3  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#4  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#5  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#6  0x00007f8a566e827d in arrow::internal::FnOnce<void ()>::FnImpl<arrow::Future<Aws::Utils::Outcome<Aws::S3::Model::ListObjectsV2Result, Aws::S3::S3Error> >::Callback<arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler> >::invoke() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#7  0x00007f8a5650efa0 in arrow::FutureImpl::AddCallback(arrow::internal::FnOnce<void ()>) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#8  0x00007f8a566e67a9 in arrow::fs::(anonymous namespace)::TreeWalker::ListObjectsV2Handler::SpawnListObjectsV2() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#9  0x00007f8a566e723f in arrow::fs::(anonymous namespace)::TreeWalker::WalkChild(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int) () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
apache#10 0x00007f8a566e74b1 in arrow::fs::(anonymous namespace)::TreeWalker::DoWalk() () from /arrow/r/check/arrow.Rcheck/arrow/libs/arrow.so
```

The callback `ListObjectsV2Handler` is being called recursively and the mutex is non-reentrant thus deadlock.

To fix it I got rid of the mutex on `TreeWalker` by using `arrow::util::internal::TaskGroup` instead of manually tracking the #/status of in-flight requests.

Closes apache#9842 from westonpace/bugfix/arrow-12040

Lead-authored-by: Weston Pace <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants