-
Notifications
You must be signed in to change notification settings - Fork 173
fix(rust/core): remove the Mutex around the FFI driver object #2736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mbrobbel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it specified somewhere that all driver implementations must be thread-safe?
This comment explains the rationale:
arrow-adbc/rust/core/src/driver_manager.rs
Lines 93 to 102 in c6e1ab5
| // According to the ADBC specification, objects allow serialized access from | |
| // multiple threads: one thread may make a call, and once finished, another | |
| // thread may make a call. They do not allow concurrent access from multiple | |
| // threads. | |
| // | |
| // In order to implement this semantics, all FFI objects are wrapped into | |
| // `Mutex`. Hence, we need to deal with multiple locks at once, so care must | |
| // be taken to avoid deadlock and in particular we must avoid "lock inversion". | |
| // The general convention chosen here is to first acquire lock to the driver | |
| // and then acquire lock to the specific object under implementation. |
The driver classes: database, connection, statement, reader... are a whole different story. The driver itself is a static struct of function pointers that once initialized shouldn't be changing. Requiring a lock on the driver for every operation makes ADBC drivers incapable of any parallelism whatsoever. I have very simple SQL queries taking 3s just because I have other threads in the program running SQL queries. And this number could be much higher (minutes!) if I start running more queries that cost more in multiple threads. |
|
It seems we should update that comment, though. |
|
@lidavidm if you're OK with the comment update, please merge. |
|
LGTM, will merge after CI - thanks Felipe |
|
Should we specify that access to the private fields of the driver arrow-adbc/c/include/arrow-adbc/adbc.h Lines 989 to 996 in c6e1ab5
must be serialized by driver ( |
| #[derive(Clone)] | ||
| pub struct ManagedDriver { | ||
| inner: Arc<ManagedDriverInner>, | ||
| inner: Pin<Arc<ManagedDriverInner>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need Pin here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to ensure the FFI_AdbcDriver doesn't get moved around after a pointer is passed to the C side. Instead of wrapping FFI_Driver in a Box (would add another pointer indirection) I pinned this Arc instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Pin helps here with that? ManagedDriverInner implements Unpin.
I agree. I checked some of the code and it uses these fields only for reference counts (which are atomic). It would be good to have a comment in |
|
There is only one copy. (The others are just synchronized via pre-commit) |
Fixes #2715