Skip to content

Conversation

@felipecrv
Copy link
Contributor

Fixes #2715

@felipecrv felipecrv requested a review from wjones127 as a code owner April 22, 2025 20:17
@github-actions github-actions bot added this to the ADBC Libraries 18 milestone Apr 22, 2025
Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it specified somewhere that all driver implementations must be thread-safe?

This comment explains the rationale:

// According to the ADBC specification, objects allow serialized access from
// multiple threads: one thread may make a call, and once finished, another
// thread may make a call. They do not allow concurrent access from multiple
// threads.
//
// In order to implement this semantics, all FFI objects are wrapped into
// `Mutex`. Hence, we need to deal with multiple locks at once, so care must
// be taken to avoid deadlock and in particular we must avoid "lock inversion".
// The general convention chosen here is to first acquire lock to the driver
// and then acquire lock to the specific object under implementation.

@felipecrv
Copy link
Contributor Author

felipecrv commented Apr 22, 2025

Is it specified somewhere that all driver implementations must be thread-safe?

This comment explains the rationale:

// According to the ADBC specification, objects allow serialized access from
// multiple threads: one thread may make a call, and once finished, another
// thread may make a call. They do not allow concurrent access from multiple
// threads.
//
// In order to implement this semantics, all FFI objects are wrapped into
// `Mutex`. Hence, we need to deal with multiple locks at once, so care must
// be taken to avoid deadlock and in particular we must avoid "lock inversion".
// The general convention chosen here is to first acquire lock to the driver
// and then acquire lock to the specific object under implementation.

The driver classes: database, connection, statement, reader... are a whole different story. The driver itself is a static struct of function pointers that once initialized shouldn't be changing.

Requiring a lock on the driver for every operation makes ADBC drivers incapable of any parallelism whatsoever.

I have very simple SQL queries taking 3s just because I have other threads in the program running SQL queries. And this number could be much higher (minutes!) if I start running more queries that cost more in multiple threads.

@felipecrv
Copy link
Contributor Author

#2715 (comment)

@lidavidm
Copy link
Member

It seems we should update that comment, though.

@felipecrv
Copy link
Contributor Author

@lidavidm if you're OK with the comment update, please merge.

@lidavidm
Copy link
Member

LGTM, will merge after CI - thanks Felipe

@lidavidm lidavidm merged commit 17f5514 into apache:main Apr 23, 2025
5 checks passed
@felipecrv felipecrv deleted the remove-big-driver-lock branch April 23, 2025 02:13
@mbrobbel
Copy link
Member

Should we specify that access to the private fields of the driver

/// \brief Opaque driver-defined state.
/// This field is NULL if the driver is unintialized/freed (but
/// it need not have a value even if the driver is initialized).
void* private_data;
/// \brief Opaque driver manager-defined state.
/// This field is NULL if the driver is unintialized/freed (but
/// it need not have a value even if the driver is initialized).
void* private_manager;

must be serialized by driver (private_data) and driver manager (private_manager) implementations?

#[derive(Clone)]
pub struct ManagedDriver {
inner: Arc<ManagedDriverInner>,
inner: Pin<Arc<ManagedDriverInner>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need Pin here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to ensure the FFI_AdbcDriver doesn't get moved around after a pointer is passed to the C side. Instead of wrapping FFI_Driver in a Box (would add another pointer indirection) I pinned this Arc instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Pin helps here with that? ManagedDriverInner implements Unpin.

@felipecrv
Copy link
Contributor Author

Should we specify that access to the private fields of the driver

/// \brief Opaque driver-defined state.
/// This field is NULL if the driver is unintialized/freed (but
/// it need not have a value even if the driver is initialized).
void* private_data;
/// \brief Opaque driver manager-defined state.
/// This field is NULL if the driver is unintialized/freed (but
/// it need not have a value even if the driver is initialized).
void* private_manager;

must be serialized by driver (private_data) and driver manager (private_manager) implementations?

I agree. I checked some of the code and it uses these fields only for reference counts (which are atomic).

It would be good to have a comment in adbc.h calling the need for thread-safety on accessing these fields, but there are so many copies of adbc.h around...

@lidavidm
Copy link
Member

There is only one copy. (The others are just synchronized via pre-commit)

colin-rogers-dbt pushed a commit to dbt-labs/arrow-adbc that referenced this pull request Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust: Every ADBC operation acquires a lock on the driver preventing parallel statement execution

3 participants