Skip to content

Conversation

@Pranav2612000
Copy link
Contributor

@Pranav2612000 Pranav2612000 commented Dec 30, 2025

According to golang/go#11100 (comment), the -Wl,-z,nodelete does not work on macos. This causes dlclose to be called when the shared library is dropped, which results in unexpected behaviour ( I've observed the program hangs because Go cannot kill the goroutines and so waits for them to close )

On linux, since the -Wl,-z,nodelete is respected, we do not see this issue.

This PR fixes the problem by ensuring that we explicitly open the library with RTLD_NODELETE flag which ensures that dlclose is not called during unload

Fixes #3840

@Pranav2612000
Copy link
Contributor Author

Thank you @savannahar68 for debugging the issue and helping with the fix

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch from 5fdead7 to 4570524 Compare December 30, 2025 16:13
// By default, go builds the libraries with '-Wl -z nodelete' which does not
// unload the go runtime. This isn't respected on mac ( https://github.com/golang/go/issues/11100#issuecomment-932638093 )
// so we need to explicitly load the library with RTLD_NODELETE( which prevents unloading )
if cfg!(target_os = "macos") {
Copy link
Contributor

@eitsupi eitsupi Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the cfg attribute #[cfg(...)] is better than cfg! here.
(In this case, cfg! case will compile on any OS.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated the code to use #[cfg(...)]

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch from 4570524 to 0bfa253 Compare December 30, 2025 16:46
Comment on lines 121 to 123
#[cfg(target_os = "macos")]
use libloading::os::unix::Library;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(target_os = "macos")]
use libloading::os::unix::Library;

I think we can remove this and just qualify the identifier below.

Comment on lines 395 to 400
Library::open(
Some(filename.as_ref()),
RTLD_LAZY | RTLD_LOCAL | RTLD_NODELETE,
)
.map(Into::into)
.map_err(libloading_error_to_adbc_error)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Library::open(
Some(filename.as_ref()),
RTLD_LAZY | RTLD_LOCAL | RTLD_NODELETE,
)
.map(Into::into)
.map_err(libloading_error_to_adbc_error)?
libloading::os::unix::Library::open(
Some(filename.as_ref()),
RTLD_LAZY | RTLD_LOCAL | RTLD_NODELETE,
)
.map_err(libloading_error_to_adbc_error)?
.into()

This feels a little simpler and more idiomatic.

Comment on lines 392 to 394
const RTLD_LAZY: i32 = 0x1;
const RTLD_LOCAL: i32 = 0x4;
const RTLD_NODELETE: i32 = 0x80;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could pull these from the libc crate, is it worth it though? cc @eitsupi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Only those two, but not RTLD_NODELETE. I don't feel super stressed about hard-coding them here to avoid taking on libc as a dep.

@eitsupi
Copy link
Contributor

eitsupi commented Dec 30, 2025

I don't fully understand the nature of the problem, so this may be off the mark, but is this a problem specific to the Rust Driver Manager?
Is there no need to address this in implementations in other languages?

(Just to be clear, I'm not saying everything should be addressed in this PR, just a question.)

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think we should just std::mem::forget the library (bypassing its Drop implementation) and not dlclose anything at all - it's just not safe. (Or remove the platform-specific cfg and do this for all platforms.)

@Pranav2612000
Copy link
Contributor Author

Pranav2612000 commented Dec 31, 2025

I don't fully understand the nature of the problem, so this may be off the mark, but is this a problem specific to the Rust Driver Manager? Is there no need to address this in implementations in other languages?

(Just to be clear, I'm not saying everything should be addressed in this PR, just a question.)

I haven't tried with other languages, but yeah, we'll probably need to do the same for other languages as well. I'm thinking of getting the rust fix in first. Doing the same for other languages should be trivial.

@Pranav2612000
Copy link
Contributor Author

Actually, I think we should just std::mem::forget the library (bypassing its Drop implementation) and not dlclose anything at all - it's just not safe. (Or remove the platform-specific cfg and do this for all platforms.)

I've tested with std:mem::forget and it works too. I'm don't have a strong preference to any approach. Do you think we should use the drop approach? I'll modify the PR to do this.

@lidavidm
Copy link
Member

IIRC, we don't dlclose in the C++ driver manager, or C#. So Rust may be the only one affected.

forget does technically introduce a small leak - I think RTLD_NODELETE is preferable. Maybe (separately) we can try to get this flag defined upstream?

@Pranav2612000
Copy link
Contributor Author

Makes sense. I'll open a separate PR upstream soon to add support for RTLD_NODELETE.

Are there any other changes I should do to make this PR mergeable

@lidavidm
Copy link
Member

I think we can remove the #[cfg] and just have this behavior on all platforms (maybe it doesn't work on Windows?)

@Pranav2612000
Copy link
Contributor Author

I'm not sure about windows. Let me see the current implementation and try te replicate it.

@Pranav2612000
Copy link
Contributor Author

I think we can remove the #[cfg] and just have this behavior on all platforms (maybe it doesn't work on Windows?)

Yeah. Libloading does have separate definitions for windows and unix ( https://github.com/nagisa/rust_libloading/blob/dab97c569b33bd515e16637b8dedbdc696d9ec9c/src/safe.rs#L6 ).
While we get the changes merged upstream, I'm updating this PR to mirror this - cfg conditions on whether the os is windows or unix ( rather than on mac )

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch from 0bfa253 to 252ccd3 Compare December 31, 2025 10:55
@Pranav2612000
Copy link
Contributor Author

Pushed. Can you folks take another look? Meanwhile, I'll work on getting the flag, and some other improvements merged upstream to libloading

};
#[cfg(not(unix))]
let library = unsafe {
libloading::Library::new(filename.as_ref()).map_err(libloading_error_to_adbc_error)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linked GitHub issue notes there's a Windows equivalent flag; is that possible to provide to libloading?

On Windows, a similar GetModuleHandleEx(GET_MODULE_HANDLE_EX_FLAG_PIN, ...) is required for the same effect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a windows expert, but I see that this flag is not passed when we load the library. During loading, the LoadLibraryExW function is called ( Ref ).

The way to do it in windows is to call the GetModuleHandleExW function with this flag. libloading supports this using the pin function ( Ref ). Let's call this fn in the windows cfg, so that we have the same behaviour across all platforms?

Copy link
Contributor Author

@Pranav2612000 Pranav2612000 Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this change for windows. Can you take a look? I'm also opening up a conversation upstream to try and get this changes merged into libloading Upstream discussion thread -> nagisa/rust_libloading#193

@lidavidm lidavidm changed the title feat(rust/driver_manager): prevent dlclose call on macos feat(rust/driver_manager): don't dlclose drivers Jan 5, 2026
@lidavidm
Copy link
Member

lidavidm commented Jan 5, 2026

It seems tests are failing?

---- test_connection_cancel stdout ----

thread 'test_connection_cancel' (13755) panicked at driver_manager/tests/driver_manager_sqlite.rs:32:90:
called `Result::unwrap()` on an `Err` value: Error { message: "Error with dynamic library: invalid mode parameter", status: Internal, vendor_code: 0, sqlstate: [0, 0, 0, 0, 0], details: None }

@Pranav2612000
Copy link
Contributor Author

Weird. I tested again on mac and linux and I don't see this error. In the CI, for a test, I also see the driver not found

---- test_database_implicit_uri stdout ----

thread 'test_database_implicit_uri' (13767) panicked at driver_manager/tests/driver_manager_sqlite.rs:62:6:
called `Result::unwrap()` on an `Err` value: Error { message: "Driver not found: adbc_driver_sqlite", status: NotFound, vendor_code: 0, sqlstate: [0, 0, 0, 0, 0], details: None }

Could it be an intermittent issue with the way the driver was built? I'm trying to reproduce this error, but meanwhile can you retry the pipeline once?

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch 4 times, most recently from 475d31e to 137b2a4 Compare January 5, 2026 05:48
@Pranav2612000
Copy link
Contributor Author

@lidavidm The tests are passing now. Here's the changes I did

  • Modified the RTLD_NODELETE to be of type c_int instead of i32.
  • Update libloading to 0.8.8 from 0.8 ( 0.8 was pinned to version 0.8.0 which did not support the pin method for windows )

let library: libloading::Library = unsafe {
use std::os::raw::c_int;

const RTLD_NODELETE: c_int = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is different on my system:

/usr/include/x86_64-linux-gnu/bits/dlfcn.h
41:#define RTLD_NODELETE	0x01000

It appears to be 0x80 on macOS. Where did 8 come from?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need some #[cfg] to define this, and fall back to 0xFFFFFFFF on Unix-like platforms besides linux (glibc) and macOS?

Copy link
Contributor Author

@Pranav2612000 Pranav2612000 Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. You're right. It's 0x80 for mac ( Ref -> https://github.com/realthunder/mac-headers/blob/88d4348d732fe95385faef950eb660626f9ff711/usr/include/dlfcn.h#L72 )

Maybe we need some #[cfg] to define this, and fall back to 0xFFFFFFFF on Unix-like platforms besides linux (glibc) and macOS?

Yeah. Doing this. Or do you think we should print error/warning and not 'OR' with RTLD_NODELETE for unknown OS ? Something like this ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not spam the console.

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch from 950483e to eb5873f Compare January 5, 2026 07:53
)) {
0x1000
} else {
0xFFFF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also wait, shouldn't this be 0 since we're OR-ing it? (Sorry, I messed this up)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I also missed it. Fixed now.

@Pranav2612000 Pranav2612000 force-pushed the fix/fix-hang-during-dlopen-mac branch from eb5873f to d0d2c94 Compare January 5, 2026 07:56
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lidavidm lidavidm changed the title feat(rust/driver_manager): don't dlclose drivers fix(rust/driver_manager): don't dlclose drivers Jan 6, 2026
@lidavidm lidavidm merged commit 5a88ac3 into apache:main Jan 6, 2026
9 checks passed
@Pranav2612000
Copy link
Contributor Author

Thank you for merging this @lidavidm .
Do you have a rough estimate on when the crate will be updated?

@lidavidm
Copy link
Member

lidavidm commented Jan 7, 2026

Hopefully this week (because you happened to coincide with the release plans)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rust driver_manager: initializing the driver creates some dangling resources which hangs application during shutdown

4 participants