Skip to content

Conversation

@jinkachy
Copy link
Contributor

@jinkachy jinkachy commented Mar 16, 2025

Purpose of this pull request

This pull request fixes an issue with JDBC driver selection when dealing with heterogeneous data sources. In the current implementation, DriverManager may select the wrong driver when multiple compatible drivers are available in the classpath. This can cause problems when transferring data between similar databases (like Greenplum to PostgreSQL or MySQL 5 to MySQL 8), where driver "drift" can occur.

The underlying issue exists in the DriverManager's getConnection method implementation: it iterates through all registered drivers and uses the first one that successfully connects to the given URL. As shown in the Java source code, it doesn't prioritize drivers by class name but simply tries each registered driver in sequence:

for(DriverInfo aDriver : registeredDrivers) {
    if(isDriverAllowed(aDriver.driver, callerCL)) {
        try {
            Connection con = aDriver.driver.connect(url, info);
            if (con != null) {
                // Success!
                return (con);
            }
        } catch (SQLException ex) {
            if (reason == null) {
                reason = ex;
            }
        }
    }
}

This behavior means that if multiple drivers can handle the same URL format (e.g., both MySQL 5 and MySQL 8 drivers can handle "jdbc:mysql://..." URLs), the one loaded first will be used, regardless of which one is more appropriate for the specific database version being connected to.

The fix explicitly finds and uses the driver specified by class name before falling back to the default DriverManager behavior, ensuring that the correct driver is used for the connection.

Does this PR introduce any user-facing change?

This PR improves the reliability of JDBC connections when working with heterogeneous data sources, but doesn't change any user-facing API or configuration. Users may notice improved stability when transferring data between similar database systems that require different drivers.

How was this patch tested?

The implementation was tested with various database connections where driver drift could occur, particularly focusing on:

  • MySQL 5 to MySQL 8 connections
  • PostgreSQL to Greenplum connections
  • Multiple JDBC drivers in the classpath

The tests verified that the correct driver is selected when explicitly specified, even when multiple compatible drivers are available.

Check list

  • The main modification is an improvement to the org.apache.seatunnel.connectors.seatunnel.jdbc.catalog.AbstractJdbcCatalog#getConnection method, prioritizing the use of the specified driver class to establish connections and avoid driver drift issues
  • Related changes include ensuring the driverClass parameter is correctly passed to the getConnection method, allowing the specified driver priority to take effect

@liunaijie
Copy link
Member

@jinkachy thanks for your contribution, please also update the document to describe this feature, and add test case.

@jinkachy
Copy link
Contributor Author

Thank you for the review and feedback. I've updated the documentation to describe this feature and added a test case that verifies the driver selection mechanism. The test confirms that our implementation correctly prioritizes explicitly specified drivers regardless of their registration order in DriverManager. @liunaijie

Driver driver = drivers.nextElement();
if (StringUtils.equals(driver.getClass().getName(), driverClass)) {
try {
return driver.connect(url, info);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinkachy jinkachy force-pushed the dev2 branch 3 times, most recently from 5b3b85b to 542b26b Compare March 18, 2025 13:36
Copy link
Member

@Hisoka-X Hisoka-X left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jinkachy . cc @hailin0

@hailin0 hailin0 merged commit a5aafa7 into apache:dev Mar 24, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants