Use Bulk Copy API for batch insert operation by peterbae · Pull Request #686 · microsoft/mssql-jdbc

peterbae · 2018-04-26T00:24:51Z

Improves the batch insert operation performace against Azure DW.

Fixes issue #331.

codecov-io · 2018-04-26T00:42:27Z

Codecov Report

Merging #686 into dev will increase coverage by 0.35%.
The diff coverage is 59.33%.

@@             Coverage Diff              @@
##                dev     #686      +/-   ##
============================================
+ Coverage     48.03%   48.39%   +0.35%     
- Complexity     2631     2741     +110     
============================================
  Files           118      120       +2     
  Lines         26753    27210     +457     
  Branches       4493     4589      +96     
============================================
+ Hits          12852    13167     +315     
- Misses        11773    11873     +100     
- Partials       2128     2170      +42

Flag	Coverage Δ	Complexity Δ
#JDBC42	`47.89% <59.33%> (+0.43%)`	`2693 <85> (+111)`	⬆️
#JDBC43	`48.28% <59.33%> (+0.31%)`	`2736 <85> (+109)`	⬆️

Impacted Files	Coverage Δ	Complexity Δ
...om/microsoft/sqlserver/jdbc/SQLServerResource.java	`100% <ø> (ø)`	`4 <0> (ø)`	⬇️
.../microsoft/sqlserver/jdbc/SQLServerDataSource.java	`44.73% <0%> (-0.47%)`	`66 <0> (ø)`
.../com/microsoft/sqlserver/jdbc/SQLServerDriver.java	`77.11% <100%> (+0.12%)`	`25 <0> (ø)`	⬇️
...n/java/com/microsoft/sqlserver/jdbc/Parameter.java	`61.92% <100%> (-0.76%)`	`64 <1> (+1)`
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java	`54.61% <100%> (+2.06%)`	`261 <2> (+22)`	⬆️
...oft/sqlserver/jdbc/SQLServerBulkCSVFileRecord.java	`46.34% <25%> (+0.41%)`	`28 <4> (-2)`	⬇️
...sqlserver/jdbc/SQLServerBulkBatchInsertRecord.java	`42.28% <42.28%> (ø)`	`27 <27> (?)`
.../microsoft/sqlserver/jdbc/SQLServerBulkCommon.java	`47.22% <47.22%> (ø)`	`5 <5> (?)`
...m/microsoft/sqlserver/jdbc/SQLServerStatement.java	`59.42% <70%> (-0.08%)`	`136 <3> (+1)`
...oft/sqlserver/jdbc/SQLServerPreparedStatement.java	`54.42% <72.26%> (+3.81%)`	`210 <39> (+50)`	⬆️
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 09d7967...096d78e. Read the comment docs.

rene-ye · 2018-05-02T19:31:19Z

+            case java.sql.Types.TIMESTAMP:
+            case microsoft.sql.Types.DATETIMEOFFSET:
+                // The precision is just a number long enough to hold all types of temporal data, doesn't need to be exact precision.
+                columnMetadata.put(positionInTable, new ColumnMetadata(colName, jdbcType, 50, scale, dateTimeFormatter));


Can we make the "50", which I'm assuming is the precision, a const so changing it is easier when necessary.

The comment on line 217 explains why this is unnecessary.

Temporal data being < 50 precision might not be true forever. If it's an arbitrarily assigned number than it can be just as arbitrarily changed. And also because it's arbitrary, we might not know what number to look for if we do have a need to update it.

Precision here should be the one passed by user. I'm assuming 50 is used as precision based on the BulkCopy CSV implementation, it makes sense in case of CSV, because temporal types can be stored in CSV using any of the supported string literal format (https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-2017#supported-string-literal-formats-for-datetime) and we pass it as varchar/nvarchar to Server.

rene-ye · 2018-05-02T19:37:59Z

+
+    boolean isAzureDW() throws SQLServerException, SQLException {
+        if (null == isAzureDW) {
+            try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");)


We don't seem to catch/handle any exceptions from this try statement, is there a reason why we're using it here.

Normally, when we create a new Statement or ResultSet object, it's good practice to close it after we're done. However, the try-with-resources syntax (which i've used here) allows those objects to get closed automatically. You can read more on this page: https://blogs.oracle.com/weblogicserver/using-try-with-resources-with-jdbc-objects

rene-ye · 2018-05-02T19:42:12Z

+                // Base data type: int
+                final int ENGINE_EDITION_FOR_SQL_AZURE_DW = 6;
+                rs.next();
+                int engineEdition = rs.getInt(1);


Change block to
isAzureDW = rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW
or even more concise but possibly a little confusing
return (rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW)

i think right now is fine, i merely adopted the same part of code from our fx framework anyways.

We should at least remove
if (boolean condition) {
bool = true
} else {
bool = false
}

rene-ye · 2018-05-02T19:45:35Z

+
+                String destinationTableName = tableName;
+                // Get destination metadata
+                try (SQLServerResultSet rs = ((SQLServerStatement) connection.createStatement())


try with resources, but we don't handle any exceptions.

see my comment above

rene-ye · 2018-05-02T19:56:32Z

+
+                // ignore all comments
+                if (localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) {
+                    int temp = localUserSQL.indexOf("*/") + 2;


This logic is done quite a lot. Can we maybe make a private function that takes in a string and returns what we need. So the final result for all these if blocks would be something like
if(localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { return newPrivateFunction("*/"); }

rene-ye · 2018-05-02T20:04:42Z

+            return isInsert(temp.substring(index));
+        }
+        char c = temp.charAt(0);
+        if (c != 'i' && c != 'I')


We were burned before using this kind of logic in an attempt to parse SQL (in pstmt metadata caching), is this not breakable? What if I put a use statement before my insert, will it not flag my SQL as an insert statement?

If the user decides to put a use statement, it won't be flagged as an insert statement, and the old implementation for handling batch statements will get executed instead. This isInsert statement is currently only being used for queries that come through as an execute batch statement - we don't expect the user to run thousands of the same batch query with a use statement in front of it.

Are the following lines here to eliminate non-insert queries faster? Ideally, the SQL passed to this method is always an insert, so these lines can be removed?

char c = temp.charAt(0); if (c != 'i' && c != 'I') return false;

Insert isn't the only way to use batch - there's many ways we would be calling isInsert on a non-insert query. I don't really think this fast elimination is worth the space, so i'm going to remove it anyways though.

…tch-insert-improvement

…e/mssql-jdbc into batch-insert-improvement

v-mabarw

I only gave this a quick skim through and have a couple of extra thoughts:

have we tested the performance impact on the normal execution path?
is this scenario also applicable to SQL Server Parallel Data Warehouse (PDW)?

v-mabarw · 2018-05-15T23:00:12Z

+
+    boolean isAzureDW() throws SQLServerException, SQLException {
+        if (null == isAzureDW) {
+            try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");)


Is there any way to avoid an extra round trip to the server to get this info? For example, can we get any useful info from the TDS pre-login or login packets?

I don't think we can avoid making a call to get the table metadata, but as it turns out, we were making an exact same FMTONLY call in SQLServerBulkCopy::getDestinationMetadata method (which is called as part of the bulk copy process shortly down the line). So instead of making the FMTONLY call to the database twice in the same fashion, the driver will now store the resultset from the first FMTONLY so it can be re-used in getDestinationMetadata. This means we won't be making any additional trips to the database compared to the way it was working before.

v-mabarw · 2018-05-15T23:03:51Z

+        }
+
+        // It shouldn't come here. If we did, something is wrong.
+        throw new IllegalArgumentException("localUserSQL");


Why did you pick throwing an IllegalArgumentException (for this & the other parse routines)? It's probably better to have a descriptive error message instead.

I decided to throw an IllegalArgumentException because I thought it best described the symptom. It would be easy to throw an exception detailing at which index the parsing failed if the parsing actually failed at some point, but for these instances where the parsing fails due to getting to the end of the parsing method (which it shouldn't - this means the user's SQL query was incorrect somehow), all we know is that the user's argument was incorrect. I do agree that I could give a better description, though - I'll add that in.

…e/mssql-jdbc into batch-insert-improvement

…on property.

ulvii · 2018-06-26T00:34:04Z

        }
    }
+
+    public void testExecuteBatch1UseBulkCopyAPI() {


This applies to all the tests, don't you need to set useBulkCopyForBatchInsert property to true?

ulvii · 2018-06-26T01:07:08Z

+     */
+    @Test
+    @Tag("slow")
+    public void testStatementPoolingUseBulkCopyAPI() throws SQLException, NoSuchFieldException, SecurityException, 


Looks like the new tests are just duplicates of the existing tests with minor changes. Can't we refactor this?

ulvii · 2018-06-26T01:12:51Z

            fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage());
        }
    }
+


This test is almost the exact copy of testExecuteBatch1(). Please refactor.

ulvii · 2018-06-26T01:13:06Z

+        int retValue[] = {0, 0, 0};
+        int updCountLength = 0;
+        try {
+            String sPrepStmt = "update ctstable2 set PRICE=PRICE*20 where TYPE_ID=?";


sPrepStmt is never closed.

It's closed.

ulvii · 2018-06-26T01:14:04Z

+                retValue[i++] = rs.getInt(1);
+            }
+
+            pstmt1.close();


Better move this into a try-with-resources block

It's handled at the end of the test anyhoo.

The connection itself is available in the class, they're all static objects (don't know why), but why create a new one anywhere else in the class?

the modifyConnectionForBulkCopyAPI method modifies the connection object. If we don't create a new connection object (with the try block), the tests that come after this one will be affected by the change I made to the connection (since the only other connection object available is static). Therefore, to prevent that happening, I need to make a new connection object just to test my PR.

Also, they connection object is static because originally the test could just re-use the same connection object that doesn't need to change, but now we need to change our connection object accordingly.

It's handled at the end of the test anyhoo.

Not if executeQuery fails.

i think it'll be closed in the terminateVariation() (this is what I meant by end of the test)

ulvii · 2018-06-26T01:14:28Z

+            }
+        }
+        catch (BatchUpdateException b) {
+            fail("BatchUpdateException :  Call to executeBatch is Failed!");


Call to executeBatch Failed!

ulvii · 2018-06-26T01:15:36Z

+            fail("BatchUpdateException :  Call to executeBatch is Failed!");
+        }
+        catch (SQLException sqle) {
+            fail("Call to executeBatch is Failed!");


These catch blocks make no sense.

I agree that they could handle exception better here, but this isn't my code. I'm just gonna fix it anyways though.

ulvii · 2018-06-26T01:17:16Z

            fail(TestResource.getResource("R_addBatchFailed") + ": " + e.getMessage());
        }
    }
+


The comments for testExecuteBatch1UseBulkCopyAPI() apply to this test too.

ulvii · 2018-06-26T01:18:39Z

+     * 
+     * @throws Exception
+     */
+    @Test


Duplicate of Repro47239large. Please refactor.

ulvii · 2018-06-26T01:18:52Z

+    @DisplayName("Regression test for using 'large' methods")
+    public void Repro47239largeUseBulkCopyAPI() throws Exception {
+
+        assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));


This can be removed. assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));

ulvii · 2018-06-26T01:20:20Z

+            error = "RAISERROR ('raiserror level 11',11,1) WITH LOG";
+            severe = "RAISERROR ('raiserror level 20',20,1) WITH LOG";
+        }
+        con.close();


Use try-with-resources

ulvii · 2018-06-26T01:21:06Z

+        }
+        catch (Exception ignored) {
+        }
+        stmt.close();


Try-with-resources.

ulvii · 2018-06-26T01:23:14Z

+        }
+
+        try {
+            stmt.executeLargeUpdate("drop table " + tableName);


Please use Utils.dropTableIfExists()

ulvii · 2018-06-26T01:26:09Z

        conn.close();
    }
+
+    /**


This is very similar to Repro47239largeUseBulkCopyAPI(). Please apply the comments to this test too.

ulvii · 2018-06-26T01:28:15Z

Duplicate of batchWithLargeStringTest(). Please refactor.

ulvii · 2018-06-26T01:29:26Z

+
+    @Test
+    public void batchWithLargeStringTestUseBulkCopyAPI() throws SQLException {
+        Connection con = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;");


Connection is never closed.

ulvii · 2018-06-26T01:30:07Z

+        // create a table with two columns
+        boolean createPrimaryKey = false;
+        try {
+            stmt.execute("if object_id('" + testTable + "', 'U') is not null\ndrop table " + testTable + ";");


Please use Utils.dropTableIfExists()

ulvii · 2018-06-26T01:32:32Z

+        }
+
+        try {
+            stmt.executeUpdate("drop table " + tableName);


Please use the method from Utils class.

ulvii · 2018-06-26T01:32:56Z

+    @BeforeEach
+    public void testSetup() throws TestAbortedException, Exception {
+        connection = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;");
+        stmt = (SQLServerStatement) connection.createStatement();


Better move this into a try-with-resources block

the stmt gets cleaned up during the cleanup stage - i think that's okay?

lilgreenbird · 2018-06-26T17:15:31Z

+                // SERVERPROPERTY('EngineEdition') can be used to determine whether the db server is SQL Azure. 
+                // It should return 6 for SQL Azure DW. This is more reliable than @@version or serverproperty('edition').
+                // Reference:  http://msdn.microsoft.com/en-us/library/ee336261.aspx
+                // 


This topic is no longer available
We're sorry—the topic that you requested is no longer available. Use the search box to find related information.

Use Bulk Copy API for batch insert operation

b727aa0

peterbae requested review from cheenamalhotra, rene-ye and ulvii April 26, 2018 00:25

peterbae requested a review from AfsanehR-zz April 26, 2018 16:44

peterbae added 5 commits April 27, 2018 14:02

Parse bug fixing and test added

52da9aa

bug fix + additional tests

5bb79a2

change reflection for testing

5cf28ad

more test changes

59d29d7

Add parsing logic for -- comment

dc42708

rene-ye reviewed May 2, 2018

View reviewed changes

peterbae and others added 7 commits May 3, 2018 13:42

Merge branch 'dev' of https://github.com/Microsoft/mssql-jdbc into ba…

f67cad1

…tch-insert-improvement

refactoring

dca4cb5

Merge branch 'dev' into batch-insert-improvement

1c6a186

Merge branch 'dev' of https://github.com/Microsoft/mssql-jdbc into ba…

80112c5

…tch-insert-improvement

Merge branch 'batch-insert-improvement' of https://github.com/peterba…

8ed14fe

…e/mssql-jdbc into batch-insert-improvement

Bug fix / testing change

b67ccfb

Reflect comment change

a677b28

v-mabarw reviewed May 15, 2018

View reviewed changes

peterbae added 5 commits May 25, 2018 10:22

Refactor two Bulk files into a common parent

60b437e

javadoc changes

8604ff4

fix problem with precision / scale

34d8bb1

fix issue with setting all to true

39060b7

make bamoo fixes

d80908e

peterbae added 3 commits June 25, 2018 09:25

Merge branch 'batch-insert-improvement' of https://github.com/peterba…

d2d5d23

…e/mssql-jdbc into batch-insert-improvement

Added getter/setter public for the useBulkCopyForBatchInsert connecti…

2369bb2

…on property.

Change implementation of child classes a bit

bae637c

ulvii reviewed Jun 26, 2018

View reviewed changes

peterbae added 2 commits June 26, 2018 15:05

Fix bamboo problem + refactor test code

b8afb58

Replace all connection and statements with try blocks

096d78e

ulvii approved these changes Jun 27, 2018

View reviewed changes

cheenamalhotra approved these changes Jun 27, 2018

View reviewed changes

rene-ye approved these changes Jun 27, 2018

View reviewed changes

lilgreenbird approved these changes Jun 27, 2018

View reviewed changes

peterbae merged commit 6ddd9d0 into microsoft:dev Jun 27, 2018

peterbae mentioned this pull request Jul 4, 2018

Support For Batch or Bulk Inserts to Azure SQL Data Warehouse (SQLDW) #331

Closed

Conversation

peterbae commented Apr 26, 2018

Uh oh!

codecov-io commented Apr 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rene-ye May 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbae May 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rene-ye May 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulvii Jun 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v-mabarw left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ulvii Jun 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Apr 26, 2018 •

edited

Loading

rene-ye May 4, 2018 •

edited

Loading

peterbae May 3, 2018 •

edited

Loading

rene-ye May 2, 2018 •

edited

Loading

ulvii Jun 15, 2018 •

edited

Loading

ulvii Jun 26, 2018 •

edited

Loading

ulvii Jun 26, 2018 •

edited

Loading

ulvii Jun 26, 2018 •

edited

Loading