Use Bulk Copy API for batch insert operation#686
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #686 +/- ##
============================================
+ Coverage 48.03% 48.39% +0.35%
- Complexity 2631 2741 +110
============================================
Files 118 120 +2
Lines 26753 27210 +457
Branches 4493 4589 +96
============================================
+ Hits 12852 13167 +315
- Misses 11773 11873 +100
- Partials 2128 2170 +42
Continue to review full report at Codecov.
|
| case java.sql.Types.TIMESTAMP: | ||
| case microsoft.sql.Types.DATETIMEOFFSET: | ||
| // The precision is just a number long enough to hold all types of temporal data, doesn't need to be exact precision. | ||
| columnMetadata.put(positionInTable, new ColumnMetadata(colName, jdbcType, 50, scale, dateTimeFormatter)); |
There was a problem hiding this comment.
Can we make the "50", which I'm assuming is the precision, a const so changing it is easier when necessary.
There was a problem hiding this comment.
The comment on line 217 explains why this is unnecessary.
There was a problem hiding this comment.
Temporal data being < 50 precision might not be true forever. If it's an arbitrarily assigned number than it can be just as arbitrarily changed. And also because it's arbitrary, we might not know what number to look for if we do have a need to update it.
There was a problem hiding this comment.
Precision here should be the one passed by user. I'm assuming 50 is used as precision based on the BulkCopy CSV implementation, it makes sense in case of CSV, because temporal types can be stored in CSV using any of the supported string literal format (https://docs.microsoft.com/en-us/sql/t-sql/data-types/datetime-transact-sql?view=sql-server-2017#supported-string-literal-formats-for-datetime) and we pass it as varchar/nvarchar to Server.
|
|
||
| boolean isAzureDW() throws SQLServerException, SQLException { | ||
| if (null == isAzureDW) { | ||
| try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");) |
There was a problem hiding this comment.
We don't seem to catch/handle any exceptions from this try statement, is there a reason why we're using it here.
There was a problem hiding this comment.
Normally, when we create a new Statement or ResultSet object, it's good practice to close it after we're done. However, the try-with-resources syntax (which i've used here) allows those objects to get closed automatically. You can read more on this page: https://blogs.oracle.com/weblogicserver/using-try-with-resources-with-jdbc-objects
| // Base data type: int | ||
| final int ENGINE_EDITION_FOR_SQL_AZURE_DW = 6; | ||
| rs.next(); | ||
| int engineEdition = rs.getInt(1); |
There was a problem hiding this comment.
Change block to
isAzureDW = rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW
or even more concise but possibly a little confusing
return (rs.getInt(1) == ENGINE_EDITION_FOR_SQL_AZURE_DW)
There was a problem hiding this comment.
i think right now is fine, i merely adopted the same part of code from our fx framework anyways.
There was a problem hiding this comment.
We should at least remove
if (boolean condition) {
bool = true
} else {
bool = false
}
|
|
||
| String destinationTableName = tableName; | ||
| // Get destination metadata | ||
| try (SQLServerResultSet rs = ((SQLServerStatement) connection.createStatement()) |
There was a problem hiding this comment.
try with resources, but we don't handle any exceptions.
There was a problem hiding this comment.
see my comment above
|
|
||
| // ignore all comments | ||
| if (localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { | ||
| int temp = localUserSQL.indexOf("*/") + 2; |
There was a problem hiding this comment.
This logic is done quite a lot. Can we maybe make a private function that takes in a string and returns what we need. So the final result for all these if blocks would be something like
if(localUserSQL.substring(0, 2).equalsIgnoreCase("/*")) { return newPrivateFunction("*/"); }
| return isInsert(temp.substring(index)); | ||
| } | ||
| char c = temp.charAt(0); | ||
| if (c != 'i' && c != 'I') |
There was a problem hiding this comment.
We were burned before using this kind of logic in an attempt to parse SQL (in pstmt metadata caching), is this not breakable? What if I put a use statement before my insert, will it not flag my SQL as an insert statement?
There was a problem hiding this comment.
If the user decides to put a use statement, it won't be flagged as an insert statement, and the old implementation for handling batch statements will get executed instead. This isInsert statement is currently only being used for queries that come through as an execute batch statement - we don't expect the user to run thousands of the same batch query with a use statement in front of it.
There was a problem hiding this comment.
Are the following lines here to eliminate non-insert queries faster? Ideally, the SQL passed to this method is always an insert, so these lines can be removed?
char c = temp.charAt(0);
if (c != 'i' && c != 'I')
return false;
There was a problem hiding this comment.
Insert isn't the only way to use batch - there's many ways we would be calling isInsert on a non-insert query. I don't really think this fast elimination is worth the space, so i'm going to remove it anyways though.
v-mabarw
left a comment
There was a problem hiding this comment.
I only gave this a quick skim through and have a couple of extra thoughts:
- have we tested the performance impact on the normal execution path?
- is this scenario also applicable to SQL Server Parallel Data Warehouse (PDW)?
|
|
||
| boolean isAzureDW() throws SQLServerException, SQLException { | ||
| if (null == isAzureDW) { | ||
| try (Statement stmt = this.createStatement(); ResultSet rs = stmt.executeQuery("SELECT CAST(SERVERPROPERTY('EngineEdition') as INT)");) |
There was a problem hiding this comment.
Is there any way to avoid an extra round trip to the server to get this info? For example, can we get any useful info from the TDS pre-login or login packets?
There was a problem hiding this comment.
I don't think we can avoid making a call to get the table metadata, but as it turns out, we were making an exact same FMTONLY call in SQLServerBulkCopy::getDestinationMetadata method (which is called as part of the bulk copy process shortly down the line). So instead of making the FMTONLY call to the database twice in the same fashion, the driver will now store the resultset from the first FMTONLY so it can be re-used in getDestinationMetadata. This means we won't be making any additional trips to the database compared to the way it was working before.
| } | ||
|
|
||
| // It shouldn't come here. If we did, something is wrong. | ||
| throw new IllegalArgumentException("localUserSQL"); |
There was a problem hiding this comment.
Why did you pick throwing an IllegalArgumentException (for this & the other parse routines)? It's probably better to have a descriptive error message instead.
There was a problem hiding this comment.
I decided to throw an IllegalArgumentException because I thought it best described the symptom. It would be easy to throw an exception detailing at which index the parsing failed if the parsing actually failed at some point, but for these instances where the parsing fails due to getting to the end of the parsing method (which it shouldn't - this means the user's SQL query was incorrect somehow), all we know is that the user's argument was incorrect. I do agree that I could give a better description, though - I'll add that in.
| } | ||
| } | ||
|
|
||
| public void testExecuteBatch1UseBulkCopyAPI() { |
There was a problem hiding this comment.
This applies to all the tests, don't you need to set useBulkCopyForBatchInsert property to true?
| */ | ||
| @Test | ||
| @Tag("slow") | ||
| public void testStatementPoolingUseBulkCopyAPI() throws SQLException, NoSuchFieldException, SecurityException, |
There was a problem hiding this comment.
Looks like the new tests are just duplicates of the existing tests with minor changes. Can't we refactor this?
| fail(TestResource.getResource("R_executeBatchFailed") + ": " + e.getMessage()); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
This test is almost the exact copy of testExecuteBatch1(). Please refactor.
| int retValue[] = {0, 0, 0}; | ||
| int updCountLength = 0; | ||
| try { | ||
| String sPrepStmt = "update ctstable2 set PRICE=PRICE*20 where TYPE_ID=?"; |
| retValue[i++] = rs.getInt(1); | ||
| } | ||
|
|
||
| pstmt1.close(); |
There was a problem hiding this comment.
Better move this into a try-with-resources block
There was a problem hiding this comment.
It's handled at the end of the test anyhoo.
There was a problem hiding this comment.
The connection itself is available in the class, they're all static objects (don't know why), but why create a new one anywhere else in the class?
There was a problem hiding this comment.
the modifyConnectionForBulkCopyAPI method modifies the connection object. If we don't create a new connection object (with the try block), the tests that come after this one will be affected by the change I made to the connection (since the only other connection object available is static). Therefore, to prevent that happening, I need to make a new connection object just to test my PR.
Also, they connection object is static because originally the test could just re-use the same connection object that doesn't need to change, but now we need to change our connection object accordingly.
There was a problem hiding this comment.
It's handled at the end of the test anyhoo.
- Not if
executeQueryfails.
There was a problem hiding this comment.
i think it'll be closed in the terminateVariation() (this is what I meant by end of the test)
| } | ||
| } | ||
| catch (BatchUpdateException b) { | ||
| fail("BatchUpdateException : Call to executeBatch is Failed!"); |
There was a problem hiding this comment.
Call to executeBatch Failed!
| fail("BatchUpdateException : Call to executeBatch is Failed!"); | ||
| } | ||
| catch (SQLException sqle) { | ||
| fail("Call to executeBatch is Failed!"); |
There was a problem hiding this comment.
These catch blocks make no sense.
There was a problem hiding this comment.
I agree that they could handle exception better here, but this isn't my code. I'm just gonna fix it anyways though.
| fail(TestResource.getResource("R_addBatchFailed") + ": " + e.getMessage()); | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
The comments for testExecuteBatch1UseBulkCopyAPI() apply to this test too.
| * | ||
| * @throws Exception | ||
| */ | ||
| @Test |
There was a problem hiding this comment.
Duplicate of Repro47239large. Please refactor.
| @DisplayName("Regression test for using 'large' methods") | ||
| public void Repro47239largeUseBulkCopyAPI() throws Exception { | ||
|
|
||
| assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC")); |
There was a problem hiding this comment.
This can be removed. assumeTrue("JDBC42".equals(Utils.getConfiguredProperty("JDBC_Version")), TestResource.getResource("R_incompatJDBC"));
| error = "RAISERROR ('raiserror level 11',11,1) WITH LOG"; | ||
| severe = "RAISERROR ('raiserror level 20',20,1) WITH LOG"; | ||
| } | ||
| con.close(); |
| } | ||
| catch (Exception ignored) { | ||
| } | ||
| stmt.close(); |
| } | ||
|
|
||
| try { | ||
| stmt.executeLargeUpdate("drop table " + tableName); |
There was a problem hiding this comment.
Please use Utils.dropTableIfExists()
| conn.close(); | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
This is very similar to Repro47239largeUseBulkCopyAPI(). Please apply the comments to this test too.
| } | ||
|
|
||
| } | ||
|
|
There was a problem hiding this comment.
Duplicate of batchWithLargeStringTest(). Please refactor.
|
|
||
| @Test | ||
| public void batchWithLargeStringTestUseBulkCopyAPI() throws SQLException { | ||
| Connection con = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;"); |
| // create a table with two columns | ||
| boolean createPrimaryKey = false; | ||
| try { | ||
| stmt.execute("if object_id('" + testTable + "', 'U') is not null\ndrop table " + testTable + ";"); |
There was a problem hiding this comment.
Please use Utils.dropTableIfExists()
| } | ||
|
|
||
| try { | ||
| stmt.executeUpdate("drop table " + tableName); |
There was a problem hiding this comment.
Please use the method from Utils class.
| @BeforeEach | ||
| public void testSetup() throws TestAbortedException, Exception { | ||
| connection = DriverManager.getConnection(connectionString + ";useBulkCopyForBatchInsert=true;"); | ||
| stmt = (SQLServerStatement) connection.createStatement(); |
There was a problem hiding this comment.
Better move this into a try-with-resources block
There was a problem hiding this comment.
the stmt gets cleaned up during the cleanup stage - i think that's okay?
| // SERVERPROPERTY('EngineEdition') can be used to determine whether the db server is SQL Azure. | ||
| // It should return 6 for SQL Azure DW. This is more reliable than @@version or serverproperty('edition'). | ||
| // Reference: http://msdn.microsoft.com/en-us/library/ee336261.aspx | ||
| // |
There was a problem hiding this comment.
This topic is no longer available
We're sorry—the topic that you requested is no longer available. Use the search box to find related information.
Improves the batch insert operation performace against Azure DW.
Fixes issue #331.