Driver version or jar name
mssql-jdbc version 6.4.0.jre8
SQL Server version
We support MS SQL Server versions back to 2012, and have been testing on version 11.0.5058.0
Client operating system
We support Windows/Linux/Mac/various other flavors of UNIX, have been testing with the client+ JDBC driver on Linux (with the SQL Server on Windows, obviously)
Java/JVM version
1.8
Table schema
(large, complex, and irrelevant to this issue)
Problem description
We recently switched our code from using the old sqljdbc42 4.2.6420.100 MS SQL Server JDBC driver to using mssql-jdbc version 6.4.0.jre8 -- we encountered a significant (O(20-35%)) performance hit when doing so, which we have been investigating. Our performance test setup uses a repeating pattern of O(100) distinct read queries of large SQL text length (up to around 16kb) that were generated by Hibernate with some queries repeated many times withing each overall repeat pattern, each individual query returning only a small amount of data, with the client and database on two different machines in the same datacenter. Thus prepared statement caching is a big win-- the connection pool we are using no longer provides this, which is why we switched to mssql-jdbc version 6.4.0.jre8.
Using the YourKit Java profiting tool, I tracked down part of the problem to the hashing code for the JDBC driver's prepared statement cache, which is using SHA-1 hashing (which it delegates to the Java security system, which delegates it to our chosen security provided library). For this test setup 10-15% of the total test time was being spent performing SHA-1 hashing in order to check the prepared statement cache -- quite unacceptable overhead for a JDBC driver! (We experimented with simply turning the caching off, but alas that made things even slower -- despite its current heavy hashing overhead, the prepared statement cache is still 5-10%faster on average than repreparing every statement.)
Here is the relevant section of the stack trace (in caller-first order) when this SHA-1 behavior was happening:
... (Hikari connection pool library) ...
ProxyPreparedStatement.java:52 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery()
SQLServerPreparedStatement.java:401 com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(TDSCommand)
SQLServerStatement.java:204 com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(TDSCommand)
SQLServerStatement.java:224 com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(TDSCommand)
SQLServerConnection.java:2713 com.microsoft.sqlserver.jdbc.TDSCommand.execute(TDSWriter, TDSReader)
IOBuffer.java:7344 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute()
SQLServerPreparedStatement.java:479 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement$PrepStmtExecCmd)
SQLServerPreparedStatement.java:536 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.reuseCachedHandle(boolean, boolean, String)
SQLServerPreparedStatement.java:960 com.microsoft.sqlserver.jdbc.SQLServerConnection$Sha1HashKey.(String, String, String)
SQLServerConnection.java:126 com.microsoft.sqlserver.jdbc.SQLServerConnection$Sha1HashKey.(String)
SQLServerConnection.java:130 java.security.MessageDigest.digest(byte[])
MessageDigest.java:410 java.security.MessageDigest.update(byte[])
... (BouncyCastle security provider library) ...
Since what is being hashed is obviously the SQL statement before parameter values are applied, that should not be under untrusted end-user control (or if it is, it's extremely hard not to leave your self open to DROP-TABLE-type SQL injection attacks from your end-users). So there should be no concern over a malicious end-user deliberately forcing cache collisions by generating two arbitrary SQL queries carefully chosen so that they SHA-1 hash to the same value (as opposed to choosing arbitrary parameter values). Thus using a cryptographic-strength hash is wasteful in this context -- any hash with sufficient size and sufficiently good hashing properties to avoid accidental collisions will do, and the selection should be based on performance, which any crypto-strength hash is going to lose.
I brought this issue to the attention of Brett Wooldridge, who I understand originally contributed this code, see:
https://groups.google.com/forum/#!topic/hikari-cp/OTsKIFd3joY
and he tells me that he submitted it using CityHash128, a fast (high-throughput-speed) non-cryptographic-strength 128-bit hash, see:
#291 (review)
It appears that somehow this got switched (back?) to SHA-1 before 6.4 was released. I strongly recommend that you switch back to using CityHash128 as he contributed [or, if there is some license issue with CityHash128, then to some comparable high-throughput-speed non-cryptographic-strength hash] -- the performance with SHA-1 is unacceptably slow.
[If for some reason you feel that a few customers may have a security profile where the use of SHA-1 might actually be necessary, since their end users can issue arbitrary SQL queries (but are somehow blocked from SQL injection attacks), then I strongly recommend that you make the choice of whether to use fast hashing or cryptographic-strength hashing be configurable e.g. by a JDBC URL parameter, and that you mention in the documentation for that parameter that the use of cryptographic-strength hashing imposes a significant performance overhead, and should be avoided except in security scenarios where it is actually necessary (and that constructing a security scenario where that is the case but that isn't open to SQL injection attacks is challenging).]
Expected behavior and actual behavior
Expected behavior: SQL hashing for the prepared statement cache takes negligible time
Actual behavior: SHA-1 SQL hashing for the prepared statement cache takes the majority of all time used by the JDBC driver, and 10-15% of the entire run-time in our performance test case
Repro code
See problem description above -- in short, turn prepared statement caching on and make use of the prepared statement cache for any realistic Hibernate-like read load that includes repeating fairly long read queries that individually retrieve fairly small amounts of data.
Driver version or jar name
mssql-jdbc version 6.4.0.jre8
SQL Server version
We support MS SQL Server versions back to 2012, and have been testing on version 11.0.5058.0
Client operating system
We support Windows/Linux/Mac/various other flavors of UNIX, have been testing with the client+ JDBC driver on Linux (with the SQL Server on Windows, obviously)
Java/JVM version
1.8
Table schema
(large, complex, and irrelevant to this issue)
Problem description
We recently switched our code from using the old sqljdbc42 4.2.6420.100 MS SQL Server JDBC driver to using mssql-jdbc version 6.4.0.jre8 -- we encountered a significant (O(20-35%)) performance hit when doing so, which we have been investigating. Our performance test setup uses a repeating pattern of O(100) distinct read queries of large SQL text length (up to around 16kb) that were generated by Hibernate with some queries repeated many times withing each overall repeat pattern, each individual query returning only a small amount of data, with the client and database on two different machines in the same datacenter. Thus prepared statement caching is a big win-- the connection pool we are using no longer provides this, which is why we switched to mssql-jdbc version 6.4.0.jre8.
Using the YourKit Java profiting tool, I tracked down part of the problem to the hashing code for the JDBC driver's prepared statement cache, which is using SHA-1 hashing (which it delegates to the Java security system, which delegates it to our chosen security provided library). For this test setup 10-15% of the total test time was being spent performing SHA-1 hashing in order to check the prepared statement cache -- quite unacceptable overhead for a JDBC driver! (We experimented with simply turning the caching off, but alas that made things even slower -- despite its current heavy hashing overhead, the prepared statement cache is still 5-10%faster on average than repreparing every statement.)
Here is the relevant section of the stack trace (in caller-first order) when this SHA-1 behavior was happening:
... (Hikari connection pool library) ...
ProxyPreparedStatement.java:52 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery()
SQLServerPreparedStatement.java:401 com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(TDSCommand)
SQLServerStatement.java:204 com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(TDSCommand)
SQLServerStatement.java:224 com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(TDSCommand)
SQLServerConnection.java:2713 com.microsoft.sqlserver.jdbc.TDSCommand.execute(TDSWriter, TDSReader)
IOBuffer.java:7344 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute()
SQLServerPreparedStatement.java:479 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement$PrepStmtExecCmd)
SQLServerPreparedStatement.java:536 com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.reuseCachedHandle(boolean, boolean, String)
SQLServerPreparedStatement.java:960 com.microsoft.sqlserver.jdbc.SQLServerConnection$Sha1HashKey.(String, String, String)
SQLServerConnection.java:126 com.microsoft.sqlserver.jdbc.SQLServerConnection$Sha1HashKey.(String)
SQLServerConnection.java:130 java.security.MessageDigest.digest(byte[])
MessageDigest.java:410 java.security.MessageDigest.update(byte[])
... (BouncyCastle security provider library) ...
Since what is being hashed is obviously the SQL statement before parameter values are applied, that should not be under untrusted end-user control (or if it is, it's extremely hard not to leave your self open to DROP-TABLE-type SQL injection attacks from your end-users). So there should be no concern over a malicious end-user deliberately forcing cache collisions by generating two arbitrary SQL queries carefully chosen so that they SHA-1 hash to the same value (as opposed to choosing arbitrary parameter values). Thus using a cryptographic-strength hash is wasteful in this context -- any hash with sufficient size and sufficiently good hashing properties to avoid accidental collisions will do, and the selection should be based on performance, which any crypto-strength hash is going to lose.
I brought this issue to the attention of Brett Wooldridge, who I understand originally contributed this code, see:
https://groups.google.com/forum/#!topic/hikari-cp/OTsKIFd3joY
and he tells me that he submitted it using CityHash128, a fast (high-throughput-speed) non-cryptographic-strength 128-bit hash, see:
#291 (review)
It appears that somehow this got switched (back?) to SHA-1 before 6.4 was released. I strongly recommend that you switch back to using CityHash128 as he contributed [or, if there is some license issue with CityHash128, then to some comparable high-throughput-speed non-cryptographic-strength hash] -- the performance with SHA-1 is unacceptably slow.
[If for some reason you feel that a few customers may have a security profile where the use of SHA-1 might actually be necessary, since their end users can issue arbitrary SQL queries (but are somehow blocked from SQL injection attacks), then I strongly recommend that you make the choice of whether to use fast hashing or cryptographic-strength hashing be configurable e.g. by a JDBC URL parameter, and that you mention in the documentation for that parameter that the use of cryptographic-strength hashing imposes a significant performance overhead, and should be avoided except in security scenarios where it is actually necessary (and that constructing a security scenario where that is the case but that isn't open to SQL injection attacks is challenging).]
Expected behavior and actual behavior
Expected behavior: SQL hashing for the prepared statement cache takes negligible time
Actual behavior: SHA-1 SQL hashing for the prepared statement cache takes the majority of all time used by the JDBC driver, and 10-15% of the entire run-time in our performance test case
Repro code
See problem description above -- in short, turn prepared statement caching on and make use of the prepared statement cache for any realistic Hibernate-like read load that includes repeating fairly long read queries that individually retrieve fairly small amounts of data.