Cleaner and tighter.#291
Conversation
|
@TobiasSQL Again, possibly best viewed by just pulling my branch local (note it is |
f987f68 to
9e73ba8
Compare
Codecov Report
@@ Coverage Diff @@
## metadataCaching #291 +/- ##
=====================================================
- Coverage 37.89% 37.45% -0.44%
+ Complexity 1992 1750 -242
=====================================================
Files 107 106 -1
Lines 27959 24606 -3353
Branches 4846 3993 -853
=====================================================
- Hits 10595 9217 -1378
+ Misses 15553 13762 -1791
+ Partials 1811 1627 -184
Continue to review full report at Codecov.
|
|
@brettwooldridge, thanks for signing the contribution license agreement. We will now validate the agreement and then the pull request. |
c05f431 to
2310df8
Compare
|
@TobiasSQL Sorry I kept poking at this, squashing and pushing tweaks. I'll resolve the above conflicts that have surfaced. I have separate change set I'm working on, built on top of these changes, that are showing 3-10x improvements in transactions per second across the board (both prepared statements and regular statements). Rather than continually piling on the |
2310df8 to
e82748f
Compare
|
Rebased. All tests passing. Feel free to ask about any particular change. |
e82748f to
1134b5e
Compare
1134b5e to
8296420
Compare
|
In summary... Observation: The prepared statement handle cache and prepared metadata cache are largely orthogonal. For example, the handle cache can be enabled, but the metadata cache disabled, and vise-versa.
Running multi-threaded oltpbenchmark harness's JPAB benchmark with ~75,000 transactions and eight clients under the profiler also showed 0 thread contentions on any of the cache collections (handle or metadata). |
|
Thanks a ton @brettwooldridge! I have started going through this and merging with changes I have been working on, posting comments as I go :-) |
|
@brettwooldridge, thanks a lot for your help in improving the performance of our JDBC driver! We are currently working on stabilizing the code to prepare for our next release. This includes running a bunch of automated tests in our test framework, which we haven't managed to put on GitHub yet. I've told @TobiasSQL that we need to merge these metadata caching changes ASAP to start stabilizing. Him and I agree that we should take your updated API names. @TobiasSQL is working on merging changes from your branch, and will continue to take more after stabilizing. Please let us know if there's anything, especially related to public API names, that you strongly feel we should get into the next release. Again, thank you so much for your help thus far. We really appreciate your time! 😄 -Andrea |
|
@ajlam @TobiasSQL Thanks to everyone for moving the driver forward, and huge kudos to Microsoft for open sourcing the driver in the first place! I don't envision any changes to the public API. I am working on a set of caching improvements based on some profiling runs that I think will provide a large performance increase, particularly for patterns employed by ORM frameworks. I don't have a solid completion date, but possibly within the week. |
|
Thanks @brettwooldridge, I just finished a push into metadataCaching fixing some issues found by tests in our old infrastructure (related to cursors). Also cleaned up public APIs a bit, would be great if you take a peek at those (especially related to "discard actions/unprepare") so we can make any changes before the next public release. As @ajlam mentioned, once we have things stabilized for the release we should continue the optimizations, hopefully we can then just take your PRs directly :-) |
|
Some comparables of current progress using the oltpbenchmark JPAB harness. Using the v6.1.6-preview driver: From my More work to do ... the big wins (eg. ResultSet metadata caching) aren't in there yet. |
|
|
||
| static class Sha1HashKey { | ||
| private byte[] bytes; | ||
| static class CityHash128Key { |
There was a problem hiding this comment.
Can you share why you switched to this hash from SHA1?
There was a problem hiding this comment.
Simply performance. On a local benchmark I measured, for 100,000 hashes:
SHA1 Hash Total ns: 124926909 (~125ms)
CityHashCrc128 Total ns: 23457110 (~23.5ms)
CitiHash128 passes the smhasher with no collisions, which is what we're really after as much as performance.
There was a problem hiding this comment.
Makes sense :-) Just wanted to dbl check. Thx!
|
|
||
| /** Get prepared statement cache entry if exists, if not parse and create a new one */ | ||
| static ParsedSQLCacheItem getOrCreateCachedParsedSQLMetadata(Sha1HashKey key, String sql) throws SQLServerException { | ||
| static ParsedSQLCacheItem getOrCreateCachedParsedSQL(CityHash128Key key, String sql) throws SQLServerException { |
There was a problem hiding this comment.
It seems clearer to rename the class to ParsedSQLMetadata rather than removing the word from the method. I.e. fundamentally this method returns metadata derived from parsing the SQL text.
There was a problem hiding this comment.
@TobiasSQL I disagree here. "Metadata" in the JDBC sense is typically either database metadata (Connection.getMetaData()), column metadata (PreparedStatement.getParameterMetaData()), or result metadata (ResultSet.getMetaData()). In this case, the cache is simply the pre-processed SQL text of the statement, without respect to type information.
There was a problem hiding this comment.
I see. Then I agree with you, makes sense :-)
| return false; | ||
| else | ||
| return handleRefCount.compareAndSet(0, -999); | ||
| public int getRefCount() { |
There was a problem hiding this comment.
I am proposing abstracting the ref counting to this method by using "higher level" methods that return boolean values depending on success/failure. Basically tryAddReference returns true only if the add was successful and the cached handle can be used, likewise discardHandle only returns true if the handle can actually get killed. These are both atomic and remove the need for checking the actual ref. count outside of this class.
There was a problem hiding this comment.
WIthing giving it too much thought, that seems reasonable.
| // * another thread is already doing the work of un-preparing, OR | ||
| // * number of handles in the map have not crossed the threshold (optimization) | ||
| final int threshold = getServerPreparedStatementDiscardThreshold(); | ||
| if (!force && (isUnprepareInProgress.get() || threshold >= preparedStatementHandleMap.size())) |
There was a problem hiding this comment.
The use of the concurrent queue before removes the need for "isUnprepareInProgress" and two of these could actually be executed concurrently. I know this uses a single cache but the queue seems like a simpler solution that solves the issue of unpreparing in batches.
There was a problem hiding this comment.
@TobiasSQL The only issue with a queue is that a handle is either in or out. I'm not sure how the semantics work in that case. In theory, a handle that reaches refcount 0 is reclaimable, but that is the case immediately after a statement is executed. Before, the tie to the metadata cache, and the eviction, artifically kept handles alive. But if that relationship is severed, as discussed above, claiming handles as soon as they reach refcount 0 seems inefficient.
In the case of the concurrent collection, a handle that as reached refcount 0 always has the possibility of being "rescued" by a new statement execution -- but only because it remains in the live collection. In the case of a true unprepare queue, once a handle is enqueued, its fate is sealed.
Now, maybe something like an eviction-based collection, dedicated to handles (separate from the metadata) could work, if the behavior is LRU in nature...
Thoughts?
There was a problem hiding this comment.
Agreed. We can't move handles to the queue simply at refcount 0, they need to actually fall out of the cache or be explicitly discarded before being moved to the queue (which is the case right now).
The downside with the queue approach is that handles live for some time in the queue where you could argue that they could be re-used but they won't be because they are no longer in the cache. My thought is that the queue simplifies the discard process and that these serve two distinct purposes, one is for batching un-prepares together to make sure we don't do unnecessary round trips and the cache is there to make us re-use handles to both reduce round trips and to minimize network traffic (lesser payload).
There was a problem hiding this comment.
@TobiasSQL I think it is workable either way you want to go. In practice, I would recommend that users set both the metadata cache size and batch handle discard thresholds to the same value, which basically ensures a large amount of handle reuse and minimizes network chatter (and server CPU due to re-prepares).
|
|
||
| // Attempt to borrow the statement handle, if statement handle caching is enabled, this will be 0 if not. | ||
| prepStmtHandle = connection.borrowStatementHandle(cacheKey); | ||
| if (0 < prepStmtHandle) |
There was a problem hiding this comment.
This removes the use of "has already been executed once", doesn't it? Seems like we will always do sp_prepexec followed by sp_execute in this case. The sp_executesql use removes the need for creating handles for statements that are used only once which is a very common use case. Maybe I am missing something.
There was a problem hiding this comment.
I think you're missing something -- unless I'm missing something. 😁
If a new PreparedStatement is created, for which no handle exists, the handle remains at 0 (uninitialized) and therefore sp_executesql will be used. Once sp_prepexec has run (i.e. the same statement is executed a second time) then the handle will be cached. Subsequent PreparedStatements with the same SQL text will acquire the same handle, and skip the sp_executesql, going directly to sp_execute. Right?
There was a problem hiding this comment.
Correctamundo :-) The issue I have is the following scenario:
- New stmt with SQL text & param definitions A is created & executed & closed: sp_executesql
- New stmt with SQL text & param definitions A is created & executed & closed: ?
In this implementation #2 will do sp_executesql again where I would argue it should move to sp_prepexec so the third execution starts using a handle.
There was a problem hiding this comment.
@TobiasSQL I pushed a small change to account for this. Basically, I "cheat" a little by leveraging the parsed SQL cache. Presence of a query in the parsed SQL cache is proof that we have executed that SQL before, even without the knowledge of handles and whotnot.
An edge-case of my change is that execution of SQL by a SQLServerStatement also creates a cache entry. Later execution of identical SQL through a SQLServerPreparedStatement would be viewed as isExecutedAtLeastOnce. Not a bad thing by any means, but worth pointing out.
There was a problem hiding this comment.
I was going to rebase my change onto mssql/metadataCaching, but basically every change was a merge conflict, so I opted to make the spot change for this in my metadataCaching2 branch.
|
|
||
| /** Cache of prepared statement handles */ | ||
| private ConcurrentLinkedHashMap<Sha1HashKey, PreparedStatementCacheItem> preparedStatementCache; | ||
| private ConcurrentLinkedHashMap<CityHash128Key, PreparedStatementMetadata> preparedStatementMetadataCache; |
There was a problem hiding this comment.
After looking at this quite a bit it seems reasonable to assume the use of parameter metadata is different from statement handles. As such we should probably separate these into two caches, this would also make the implementation cleaner. Doing this now in metadataCaching branch. Thoughts? Seems like you are heading in that direction as well.
There was a problem hiding this comment.
I keep going back and forth on this. In general, is a Good Ideatm to set them in lock-step in terms of size, but abstractly there is no implementation reason to couple them. I'm fine with snipping the thread between them -- only two places to cut.
There was a problem hiding this comment.
@TobiasSQL Do you want me to sever the coupling between the two caches and push it to my metadataCache2 branch? Or is that something you want to look at post-6.1.8?
There was a problem hiding this comment.
I already did that in metadataCaching :-)
proof that this is not the first time we have executed this statement.
| } | ||
| else { | ||
| parsedSql = parseAndCacheSQL(cacheKey, sql); | ||
| } |
There was a problem hiding this comment.
I actually like this for another reason. The parsed sql cache is across connections, whereas the handle cache is per-connection. In connection pooling environments, users often have pools of say 50 (or even 100) connections, so executing PreparedStatement A only has a 1:50 chance of executing on the same connection later -- and therefore could avoid getting prepared for quite a while, even though it may be executed semi-frequently.
Using presence in a cache that cuts across connections as the indicator of isExecutedAtLeastOnce is going "recognize" multiple identical executions on seperate connections immediately. For ORMs this is going to be a big boost, because the generated SQL is always identical -- Hibernate uses PreparedStatements across the board -- but there is additionally almost always a pool present.
There was a problem hiding this comment.
I was thinking the same thing at first but then arrived at the opposite conclusion. My comment is that since prepared handles can only be reused on the same connection the fact that a statement gets re-used across connections is void. In addition we don't include parameter definitions in the parsed SQL key (which we should not do, so that is great), which we need to in the handle cache. So my conclusion is that we really should not "skip" sp_executesql for the same text across connections.
There was a problem hiding this comment.
@TobiasSQL Hang on a sec. 😁 This is a JDBC driver, right? I think if you survey the landscape you'll find that PreparedStatements involving identical SQL are overwhelmingly executed across connections, by a large margin.
JEE containers, Spring containers, web containers (Tomcat/Jetty), JPA persistence, almost without fail acquire a connection from a pool, execute some business logic involving static (but parameterized) SQL, via PreparedStatement, and return the connection to the pool. The next web request that comes in, timer that fires, or event arrives triggers its execution again. The sooner that commonly executed SQL gets prepared across all connections in the pool, the faster the application will respond.
Skipping sp_executesql after observing multiple executions of identical SQL is not the same as skipping sp_prepsql and going directly to sp_execute. Which by the way is what my speedify branch does, because it caches input parameter metadata and goes directly to sp_execute for identical SQL with identical parameter types, additionally using the fReuseMetaData flag (and fNoMetaData, because the column info metadata coming out is cached as well).reference
Functionally, using sp_executesql within the PreparedStatement is purely optional, but I do recognize the case for dynamic SQL executed via PreparedStatement. If it is truly dynamic, the behavior above is exactly as you would expect/want. If it is quasi-dynamic wherein identical SQL is being executed with some frequency, I would argue that it would benefit from the optimizations that (will) come with a prepared handle by skipping the sp_executesql ... and even the sp_prepsql (after third execution) when those changes land.
Keep in mind that "is being executed with some frequency" means multiple executions of identical SQL within the timeframe in which parsed SQL has not rolled out of the cache. With a lot of (truly) dynamic SQL the cache is going to roll very often. We're talking about identical SQL executed within a narrow time horizon being "promoted" to sp_prepsql (regardless of connection affiliation).
There was a problem hiding this comment.
Providing a real world example from just yesterday...
I was trying to run the oltpbench/epinions benchmark against my speedify branch, but I wasn't seeing any performance gain at all. Zero. So, I added some logging to see what was going on...
The benchmark code does this:
- Grab a connection from the pool (in a try-with-resources block)
- Prepare a statement (
conn.prepareStatement(sql) - Set parameters and execute
- Return the connection to the pool (exit the
try) - Rinse and repeat
No single PreparedStatement is ever executed more than once. So, no sp_prepsql was occurring, only sp_executesql.
Then I realized ... that is what my company's application, and most Java web applications, do in 90% of the cases.
- The user types a search, enters an order, etc. and then clicks "go" (or "save", or "search")
- The web request comes in, grab a connection from the pool
- Prepare a statement
- Set parameters and execute
- Close the connection (back to the pool), and return results back to the user
Again, individual PreparedStatement instances are rarely executed multiple times, so no sp_prepsql. However, there are thousands of users doing this! That SQL needs to be prepared!
And a user of SQL Server JDBC shouldn't have to force prepare-on-first-execute across the board. But you can see, in order to actually realize any of the optimizations we've been making on this branch, that is exactly what I had to do.
And not only for the epinions benchmark, but also the jpab benchmark. jpab is pure Hibernate, so that is exactly what Hibernate is doing -- executing, on-shot, without reuse (of the instance). Surveying the suite of the ~15 benchmarks in that suite, developed by different individuals, nearly all of them require that.
There was a problem hiding this comment.
I see why your test would run only sp_executesql with your previous implementation. However, if you kept "already executed once" as part of the handle cache you would not see that behavior, this is why I added that in the first place :-)
I guess the difference of treating the parse cache hit as a "first execution" is that we assume that we will have >2 executions of the same handle per connection rather than > 1. This doesn't seem like a huge difference to me. We should also remember the cost of the unprepare calls. At the end of the day the re-use of the handle will only occur within the same connection.
So I definitely agree with the scenario you describe, just not convinced it is a huge difference to keep the "has executed once" on the parse SQL level (or that it is even better).
There was a problem hiding this comment.
Just to clarify, as far as I see it this would be the only difference between the two implementations, i.e. at the end of the day you will get handle re-use in both. For the "single call on connection" case the connection level is better, for the other cases I would say difference will be negligible (happy to be proven wrong though :-) ):
1 call
sp_executesql | sp_prepexec
| sp_unprepare
2 calls
sp_executesql | sp_prepexec
sp_prepexec | sp_execute
sp_unprepare | sp_unprepare
3+ calls
sp_executesql | sp_prepexec
sp_prepexec | sp_execute
sp_execute | sp_execute
... | ...
sp_unprepare | sp_unprepare
There was a problem hiding this comment.
I see why your test would run only sp_executesql with your previous implementation. However, if you kept "already executed once" as part of the handle cache you would not see that behavior, this is why I added that in the first place :-)
I don't understand how that would have made any difference in the oltpbench epinions or jpab benchmarks (or Hibernate in general), where a single PreparedStatement instance is never executed more than once. If execution for a single statement never reaches sp_prepsql then no handle would be created, right? The benchmarks below seem to bear this out.
For the "single call on connection" case the connection level is better, for the other cases I would say difference will be negligible (happy to be proven wrong though :-)
If the difference is negligible, I don't know what we're doing. 😁 Fortunately, the difference is not negligible. This is the oltpbench/epinions benchmark with the current metadataCaching branch, with enablePrepareOnFirstPreparedStatementCall=false:
Benchmark: EPINIONS {com.oltpbenchmark.benchmarks.epinions.EpinionsBenchmark}
Configuration: config/sample_epinions_config.xml
Type: SQLSERVER
Driver: com.microsoft.sqlserver.jdbc.SQLServerDriver
URL: jdbc:sqlserver://10.0.40.11:1433;databaseName=epinions;statementPoolingCacheSize=100;serverPreparedStatementDiscardThreshold=100;enablePrepareOnFirstPreparedStatementCall=false;
Isolation: TRANSACTION_READ_COMMITTED
Scale Factor: 100.0
17:26:53,982 (DBWorkload.java:260) INFO - ======================================================================
17:26:53,994 (DBWorkload.java:790) INFO - Creating 10 virtual terminals...
[main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
[main] INFO com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
17:26:54,627 (DBWorkload.java:795) INFO - Launching the EPINIONS Benchmark with 1 Phase...
17:26:54,639 (ThreadBench.java:325) INFO - PHASE START :: [Workload=EPINIONS] [Serial=false] [Time=120] [Rate=10000] [Arrival=REGULAR] [Ratios=[10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 10.0, 20.0]] [ActiveWorkers=10]
17:26:54,640 (ThreadBench.java:472) INFO - MEASURE :: Warmup complete, starting measurements.
17:28:54,636 (ThreadBench.java:428) INFO - TERMINATE :: Waiting for all terminals to finish ..
17:28:54,639 (ThreadBench.java:233) INFO - Starting WatchDogThread
17:28:54,787 (DBWorkload.java:801) INFO - ======================================================================
17:28:54,787 (DBWorkload.java:802) INFO - Rate limited reqs/s: Results(nanoSeconds=120000780351, measuredRequests=468917) = 3907.6162557312264 requests/sec
17:28:54,789 (DBWorkload.java:675) INFO - Output Raw data into file: results/oltpbench.22.csv
17:28:58,977 (DBWorkload.java:613) INFO - ======================================================================
17:28:58,978 (DBWorkload.java:614) INFO - Workload Histograms:
GetReviewItemById/01 [37559] ****************************************
GetReviewsByUser/02 [37486] ****************************************
GetAverageRatingB... [37548] ****************************************
GetItemAverageRat... [37485] ****************************************
GetItemReviewsByT... [37568] ****************************************
UpdateUserName/06 [37808] ****************************************
UpdateItemTitle/07 [37365] ***************************************
UpdateReviewRatin... [37629] ****************************************
UpdateTrustRating/09 [74947] ********************************************************************************
Again, because no PreparedStatement instance is ever executed more than once, there are no handles to be shared. Here is the same test with enablePrepareOnFirstPreparedStatementCall=true:
GetReviewItemById/01 [46663] ***************************************
GetReviewsByUser/02 [47003] ****************************************
GetAverageRatingB... [46806] ****************************************
GetItemAverageRat... [47010] ****************************************
GetItemReviewsByT... [47139] ****************************************
UpdateUserName/06 [46886] ****************************************
UpdateItemTitle/07 [46643] ***************************************
UpdateReviewRatin... [47369] ****************************************
UpdateTrustRating/09 [93408] ********************************************************************************
And oltpbench/jpab (Hibernate) benchmark with enablePrepareOnFirstPreparedStatementCall=false:
Persist/01 [ 5641] *****
Retrieve/02 [85171] ********************************************************************************
Update/03 [17168] ****************
Delete/04 [ 5780] *****
And with enablePrepareOnFirstPreparedStatementCall=true:
Persist/01 [ 6352] *****
Retrieve/02 [92136] ********************************************************************************
Update/03 [18669] ****************
Delete/04 [ 6128] *****
There was a problem hiding this comment.
Thanks for the analysis!
I will take a deeper look into this. This statement doesn't make sense to me.
Again, because no PreparedStatement instance is ever executed more than once, there are no handles to be shared.
That is why we have the cache and move directly to sp_prepexec if we see that another instance has already done sp_executsql before. This should make the new instance re-use the state from the previous instance and create the handle. Subsequent instance should re-use the handle.
// Because sp_executesql was already called on this SQL-text use
// regular prep/exec pattern.
if (cachedPreparedStatementHandle.hasExecutedAtLeastOnce())
isExecutedAtLeastOnce = true;
This may simply be an indication that the default cache size of 10 is too small.
I am trying to figure out a failure we have with multi-threaded JDBC-batch execution, as soon as I am done there I will dig into this :-)
There was a problem hiding this comment.
Ok, after wasting my time with my own analysis I concluded you are right @brettwooldridge :-) A larger cache size would fix the problem, just would need to be substantially larger...
What I completely missed is that the cache hit ratio with enablePrepareOnFirstPreparedStatementCall=true is 1/queriesPerCacheSize given uniformly random usage of unique queries. This would mean 100% cache hit ratio where cache size is at least as large as total number of queries and 50% where the cache size is half the number of unique queries. The cache hit ratio for the default of enablePrepareOnFirstPreparedStatementCall=false is 1/queriesPerCacheSize^2 (i.e. where cache size is half the number of queries the hit rate goes from 50% to 25%).
When using your change depending on the parsing cache for isExecutedAtLeastOnce the cache hit ratio is close to the same as enablePrepareOnFirstPreparedStatementCall=false while still using sp_executesql when unique queries are executed (or queries are rarely re-used). Consider me completely on-board with the change of having isExecutedAtLeastOnce be dependent on the parsed SQL text cache!
I have pushed this change to metadataCaching and hope to get it into the release :-)
Thanks for being persistent @brettwooldridge!
PS I also believe I fixed the final bug for the not-yet-GitHub:ed tests :-)
|
@TobiasSQL : I guess your metadatacaching includes this fix too. If so we can close this PR. |
|
Yes, I think this particular PR I have picked up most everything from into my PR. Brett has been doing a bunch of more work that he was planning to issue a PR for as soon as my changes make it back to master (when do you expect that to happen?). @brettwooldridge is that accurate? |
|
@TobiasSQL That is accurate. |
|
Closing this PR as @TobiasSQL already include this in his patch. @TobiasSQL : Your patch is already in Master & DEV.
|
|
@v-nisidh, oh, I did not realize (maybe I should have looked...). Hey @brettwooldridge that means we are ready for you to start PRing your speedify stuff :-) |
I did rename one or two public methods, but AFAIK they are all still pre-release, so now is the time to get them right.
I tested performance between the before/after and find no difference. I did add a multi-threaded test to try to detect any race conditions (haven't found any in a few hundred runs), and it can also serve as a decent quick test balloon on performance because it is designed to generate maximum contentention.