perf: cache prepared statements across .prepareStatement calls#319
perf: cache prepared statements across .prepareStatement calls#319davecramer merged 2 commits intopgjdbc:masterfrom
Conversation
95b0ff4 to
280f5e3
Compare
54454c1 to
d8c95d4
Compare
This saves a bit of cycles when encoding the query to UTF8 (the whole query can be encoded at once)
This allows to use server-side prepared statements when application uses the same SQL multiple times
d8c95d4 to
5642abc
Compare
There was a problem hiding this comment.
This should probably be called toIntArray as it is explicitly converting from a List.
perf: cache prepared statements across .prepareStatement calls
There was a problem hiding this comment.
Pretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.
There was a problem hiding this comment.
Do we want to use this ? Seems a bit esoteric ?
Dave Cramer
On 28 June 2015 at 11:20, Sehrope Sarkuni [email protected] wrote:
In doc/pgjdbc.xml
#319 (comment):
(see <varname>prepareThreshold</varname>) even if the prepared statement isclosed after each execution. The value of 0 disables the cache.<note><para>Each connection has its own statement cache.</para></note></para></listitem></varlistentry>
<varlistentry><term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term><listitem><para>Determine the maximum size (in mebibytes) of the prepared queries cachePretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.
—
Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33423738.
There was a problem hiding this comment.
Well technically it's the correct term for it.
There was a problem hiding this comment.
Dave Cramer
On 28 June 2015 at 12:02, Sehrope Sarkuni [email protected] wrote:
In doc/pgjdbc.xml
#319 (comment):
(see <varname>prepareThreshold</varname>) even if the prepared statement isclosed after each execution. The value of 0 disables the cache.<note><para>Each connection has its own statement cache.</para></note></para></listitem></varlistentry>
<varlistentry><term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term><listitem><para>Determine the maximum size (in mebibytes) of the prepared queries cacheWell technically it's the correct term for it.
Yes, but not very well known.
—
Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33424088.
|
Haven't had a chance to go through this yet (skimming through it now) but overall I like the idea. I wonder if it'd be worth it to have this type of caching done at the Driver level so that it can be shared across multiple connections. That way a pool of connections would share a single cache, even as connections are added/removed from the pool (ex: pool expiration). For that to happen it'd at least have to be thread safe. I don't think there's a direct concurrent replacement for LinkedHashMap so that'd need to be sorted out. Also, we may need separate caches per connection URL as the parsing of a particular query is dependent on connection level settings. It's a trade off of code complexity vs a wider gain. For a pool of size N using K memory, it'd be (N-1) x K savings so I guess it'd be worth it for a large pool or diverse workload. If it's too complicated (either to implement or we just decide it's too risky) then I wouldn't want to hold off the version of the patch that is only done within a connection. Just something to consider... |
|
One more thought ... how bout an option to not cache queries that are larger than a given size? In most real-world applications the majority of queries are relatively small though every now and then there will be some dynamic SQL that is huge. Usually they're large dynamically generated reporting queries. Although they'd eventually get washed out of the cache, it's pointless to cache them as they're one offs. At the very least we shouldn't bother trying to cache things that are greater than the max size of the cache. Otherwise one gigantic query would evict all the prior smaller queries (and itself!). |
|
On 28 June 2015 at 11:37, Sehrope Sarkuni [email protected] wrote:
|
|
On 28 June 2015 at 11:42, Sehrope Sarkuni [email protected] wrote:
good point!
|
I was only thinking about caching the driver side parsing of the sql String. The idea being that there is no need to repeatedly parse the same sql as it'll always have the same bind var positions and native representation (i.e. I haven't really gone through the patch and didn't realize that it caches the server side prepared statements and is intended to reuse those as well. Thinking that through a bit, I wonder if the two things can be split out. There's no reason you have to use server side prepared statements to benefit from driver side parse caching right? |
It is not in JDK, but it is does support concurrency and expiration.
I've filed #344 to cover that.
I did intend to split those two items, however I prefer filing another PR to cover "inter-connection" caching of query texts. Here is the issue: #345 @davecramer, technically I did not expected to get PR merged in as-is while still having non-resolved comments (I was having a good vacation, so I was not very responsive). I'll file another "cleanup" PR. |
Make fields of JdbcCallParseInfo final, simplify CallableQueryKey's equals/hashCode
|
On 29 June 2015 at 05:25, Vladimir Sitnikov [email protected]
I'd like to release snapshots. I can wait to release a version until it is |
Here's the cleanup PR: #346 |
|
Merged, thanks! Dave Cramer On 29 June 2015 at 08:20, Vladimir Sitnikov [email protected]
|
This allows to use server-side prepared statements when application uses the same SQL multiple times.
Major difference from existing
setPrepareThresholdis with current fix application does not have to reference exactly the samePreparedStatement. When a new statement is prepared, it is checked against existing statements in the cache. If there is a match, parsing result and server-side named query (!) are reused.Note: this fix comes on top of #311.
Note: I believe the code is feature complete
Cache size is configurable. It defaults to 256 queries and 5 MiB maximum per connection.
Feel free to review, add your comments, and merge.
Some measurements:
On a simple
select ?, ?, ?...the response time is 20-40% better (78us -> 46us for 20 binds).Execution of cached statements allocates 50% less garbage.
Before
After
Pathological case when
prepareStatementis called with random sql texts.Before
After