perf: cache prepared statements across .prepareStatement calls by vlsi · Pull Request #319 · pgjdbc/pgjdbc

vlsi · 2015-06-13T10:57:45Z

This allows to use server-side prepared statements when application uses the same SQL multiple times.

Major difference from existing setPrepareThreshold is with current fix application does not have to reference exactly the same PreparedStatement. When a new statement is prepared, it is checked against existing statements in the cache. If there is a match, parsing result and server-side named query (!) are reused.

Note: this fix comes on top of #311.

Note: I believe the code is feature complete

Cache size is configurable. It defaults to 256 queries and 5 MiB maximum per connection.
Feel free to review, add your comments, and merge.

Some measurements:
On a simple select ?, ?, ?... the response time is 20-40% better (78us -> 46us for 20 binds).
Execution of cached statements allocates 50% less garbage.

Before

Benchmark                                           (bindCount)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                               0  avgt   10    42,883 ±    0,519   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           0  avgt   10  1160,077 ±    0,255    B/op
ParseStatement.bindExecuteFetch                               1  avgt   10    48,658 ±    0,483   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           1  avgt   10  1600,087 ±    0,289    B/op
ParseStatement.bindExecuteFetch                              10  avgt   10    64,844 ±    0,657   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          10  avgt   10  3896,114 ±    0,379    B/op
ParseStatement.bindExecuteFetch                              20  avgt   10    78,440 ±    0,772   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          20  avgt   10  6576,138 ±    0,458    B/op

After

Benchmark                                           (bindCount)  Mode  Cnt     Score     Error   Units
ParseStatement.bindExecuteFetch                               0  avgt   10    32,450 ±   0,195   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           0  avgt   10   608,057 ±   0,190    B/op
ParseStatement.bindExecuteFetch                               1  avgt   10    34,650 ±   0,907   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           1  avgt   10   792,061 ±   0,201    B/op
ParseStatement.bindExecuteFetch                              10  avgt   10    41,341 ±   0,315   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          10  avgt   10  1464,073 ±   0,241    B/op
ParseStatement.bindExecuteFetch                              20  avgt   10    46,994 ±   0,352   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          20  avgt   10  2224,083 ±   0,275    B/op

Pathological case when prepareStatement is called with random sql texts.

Test passes, no out of memory happens
Response time is intact
Memory allocation is a bit higher than before 4496 -> 4560

Before

Benchmark                                                         (bindCount)  (unique)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                                            10      true  avgt   10    64,710 ±    0,495   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm                        10      true  avgt   10  4496,114 ±    0,378    B/op

After

Benchmark                                                         (bindCount)  (unique)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                                            10      true  avgt   10    65,289 ±    0,568   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm                        10      true  avgt   10  4560,115 ±    0,384    B/op

This saves a bit of cycles when encoding the query to UTF8 (the whole query can be encoded at once)

This allows to use server-side prepared statements when application uses the same SQL multiple times

stevenschlansker · 2015-06-22T01:06:06Z

This should probably be called toIntArray as it is explicitly converting from a List.

makes sense

perf: cache prepared statements across .prepareStatement calls

davecramer · 2015-06-28T14:50:13Z

did you mean mega bytes ?

Pretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.

Do we want to use this ? Seems a bit esoteric ?

Dave Cramer

On 28 June 2015 at 11:20, Sehrope Sarkuni [email protected] wrote:

In doc/pgjdbc.xml
#319 (comment):

(see <varname>prepareThreshold</varname>) even if the prepared statement is

closed after each execution. The value of 0 disables the cache.

<note>

<para>

Each connection has its own statement cache.

</para>

</note>

</para>

</listitem>

</varlistentry>

<varlistentry>

<term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term>

<listitem>

<para>

Determine the maximum size (in mebibytes) of the prepared queries cache

Pretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.

—
Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33423738.

Well technically it's the correct term for it.

Dave Cramer

On 28 June 2015 at 12:02, Sehrope Sarkuni [email protected] wrote:

In doc/pgjdbc.xml
#319 (comment):

(see <varname>prepareThreshold</varname>) even if the prepared statement is

closed after each execution. The value of 0 disables the cache.

<note>

<para>

Each connection has its own statement cache.

</para>

</note>

</para>

</listitem>

</varlistentry>

<varlistentry>

<term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term>

<listitem>

<para>

Determine the maximum size (in mebibytes) of the prepared queries cache

Well technically it's the correct term for it.

Yes, but not very well known.

—
Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33424088.

sehrope · 2015-06-28T15:37:15Z

Haven't had a chance to go through this yet (skimming through it now) but overall I like the idea.

I wonder if it'd be worth it to have this type of caching done at the Driver level so that it can be shared across multiple connections. That way a pool of connections would share a single cache, even as connections are added/removed from the pool (ex: pool expiration). For that to happen it'd at least have to be thread safe. I don't think there's a direct concurrent replacement for LinkedHashMap so that'd need to be sorted out. Also, we may need separate caches per connection URL as the parsing of a particular query is dependent on connection level settings.

It's a trade off of code complexity vs a wider gain. For a pool of size N using K memory, it'd be (N-1) x K savings so I guess it'd be worth it for a large pool or diverse workload. If it's too complicated (either to implement or we just decide it's too risky) then I wouldn't want to hold off the version of the patch that is only done within a connection. Just something to consider...

sehrope · 2015-06-28T15:42:46Z

One more thought ... how bout an option to not cache queries that are larger than a given size? In most real-world applications the majority of queries are relatively small though every now and then there will be some dynamic SQL that is huge. Usually they're large dynamically generated reporting queries. Although they'd eventually get washed out of the cache, it's pointless to cache them as they're one offs.

At the very least we shouldn't bother trying to cache things that are greater than the max size of the cache. Otherwise one gigantic query would evict all the prior smaller queries (and itself!).

davecramer · 2015-06-28T15:44:39Z

On 28 June 2015 at 11:37, Sehrope Sarkuni [email protected] wrote:

Haven't had a chance to go through this yet (skimming through it now) but
overall I like the idea.

I wonder if it'd be worth it to have this type of caching done at the
Driver level so that it can be shared across multiple connections. That way
a pool of connections would share a single cache, even as connections are
added/removed from the pool (ex: pool expiration). For that to happen it'd
at least have to be thread safe. I don't think there's a direct concurrent
replacement for LinkedHashMap so that'd need to be sorted out. Also, we may
need separate caches per connection URL as the parsing of a particular
query is dependent on connection level settings.

Not sure how you do this as server side statements are per connection ??

It's a trade off of code complexity vs a wider gain. For a pool of size N
using K memory, it'd be (N-1) x K savings so I guess it'd be worth it for a
large pool or diverse workload. If it's too complicated (either to
implement or we just decide it's too risky) then I wouldn't want to hold
off the version of the patch that is only done within a connection. Just
something to consider...

—
Reply to this email directly or view it on GitHub
#319 (comment).

davecramer · 2015-06-28T15:45:03Z

On 28 June 2015 at 11:42, Sehrope Sarkuni [email protected] wrote:

One more thought ... how bout an option to not cache queries that are
larger than a given size? In most real-world applications the majority of
queries are relatively small though every now and then there will be some
dynamic SQL that is huge. Usually they're large dynamically generated
reporting queries. Although they'd eventually get washed out of the cache,
it's pointless to cache them as they're one offs.

At the very least we shouldn't bother trying to cache things that are
greater than the max size of the cache. Otherwise one gigantic query would
evict all the prior smaller queries (and itself!).

good point!

—
Reply to this email directly or view it on GitHub
#319 (comment).

sehrope · 2015-06-28T15:52:37Z

Not sure how you do this as server side statements are per connection ??

I was only thinking about caching the driver side parsing of the sql String. The idea being that there is no need to repeatedly parse the same sql as it'll always have the same bind var positions and native representation (i.e. SELECT ? always goes to SELECT $1). That would improve perf (less parsing) and GC (less parsing means less garbage).

I haven't really gone through the patch and didn't realize that it caches the server side prepared statements and is intended to reuse those as well. Thinking that through a bit, I wonder if the two things can be split out. There's no reason you have to use server side prepared statements to benefit from driver side parse caching right?

vlsi · 2015-06-29T09:25:53Z

I don't think there's a direct concurrent replacement for LinkedHashMap so that'd need to be sorted out

It is not in JDK, but it is does support concurrency and expiration.
Here it is: https://github.com/ben-manes/concurrentlinkedhashmap
Not sure if we need that, but your point is valid.

At the very least we shouldn't bother trying to cache things that are greater than the max size of the cache. Otherwise one gigantic query would evict all the prior smaller queries (and itself!).

I've filed #344 to cover that.

Thinking that through a bit, I wonder if the two things can be split out. There's no reason you have to use server side prepared statements to benefit from driver side parse caching right?

I did intend to split those two items, however I prefer filing another PR to cover "inter-connection" caching of query texts.

Here is the issue: #345

@davecramer, technically I did not expected to get PR merged in as-is while still having non-resolved comments (I was having a good vacation, so I was not very responsive). I'll file another "cleanup" PR.

Make fields of JdbcCallParseInfo final, simplify CallableQueryKey's equals/hashCode

davecramer · 2015-06-29T12:19:11Z

On 29 June 2015 at 05:25, Vladimir Sitnikov [email protected]
wrote:

I don't think there's a direct concurrent replacement for LinkedHashMap so
that'd need to be sorted out

It is not in JDK, but it is does support concurrency and expiration.
Here it is: https://github.com/ben-manes/concurrentlinkedhashmap
Not sure if we need that, but your point is valid.

At the very least we shouldn't bother trying to cache things that are
greater than the max size of the cache. Otherwise one gigantic query would
evict all the prior smaller queries (and itself!).

I've filed #344 #344 to cover
that.

Thinking that through a bit, I wonder if the two things can be split out.
There's no reason you have to use server side prepared statements to
benefit from driver side parse caching right?

I did intend to split those two items, however I prefer filing another PR
to cover "inter-connection" caching of query texts.

Here is the issue: #345 #345

@davecramer https://github.com/davecramer, technically I did not
expected to get PR merged in while still having non-resolved comments. I'll
file another "cleanup" PR.

I'd like to release snapshots. I can wait to release a version until it is
cleaned up. Note that my merging it evoked constructive criticism.

vlsi · 2015-06-29T12:20:14Z

I can wait to release a version until it is cleaned up.

Here's the cleanup PR: #346

davecramer · 2015-06-29T12:24:08Z

Merged, thanks!

Dave Cramer

On 29 June 2015 at 08:20, Vladimir Sitnikov [email protected]
wrote:

I can wait to release a version until it is cleaned up.

Here's the cleanup PR: #346 #346

—
Reply to this email directly or view it on GitHub
#319 (comment).

vlsi force-pushed the cachePreparedStatements branch 4 times, most recently from 95b0ff4 to 280f5e3 Compare June 14, 2015 07:00

vlsi mentioned this pull request Jun 14, 2015

Implement prepared statement cache #308

Closed

vlsi force-pushed the cachePreparedStatements branch 3 times, most recently from 54454c1 to d8c95d4 Compare June 15, 2015 08:21

vlsi added 2 commits June 15, 2015 11:40

perf: parse SQL to a single string, not a array of fragments

4797114

This saves a bit of cycles when encoding the query to UTF8 (the whole query can be encoded at once)

perf: cache parsed statement across .prepareStatement calls

5642abc

This allows to use server-side prepared statements when application uses the same SQL multiple times

vlsi force-pushed the cachePreparedStatements branch from d8c95d4 to 5642abc Compare June 15, 2015 08:40

stevenschlansker reviewed Jun 22, 2015
View reviewed changes

davecramer added a commit that referenced this pull request Jun 28, 2015

Merge pull request #319 from Gordiychuk/cachePreparedStatements

f82be90

perf: cache prepared statements across .prepareStatement calls

davecramer merged commit f82be90 into pgjdbc:master Jun 28, 2015

davecramer reviewed Jun 28, 2015
View reviewed changes

This was referenced Jun 29, 2015

Avoid preparedStatement cache trashing by long rarely-used SQLs #344

Closed

Share parsed query texts across different connections #345

Open

vlsi added a commit to Gordiychuk/pgjdbc that referenced this pull request Jun 29, 2015

refactor: cleanup Parser and CallableQueryKey after pgjdbc#319

5ec7dea

Make fields of JdbcCallParseInfo final, simplify CallableQueryKey's equals/hashCode

vlsi mentioned this pull request Jun 29, 2015

Cleanup prepared statement cache #346

Merged

dfex55 mentioned this pull request Sep 29, 2015

Data corruption caused by prepared statement cache #373

Closed

gmcguire mentioned this pull request Jan 19, 2016

"Cached plan must not change result type" when using PreparedStatement cache across schemas #496

Closed

davidfrankson mentioned this pull request Mar 2, 2017

Implement PreparedStatement caching for performance microsoft/mssql-jdbc#166

Closed

gotson mentioned this pull request Aug 1, 2022

Feature request: PreparedStatement caching at the driver layer xerial/sqlite-jdbc#614

Open

Conversation

vlsi commented Jun 13, 2015

Note: I believe the code is feature complete

Before

After

Before

After

Uh oh!

stevenschlansker Jun 22, 2015

Choose a reason for hiding this comment

Uh oh!

vlsi Jun 23, 2015

Choose a reason for hiding this comment

Uh oh!

davecramer Jun 28, 2015

Choose a reason for hiding this comment

Uh oh!

sehrope Jun 28, 2015

Choose a reason for hiding this comment

Uh oh!

davecramer Jun 28, 2015

Choose a reason for hiding this comment

Uh oh!

sehrope Jun 28, 2015

Choose a reason for hiding this comment

Uh oh!

davecramer Jun 28, 2015

Choose a reason for hiding this comment

Uh oh!

sehrope commented Jun 28, 2015

Uh oh!

sehrope commented Jun 28, 2015

Uh oh!

davecramer commented Jun 28, 2015

Uh oh!

davecramer commented Jun 28, 2015

Uh oh!

sehrope commented Jun 28, 2015

Uh oh!

vlsi commented Jun 29, 2015

Uh oh!

davecramer commented Jun 29, 2015

Uh oh!

vlsi commented Jun 29, 2015

Uh oh!

davecramer commented Jun 29, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants