Skip to content

perf: cache prepared statements across .prepareStatement calls#319

Merged
davecramer merged 2 commits intopgjdbc:masterfrom
Gordiychuk:cachePreparedStatements
Jun 28, 2015
Merged

perf: cache prepared statements across .prepareStatement calls#319
davecramer merged 2 commits intopgjdbc:masterfrom
Gordiychuk:cachePreparedStatements

Conversation

@vlsi
Copy link
Copy Markdown
Member

@vlsi vlsi commented Jun 13, 2015

This allows to use server-side prepared statements when application uses the same SQL multiple times.

Major difference from existing setPrepareThreshold is with current fix application does not have to reference exactly the same PreparedStatement. When a new statement is prepared, it is checked against existing statements in the cache. If there is a match, parsing result and server-side named query (!) are reused.

Note: this fix comes on top of #311.

Note: I believe the code is feature complete

Cache size is configurable. It defaults to 256 queries and 5 MiB maximum per connection.
Feel free to review, add your comments, and merge.

Some measurements:
On a simple select ?, ?, ?... the response time is 20-40% better (78us -> 46us for 20 binds).
Execution of cached statements allocates 50% less garbage.

Before

Benchmark                                           (bindCount)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                               0  avgt   10    42,883 ±    0,519   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           0  avgt   10  1160,077 ±    0,255    B/op
ParseStatement.bindExecuteFetch                               1  avgt   10    48,658 ±    0,483   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           1  avgt   10  1600,087 ±    0,289    B/op
ParseStatement.bindExecuteFetch                              10  avgt   10    64,844 ±    0,657   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          10  avgt   10  3896,114 ±    0,379    B/op
ParseStatement.bindExecuteFetch                              20  avgt   10    78,440 ±    0,772   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          20  avgt   10  6576,138 ±    0,458    B/op

After

Benchmark                                           (bindCount)  Mode  Cnt     Score     Error   Units
ParseStatement.bindExecuteFetch                               0  avgt   10    32,450 ±   0,195   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           0  avgt   10   608,057 ±   0,190    B/op
ParseStatement.bindExecuteFetch                               1  avgt   10    34,650 ±   0,907   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm           1  avgt   10   792,061 ±   0,201    B/op
ParseStatement.bindExecuteFetch                              10  avgt   10    41,341 ±   0,315   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          10  avgt   10  1464,073 ±   0,241    B/op
ParseStatement.bindExecuteFetch                              20  avgt   10    46,994 ±   0,352   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm          20  avgt   10  2224,083 ±   0,275    B/op

Pathological case when prepareStatement is called with random sql texts.

  1. Test passes, no out of memory happens
  2. Response time is intact
  3. Memory allocation is a bit higher than before 4496 -> 4560

Before

Benchmark                                                         (bindCount)  (unique)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                                            10      true  avgt   10    64,710 ±    0,495   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm                        10      true  avgt   10  4496,114 ±    0,378    B/op

After

Benchmark                                                         (bindCount)  (unique)  Mode  Cnt     Score      Error   Units
ParseStatement.bindExecuteFetch                                            10      true  avgt   10    65,289 ±    0,568   us/op
ParseStatement.bindExecuteFetch:·gc.alloc.rate.norm                        10      true  avgt   10  4560,115 ±    0,384    B/op

@vlsi vlsi force-pushed the cachePreparedStatements branch 4 times, most recently from 95b0ff4 to 280f5e3 Compare June 14, 2015 07:00
@vlsi vlsi force-pushed the cachePreparedStatements branch 3 times, most recently from 54454c1 to d8c95d4 Compare June 15, 2015 08:21
vlsi added 2 commits June 15, 2015 11:40
This saves a bit of cycles when encoding the query to UTF8 (the whole query can be encoded at once)
This allows to use server-side prepared statements when application uses the same SQL multiple times
@vlsi vlsi force-pushed the cachePreparedStatements branch from d8c95d4 to 5642abc Compare June 15, 2015 08:40
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be called toIntArray as it is explicitly converting from a List.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

davecramer added a commit that referenced this pull request Jun 28, 2015
perf: cache prepared statements across .prepareStatement calls
@davecramer davecramer merged commit f82be90 into pgjdbc:master Jun 28, 2015
Comment thread doc/pgjdbc.xml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean mega bytes ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use this ? Seems a bit esoteric ?

Dave Cramer

On 28 June 2015 at 11:20, Sehrope Sarkuni [email protected] wrote:

In doc/pgjdbc.xml
#319 (comment):

  •     (see <varname>prepareThreshold</varname>) even if the prepared statement is
    
  •     closed after each execution. The value of 0 disables the cache.
    
  •     <note>
    
  •      <para>
    
  •       Each connection has its own statement cache.
    
  •      </para>
    
  •     </note>
    
  •    </para>
    
  •   </listitem>
    
  •  </varlistentry>
    
  •  <varlistentry>
    
  •   <term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term>
    
  •   <listitem>
    
  •    <para>
    
  •     Determine the maximum size (in mebibytes) of the prepared queries cache
    

Pretty sure he does mean mebibytes. It explicitly means 2^20 vs 10^6.


Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33423738.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well technically it's the correct term for it.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dave Cramer

On 28 June 2015 at 12:02, Sehrope Sarkuni [email protected] wrote:

In doc/pgjdbc.xml
#319 (comment):

  •     (see <varname>prepareThreshold</varname>) even if the prepared statement is
    
  •     closed after each execution. The value of 0 disables the cache.
    
  •     <note>
    
  •      <para>
    
  •       Each connection has its own statement cache.
    
  •      </para>
    
  •     </note>
    
  •    </para>
    
  •   </listitem>
    
  •  </varlistentry>
    
  •  <varlistentry>
    
  •   <term><varname>preparedStatementCacheSizeMiB</varname> = <type>int</type></term>
    
  •   <listitem>
    
  •    <para>
    
  •     Determine the maximum size (in mebibytes) of the prepared queries cache
    

Well technically it's the correct term for it.

Yes, but not very well known.


Reply to this email directly or view it on GitHub
https://github.com/pgjdbc/pgjdbc/pull/319/files#r33424088.

@sehrope
Copy link
Copy Markdown
Member

sehrope commented Jun 28, 2015

Haven't had a chance to go through this yet (skimming through it now) but overall I like the idea.

I wonder if it'd be worth it to have this type of caching done at the Driver level so that it can be shared across multiple connections. That way a pool of connections would share a single cache, even as connections are added/removed from the pool (ex: pool expiration). For that to happen it'd at least have to be thread safe. I don't think there's a direct concurrent replacement for LinkedHashMap so that'd need to be sorted out. Also, we may need separate caches per connection URL as the parsing of a particular query is dependent on connection level settings.

It's a trade off of code complexity vs a wider gain. For a pool of size N using K memory, it'd be (N-1) x K savings so I guess it'd be worth it for a large pool or diverse workload. If it's too complicated (either to implement or we just decide it's too risky) then I wouldn't want to hold off the version of the patch that is only done within a connection. Just something to consider...

@sehrope
Copy link
Copy Markdown
Member

sehrope commented Jun 28, 2015

One more thought ... how bout an option to not cache queries that are larger than a given size? In most real-world applications the majority of queries are relatively small though every now and then there will be some dynamic SQL that is huge. Usually they're large dynamically generated reporting queries. Although they'd eventually get washed out of the cache, it's pointless to cache them as they're one offs.

At the very least we shouldn't bother trying to cache things that are greater than the max size of the cache. Otherwise one gigantic query would evict all the prior smaller queries (and itself!).

@davecramer
Copy link
Copy Markdown
Member

On 28 June 2015 at 11:37, Sehrope Sarkuni [email protected] wrote:

Haven't had a chance to go through this yet (skimming through it now) but
overall I like the idea.

I wonder if it'd be worth it to have this type of caching done at the
Driver level so that it can be shared across multiple connections. That way
a pool of connections would share a single cache, even as connections are
added/removed from the pool (ex: pool expiration). For that to happen it'd
at least have to be thread safe. I don't think there's a direct concurrent
replacement for LinkedHashMap so that'd need to be sorted out. Also, we may
need separate caches per connection URL as the parsing of a particular
query is dependent on connection level settings.

Not sure how you do this as server side statements are per connection ??

It's a trade off of code complexity vs a wider gain. For a pool of size N
using K memory, it'd be (N-1) x K savings so I guess it'd be worth it for a
large pool or diverse workload. If it's too complicated (either to
implement or we just decide it's too risky) then I wouldn't want to hold
off the version of the patch that is only done within a connection. Just
something to consider...


Reply to this email directly or view it on GitHub
#319 (comment).

@davecramer
Copy link
Copy Markdown
Member

On 28 June 2015 at 11:42, Sehrope Sarkuni [email protected] wrote:

One more thought ... how bout an option to not cache queries that are
larger than a given size? In most real-world applications the majority of
queries are relatively small though every now and then there will be some
dynamic SQL that is huge. Usually they're large dynamically generated
reporting queries. Although they'd eventually get washed out of the cache,
it's pointless to cache them as they're one offs.

At the very least we shouldn't bother trying to cache things that are
greater than the max size of the cache. Otherwise one gigantic query would
evict all the prior smaller queries (and itself!).

good point!


Reply to this email directly or view it on GitHub
#319 (comment).

@sehrope
Copy link
Copy Markdown
Member

sehrope commented Jun 28, 2015

Not sure how you do this as server side statements are per connection ??

I was only thinking about caching the driver side parsing of the sql String. The idea being that there is no need to repeatedly parse the same sql as it'll always have the same bind var positions and native representation (i.e. SELECT ? always goes to SELECT $1). That would improve perf (less parsing) and GC (less parsing means less garbage).

I haven't really gone through the patch and didn't realize that it caches the server side prepared statements and is intended to reuse those as well. Thinking that through a bit, I wonder if the two things can be split out. There's no reason you have to use server side prepared statements to benefit from driver side parse caching right?

@vlsi
Copy link
Copy Markdown
Member Author

vlsi commented Jun 29, 2015

I don't think there's a direct concurrent replacement for LinkedHashMap so that'd need to be sorted out

It is not in JDK, but it is does support concurrency and expiration.
Here it is: https://github.com/ben-manes/concurrentlinkedhashmap
Not sure if we need that, but your point is valid.

At the very least we shouldn't bother trying to cache things that are greater than the max size of the cache. Otherwise one gigantic query would evict all the prior smaller queries (and itself!).

I've filed #344 to cover that.

Thinking that through a bit, I wonder if the two things can be split out. There's no reason you have to use server side prepared statements to benefit from driver side parse caching right?

I did intend to split those two items, however I prefer filing another PR to cover "inter-connection" caching of query texts.

Here is the issue: #345

@davecramer, technically I did not expected to get PR merged in as-is while still having non-resolved comments (I was having a good vacation, so I was not very responsive). I'll file another "cleanup" PR.

vlsi added a commit to Gordiychuk/pgjdbc that referenced this pull request Jun 29, 2015
Make fields of JdbcCallParseInfo final, simplify CallableQueryKey's equals/hashCode
@davecramer
Copy link
Copy Markdown
Member

On 29 June 2015 at 05:25, Vladimir Sitnikov [email protected]
wrote:

I don't think there's a direct concurrent replacement for LinkedHashMap so
that'd need to be sorted out

It is not in JDK, but it is does support concurrency and expiration.
Here it is: https://github.com/ben-manes/concurrentlinkedhashmap
Not sure if we need that, but your point is valid.

At the very least we shouldn't bother trying to cache things that are
greater than the max size of the cache. Otherwise one gigantic query would
evict all the prior smaller queries (and itself!).

I've filed #344 #344 to cover
that.

Thinking that through a bit, I wonder if the two things can be split out.
There's no reason you have to use server side prepared statements to
benefit from driver side parse caching right?

I did intend to split those two items, however I prefer filing another PR
to cover "inter-connection" caching of query texts.

Here is the issue: #345 #345

@davecramer https://github.com/davecramer, technically I did not
expected to get PR merged in while still having non-resolved comments. I'll
file another "cleanup" PR.

I'd like to release snapshots. I can wait to release a version until it is
cleaned up. Note that my merging it evoked constructive criticism.

@vlsi
Copy link
Copy Markdown
Member Author

vlsi commented Jun 29, 2015

I can wait to release a version until it is cleaned up.

Here's the cleanup PR: #346

@davecramer
Copy link
Copy Markdown
Member

Merged, thanks!

Dave Cramer

On 29 June 2015 at 08:20, Vladimir Sitnikov [email protected]
wrote:

I can wait to release a version until it is cleaned up.

Here's the cleanup PR: #346 #346


Reply to this email directly or view it on GitHub
#319 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants