-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
In today's ADO.NET, connection pooling is purely an internal concern of ADO providers - no pooling API is exposed to the user in any way. It seems to be an unwritten contract that providers are supposed to pool by default, and disable pooling if the Pooling=false connection string parameter is used. The following is a (too lengthy) analysis of possible directions to get the conversation started.
Disadvantages of the current model
- Since the pool isn't exposed,
DbConnection.Open()must look up the pool internally, keyed on the connection string. A naïve implementation using a dictionary would add a dictionary string lookup for each pooled open, which is a significant overhead. - Pooling has to be reimplemented (efficiently!) by each provider.
- The provider-implemented pool is forced upon the user - not really possible to have competing pool implementations.
- Since pooling is internal to the provider, DbConnection is typically a lightweight façade to a more heavyweight internal "physical connection" class. The relationship and bookkeeping between these two objects adds complexity (and possibly some perf hit). The DbConnection façade is also newed-up and disposed, adding a needless allocation.
Option 1: An ADO.NET user-facing pool API
The connection string lookup imposed by the internal pooling mechanism could be mitigating by adding a simple, user-facing API. User code would look something like this:
var factory = DbProviderFactories.GetFactory(providerName);
var connFactory = factory.GetConnectionFactory(connectionString);
using (var conn = connFactory.GetConnection()) { ... }Regarding the naming, instead of GetConnectionFactory():
- We could have
GetPool(), although seems to imply that all providers implement pooling, but some shouldn't (e.g. in-memory DBs or sqlite) - We could have
GetDatabase(), although databases and pools aren't the same thing (e.g. you can have multiple pools for the same database0. - Maybe
GetConnectionProvider()? - ConnectionFactory is abstract enough that it could be used for other purposes (e.g. a wrapper of other connection factories which returns connections that allow interception of certain events, or whatever).
- JDBC calls it
DataSourcewhich isn't too bad IMHO.
Option 2: A complete pooling API (not just user facing)
The above only provides a user-facing abstraction which allows avoiding the lookup (and possibly opens up possibility for composing connection provider), but the pool itself is still implemented by each provider. We could go further and provide a more complete abstraction that allows writing connection pools. This would allow pooling implementations which aren't connected to any specific provider (and which could be used with any provider). Users would be able to select a 3rd-party pooling implementation which fits their specific application needs, rather than being locked into a single, provider-implemented pool.
This mainly has to do with defining standard public APIs for communication between the ADO provider and the pool in use. For example, when a connection is returned to the pool its state may need to be reset in a provider-specific, the DbConnection would have to expose that API. There may be other things as well.
If we go down this route, we could also optionally provide a provider-independent, highly efficient connection pool that could be used instead of the providers' ones.
Comparison with JDBC
JDBC has had competing pool implementations for a very long time, which can be used with any JDBC database provider. A nice inspiration is https://brettwooldridge.github.io/HikariCP/, which claims to be the highest-performing pool implementation. The JDBC API includes several abstractions for manaing pooling (ConnectionPoolDataSource, PooledConnection), although more research is needed to understand exactly how the pooling model operates there.
Things to keep in mind
- Connections enlisted in a TransactionScope and which are closed before the TransactionScope is disposed: they cannot be reused until the transaction commits/rolls back, and need to be returned if the user opens another connection with the same database and transaction to avoid escalation to a distributed transaction. This would all have some impact on the pooling.
- Including/omitting authentication information in the key. When using "integrated security" the username can be omitted, I know SqlServer has some provisions for including that information alongside the connection string in the key.
- Ability to include some arbitrary data on the pool. For example, Npgsql caches the password-less connection string on the pool (
Persist Security Info=false). There could be other things.
/cc @anpete @ajcvickers @divega @davidfowl
Edit by @roji 15/3: Added "things to keep in mind" with some new points.