Database Optimization - PostgreSQL Perfor-
mance Guide
Table of Contents
1. Query Optimization
2. Index Strategies
3. Connection Pooling
4. Partitioning
5. Replication
6. Vacuum and Maintenance
7. Query Planning
8. Monitoring and Analysis
9. Caching Strategies
10. Schema Design
1. Query Optimization
Understanding Query Execution
PostgreSQL executes queries through multiple stages: parsing the SQL state-
ment, analyzing and rewriting it, planning the optimal execution path, and
finally executing the plan. Understanding these stages helps optimize queries
effectively.
The query planner uses statistics about table data distribution to estimate costs
of different execution paths. These statistics are gathered by the ANALYZE
command and stored in system catalogs. Outdated statistics lead to suboptimal
query plans.
Cost-based optimization considers multiple factors: I/O operations required,
CPU processing, memory usage, and network transfer for distributed queries.
The planner estimates costs for different strategies (sequential scans, index scans,
nested loops, hash joins, merge joins) and selects the lowest-cost plan.
EXPLAIN and EXPLAIN ANALYZE
EXPLAIN shows the query execution plan without running the query. It dis-
plays estimated costs, row counts, and operation types. This helps identify
expensive operations before execution.
EXPLAIN SELECT * FROM users WHERE email = '
[email protected]';
EXPLAIN ANALYZE actually executes the query and shows real timing and
row counts. This reveals discrepancies between planner estimates and actual
execution.
1
EXPLAIN ANALYZE SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2023-01-01';
Output includes: operation types (Seq Scan, Index Scan, Hash Join), estimated
vs actual rows, execution time per operation, and total query time. High dis-
crepancies between estimated and actual suggest outdated statistics.
Sequential Scans vs Index Scans
Sequential scans read entire tables row by row. They’re efficient for queries ac-
cessing large portions of a table (>5-10% of rows). They benefit from sequential
I/O and operating system read-ahead.
Index scans use indexes to find specific rows. They’re efficient for selective
queries. However, index scans require random I/O which is slower. Bitmap
index scans combine index efficiency with sequential data retrieval.
The planner chooses between scans based on selectivity. For highly selective
queries (finding few rows), index scans are faster. For low selectivity (finding
many rows), sequential scans are faster despite the index existing.
Join Optimization
PostgreSQL supports multiple join algorithms: nested loop joins, hash joins,
and merge joins. Each has different performance characteristics.
Nested loop joins iterate the outer table and for each row, scan the inner table.
They’re efficient when the inner table has an index on the join column and the
outer table is small.
Hash joins build a hash table from one input (typically the smaller table) and
probe it with rows from the other input. They’re efficient for large tables when
neither has an index, especially for equijoins.
Merge joins require both inputs sorted on the join column. When both tables
have indexes on join columns or query includes ORDER BY, merge joins can
be very efficient.
Join order matters significantly. The planner tries different orders for queries
with multiple joins. Smaller result sets should be computed first to reduce
subsequent join input sizes.
-- PostgreSQL automatically optimizes join order
SELECT u.name, p.title, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN products p ON o.product_id = p.id
WHERE u.country = 'US' AND o.created_at > NOW() - INTERVAL '30 days';
2
Subquery Optimization
Correlated subqueries execute once per outer row and can be slow. They’re
often better expressed as joins.
-- Slow: correlated subquery
SELECT u.name,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) as order_count
FROM users u;
-- Fast: join with aggregation
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name;
Subqueries in WHERE clauses can often be replaced with EXISTS, which short-
circuits when finding first match.
-- Slower
WHERE user_id IN (SELECT id FROM premium_users WHERE active = true)
-- Faster
WHERE EXISTS (SELECT 1 FROM premium_users WHERE id = user_id AND active = true)
Window Functions
Window functions perform calculations across rows related to the current row.
They’re more efficient than self-joins for ranking, running totals, and moving
averages.
SELECT
date,
sales,
SUM(sales) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_7day_to
RANK() OVER (PARTITION BY product_id ORDER BY sales DESC) as sales_rank
FROM daily_sales;
Window functions process rows only once, while equivalent self-joins would scan
tables multiple times.
Common Table Expressions (CTEs)
CTEs improve query readability by breaking complex queries into named sub-
queries. In PostgreSQL 12+, non-recursive CTEs can be inlined and optimized
with the main query.
WITH recent_orders AS (
SELECT user_id, SUM(total) as total_spent
FROM orders
3
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.name, ro.total_spent
FROM users u
JOIN recent_orders ro ON u.id = ro.user_id
WHERE ro.total_spent > 1000;
MATERIALIZED forces CTE evaluation before the main query. This can im-
prove performance when the CTE is referenced multiple times or when prevent-
ing optimization is desired.
WITH MATERIALIZED expensive_calculation AS (
SELECT user_id, complex_aggregate(data)
FROM large_table
GROUP BY user_id
)
SELECT * FROM expensive_calculation WHERE ...;
Limiting Result Sets
Always use LIMIT when fetching partial results, especially for pagination.
LIMIT allows the database to stop scanning once enough rows are found.
-- Pagination
SELECT * FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 100;
However, large OFFSETs are inefficient as the database must scan and skip
those rows. Keyset pagination using WHERE clauses on indexed columns is
more efficient:
-- First page
SELECT * FROM users ORDER BY id DESC LIMIT 20;
-- Next page (assuming last id from previous page was 9523)
SELECT * FROM users WHERE id < 9523 ORDER BY id DESC LIMIT 20;
2. Index Strategies
B-Tree Indexes
B-Tree indexes are the default and most common index type. They support
equality and range queries on sortable data types. B-Trees maintain sorted
order, enabling efficient ORDER BY and range scans.
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created ON orders(created_at);
4
B-Tree indexes work left-to-right. Multi-column indexes can be used for queries
filtering on prefix columns:
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Uses index
SELECT * FROM orders WHERE user_id = 123 AND created_at > '2023-01-01';
-- Uses index (only first column)
SELECT * FROM orders WHERE user_id = 123;
-- Cannot use index efficiently
SELECT * FROM orders WHERE created_at > '2023-01-01';
Partial Indexes
Partial indexes index only rows matching a condition. They’re smaller and
faster for queries targeting that subset.
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
This index is only used for queries with WHERE active = true. It’s much
smaller than indexing all users when most users are inactive.
Expression Indexes
Expression indexes index the result of expressions or functions. They enable
indexed lookups for transformed values.
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Uses index
SELECT * FROM users WHERE LOWER(email) = '[email protected]';
Expression indexes are useful for case-insensitive searches, date extractions,
JSON field access, and calculated fields.
Covering Indexes
Covering indexes include all columns needed by a query, allowing index-only
scans without accessing the table. This dramatically reduces I/O.
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name);
-- Index-only scan possible
SELECT name FROM users WHERE email = '
[email protected]';
The INCLUDE clause adds columns to the index without making them part of
the index structure. They’re available for index-only scans but don’t contribute
to index ordering.
5
GIN Indexes
Generalized Inverted Indexes (GIN) index composite values: arrays, full-text
search, JSONB. They’re efficient for containment and text search queries.
CREATE INDEX idx_products_tags ON products USING GIN(tags);
-- Efficient array containment query
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'sale'];
For JSONB:
CREATE INDEX idx_users_metadata ON users USING GIN(metadata);
-- Efficient JSON queries
SELECT * FROM users WHERE metadata @> '{"premium": true}';
SELECT * FROM users WHERE metadata ? 'vip_since';
GiST Indexes
Generalized Search Tree (GiST) indexes support geometric data, full-text search,
and custom types. They enable range queries on complex types.
CREATE INDEX idx_locations_point ON locations USING GIST(coordinates);
-- Spatial queries
SELECT * FROM locations WHERE coordinates <-> point(40.7128, -74.0060) < 10;
BRIN Indexes
Block Range Indexes (BRIN) are extremely small indexes for large tables with
natural clustering (like time-series data). They store min/max values for physi-
cal blocks.
CREATE INDEX idx_logs_created ON logs USING BRIN(created_at);
BRIN indexes are much smaller than B-Tree indexes but less precise. They’re
ideal for append-only tables with natural time ordering.
Index Maintenance
Indexes require maintenance to remain efficient. UPDATE and DELETE oper-
ations can leave dead index entries. VACUUM reclaims this space.
-- Rebuild index to eliminate bloat
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on a table
REINDEX TABLE users;
Large indexes can be rebuilt concurrently without blocking writes:
6
CREATE INDEX CONCURRENTLY idx_new ON users(email);
DROP INDEX idx_old;
Index Monitoring
Monitor index usage to identify unused indexes that waste space and slow writes:
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
Zero idx_scan indicates the index has never been used and can likely be
dropped.
3. Connection Pooling
Why Connection Pooling
Database connections are expensive to create: TCP handshake, authentication,
session initialization. Creating connections for each request is wasteful. Con-
nection pools maintain reusable connections.
Without pooling, applications create and destroy connections repeatedly. With
pooling, connections are borrowed from the pool, used, and returned for reuse.
PgBouncer
PgBouncer is a lightweight connection pooler for PostgreSQL. It maintains a
pool of connections to the database and multiplexes client connections onto
them.
PgBouncer supports three pooling modes:
Session pooling: connections assigned to clients for session duration. Lowest
overhead but doesn’t reduce connection count much.
Transaction pooling: connections returned after each transaction. Most
efficient, but incompatible with session features (prepared statements, LIS-
TEN/NOTIFY, temporary tables).
Statement pooling: connections returned after each statement. Maximum effi-
ciency but breaks multi-statement transactions.
7
Configuration example:
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 5
This configuration allows 1000 client connections sharing 25 database connec-
tions.
Application-Level Pooling
Many application frameworks include connection pooling. SQLAlchemy in
Python:
from sqlalchemy import create_engine
engine = create_engine(
'postgresql://user:pass@localhost/db',
pool_size=20,
max_overflow=10,
pool_timeout=30,
pool_recycle=3600,
)
Pool size is tuned based on: concurrent request load, available database con-
nections, database server capacity. Too few connections cause contention; too
many exhaust database resources.
Connection Pool Sizing
Optimal pool size depends on workload characteristics. For CPU-bound work-
loads, pool size should match CPU core count. For I/O-bound workloads, larger
pools improve throughput.
A common formula: pool_size = (core_count * 2) + effective_spindle_count
For cloud databases, consult provider recommendations. Over-provisioning
wastes resources; under-provisioning causes request queuing.
Connection Health Checks
Pooled connections can become stale: network issues, database restarts, firewalls
dropping idle connections. Health checks detect and remove stale connections.
8
engine = create_engine(
'postgresql://user:pass@localhost/db',
pool_pre_ping=True, # Test connection before use
)
Health checks add latency to connection acquisition but prevent errors from
stale connections.
Connection Lifecycle
Connections have lifecycle hooks: on acquisition, before return, on timeout.
These enable custom logic like setting session variables or logging slow queries.
from sqlalchemy import event
@event.listens_for(engine, "connect")
def receive_connect(dbapi_conn, connection_record):
# Set session parameters
cursor = dbapi_conn.cursor()
cursor.execute("SET statement_timeout = '30s'")
cursor.close()
4. Partitioning
What is Partitioning
Partitioning splits large tables into smaller physical pieces while maintaining a
single logical table. Queries can target specific partitions, scanning less data.
Partitioning improves: query performance for queries targeting specific parti-
tions, bulk loading/deleting (drop partition instead of DELETE), and index
sizes (smaller indexes per partition).
Range Partitioning
Range partitioning divides data by value ranges, typically time-based:
CREATE TABLE measurements (
id SERIAL,
measured_at TIMESTAMP NOT NULL,
value NUMERIC
) PARTITION BY RANGE (measured_at);
CREATE TABLE measurements_2023_01 PARTITION OF measurements
FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
CREATE TABLE measurements_2023_02 PARTITION OF measurements
9
FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
CREATE INDEX ON measurements_2023_01 (measured_at);
CREATE INDEX ON measurements_2023_02 (measured_at);
Queries filtering on measured_at only scan relevant partitions. Partition prun-
ing eliminates scanning irrelevant partitions.
List Partitioning
List partitioning assigns specific values to partitions:
CREATE TABLE users (
id SERIAL,
country VARCHAR(2),
name VARCHAR(100)
) PARTITION BY LIST (country);
CREATE TABLE users_us PARTITION OF users
FOR VALUES IN ('US');
CREATE TABLE users_uk PARTITION OF users
FOR VALUES IN ('UK');
CREATE TABLE users_other PARTITION OF users
DEFAULT;
Hash Partitioning
Hash partitioning distributes data evenly across partitions:
CREATE TABLE orders (
id SERIAL,
user_id INTEGER,
total NUMERIC
) PARTITION BY HASH (user_id);
CREATE TABLE orders_p0 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE orders_p1 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE orders_p2 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE orders_p3 PARTITION OF orders
FOR VALUES WITH (MODULUS 4, REMAINDER 3);
10
Hash partitioning doesn’t enable partition pruning but distributes load evenly
and enables partition-wise operations.
Partition Maintenance
New partitions are created as needed. For time-series data, automate partition
creation:
CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS void AS $$
DECLARE
partition_date DATE := DATE_TRUNC('month', NOW() + INTERVAL '1 month');
partition_name TEXT := 'measurements_' || TO_CHAR(partition_date, 'YYYY_MM');
BEGIN
EXECUTE FORMAT('
CREATE TABLE IF NOT EXISTS %I PARTITION OF measurements
FOR VALUES FROM (%L) TO (%L)',
partition_name,
partition_date,
partition_date + INTERVAL '1 month'
);
END;
$$ LANGUAGE plpgsql;
Old partitions can be detached and archived:
-- Detach partition
ALTER TABLE measurements DETACH PARTITION measurements_2022_01;
-- Archive or drop
DROP TABLE measurements_2022_01;
5. Replication
Streaming Replication
Streaming replication continuously streams Write-Ahead Log (WAL) records
from primary to standby servers. Standby servers replay WAL to maintain
synchronized copies.
Primary configuration in postgresql.conf:
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
Standby configuration:
11
hot_standby = on
Create replication user on primary:
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';
Configure standby recovery:
# standby.signal file presence enables standby mode
primary_conninfo = 'host=primary_host port=5432 user=replicator password=secret'
Synchronous vs Asynchronous Replication
Asynchronous replication doesn’t wait for standby confirmation. Primary com-
mits transactions immediately. This maximizes performance but risks data loss
if primary fails before WAL reaches standby.
Synchronous replication waits for at least one standby to confirm WAL receipt
before committing. This guarantees zero data loss but adds latency.
synchronous_commit = on
synchronous_standby_names = 'standby1,standby2'
Logical Replication
Logical replication replicates data changes at logical level (INSERT, UPDATE,
DELETE) rather than physical WAL. This enables: selective replication (spe-
cific tables), cross-version replication, and bi-directional replication.
Create publication on source:
CREATE PUBLICATION my_pub FOR TABLE users, orders;
Create subscription on destination:
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=source_host dbname=mydb user=replicator password=secret'
PUBLICATION my_pub;
Logical replication enables zero-downtime migrations and multi-master setups.
Read Replicas
Standby servers can serve read queries (hot standby), distributing read load
across multiple servers. Applications route reads to replicas and writes to pri-
mary.
Replicas have replication lag: delay between primary commit and replica ap-
plication. Applications must tolerate eventual consistency or use synchronous
replication for critical reads.
Monitor replication lag:
12
SELECT
pg_last_wal_receive_lsn() AS receive_lsn,
pg_last_wal_replay_lsn() AS replay_lsn,
pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn()) AS lag_bytes,
NOW() - pg_last_xact_replay_timestamp() AS lag_time;
6. Vacuum and Maintenance
MVCC and Dead Tuples
PostgreSQL uses Multi-Version Concurrency Control (MVCC). Updates create
new row versions rather than modifying in place. Old versions become “dead
tuples” after all transactions can no longer see them.
Dead tuples waste space and slow queries. VACUUM reclaims space from dead
tuples and updates statistics.
Autovacuum
Autovacuum automatically runs VACUUM and ANALYZE on tables. It’s con-
figured per-table or globally:
# postgresql.conf
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
Autovacuum runs when: threshold + (scale_factor * table_size) dead tuples
exist.
For frequently updated tables, more aggressive settings prevent bloat:
ALTER TABLE users SET (
autovacuum_vacuum_scale_factor = 0.05,
autovacuum_analyze_scale_factor = 0.02
);
Manual VACUUM
Manual VACUUM is needed for: bulk operations producing many dead tuples,
before critical queries requiring current statistics, or when autovacuum can’t
keep up.
-- Standard vacuum
VACUUM users;
13
-- Vacuum and analyze
VACUUM ANALYZE users;
-- Full vacuum (rewrites table, locks exclusively, reclaims all space)
VACUUM FULL users;
VACUUM FULL is rarely needed and very disruptive. It should be run during
maintenance windows.
ANALYZE
ANALYZE updates planner statistics. Accurate statistics enable optimal query
plans. ANALYZE samples table data and stores distribution statistics.
ANALYZE users;
After bulk data changes, ANALYZE ensures the planner knows about new data
distribution.
Reindexing
B-Tree indexes accumulate bloat from updates. REINDEX rebuilds indexes to
eliminate bloat:
REINDEX TABLE users;
REINDEX INDEX idx_users_email;
For production systems, use CONCURRENTLY to avoid locking:
REINDEX TABLE CONCURRENTLY users;
This comprehensive guide covers PostgreSQL optimization from query tuning to
infrastructure. Regular monitoring, measurement, and iterative improvement
are key to maintaining optimal database performance as data and workloads
grow.
14