0% found this document useful (0 votes)

108 views42 pages

PostgreSQL Distributed Architectures and Best Practices

Uploaded by

Ethan Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views42 pages

PostgreSQL Distributed Architectures and Best Practices

Uploaded by

Ethan Chen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

PostgreSQL Distributed

Architectures & Best Practices

Marco Slot - [email protected]

Formerly: founding engineer at Citus Data, architect at Microsoft
Today’s talk on PostgreSQL Distributed
Many distributed database talks discuss algorithms for distributed query planning,
transactions, etc.

In distributed systems, trade-offs are more important than algorithms.

Vendors and even many papers rarely talk about trade-offs.

Many different PostgreSQL distributed system architectures with different trade-offs exist.

Experiment: Discuss PostgreSQL distributed systems architecture trade-offs by example.

Single machine PostgreSQL
PostgreSQL on a single machine can be incredibly fast

No network latency
Millions of IOPS
Microsecond disk latency
Low cost / fast hardware
Can co-locate application server
Single machine PostgreSQL?
PostgreSQL on a single machine comes with operational hazards

Machine/DC failure (downtime)

Disk failure (data loss)
System overload (difficult to scale)
Disk full (downtime)
PostgreSQL Distributed (in the cloud)
Fixing the operational hazards of single machine PostgreSQL requires a distributed set up.

The cloud enables flexible distributed set ups, with resources shared between customers for
high efficiency and resiliency.
Goals of distributed database architecture
Goal: Offer same functionality and transactional semantics as single node
RDBMS, with superior properties

Mechanisms: Replication - Place copies of data on different machines

Distribution - Place partitions of data on different machines
Decentralization - Place different DBMS activities on different machines

Reality: Concessions in terms of performance, transactional semantics,

functionality, and/or operational complexity
PostgreSQL Distributed Layers
Distributed architectures can hook in at different layers — many are orthogonal!

Client Manual sharding, load balancing, write to multiple endpoints

Pooler Load balancing and sharding (e.g. pgbouncer, pgcat)
Query engine Transparent sharding (e.g. Citus, Aurora limitless), DSQL
Logical data layer Active-active, federation (e.g. BDR, postgres_fdw)
Storage manager DBMS-optimized cloud storage (e.g. Aurora, Neon)
Data files, WAL Read replicas, hot standby
Disk Cloud block storage (e.g. Amazon EBS, Azure Premium SSD)
Practical view of Distributed PostgreSQL
Today we will cover:
• Network-attached block storage
• Read replicas
• DBMS-optimized cloud storage
• Transparent Sharding
• Active-active deployments
• Distributed key-value stores with SQL
Two questions:

1) What are the trade-offs?

Latency, Efficiency, Cost, Scalability, Availability, Consistency, Complexity, …

2) For which workloads?

Lookups, analytical queries, small updates, large transforms, batch loads, …
The perils of latency: Synchronous protocol
Transactions are performed step-by-step on each session.

Client BEGIN; PostgreSQL

SELECT
may need to read from disk
UPDATE
write to the heap
(asynchronously flushed to disk)
COMMIT;

write to write ahead log

(synchronously flushed to disk)

time Locks!

Max throughput per session = 1 / avg. response time

The perils of latency: Connection limits
Max overall throughput: #sessions / avg.response time

Application

Application PostgreSQL

Application

Number of connections limited by app architecture Number of processes limited by memory, contention
Network-attached
block storage
Network-attached block storage

PostgreSQL

VM
Hypervisor
Multi-tenant
Network

Block Storage API

Single AZ/DC
Network-attached storage
Pros:
Higher durability (replication)
Higher uptime (replace VM, reattach)
Fast backups and replica creation (snapshots)
Disk is resizable

General guideline:
Cons: Always use, durability &
Higher disk latency (~20μs -> ~1000μs) availability are more
Lower IOPS (~1M -> ~10k IOPS) important than performance.
Crash recovery on restart takes time
Cost can be high
Read replicas
Read replicas
Readable replicas can help you scale read throughput, reduce latency through cross-region
replication, improve availability through auto-failover.

PostgreSQL
(replica)

PostgreSQL Physical replication (data files + WAL)

(primary)

PostgreSQL
(replica)
Scaling read throughput
Readable replicas can help you scale read throughput (when reads are CPU or I/O
bottlenecked) by load balancing queries across replicas.

PostgreSQL
(replica)
?
PostgreSQL
Client
(primary)
Load
Balancing PostgreSQL
Client
(replica)
(Several options)

.. Scale out …
Eventual read-your-writes consistency
Read replicas can be behind on the primary, cannot always read your writes.

SELECT .. FROM shopping_cart Replica A

(lsn=8) INSERT INTO shopping_cart

Load PostgreSQL
Client
Client Balancing (lsn=9)
Replica B
(lsn=7)
No monotonic read consistency
Load-balancing across read replicas will cause you to go back-and-forth in time.

Replica A
SELECT count(*) 1
(lsn=9)
1 3
INSERT
2 Load PostgreSQL
Client
Client 3 Balancing 2 (lsn=9)
Replica B
(lsn=7)
Poor cache usage
If all replicas are equal, they all have the same stuff in cache

Replica A
SELECT .. WHERE id = 1
(id=1, id=2, …)

Load PostgreSQL
Client Balancing (primary)
Replica B
(id=1, id=2, …)
SELECT .. WHERE id = 2

If working set >> memory, all replicas get bottlenecked on disk I/O.
Read scaling trade-offs
Pros:
Read throughput scales linearly
Low latency stale reads if read replica is closer than primary
Lower load on primary

Cons: General guideline:

Eventual read-your-writes consistency Consider at >100k reads/sec
No monotonic read consistency or heavy CPU bottleneck, but
avoid for dependent
Poor cache usage
transactions and large working
sets.
DBMS-optimized
storage
Like Aurora, Neon, AlloyDB
DBMS-optimized storage
Cloud storage that can perform background page writes autonomously, which saves on
write I/O from primary. Also optimized for other DBMS needs (e.g. read replicas).

PostgreSQL PostgreSQL PostgreSQL

(primary) (primary) (replica)

WAL pages WAL pages

Block Storage API

Regular cloud storage DBMS-optimized

DBMS-optimized storage trade-offs
Pros:
Potential performance benefits by avoiding page writes from primary
No long crash recovery
Replicas can reuse storage, incl. hot standby
Less rigid than network-attached storage implementations (faster reattach, branching, ...)

Cons: General guideline:

Write latency is high by default Consider using for complex
High cost / pricing workloads, but measure
whether price-performance
PostgreSQL is not designed for it under load is better than a
bigger machine.
Transparent
sharding
Like Citus
Transparent sharding
Distribute tables by a shard key and/or replicate tables across multiple (primary) nodes.
Queries & transactions are transparently routed / parallelized.
Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6

Tables can be co-located to enable local joins, foreign keys, etc. by the shard key.
Single shard queries for operational workloads
Scale capacity for handling a high rate of single shard key queries:
insert into items (user_id, …) values (123, …);

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items
insert into i4 …

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6

Per-statement latency can be a bottleneck!

Data loading in sharded system
Pipelining through COPY can make data loading a lot more efficient and scalable
COPY items FROM STDIN WITH (format 'csv')

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items
COPY

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6
Compute-heavy queries
Compute-heavy queries (shard key joins, json, vector, …) get the most relative benefit
select compute_stuff(…) from users join items using (user_id) where user_id = 123 …

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items
select

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6
Multi-shard queries for analytical workloads
Parallel multi-shard queries can quickly answer analytical queries across shard keys:
select country, count(*) from items, users where … group by 1 order by 2 desc limit 10;

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6
Multi-shard queries for operational workloads
Multi-shard queries add significant overhead for simple non-shard-key queries
select * from items where item_id = 87;

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6
Multi-shard queries for analytical workloads
Snapshot isolation is a challenge (involves trade-offs):
select country, count(*) from items, users where … group by 1 order by 2 desc limit 10;

Load balancer

PostgreSQL PostgreSQL PostgreSQL

(primary coordinator) (primary) (primary)
users items users items users items

u1 u4 u2 u5 u3 u6
i1 i4 i2 i5 i3 i6

↔ BEGIN;
← INSERT INTO items VALUES (123, …);
→ INSERT INTO items VALUES (456, …);
↔ COMMIT;
Sharding trade-offs
Pros:
Scale throughput for reads & writes (CPU & IOPS)
Scale memory for large working sets
Parallelize analytical queries, batch operations

Cons:
General guideline:
High read and write latency Use for multi-tenant apps,
Data model decisions have high impact on performance otherwise use for large
Snapshot isolation concessions working set (>100GB) or
compute heavy queries.
Active-active
Like BDR, pgactive, pgEdge, …
Active-active / n-way replication
Accept writes from any node, use logical replication to asynchronously exchange and
consolidate writes.

PostgreSQL reads
(primary) writes

async
reads PostgreSQL PostgreSQL reads
writes (primary) (primary) writes

UPDATE counters SET val = val + 1 UPDATE counters SET val = val + 1
Active-active / n-way replication
All nodes can survive network partitions by accepting writes locally, but no linear history
(CAP).

PostgreSQL reads
(primary) writes

async
reads PostgreSQL PostgreSQL reads
writes (primary) (primary) writes
Active-active trade-offs
Pros:
Very high read and write availability
Low read and write latency
Read throughput scales linearly

Cons: General guideline:

Eventual read-your-writes consistency Consider only for simple data
models (e.g. queues) and only
No monotonic read consistency
if you really need the benefits.
No linear history (updates might conflict after commit)
Distributed SQL
Like Yugabyte, CockroachDB, Spanner
Distributed key-value storage with SQL (DSQL)
Tables are stored on distributed key-value stores, shards replicated using Paxos/Raft.
Distributed transactions with snapshot isolation via global timestamps (HLC or TrueTime).

PostgreSQLike PostgreSQLike PostgreSQLike PostgreSQLike

users items users items users items users items

i1-100 i1-100 i1-100

i101-200 i101-200 i101-200
u11-20 u11-20 u11-20
u21-30 u21-30 u21-30
Distributed key-value storage trade-offs
Pros:
Good read and write availability (shard-level failover)
Single table, single key operations scale well
No additional data modelling steps or snapshot isolation concessions

Cons:
General guideline:
Many internal operations incur high latency Just use PostgreSQL ;)
No local joins in current implementations
Less mature and optimized than PostgreSQL but for simple apps, the
availability benefits can be useful
Conclusion
PostgreSQL can be distributed at different layers.
Client

Each architecture can introduce severe trade-offs. Pooler

Almost nothing comes for free.. Query engine

Logical data layer

Keep asking:
What do I really want? Storage manager
Which architecture achieves that?
Data files, WAL
What are the trade-offs?
What can my application tolerate? (can I change it?) Disk
Questions?
[email protected]

1 s2.0 S1876610215012424 Main PDF
100% (1)
1 s2.0 S1876610215012424 Main PDF
6 pages
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
No ratings yet
Benchmarking Apache Kafka - 2 Million Writes Per Second (On Three Cheap Machines) - LinkedIn Engineering
9 pages
Sustainable Manufacturing Metrics Development
100% (1)
Sustainable Manufacturing Metrics Development
5 pages
Event-Driven Architecture - Building Scalable Systems With Apache Kafka - The Tal
No ratings yet
Event-Driven Architecture - Building Scalable Systems With Apache Kafka - The Tal
19 pages
Manufacturing Use Cases List
No ratings yet
Manufacturing Use Cases List
11 pages
Azure DR & Continuity Solutions
No ratings yet
Azure DR & Continuity Solutions
35 pages
PowerCenter 6.x Upgrade Features Overview
No ratings yet
PowerCenter 6.x Upgrade Features Overview
53 pages
IBM Big Data & Analytics RA - V1
No ratings yet
IBM Big Data & Analytics RA - V1
69 pages
Documenting Use Cases Overview
No ratings yet
Documenting Use Cases Overview
11 pages
Big Data Processing and Analytics Platform Architecture For Process Industry Factories
100% (1)
Big Data Processing and Analytics Platform Architecture For Process Industry Factories
15 pages
(EXAMPLE) Business Requirements Document (BRD)
No ratings yet
(EXAMPLE) Business Requirements Document (BRD)
10 pages
IC Business Requirements Document Template - Google Docs
No ratings yet
IC Business Requirements Document Template - Google Docs
11 pages
Site Recovery Manager Technical Overview
No ratings yet
Site Recovery Manager Technical Overview
36 pages
Building A Scalable Architecture
No ratings yet
Building A Scalable Architecture
46 pages
Best Practices For Running Containers and Kubernetes in Production PDF
No ratings yet
Best Practices For Running Containers and Kubernetes in Production PDF
9 pages
Monitor SNMP Traffic on Ubuntu with MRTG
No ratings yet
Monitor SNMP Traffic on Ubuntu with MRTG
5 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
AWS IoT Integration with Siemens LOGO!
No ratings yet
AWS IoT Integration with Siemens LOGO!
1 page
Predix The Industrial Internet Platform Brief
No ratings yet
Predix The Industrial Internet Platform Brief
28 pages
Kafka Integration in Microservices
No ratings yet
Kafka Integration in Microservices
13 pages
AWS Security Best Practices Guide
No ratings yet
AWS Security Best Practices Guide
39 pages
AWS Architecture Icons Deck For Dark BG 20190729
No ratings yet
AWS Architecture Icons Deck For Dark BG 20190729
111 pages
Industry 4.0 Solutions for Operations Excellence
No ratings yet
Industry 4.0 Solutions for Operations Excellence
20 pages
Supply Chain Management Maturity Model
No ratings yet
Supply Chain Management Maturity Model
27 pages
Use Our Editable Graphic Resources... : Resize Change The Color
No ratings yet
Use Our Editable Graphic Resources... : Resize Change The Color
26 pages
10-0sr12 Modeling Conventions For Process Governance
No ratings yet
10-0sr12 Modeling Conventions For Process Governance
232 pages
Veeam Backup 11 0 Whats New
No ratings yet
Veeam Backup 11 0 Whats New
21 pages
Confluent Cloud Security Controls
No ratings yet
Confluent Cloud Security Controls
32 pages
Azure Stack Pricing & Services Guide
No ratings yet
Azure Stack Pricing & Services Guide
41 pages
Key Manufacturing Metrics Insights 2014
No ratings yet
Key Manufacturing Metrics Insights 2014
44 pages
Hadoop YARN Security and Kubernetes Integration
No ratings yet
Hadoop YARN Security and Kubernetes Integration
63 pages
Everything You Know About Medical Design Is Wrong - 0
No ratings yet
Everything You Know About Medical Design Is Wrong - 0
11 pages
DFC20203 Database Design: Topic 1: Fundamentals of Database Management System
No ratings yet
DFC20203 Database Design: Topic 1: Fundamentals of Database Management System
12 pages
Simplifying Data Integration with AWS
No ratings yet
Simplifying Data Integration with AWS
24 pages
Software Reliability
50% (2)
Software Reliability
211 pages
CH3 Manufacturing Metrics Mxa
No ratings yet
CH3 Manufacturing Metrics Mxa
22 pages
Chaos Engineering for Architects
No ratings yet
Chaos Engineering for Architects
2 pages
Global Compliance Program Guidebook 0217
100% (1)
Global Compliance Program Guidebook 0217
66 pages
Data Engineering and Data Engineer - Students
No ratings yet
Data Engineering and Data Engineer - Students
56 pages
Data Governance Implementation Plan
No ratings yet
Data Governance Implementation Plan
9 pages
UGBU - Product Support Matrix
No ratings yet
UGBU - Product Support Matrix
89 pages
Entity Relationship Model
No ratings yet
Entity Relationship Model
46 pages
Microsoft Supplier Guidelines 2014
No ratings yet
Microsoft Supplier Guidelines 2014
2 pages
RM Operations Guide For SAP Risk Management 12.0 PDF
No ratings yet
RM Operations Guide For SAP Risk Management 12.0 PDF
36 pages
Industrial Analytics Insights 2020
No ratings yet
Industrial Analytics Insights 2020
45 pages
Atlantic Cities Place-Making Guide
100% (1)
Atlantic Cities Place-Making Guide
34 pages
Outsystems On Aws: Migrate, Develop and Innovate Applications in The Cloud
No ratings yet
Outsystems On Aws: Migrate, Develop and Innovate Applications in The Cloud
15 pages
Challenges of Production Microservices
No ratings yet
Challenges of Production Microservices
6 pages
AWS Competency Application Readiness Checklist: Topic Description
No ratings yet
AWS Competency Application Readiness Checklist: Topic Description
3 pages
Server Events and Client Scripts PDF
No ratings yet
Server Events and Client Scripts PDF
43 pages
Structured Approachto Solution Architecture
100% (1)
Structured Approachto Solution Architecture
109 pages
DERMS on AWS for Utilities Management
No ratings yet
DERMS on AWS for Utilities Management
1 page
Network Video Recorder: User Manual
No ratings yet
Network Video Recorder: User Manual
213 pages
AI and DevOps
No ratings yet
AI and DevOps
9 pages
Prices and Roi Microsoft Fabric
No ratings yet
Prices and Roi Microsoft Fabric
36 pages
A Path To Event Sourcing With Amazon MSK - James Ousby
No ratings yet
A Path To Event Sourcing With Amazon MSK - James Ousby
42 pages
PostgreSQL Backup and High Availability Guide
50% (2)
PostgreSQL Backup and High Availability Guide
42 pages
Postgres Amdocs Day1
No ratings yet
Postgres Amdocs Day1
199 pages
Distributed PostgreSQL Overview
No ratings yet
Distributed PostgreSQL Overview
118 pages
Postgres High Availability, Load Balancing, Replication
No ratings yet
Postgres High Availability, Load Balancing, Replication
64 pages
Sample Exam Questions
No ratings yet
Sample Exam Questions
6 pages
Linux
No ratings yet
Linux
4 pages
Kode Singkatan
No ratings yet
Kode Singkatan
25 pages
Key Concepts in Computer Architecture
100% (1)
Key Concepts in Computer Architecture
14 pages
P N M S: PNMSJ+ Installation Manual (Windows Server 2008, Windows Server 2008 R2, Windows Server 2012 R2)
No ratings yet
P N M S: PNMSJ+ Installation Manual (Windows Server 2008, Windows Server 2008 R2, Windows Server 2012 R2)
113 pages
HPC Endsem 2024 FlyHigh Services
No ratings yet
HPC Endsem 2024 FlyHigh Services
16 pages
Cloudwatch User Guide PDF
No ratings yet
Cloudwatch User Guide PDF
435 pages
Computer F3
No ratings yet
Computer F3
3 pages
Échantillon de WallpaperDynamicExtension
No ratings yet
Échantillon de WallpaperDynamicExtension
19 pages
3500 22 01 01 CN
No ratings yet
3500 22 01 01 CN
9 pages
OSI Model: Communication Protocols Overview
No ratings yet
OSI Model: Communication Protocols Overview
8 pages
Read Me
No ratings yet
Read Me
8 pages
CPU Emulator Tutorial Guide
No ratings yet
CPU Emulator Tutorial Guide
40 pages
VBP Plus Data Sheet Enus
No ratings yet
VBP Plus Data Sheet Enus
2 pages
Smarttouch For Kodak I900 Series Scanners Release Notes: Version 1.9.8.1177 Summary
No ratings yet
Smarttouch For Kodak I900 Series Scanners Release Notes: Version 1.9.8.1177 Summary
3 pages
How To Recreate Control File
No ratings yet
How To Recreate Control File
10 pages
NetWorker - Updating The NetWorker software-NetWorker 19.4
No ratings yet
NetWorker - Updating The NetWorker software-NetWorker 19.4
28 pages
Using SQLIO To Stress Test An I/O Subsystem
No ratings yet
Using SQLIO To Stress Test An I/O Subsystem
6 pages
Implementing Secure Boot in Your System On Chip or FPGA 1.0
No ratings yet
Implementing Secure Boot in Your System On Chip or FPGA 1.0
5 pages
High Availability Disaster Recovery For Sap Applications
No ratings yet
High Availability Disaster Recovery For Sap Applications
33 pages
Digital Signal Processing Course Overview
No ratings yet
Digital Signal Processing Course Overview
2 pages
DevOps Roadmap 2024 - TrainWithShubham
No ratings yet
DevOps Roadmap 2024 - TrainWithShubham
5 pages
Run - Time Environment
No ratings yet
Run - Time Environment
27 pages
Bec405 A Module4 - 1
No ratings yet
Bec405 A Module4 - 1
4 pages
Com - Yy.hiyo Com-Yy-Hiyo 2025 04 19 03 58
No ratings yet
Com - Yy.hiyo Com-Yy-Hiyo 2025 04 19 03 58
296 pages
Linux Fundamentals Second Edition Richard Blum Updated 2025
No ratings yet
Linux Fundamentals Second Edition Richard Blum Updated 2025
173 pages
Quiz-1 23072014 040010501 - Mobile Application Development PDF
100% (1)
Quiz-1 23072014 040010501 - Mobile Application Development PDF
6 pages
Operating System Imp Questions Part 2
No ratings yet
Operating System Imp Questions Part 2
20 pages
Red Hat Enterprise Linux 4 (RHCE) Training at MTNL's CETTM: Detailed Course Content
No ratings yet
Red Hat Enterprise Linux 4 (RHCE) Training at MTNL's CETTM: Detailed Course Content
17 pages
UC v3 3 Milestone Release Notes
No ratings yet
UC v3 3 Milestone Release Notes
2 pages

PostgreSQL Distributed Architectures and Best Practices

Uploaded by

PostgreSQL Distributed Architectures and Best Practices

Uploaded by

PostgreSQL Distributed

Architectures & Best Practices

Marco Slot - [email protected]

In distributed systems, trade-offs are more important than algorithms.

Vendors and even many papers rarely talk about trade-offs.

Experiment: Discuss PostgreSQL distributed systems architecture trade-offs by example.

Machine/DC failure (downtime)

Mechanisms: Replication - Place copies of data on different machines

Reality: Concessions in terms of performance, transactional semantics,

Client Manual sharding, load balancing, write to multiple endpoints

1) What are the trade-offs?

2) For which workloads?

Client BEGIN; PostgreSQL

write to write ahead log

Max throughput per session = 1 / avg. response time

Block Storage API

PostgreSQL Physical replication (data files + WAL)

SELECT .. FROM shopping_cart Replica A

Cons: General guideline:

PostgreSQL PostgreSQL PostgreSQL

WAL pages WAL pages

Regular cloud storage DBMS-optimized

Cons: General guideline:

PostgreSQL PostgreSQL PostgreSQL

PostgreSQL PostgreSQL PostgreSQL

Per-statement latency can be a bottleneck!

PostgreSQL PostgreSQL PostgreSQL

PostgreSQL PostgreSQL PostgreSQL

PostgreSQL PostgreSQL PostgreSQL

PostgreSQL PostgreSQL PostgreSQL

PostgreSQL PostgreSQL PostgreSQL

Cons: General guideline:

PostgreSQLike PostgreSQLike PostgreSQLike PostgreSQLike

i1-100 i1-100 i1-100

Each architecture can introduce severe trade-offs. Pooler

Logical data layer

You might also like