0% found this document useful (0 votes)
25 views10 pages

PostgreSQL Database Architecture

Uploaded by

Kaushal Sutar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views10 pages

PostgreSQL Database Architecture

Uploaded by

Kaushal Sutar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Understanding PostgreSQL database architecture is key to mastering how it stores,

manages, and processes data. Here's a breakdown of the core components and their roles:

🧠 High-Level PostgreSQL Architecture

PostgreSQL follows a client-server architecture. It has the following main components:

Client <---> PostgreSQL Server (Postmaster) <--> Data Files (Storage)


|
--> Background Processes
--> Shared Memory

Image taken from the internet.

1. 🧍‍♂️ Client

● The frontend—applications, tools, or users that send SQL commands to the server.

● Communicate with the server via libpq (PostgreSQL's C API) or other drivers (JDBC,
psycopg, etc.).
2. 🧠 PostgreSQL Server (Postmaster)

This is the main daemon process. It listens for client connections and manages:

● Authentication

● Connection pooling

● Process spawning (forks child processes)

3. ⚙️ Backend Processes

Each client connection gets its own backend process. This process:

● Parses and plans queries

● Accesses data from disk or shared memory

● Returns results to the client

There are also background processes:

Process Description

checkpointer Writes dirty pages to disk at checkpoints

writer Flushes modified buffers to disk in the background

wal writer Writes WAL (Write-Ahead Log) records to disk

autovacuum Reclaims storage by removing dead tuples

bgworker Custom background workers (extensions, etc.)


stats Gathers DB usage statistics
collector

4. 🧊 Shared Memory

Shared between all PostgreSQL processes. Key parts:

● Buffer Pool (Shared Buffers): Caches table and index pages.

● WAL Buffers: Temporarily stores WAL data before writing.

● Work Memory: For sorts, hashes, etc. (per backend).

● Locks & LWLocks: To coordinate access.

5. 🗃️ Storage Layer

Data is stored in files on disk in the data directory ($PGDATA). Important subfolders:

● base/ → actual table data

● pg_wal/ → WAL (Write-Ahead Logs)

● pg_clog/ → transaction commit status

● pg_stat/ → runtime statistics

Each table/index corresponds to one or more files.

6. ✍️ Write-Ahead Logging (WAL)

To ensure durability and crash recovery:


● Changes are written to WAL first (sequential writes)

● Then applied to data files later (checkpointing)

This allows crash recovery by replaying WAL.

7. 📊 System Catalogs

PostgreSQL stores metadata in its own internal tables (in pg_catalog):

● pg_class → tables and indexes

● pg_attribute → columns

● pg_stat_activity → current queries and connections

🧩 Putting It Together

When a query is executed:

1. Client sends SQL to server

2. Server process parses, plans, and executes it

3. Results sent back to client

4. Modified data written to WAL first

5. Eventually flushed to data files by background processes

Let’s dive deeper into the PostgreSQL database architecture, focusing on the critical
internal components, their interaction, and how queries are processed.
1. Query Lifecycle: From Client to Data

Step-by-Step Flow of a Query (e.g., SELECT * FROM users WHERE id =


5;)
Stage Description

1. Client Sends the SQL query to the PostgreSQL server.

2. Postmaster Accepts the connection, forks a backend process to handle the


session.

3. Parser Converts SQL into a parse tree (SELECT → what table, what column,

what condition).

4. Planner/Optimizer Chooses the best execution plan using statistics, indexes, cost
estimates.

5. Executor Executes the plan, accesses data from shared buffers or disk,
returns result.

🔍 Example
SELECT name FROM users WHERE id = 5;

1. The planner checks if there's an index on id.

2. If found, uses Index Scan, else Sequential Scan.

3. Fetches name from the matching row.

4. Sends the result to the client.

2. Backend Processes (Detailed)


Each PostgreSQL session runs as a separate OS process (forked from postmaster). Key
background processes:

Process Role

Checkpointer Periodically writes dirty pages (modified buffers) to disk for


durability.

WAL Writer Flushes WAL buffers to disk to ensure WAL is persistent.

Background Writer Frees memory by writing less-used data from shared buffers
to disk.

Autovacuum Removes dead tuples and prevents table bloat (important for
MVCC).

Stats Collector Tracks query usage, index usage, table activity, etc.

Logical Replication Handle replication of data between PostgreSQL instances.


Workers

3. Shared Memory Areas (Memory Architecture)

🔹 Shared Buffers

● Stores most recently used table/index pages.

● Acts as a cache to reduce disk I/O.

Think of it like PostgreSQL’s internal "RAM-based cache".

🔹 WAL Buffers

● Temporary space for uncommitted changes before they're flushed to WAL logs.

🔹 Work Memory (per backend)


● Temporary memory used for query operations like sorting, hashing, joins.

🔹 Maintenance Work Memory

● Used for VACUUM, CREATE INDEX, ANALYZE.

4. Write-Ahead Logging (WAL)


WAL ensures durability and crash recovery.

💡 How WAL Works:

1. Before modifying a data file, PostgreSQL writes a log record to WAL.

2. WAL is flushed to disk (fsync) before confirming a transaction commit.

3. If the system crashes, WAL is replayed to restore consistency.

📂 WAL Files

● Stored in $PGDATA/pg_wal/

● Each file is 16MB by default.

● Used in replication, point-in-time recovery (PITR).

5. MVCC – Multi-Version Concurrency Control


MVCC allows concurrent reads and writes without locks.

🔸 What it means:
● Every transaction sees a snapshot of the database.

● Updates don't overwrite data—they create new versions (tuples).

● Old tuples are marked as "dead" and cleaned up by autovacuum.

🔹 Tuple Metadata:

Each row has hidden fields:

● xmin: ID of the transaction that inserted it.

● xmax: ID of the transaction that deleted/updated it.

6. Storage Engine & Filesystem


Tables and indexes are stored as binary heap files.

📂 Data Directory ($PGDATA)

● base/: contains subdirectories for each database.

● pg_wal/: write-ahead logs.

● global/: system-wide metadata.

● pg_tblspc/: tablespaces (alternative locations for data).

🧱 File Structure:

● Each table is stored in one or more files named after the OID of the table.

● If a table grows beyond 1GB, PostgreSQL splits it into multiple segments (e.g., 12345,
12345.1, 12345.2, etc.)
7. System Catalogs
PostgreSQL stores its metadata in regular tables (called system catalogs).

Catalog Table Description

pg_class Info about tables and indexes.

pg_attribute Info about columns.

pg_namespace Info about schemas.

pg_stat_activ Active queries and


ity connections.

pg_locks Lock monitoring.

8. Query Planning & Optimization (More Depth)


PostgreSQL generates multiple plans and chooses the cheapest.

🔹 Key Plan Types:

● Seq Scan: Full table scan.

● Index Scan: Uses B-tree or GIN index.

● Bitmap Scan: Efficient for multiple matches.

● Nested Loop / Hash Join / Merge Join: For joining tables.

🔍 View Execution Plan


EXPLAIN ANALYZE SELECT * FROM users WHERE email = '[email protected]';

You’ll see the chosen scan type, costs, row estimates, and actual execution time.
Summary Diagram (Mental Model)
Client

Postmaster (accepts connection)

Backend Process (per user)
├─ Parser → Planner → Executor
├─ Shared Buffers (data cache)
├─ WAL Buffer

Data Directory (on disk)
├─ Base (table data)
├─ pg_wal (logs)
├─ global (metadata)

You might also like