PostgreSQL lingo

Navigating confusing Postgres terminology (HOT, TOAST! 🍞)

17 September 2020

Marsel Mavletkulov

Relation

PostgreSQL is object-relational database management system

In fact, Bob is 18.

func person(name string, age int) bool { ... }

fmt.Println("Is Bob 18?", person("Bob", 18)) // true
fmt.Println("Is Bob 21?", person("Bob", 21)) // false
2

Relational calculus (declarative query language)

Since predicates' values are bool, you can make new predicates based on existing ones using logical operations (OR, AND, NOT)

A database query is a predicate expressed as a logical formula

Database needs to find attribute values that make the formula true

3

Relation as set

Predicate can be written down as math set of factual attributes (values where formula returns true) which in turn can be represented as a table

An element in that set is called a tuple (or row in table representation)

In fact, Bob is 18, Alice is 21.

{
    (name="Bob", age=18),   // person("Bob", 18) == true
    (name="Alice", age=21), // person("Alice", 21) == true
}

| name  | age
| ---   | ---
| Bob   | 18
| Alice | 21
4

Tuple

An immutable record with typed attributes is called a tuple (or cortege)

(
    name string = "Bob"
    age int = 18
)

The term originated as an abstraction of the sequence: single, couple, triple, quadruple, quintuple, sextuple, septuple, octuple, ..., n‑tuple

* etymology

5

Postgres relation

Postgres query result is a relation

# SELECT * FROM person;
name  | age
------+-----
Bob   |  18
Alice |  21

pg_class is where you can find description of relations such as tables, views, indexes, sequences

# SELECT relname FROM pg_class;
   relname
-------------
person
pg_statistic
pg_type
...
6

Relation files

Where is a relation stored on disk?

# SELECT pg_relation_filepath('person') AS table, pg_relation_filepath('person_name') as index;
     table       |      index
-----------------+------------------
base/16488/16493 | base/16488/16500

# \! ls /usr/local/pgsql/data/base/16488 | grep 16493
16493
16493_fsm
16493_vm

16493 is a main fork that contains table tuples, 16500 contains index tuples

Each file (~1 GB) is logically divided by 8 KB blocks called pages

Page starts with a header which describes page content (e.g., table page, index page)

7

Postgres tuple

Multiple processes (readers and writers) concurrently work with person table

# INSERT INTO person(name, age) VALUES ('Eve', 19); -- Current transaction id is 29410.
# UPDATE person SET age=80 WHERE name='Eve'; -- Current transaction id is 29411.
# UPDATE person SET age=20 WHERE name='Eve'; -- Current transaction id is 29412.

Eve row has three versions each represented with two numbers (transaction IDs)

xmin  |  xmax |  data
------+-------+--------
29410 | 29411 | Eve,18
29411 | 29412 | Eve,80
29412 |     0 | Eve,20
8

Heap-only tuple (HOT)

"Heap" in this case means "table", its data (a bunch of table tuples)

ctid  | t_ctid | hhu | hot |  xmin |  xmax |  data
------+--------+-----+-----+-------+-------+--------
(0,1) |  (0,2) |  t  |     | 29410 | 29411 | Eve,18
(0,2) |  (0,3) |  t  |  t  | 29411 | 29412 | Eve,80
(0,3) |  (0,3) |     |  t  | 29412 |     0 | Eve,20
9

Heap-only tuple update

Table updates must be reflected in its indices

HOT update mechanism prevents creation of wasteful index tuples and facilitates in-page cleaning (less work for vacuum)

It works when an updated field is not part of any index and table page has free space

10

The oversized attribute storage technique (TOAST)

Postgres requires a tuple to fit into a single page, but 8 KB is not enough in practise

The oversized attribute storage technique allows to compress/store a large attribute value (e.g., text, numeric, json) in a special table called TOAST

This makes the original table smaller because it contains a reference to a TOAST table, but there is an overhead when a large field is frequently used

# SELECT relname, relfilenode FROM pg_class WHERE oid = (
    -- Find OID (16497) of TOAST table by table name "person" for which TOAST table was created.
    SELECT reltoastrelid FROM pg_class WHERE relname = 'person'
);
   relname     | relfilenode
---------------+-------------
pg_toast_16493 |       16497
11

TOAST strategies

TOAST is triggered when a tuple is wider than 1/4 of a page, TOAST_TUPLE_THRESHOLD (2 KB)

It compresses and/or moves field values until the tuple is shorter than TOAST_TUPLE_TARGET (2 KB)

12

Buffer cache

Buffer cache is an array of buffers located in shared memory of Postgres processes

A buffer stores a page and where it was read from, its state

When buffer cache becomes full

Write ahead log (WAL) is a stream of executed actions (segment files of 16 MB)

checkpointer process periodically writes all dirty pages on disk (this helps to keep WAL size fairly small), bgwriter writes dirty pages that will likely be evicted soon, backend also writes dirty pages on disk if there are not enough buffers for a query

13

Buffers in explain

You can see how many buffers were used in a query plan

# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM person;
...
Buffers: shared hit=2 read=1
I/O Timings: read=50
14

Conclusion

Tables, views, indexes, sequences are relations

Table rows, indexed rows are tuples (table tuples, index tuples); elements of a relation

Heap-only tuple (HOT) is a table tuple that doesn't have pointers from index tuples

The oversized attribute storage technique (TOAST) makes sure a tuple fits into a page

Page is a logical block (8 KB) of a segment file where tuples are physically stored

Buffer cache is in-memory representation of array of pages (one buffer per page)

15

References

16

Thank you

Marsel Mavletkulov

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)