0% found this document useful (0 votes)
17 views10 pages

Postgresql Database Performance Optimization

The document discusses performance tuning for MySQL and PostgreSQL, focusing on troubleshooting slow queries and visualizing query plans. It covers how to identify slow queries using pg_stat_statements in PostgreSQL and similar techniques in MySQL, including the use of pt-query-digest for analyzing query performance. Additionally, it highlights the importance of indexing and provides SQL queries to retrieve statistics on query execution times and resource usage.

Uploaded by

ethiodbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views10 pages

Postgresql Database Performance Optimization

The document discusses performance tuning for MySQL and PostgreSQL, focusing on troubleshooting slow queries and visualizing query plans. It covers how to identify slow queries using pg_stat_statements in PostgreSQL and similar techniques in MySQL, including the use of pt-query-digest for analyzing query performance. Additionally, it highlights the importance of indexing and provides SQL queries to retrieve statistics on query execution times and resource usage.

Uploaded by

ethiodbas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

https://www.linkedin.

com/in/ahmed-mohamed-423583151

Day 4 MySQL and PostgreSQL performance tuning


PostgreSql
postgresql is greet dbms in performance thanks to it MVCC that hold version of the rows to reduce conflict and deadlock
for today we have pagila already loaded and we will see how to to troubleshoot slow query
how to locate slow query and how to view query plan in more visual way

Query plan
we have the below query running on pagila database , you can execute once to confirm its working fine This query is intentionally written to
be potentially inefficient

SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_paid FROM customer c


JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id GROUP BY c.customer_id,
c.first_name, c.last_name ORDER BY total_paid DESC LIMIT 10;

To Get the execution plan and actual runtime stats we will use explain analyze follow by the query it self

EXPLAIN ANALYZE SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_paid


FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id GROUP BY
c.customer_id ORDER BY SUM(p.amount) DESC LIMIT 10;

in analyze output look for high cost value


look for high actual time The time in milliseconds spent on that step

look for A Seq Scan (Sequential Scan) means the table is getting fully scanned might require to create index on the column you can see from
pic the scan is done on payment column

Visualizing the Query Plan


show query plan in text output is not greet way to analyze the performance and for huge query it will take more time from your side to
analyze this why its best to visualize the query plan , one greet website is https://explain.depesz.com/ which allow you to past explain
analyze output and then it will visualize the query
, run EXPLAIN (ANALYZE, BUFFERS) follow by the query and past the output to the website

EXPLAIN (ANALYZE, BUFFERS) SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS


total_paid FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id
GROUP BY c.customer_id ORDER BY SUM(p.amount) DESC LIMIT 10;

you can see the it will mark problems in query plan and from there you can understand which section from query is taking more time to
execute
for this query limit and sort are taking mush of the query time so it need to be optimized
you might need to check column that is sorting or limit or hashing and create index on column

note that explain analyze will execute the query to provide live state from query plan
if you don't want to execute query you can use explain follow by query it will return estimated executing plan

identify slow query https://www.linkedin.com/in/ahmed-mohamed-423583151


to identify poor performance query you need it add pg_stat_statment in preloaded library in postgresql.conf , if you follow day 1
installing and configuring PostgreSQL you saw that we have edited PostgreSQL parameter and enabled also pg_state_statment

best start point is to run the below query it will show what session running query what their username and client address and wait event
causing delay for query executing

SELECT pid, usename, client_addr, ,client_addr,state, wait_event FROM pg_stat_activity WHERE state = 'active' AND
pid <> pg_backend_pid();

below query also useful to see if query is wating for quirng lock and
it will; show blocked pid and blocking pid

SELECT
-- Details of the process holding the lock
pl.pid AS blocking_pid,
pa.query AS blocking_query,

-- Details of the process being blocked


al.pid AS blocked_pid,
a.query AS blocked_query,
to_char(a.query_start, 'YYYY-MM-DD HH24:MI:SS') as
blocked_query_started
FROM
pg_locks al
JOIN
pg_stat_activity a ON al.pid = a.pid
JOIN
pg_locks pl ON al.locktype = pl.locktype AND al.database is not
distinct from pl.database AND al.relation is not distinct from
pl.relation AND al.page is not distinct from pl.page AND al.tuple is
not distinct from pl.tuple AND al.virtualxid is not distinct from
pl.virtualxid AND al.transactionid is not distinct from
pl.transactionid AND al.classid is not distinct from pl.classid AND
al.objid is not distinct from pl.objid AND al.objsubid is not distinct
from pl.objsubid AND al.pid <> pl.pid
JOIN
pg_stat_activity pa ON pl.pid = pa.pid
WHERE
NOT al.granted;

What is pg_stat_statements ?
It is a PostgreSQL extension that tracks execution statistics for all SQL statements executed on your server. For every unique query, it
records key metrics like:

How many times the query was executed.


The total time spent executing that query.
The average execution time.
How much data it read from memory vs. disk.
How many rows it returned.

This information is invaluable for identifying slow, resource-intensive, or frequently run queries that are prime candidates for optimization.

to start we need to enable extension on database

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

then we can run varius query to retrive query base on metric we specify

Query 1: Find the Top 10 Most Time-Consuming Queries Overall

This query shows which queries are responsible for the most cumulative time spent on the server. These are often the best candidates for
optimization.

SELECT
total_exec_time,
calls,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;

https://www.linkedin.com/in/ahmed-mohamed-423583151
Query 2: Find the Top 10 Most Frequently Executed Queries
This helps you find "chatty" application behavior. A query might be very fast individually, but if it's called millions of times, it can create
significant load.

SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
calls DESC
LIMIT 10;

Query 2: Find the Top 10 Most Frequently Executed Queries


This helps you find "chatty" application behavior. A query might be very fast individually, but if it's called millions of times, it can create
significant load.

SQL

SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
calls DESC
LIMIT 10;

Query 3: Find Queries that Read the Most from Disk


Queries that read a lot from disk (instead of from memory/cache) are often slow due to I/O waits. This can indicate missing indexes.

SQL

SELECT
(shared_blks_read + local_blks_read + temp_blks_read) AS total_disk_reads,
query
FROM
pg_stat_statements
ORDER BY
total_disk_reads DESC
LIMIT 10;

shared_blks_read : Data read from disk for your tables/indexes. High numbers here often point to missing indexes.
temp_blks_read : Data read from temporary on-disk files, often caused by large sorts or joins that don't fit in work_mem .
Resetting the Statistics

After you perform a major optimization (like adding an index or rewriting a query), you may want to reset the statistics to get a fresh baseline.

You can clear all collected statistics by running:

SQL

SELECT pg_stat_statements_reset();

MySQL
for MySQL its have system views called performance schema that hold statics for everything running in MySQL
in addition showing the executing plan for query is very similar to PostgreSQL
you use explain analyze follow by query for live state
or explain follow by the query for estimated query plan

This query filters by an actor's first and last name, which is often a source of inefficiency if not indexed correctly.
-- Find all films for a specific actor and their stock levels

SELECT f.title, s.store_id, COUNT(i.inventory_id) AS number_in_stock FROM actor a JOIN film_actor fa ON a.actor_id =
fa.actor_id JOIN film f ON fa.film_id = f.film_id JOIN inventory i ON f.film_id = i.film_id JOIN store s ON
i.store_id = s.store_id WHERE a.first_name = 'PENELOPE' AND a.last_name = 'GUINESS' GROUP BY f.title, s.store_id
ORDER BY f.title;

Get the execution plan:

EXPLAIN SELECT f.title, s.store_id, COUNT(i.inventory_id) AS number_in_stock FROM actor a JOIN film_actor fa ON
a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id JOIN inventory i ON f.film_id = i.film_id JOIN store s
ON i.store_id = s.store_id WHERE a.first_name = 'PENELOPE' AND a.last_name = 'GUINESS' GROUP BY f.title, s.store_id
ORDER BY f.title;

Visualizing the Query Plan


you can past the query plan in the following webiste MySQL Visual Explain and get more visual query plan for better identify cost in the
query plan , follow along the guide and the you will get mush more cleaner query plan

**Identifying Active Sessionsand Locks

Use the SHOW FULL PROCESSLIST command to see what all the connected threads are doing.

-- Show all active connections and their queries


SHOW FULL PROCESSLIST;

Look at the Time column to see how long a query has been running and the Info column for the query text.
This is your go-to command for seeing "what's running right now."

How to Check for Locks and Deadlocks:

SELECT
r.trx_id AS waiting_trx_id,
r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query,
b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread,
b.trx_query AS blocking_query
FROM
performance_schema.data_lock_waits AS w
JOIN
information_schema.innodb_trx AS b ON b.trx_id = w.blocking_engine_transaction_id
JOIN
information_schema.innodb_trx AS r ON r.trx_id = w.requesting_engine_transaction_id;

run the query multiable time if no result returned that means there is not blocking.

using percona toolkit


i personally preferred to use perconatoolkit for faster troubleshooting , it will run various script and return to you overall status of MySQL
including wait type , hardware metric to identify if there any hardware bottlenecks , and statues of each database with query's statics

install perconatoolkit by downloading the following steps in this link https://docs.percona.com/percona-toolkit/?


_gl=1*vj3u9r*_gcl_aw*R0NMLjE3NDkyMzIxNTEuQ2p3S0NBandvNHJDQmhBYkVpd0F4aEpsQ2R5MXN4QjZLTmp1aEVLdnlOZGVuVG9nY
VJDdW9aNHRPQ2stSGJCd2Y3alQ2T29JTTRFZi1ob0NlMUVRQXZEX0J3RQ..*_gcl_au*MjkxMDYwOTA5LjE3NDkyMzIwMzk.*_ga*Nzc2Nj
M1NDgxLjE3Mzg1MzQ3NzI.*_ga_DXWV0B7PSN*czE3NDk0ODU2NzUkbzMkZzEkdDE3NDk0ODY0NzgkajYwJGwwJGgw

for MySQL we will use pt-query-digest , best common way is to enable slow query log on MySQL either from my.cnf for permeate setting up
our temporary enable it using SET GLOBAL from inside MySQL

mysql -uroot -p

SET GLOBAL slow_query_log = 'ON';


SET GLOBAL log_output = 'FILE';
-- The next line is the most important part

SET GLOBAL long_query_time = 0;

long_query_time = 0 : This tells MySQL to log every single query, not just ones that take a long time. This gives pt-query-digest a
complete picture of your workload.

You can find the location of your slow query log file by running:

SHOW VARIABLES LIKE 'slow_query_log_file';

The default is often /var/lib/mysql/your-hostname-slow.log .

Run pt-query-digest This is the simplest and most common use case. It reads the specified log file and prints a detailed report to your
screen.

sudo pt-query-digest /var/lib/mysql/mysql-test-slow.log

https://www.linkedin.com/in/ahmed-mohamed-423583151
look for query that has high query time or high lock time

remember after you finish using pt-query-digest to change parameter for long_query_time from 0 to avoid filling up your disk and adding
unnecessary overhead:

SET GLOBAL long_query_time = 10; -- Or your previous default


SET GLOBAL slow_query_log = 'OFF';
Analyzing the PROCESSLIST
This mode repeatedly checks SHOW FULL PROCESSLIST for a set duration to see what queries are running right now.

pt-query-digest --processlist h=localhost,u=root,p=your_password --run-time 30 > current_activity_report.txt

This will check the process list every second for 30 seconds and create a report.

pt-deadlock-logger
is another essential utility from the Percona Toolkit. While pt-query-digest is for analyzing general query performance, pt-deadlock-
logger has a very specific and critical purpose: to continuously monitor for and create a permanent record of MySQL deadlocks.

When a deadlock occurs in InnoDB, MySQL resolves it automatically by choosing one transaction as a "victim" and rolling it back.
Information about this deadlock is then printed to the output of the SHOW ENGINE INNODB STATUS command.

The problem is that this output is ephemeral. It only ever shows the most recent deadlock detected. If another deadlock occurs five minutes
later, the information about the first one is lost forever. This makes it incredibly difficult to debug intermittent deadlock issues.

pt-deadlock-logger solves this by running as a background process (a daemon), constantly checking for deadlocks, and when it finds
one, it extracts the relevant information and saves it to a file or a database table. This gives you a complete historical log of every deadlock
that has occurred.

Interactive Testing (Print to Screen)

This is great for testing your connection. It will check for deadlocks every 5 seconds for a total of one minute and print any findings to your
terminal.

pt-deadlock-logger --run-time=60s --interval=5s h=localhost,u=pt_user,p=a_very_secure_password

https://www.linkedin.com/in/ahmed-mohamed-423583151

You might also like