https://www.linkedin.
com/in/ahmed-mohamed-423583151
Day 4 MySQL and PostgreSQL performance tuning
PostgreSql
postgresql is greet dbms in performance thanks to it MVCC that hold version of the rows to reduce conflict and deadlock
for today we have pagila already loaded and we will see how to to troubleshoot slow query
how to locate slow query and how to view query plan in more visual way
Query plan
we have the below query running on pagila database , you can execute once to confirm its working fine This query is intentionally written to
be potentially inefficient
SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_paid FROM customer c
JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id GROUP BY c.customer_id,
c.first_name, c.last_name ORDER BY total_paid DESC LIMIT 10;
To Get the execution plan and actual runtime stats we will use explain analyze follow by the query it self
EXPLAIN ANALYZE SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS total_paid
FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id GROUP BY
c.customer_id ORDER BY SUM(p.amount) DESC LIMIT 10;
in analyze output look for high cost value
look for high actual time The time in milliseconds spent on that step
look for A Seq Scan (Sequential Scan) means the table is getting fully scanned might require to create index on the column you can see from
pic the scan is done on payment column
Visualizing the Query Plan
show query plan in text output is not greet way to analyze the performance and for huge query it will take more time from your side to
analyze this why its best to visualize the query plan , one greet website is https://explain.depesz.com/ which allow you to past explain
analyze output and then it will visualize the query
, run EXPLAIN (ANALYZE, BUFFERS) follow by the query and past the output to the website
EXPLAIN (ANALYZE, BUFFERS) SELECT c.first_name, c.last_name, COUNT(r.rental_id) AS total_rentals, SUM(p.amount) AS
total_paid FROM customer c JOIN rental r ON c.customer_id = r.customer_id JOIN payment p ON r.rental_id = p.rental_id
GROUP BY c.customer_id ORDER BY SUM(p.amount) DESC LIMIT 10;
you can see the it will mark problems in query plan and from there you can understand which section from query is taking more time to
execute
for this query limit and sort are taking mush of the query time so it need to be optimized
you might need to check column that is sorting or limit or hashing and create index on column
note that explain analyze will execute the query to provide live state from query plan
if you don't want to execute query you can use explain follow by query it will return estimated executing plan
identify slow query https://www.linkedin.com/in/ahmed-mohamed-423583151
to identify poor performance query you need it add pg_stat_statment in preloaded library in postgresql.conf , if you follow day 1
installing and configuring PostgreSQL you saw that we have edited PostgreSQL parameter and enabled also pg_state_statment
best start point is to run the below query it will show what session running query what their username and client address and wait event
causing delay for query executing
SELECT pid, usename, client_addr, ,client_addr,state, wait_event FROM pg_stat_activity WHERE state = 'active' AND
pid <> pg_backend_pid();
below query also useful to see if query is wating for quirng lock and
it will; show blocked pid and blocking pid
SELECT
-- Details of the process holding the lock
pl.pid AS blocking_pid,
pa.query AS blocking_query,
-- Details of the process being blocked
al.pid AS blocked_pid,
a.query AS blocked_query,
to_char(a.query_start, 'YYYY-MM-DD HH24:MI:SS') as
blocked_query_started
FROM
pg_locks al
JOIN
pg_stat_activity a ON al.pid = a.pid
JOIN
pg_locks pl ON al.locktype = pl.locktype AND al.database is not
distinct from pl.database AND al.relation is not distinct from
pl.relation AND al.page is not distinct from pl.page AND al.tuple is
not distinct from pl.tuple AND al.virtualxid is not distinct from
pl.virtualxid AND al.transactionid is not distinct from
pl.transactionid AND al.classid is not distinct from pl.classid AND
al.objid is not distinct from pl.objid AND al.objsubid is not distinct
from pl.objsubid AND al.pid <> pl.pid
JOIN
pg_stat_activity pa ON pl.pid = pa.pid
WHERE
NOT al.granted;
What is pg_stat_statements ?
It is a PostgreSQL extension that tracks execution statistics for all SQL statements executed on your server. For every unique query, it
records key metrics like:
How many times the query was executed.
The total time spent executing that query.
The average execution time.
How much data it read from memory vs. disk.
How many rows it returned.
This information is invaluable for identifying slow, resource-intensive, or frequently run queries that are prime candidates for optimization.
to start we need to enable extension on database
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
then we can run varius query to retrive query base on metric we specify
Query 1: Find the Top 10 Most Time-Consuming Queries Overall
This query shows which queries are responsible for the most cumulative time spent on the server. These are often the best candidates for
optimization.
SELECT
total_exec_time,
calls,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
total_exec_time DESC
LIMIT 10;
https://www.linkedin.com/in/ahmed-mohamed-423583151
Query 2: Find the Top 10 Most Frequently Executed Queries
This helps you find "chatty" application behavior. A query might be very fast individually, but if it's called millions of times, it can create
significant load.
SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
calls DESC
LIMIT 10;
Query 2: Find the Top 10 Most Frequently Executed Queries
This helps you find "chatty" application behavior. A query might be very fast individually, but if it's called millions of times, it can create
significant load.
SQL
SELECT
calls,
total_exec_time,
mean_exec_time,
query
FROM
pg_stat_statements
ORDER BY
calls DESC
LIMIT 10;
Query 3: Find Queries that Read the Most from Disk
Queries that read a lot from disk (instead of from memory/cache) are often slow due to I/O waits. This can indicate missing indexes.
SQL
SELECT
(shared_blks_read + local_blks_read + temp_blks_read) AS total_disk_reads,
query
FROM
pg_stat_statements
ORDER BY
total_disk_reads DESC
LIMIT 10;
shared_blks_read : Data read from disk for your tables/indexes. High numbers here often point to missing indexes.
temp_blks_read : Data read from temporary on-disk files, often caused by large sorts or joins that don't fit in work_mem .
Resetting the Statistics
After you perform a major optimization (like adding an index or rewriting a query), you may want to reset the statistics to get a fresh baseline.
You can clear all collected statistics by running:
SQL
SELECT pg_stat_statements_reset();
MySQL
for MySQL its have system views called performance schema that hold statics for everything running in MySQL
in addition showing the executing plan for query is very similar to PostgreSQL
you use explain analyze follow by query for live state
or explain follow by the query for estimated query plan
This query filters by an actor's first and last name, which is often a source of inefficiency if not indexed correctly.
-- Find all films for a specific actor and their stock levels
SELECT f.title, s.store_id, COUNT(i.inventory_id) AS number_in_stock FROM actor a JOIN film_actor fa ON a.actor_id =
fa.actor_id JOIN film f ON fa.film_id = f.film_id JOIN inventory i ON f.film_id = i.film_id JOIN store s ON
i.store_id = s.store_id WHERE a.first_name = 'PENELOPE' AND a.last_name = 'GUINESS' GROUP BY f.title, s.store_id
ORDER BY f.title;
Get the execution plan:
EXPLAIN SELECT f.title, s.store_id, COUNT(i.inventory_id) AS number_in_stock FROM actor a JOIN film_actor fa ON
a.actor_id = fa.actor_id JOIN film f ON fa.film_id = f.film_id JOIN inventory i ON f.film_id = i.film_id JOIN store s
ON i.store_id = s.store_id WHERE a.first_name = 'PENELOPE' AND a.last_name = 'GUINESS' GROUP BY f.title, s.store_id
ORDER BY f.title;
Visualizing the Query Plan
you can past the query plan in the following webiste MySQL Visual Explain and get more visual query plan for better identify cost in the
query plan , follow along the guide and the you will get mush more cleaner query plan
**Identifying Active Sessionsand Locks
Use the SHOW FULL PROCESSLIST command to see what all the connected threads are doing.
-- Show all active connections and their queries
SHOW FULL PROCESSLIST;
Look at the Time column to see how long a query has been running and the Info column for the query text.
This is your go-to command for seeing "what's running right now."
How to Check for Locks and Deadlocks:
SELECT
r.trx_id AS waiting_trx_id,
r.trx_mysql_thread_id AS waiting_thread,
r.trx_query AS waiting_query,
b.trx_id AS blocking_trx_id,
b.trx_mysql_thread_id AS blocking_thread,
b.trx_query AS blocking_query
FROM
performance_schema.data_lock_waits AS w
JOIN
information_schema.innodb_trx AS b ON b.trx_id = w.blocking_engine_transaction_id
JOIN
information_schema.innodb_trx AS r ON r.trx_id = w.requesting_engine_transaction_id;
run the query multiable time if no result returned that means there is not blocking.
using percona toolkit
i personally preferred to use perconatoolkit for faster troubleshooting , it will run various script and return to you overall status of MySQL
including wait type , hardware metric to identify if there any hardware bottlenecks , and statues of each database with query's statics
install perconatoolkit by downloading the following steps in this link https://docs.percona.com/percona-toolkit/?
_gl=1*vj3u9r*_gcl_aw*R0NMLjE3NDkyMzIxNTEuQ2p3S0NBandvNHJDQmhBYkVpd0F4aEpsQ2R5MXN4QjZLTmp1aEVLdnlOZGVuVG9nY
VJDdW9aNHRPQ2stSGJCd2Y3alQ2T29JTTRFZi1ob0NlMUVRQXZEX0J3RQ..*_gcl_au*MjkxMDYwOTA5LjE3NDkyMzIwMzk.*_ga*Nzc2Nj
M1NDgxLjE3Mzg1MzQ3NzI.*_ga_DXWV0B7PSN*czE3NDk0ODU2NzUkbzMkZzEkdDE3NDk0ODY0NzgkajYwJGwwJGgw
for MySQL we will use pt-query-digest , best common way is to enable slow query log on MySQL either from my.cnf for permeate setting up
our temporary enable it using SET GLOBAL from inside MySQL
mysql -uroot -p
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL log_output = 'FILE';
-- The next line is the most important part
SET GLOBAL long_query_time = 0;
long_query_time = 0 : This tells MySQL to log every single query, not just ones that take a long time. This gives pt-query-digest a
complete picture of your workload.
You can find the location of your slow query log file by running:
SHOW VARIABLES LIKE 'slow_query_log_file';
The default is often /var/lib/mysql/your-hostname-slow.log .
Run pt-query-digest This is the simplest and most common use case. It reads the specified log file and prints a detailed report to your
screen.
sudo pt-query-digest /var/lib/mysql/mysql-test-slow.log
https://www.linkedin.com/in/ahmed-mohamed-423583151
look for query that has high query time or high lock time
remember after you finish using pt-query-digest to change parameter for long_query_time from 0 to avoid filling up your disk and adding
unnecessary overhead:
SET GLOBAL long_query_time = 10; -- Or your previous default
SET GLOBAL slow_query_log = 'OFF';
Analyzing the PROCESSLIST
This mode repeatedly checks SHOW FULL PROCESSLIST for a set duration to see what queries are running right now.
pt-query-digest --processlist h=localhost,u=root,p=your_password --run-time 30 > current_activity_report.txt
This will check the process list every second for 30 seconds and create a report.
pt-deadlock-logger
is another essential utility from the Percona Toolkit. While pt-query-digest is for analyzing general query performance, pt-deadlock-
logger has a very specific and critical purpose: to continuously monitor for and create a permanent record of MySQL deadlocks.
When a deadlock occurs in InnoDB, MySQL resolves it automatically by choosing one transaction as a "victim" and rolling it back.
Information about this deadlock is then printed to the output of the SHOW ENGINE INNODB STATUS command.
The problem is that this output is ephemeral. It only ever shows the most recent deadlock detected. If another deadlock occurs five minutes
later, the information about the first one is lost forever. This makes it incredibly difficult to debug intermittent deadlock issues.
pt-deadlock-logger solves this by running as a background process (a daemon), constantly checking for deadlocks, and when it finds
one, it extracts the relevant information and saves it to a file or a database table. This gives you a complete historical log of every deadlock
that has occurred.
Interactive Testing (Print to Screen)
This is great for testing your connection. It will check for deadlocks every 5 seconds for a total of one minute and print any findings to your
terminal.
pt-deadlock-logger --run-time=60s --interval=5s h=localhost,u=pt_user,p=a_very_secure_password
https://www.linkedin.com/in/ahmed-mohamed-423583151