Evaluation de requêtes et
optimisation
Prérequis abordés en L2-L3
En L2:
●
SQL – requêtes simples et complexes
●
Algèbre relationnelle – requêtes simples
En L3
●
Stockage des données en mémoire
●
Indexation
Cette partie du cours
Objectifs:
●
Principes de l’évaluation de requêtes
●
Principaux algorithmes et coûts associés
●
Savoir analyser un “query plan” (dans postgreSQL)
●
Pistes pour améliorer des requêtes lentes
Attention, concepts communs à plusieurs SGBD
mais les détails varient selon les systèmes
(cours basé sur le cours de C. Sirangelo Univ Paris7)
Rappels - stockage
Enregistrement - tuple
Bloc
une entête
et plusieurs enregistrements
unité de chargement des
données en mémoire principale
Fichier de données - table
collection de blocs
organisation séquentielle ou en tas
Rappels – index
Objectif : accélérer l’accès aux tuples d’une table ayant une
valeur particulière d’un (ou plusieurs) attributs appelé clé de
recherche de l’index.
Ensemble de
<clé, pointeur>
ie un dictionnaire
Accès aux données : Chercher la valeur de clé dans l’index
puis charger le(s) blocs correspondants aux enregistrements.
(dans un indexe dense)
Rappels – indexes
Ensemble de
<clé, pointeur>
ie un dictionnaire
Types d’index: dense/non-dense, primaire/secondaire
Dans tous les cas, la manipulation d’un fichier de données
indexé nécéssite des opérations sur les index:
Recherche pour une valeur de clé
Insertion d’une couple <clé, pointeur>
Suppression d’un couple <clé, pointeur>
Rappels – indexes
Avantage de la création d’un index sur une table:
évaluation efficace des requêtes portant sur la clé de
recherche correspondante
Inconvénient:
mise à jour de l’index en cas de modification de la table
(pose problème en cas de mises à jour très fréquentes)
Structures de données utilisées dans les implémentations
Arbres : B+-tree et B-tree
Tables de hachage : hash
D’autres structures existent pour des utilisations spécifiques
Rappels – indexes
Exemple de Table de hachage
Rappels – indexes
Exemple de B+-tree
pointeurs vers les enregistrements
Dans un B-tree, les couples <clé, pointeur enregistrement>
sont répartis entre les noeuds internes et les feuilles.
Rappels – indexation
Syntaxe en postgreSQL:
CREATE INDEX name ON table [USING method](columns);
DROP INDEX name;
Par défaut les indexes postgreSQL sont un genre de B-tree.
Le type hash existe aussi.
(Il était déconseillé jusqu’à la version 10. voir la doc)
Rappels – Algèbre relationelle
SQL / Calcul relationnel
Déclaratif - décrit ce qu'on veut avoir en résultat
– cad les propriétés des tuples résultats
Algèbre relationnelle (Õ, , x, , , –, ⋈...)
Procédural – décrit comment calculer le résultat
– une combinaison d'opérations sur des ensembles.
Utile pour représenter un plan d'exécution.
A la base de l'optimisation de requête pour SQL
Algèbre relationnelle = Calcul relationnel (sûr)
= SQL (sans agrégat ni groupement)
Rappels – Algèbre relationelle
SQL : SELECT DISTINCT nom, intitule
FROM enseignant NATURAL JOIN enseigne NATURAL JOIN cours
WHERE dpt = info;
Requête d’algèbre relationelle équivalente:
Deux autres réécritures possibles:
...
Q1 = select avatar from participe where idS=200;
Q2 = select avatar from participe where idS=145;
Q3 = select avatar from participe where idS=60;
Q1 = select avatar from participe where idS=200;
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using participe_pkey on participe (cost=0.28..37.02 rows=20 width=7)
Index Cond: (ids = 200)
Q2 = select avatar from participe where idS=145;
QUERY PLAN
-------------------------------------------------------------------------------
Bitmap Heap Scan on participe (cost=5.44..50.32 rows=150 width=7)
Recheck Cond: (ids = 145)
-> Bitmap Index Scan on participe_pkey (cost=0.00..5.41 rows=150 width=0)
Index Cond: (ids = 145)
Q3 = select avatar from participe where idS=60;
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using participe_pkey on participe (cost=0.28..37.02 rows=20 width=7)
Index Cond: (ids = 60)
Q1 = select avatar from participe where idS=200;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using participe_pkey on participe (cost=0.28..37.02 rows=20 width=7)
(actual time=0.016..0.018 rows=3 loops=1)
Index Cond: (ids = 200)
Planning Time: 0.070 ms
Execution Time: 0.041 ms
Q2 = select avatar from participe where idS=145;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on participe (cost=5.44..50.32 rows=150 width=7) (actual time=0.028..0.052 rows=150 loops=1)
Recheck Cond: (ids = 145)
Heap Blocks: exact=3
-> Bitmap Index Scan on participe_pkey (cost=0.00..5.41 rows=150 width=0)
(actual time=0.019..0.019 rows=150 loops=
Index Cond: (ids = 145)
Planning Time: 0.059 ms
Execution Time: 0.085 ms
Q3 = select avatar from participe where idS=60;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using participe_pkey on participe (cost=0.28..37.02 rows=20 width=7)
(actual time=0.158..0.171 rows=14 loops=1)
Index Cond: (ids = 60)
Planning Time: 0.157 ms
Execution Time: 0.226 ms
Q4 = select idS, surnom, avatar from participe natural join soiree where lieu ='caen';
Q5 = select P.idS, surnom, avatar from participe P, soiree S where P.idS=S.idS and lieu = 'caen';
Q6 = select idS, surnom, avatar from participe where
exists ( select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
Q7 = select idS, surnom, avatar from participe where
idS in (select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
Q4 = select idS, surnom, avatar from participe natural join soiree where lieu ='caen';
Q5 = select P.idS, surnom, avatar from participe P, soiree S where P.idS=S.idS and lieu = 'caen';
Q6 = select idS, surnom, avatar from participe where
exists ( select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
QUERY PLAN
---------------------------------------------------------------------
Hash Join (cost=4.61..131.67 rows=302 width=18)
Hash Cond: (participe.ids = soiree.ids)
-> Seq Scan on participe (cost=0.00..109.07 rows=6707 width=18)
-> Hash (cost=4.50..4.50 rows=9 width=4)
-> Seq Scan on soiree (cost=0.00..4.50 rows=9 width=4)
Filter: ((lieu)::text = 'caen'::text)
Q7 = select idS, surnom, avatar from participe where
idS in (select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
QUERY PLAN
-----------------------------------------------------------------------------
Seq Scan on participe (cost=0.00..16901.72 rows=3354 width=18)
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on soiree (cost=0.00..5.00 rows=1 width=4)
Filter: ((participe.ids = ids) AND ((lieu)::text = 'caen'::text))
Q4 = select idS, surnom, avatar from participe natural join soiree where lieu ='caen';
Q5 = select P.idS, surnom, avatar from participe P, soiree S where P.idS=S.idS and lieu = 'caen';
Q6 = select idS, surnom, avatar from participe where
exists ( select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Hash Join (cost=4.61..131.67 rows=302 width=18) (actual time=0.149..1.154 rows=281 loops=1)
Hash Cond: (participe.ids = soiree.ids)
-> Seq Scan on participe (cost=0.00..109.07 rows=6707 width=18) (actual time=0.006..0.527 rows=6707 loops=1)
-> Hash (cost=4.50..4.50 rows=9 width=4) (actual time=0.025..0.026 rows=9 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on soiree (cost=0.00..4.50 rows=9 width=4) (actual time=0.008..0.022 rows=9 loops=1)
Filter: ((lieu)::text = 'caen'::text)
Rows Removed by Filter: 191
Planning Time: 0.175 ms
Execution Time: 1.182 ms
Q7 = select idS, surnom, avatar from participe where
idS in (select idS from soiree where participe.idS=soiree.idS and lieu = 'caen');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on participe (cost=0.00..16901.72 rows=3354 width=18) (actual time=10.439..105.118 rows=281 loops=1)
Filter: (SubPlan 1)
Rows Removed by Filter: 6426
SubPlan 1
-> Seq Scan on soiree (cost=0.00..5.00 rows=1 width=4) (actual time=0.015..0.015 rows=0 loops=6707)
Filter: ((participe.ids = ids) AND ((lieu)::text = 'caen'::text))
Rows Removed by Filter: 195
Planning Time: 0.109 ms
Execution Time: 105.149 ms
Q8 = select surnom from participe where avatar = 'citrouille' and avatar = 'schtroumpf';
Q9 = (select surnom from participe where avatar = 'citrouille') intersect
(select surnom from participe where avatar = 'schtroumpf');
Q10 = select surnom from participe where avatar = 'citrouille' and
surnom in (select surnom from participe where avatar = 'schtroumpf');
Q8 = select surnom from participe where avatar = 'citrouille' and avatar = 'schtroumpf';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..128.00 rows=1 width=7)
One-Time Filter: false
-> Seq Scan on participe (cost=0.00..128.00 rows=1 width=7)
Filter: ((avatar)::text = 'citrouille'::text)
(4 rows)
Q8 = select surnom from participe where avatar = 'citrouille' and avatar = 'schtroumpf';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Result (cost=0.00..128.00 rows=1 width=7) (actual time=0.002..0.003 rows=0 loops=1)
One-Time Filter: false
-> Seq Scan on participe (cost=0.00..128.00 rows=1 width=7) (never executed)
Filter: ((avatar)::text = 'citrouille'::text)
Planning Time: 0.158 ms
Execution Time: 0.035 ms
(6 rows)
Q9 = (select surnom from participe where avatar = 'citrouille') intersect
(select surnom from participe where avatar = 'schtroumpf');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..269.56 rows=269 width=72)
-> Append (cost=0.00..267.62 rows=775 width=72)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..131.06 rows=306 width=11)
-> Seq Scan on participe (cost=0.00..128.00 rows=306 width=7)
Filter: ((avatar)::text = 'citrouille'::text)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..132.69 rows=469 width=11)
-> Seq Scan on participe participe_1 (cost=0.00..128.00 rows=469 width=7)
Filter: ((avatar)::text = 'schtroumpf'::text)
(8 rows)
Q10 = select surnom from participe where avatar = 'citrouille' and
surnom in (select surnom from participe where avatar = 'schtroumpf');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=133.86..267.52 rows=156 width=7)
Hash Cond: ((participe.surnom)::text = (participe_1.surnom)::text)
-> Seq Scan on participe (cost=0.00..128.00 rows=306 width=7)
Filter: ((avatar)::text = 'citrouille'::text)
-> Hash (cost=128.00..128.00 rows=469 width=7)
-> Seq Scan on participe participe_1 (cost=0.00..128.00 rows=469 width=7)
Filter: ((avatar)::text = 'schtroumpf'::text)
(7 rows)
Q9 = (select surnom from participe where avatar = 'citrouille') intersect
(select surnom from participe where avatar = 'schtroumpf');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..269.56 rows=269 width=72) (actual time=4.613..4.650 rows=101 loops=1)
-> Append (cost=0.00..267.62 rows=775 width=72) (actual time=0.036..4.228 rows=775 loops=1)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..131.06 rows=306 width=11) (actual time=0.035..1.967 rows=306 loops=1)
-> Seq Scan on participe (cost=0.00..128.00 rows=306 width=7) (actual time=0.032..1.879 rows=306 loops=1)
Filter: ((avatar)::text = 'citrouille'::text)
Rows Removed by Filter: 6494
-> Subquery Scan on "*SELECT* 2" (cost=0.00..132.69 rows=469 width=11) (actual time=0.038..2.113 rows=469 loops=1)
-> Seq Scan on participe participe_1 (cost=0.00..128.00 rows=469 width=7) (actual time=0.037..1.984 rows=469 loops=1)
Filter: ((avatar)::text = 'schtroumpf'::text)
Rows Removed by Filter: 6331
Planning Time: 0.206 ms
Execution Time: 4.734 ms
(12 rows)
Q10 = select surnom from participe where avatar = 'citrouille' and
surnom in (select surnom from participe where avatar = 'schtroumpf');
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=133.86..267.52 rows=156 width=7) (actual time=2.357..4.267 rows=119 loops=1)
Hash Cond: ((participe.surnom)::text = (participe_1.surnom)::text)
-> Seq Scan on participe (cost=0.00..128.00 rows=306 width=7) (actual time=0.031..1.792 rows=306 loops=1)
Filter: ((avatar)::text = 'citrouille'::text)
Rows Removed by Filter: 6494
-> Hash (cost=128.00..128.00 rows=469 width=7) (actual time=2.276..2.277 rows=469 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Seq Scan on participe participe_1 (cost=0.00..128.00 rows=469 width=7) (actual time=0.024..2.051 rows=469 loops=1)
Filter: ((avatar)::text = 'schtroumpf'::text)
Rows Removed by Filter: 6331
Planning Time: 0.871 ms
Execution Time: 4.323 ms
(12 rows)
Q11 = select surnom from personne where not exists (
select avatar from participe p1 where not exists (
select * from participe p2 where p2.avatar =p1.avatar
and p2.surnom = personne.surnom));
Q12 = select surnom from personne where not exists (
select avatar from participe p1 where avatar not in (
select avatar from participe p2 where p2.surnom = personne.surnom));
Q13 = select surnom from personne where not exists (
select avatar from participe p1 where personne.surnom not in (
select surnom from participe p2 where p2.avatar =p1.avatar));
Q11 = select surnom from personne where not exists (
select avatar from participe p1 where not exists (
select * from participe p2 where p2.avatar =p1.avatar
and p2.surnom = personne.surnom));
Execution Time: 4.663 ms
Q12 = select surnom from personne where not exists (
select avatar from participe p1 where avatar not in (
select avatar from participe p2 where p2.surnom = personne.surnom));
Execution Time: 1017477.510 ms
Q13 = select surnom from personne where not exists (
select avatar from participe p1 where personne.surnom not in (
select surnom from participe p2 where p2.avatar =p1.avatar));
Execution Time: 674.501 ms
Q14 = Select surnom, age, date, avatar, entree from soiree natural join personne
natural join participe where entree >10 and age <20
QUERY PLAN
---------------------------------------------------------------------------
Hash Join (cost=31.79..168.95 rows=1364 width=26)
Hash Cond: ((participe.surnom)::text = (personne.surnom)::text)
-> Hash Join (cost=5.62..134.85 rows=3011 width=22)
Hash Cond: (participe.ids = soiree.ids)
-> Seq Scan on participe (cost=0.00..111.00 rows=6800 width=18)
-> Hash (cost=4.51..4.51 rows=89 width=12)
-> Seq Scan on soiree (cost=0.00..4.51 rows=89 width=12)
Execution Time: 3.248 ms
Filter: (entree > 10)
-> Hash (cost=20.50..20.50 rows=453 width=11)
-> Seq Scan on personne (cost=0.00..20.50 rows=453 width=11)
Filter: (age < 20)
(11 rows)
set join_collapse_limit to 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Hash Join (cost=213.00..953.86 rows=1364 width=26)
Hash Cond: ((soiree.ids = participe.ids) AND ((personne.surnom)::text = (participe.surnom)::text))
-> Nested Loop (cost=0.00..529.20 rows=40317 width=23)
-> Seq Scan on personne (cost=0.00..20.50 rows=453 width=11)
Filter: (age < 20)
-> Materialize (cost=0.00..4.96 rows=89 width=12)
-> Seq Scan on soiree (cost=0.00..4.51 rows=89 width=12)
Filter: (entree > 10) Execution Time: 20.448 ms
-> Hash (cost=111.00..111.00 rows=6800 width=18)
-> Seq Scan on participe (cost=0.00..111.00 rows=6800 width=18)
(10 rows)
leia-citrouille=> select * from pg_catalog.pg_stats where tablename='participe' and attname='ids';
-[ RECORD 1 ]----------+---------------------------------------------------------------------------------------------------------------
schemaname | public
tablename | participe
attname | ids
inherited |f
null_frac |0
avg_width |4
n_distinct | 201
most_common_vals | {145,141,62,188,11,59,78,104,140,162}
most_common_freqs | {0.0220588,0.00882353,0.00867647,0.00867647,0.00852941,0.00852941,0.00852941,0.00
histogram_bounds | {0,24,41,64,80,95,115,134,159,180,200}
correlation | 0.994659
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |