Lecture 3: SQL cont.
Outline
Unions, intersections, differences
(6.2.5, 6.4.2)
Subqueries (6.3)
Aggregations (6.4.3 6.4.6)
Hint for reading the textbook: read the entire chapter 6 !
Recommended reading from SQL for Nerds: chapter 4, More
complex queries (you will find it very useful for subqueries)
2
Cartesian Product
SELECT
SELECT **
FROM
FROM Games
Games
Game
Color
Baketball
Footbal
Tennis
SELECT
SELECT **
FROM
FROM Colors
Colors
Color_ID
Color_Nam
e
RED
GREEN
Cartesian Product
SELECT
SELECT **
FROM
FROM Games,
Games,Colors
Colors
Game
Color
Color_ID
Color_Name
Baketball
RED
Baketball
GREEN
Footbal
RED
Footbal
GREEN
Tennis
RED
Tennis
GREEN
4
Cartesian Product
SELECT
SELECT **
FROM
FROM Games,
Games,Colors
Colors
WHERE
WHEREColor
Color==Color_ID
Color_ID
Game
Color
Color_ID
Color_Name
Baketball
RED
Baketball
GREEN
Footbal
RED
Footbal
GREEN
Tennis
RED
Tennis
GREEN
5
Cartesian Product
SELECT
SELECT **
FROM
FROM Games,
Games,Colors
Colors
WHERE
WHEREColor
Color==Color_ID
Color_ID
Game
Color
Color_ID
Color_Name
Baketball
RED
Footbal
GREEN
Tennis
RED
Renaming Columns
Product
SELECT
SELECT
FROM
FROM
WHERE
WHERE
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
Pname
PnameAS
ASprodName,
prodName,Price
PriceAS
ASaskPrice
askPrice
Product
Product
Price
Price>>100
100
Query with
renaming
prodName
askPrice
SingleTouch
$149.99
MultiTouch
$203.99
7
Union, Intersection, Difference
(SELECT
(SELECT name
name
FROM
FROM Person
Person
WHERE
WHERE City=Seattle)
City=Seattle)
UNION
UNION
(SELECT
(SELECT name
name
FROM
FROM Person,
Person,Purchase
Purchase
WHERE
WHERE buyer=name
buyer=nameAND
ANDstore=The
store=TheBon)
Bon)
Similarly, you can use INTERSECT and EXCEPT.
8
You must have the same attribute names (otherwise:
rename)
First Unintuitive SQLism
SELECT
SELECT DISTINCT
DISTINCTR.A
R.A
FROM
FROM R,
R,S,
S,TT
WHERE
WHERE R.A=S.A
R.A=S.A OR
OR R.A=T.A
R.A=T.A
Looking for R (S U T)
But what happens if T is empty?
9
((SELECT
SELECTR.A
R.AFROM
FROMRR))
INTERSECT
INTERSECT
((
((SELECT
SELECTS.A
S.AFROM
FROMSS))
UNION
UNION
((SELECT
SELECTT.A
T.AFROM
FROMTT))
))
10
Conserving Duplicates
(SELECT
(SELECT name
name
FROM
FROM Person
Person
WHERE
WHERE City=Seattle)
City=Seattle)
UNION
UNION ALL
ALL
(SELECT
(SELECT name
name
FROM
FROM Person,
Person,Purchase
Purchase
WHERE
WHERE buyer=name
buyer=nameAND
ANDstore=The
store=TheBon)
Bon)
11
Subqueries (Static..)
A subquery producing a single value:
SELECT
SELECTPurchase.product
Purchase.product
FROM
FROM Purchase
Purchase
WHERE
WHERE buyer
buyer==
(SELECT
(SELECT name
name
FROM
FROM Person
Person
WHERE
WHERE ssn
ssn==123456789);
123456789);
In this case, the subquery returns one value.
If it returns more, its a run-time error
12
Can say the same thing without a subquery:
SELECT
SELECTPurchase.product
Purchase.product
FROM
FROM Purchase,
Purchase,Person
Person
WHERE
WHERE buyer
buyer==name
nameAND
ANDssn
ssn==123456789
123456789
This is equivalent to the previous one when the ssn is a key
and 123456789 exists in the database;
otherwise they are different.
Why??
13
Subqueries Returning Relations
Find companies who manufacture products bought by Joe Blow.
SELECT
SELECT Company.name
Company.name
FROM
FROM Company,
Company,Product
Product
WHERE
WHERE Company.name=Product.maker
Company.name=Product.maker
AND
AND Product.name
Product.name IN
IN
(SELECT
(SELECTPurchase.product
Purchase.product
FROM
FROM Purchase
Purchase
WHERE
WHEREPurchase.buyer
Purchase.buyer==Joe
JoeBlow);
Blow);
Here the subquery returns a set of values: no more
runtime errors
14
Subqueries Returning Relations
Equivalent to:
SELECT
SELECT Company.name
Company.name
FROM
FROM Company,
Company,Product,
Product,Purchase
Purchase
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND Product.name
Product.name ==Purchase.product
Purchase.product
AND
AND Purchase.buyer
Purchase.buyer==Joe
JoeBlow
Blow
Is this query equivalent to the previous one ?
Beware of duplicates !
15
Removing Duplicates
SELECT
SELECT Company.name
Company.name
FROM
FROM Company,
Company,Product,
Product,Purchase
Purchase
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND Product.name
Product.name ==Purchase.product
Purchase.product
AND
ANDPurchase.buyer
Purchase.buyer==Joe
JoeBlow
Blow
Multiple copies
SELECT
SELECTDISTINCT
DISTINCTCompany.name
Company.name
FROM
FROM Company,
Company,Product,
Product,Purchase
Purchase
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND Product.name
Product.name ==Purchase.product
Purchase.product
AND
ANDPurchase.buyer
Purchase.buyer==Joe
JoeBlow
Blow
Single copies
16
Removing Duplicates
SELECT
SELECTDISTINCT
DISTINCTCompany.name
Company.name
FROM
FROM Company,
Company,Product
Product
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND Product.name
Product.name IN
IN
(SELECT
(SELECTPurchase.product
Purchase.product
FROM
FROM Purchase
Purchase
WHERE
WHEREPurchase.buyer
Purchase.buyer==Joe
JoeBlow)
Blow)
SELECT
SELECTDISTINCT
DISTINCTCompany.name
Company.name
FROM
FROM Company,
Company,Product,
Product,Purchase
Purchase
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND Product.name
Product.name ==Purchase.product
Purchase.product
AND
AND Purchase.buyer
Purchase.buyer==Joe
JoeBlow
Blow 17
Now
they are
equivalent
Subqueries Returning Relations
You can also use: s > ALL R
s > ANY R
EXISTS R
Product ( pname, price, category, maker)
Find products that are more expensive than all those produced
By Gizmo-Works
SELECT
SELECT name
name
FROM
FROM Product
Product
WHERE
WHERE price
price>> ALL
ALL(SELECT
(SELECTprice
price
FROM
FROM Purchase
Purchase
WHERE
WHERE maker=Gizmo-Works)
maker=Gizmo-Works)
18
Question for Database Fans
and their Friends
Can we express this query as a single SELECTFROM-WHERE query, without subqueries ?
Hint: show that all SFW queries are monotone
(figure out what this means). A query with ALL
is not monotone
19
Conditions on Tuples
SELECT
SELECT DISTINCT
DISTINCTCompany.name
Company.name
FROM
FROM Company,
Company,Product
Product
WHERE
WHERE Company.name=
Company.name=Product.maker
Product.maker
AND
AND (Product.name,price)
(Product.name,price) IN
IN
(SELECT
(SELECTPurchase.product,
Purchase.product,Purchase.price)
Purchase.price)
FROM
FROM Purchase
Purchase
WHERE
WHEREPurchase.buyer
Purchase.buyer==Joe
JoeBlow);
Blow);
May not work in MySQL...
20
Correlated Queries
Movie (title, year, director, length)
Find movies whose title appears more than once.
correlation
SELECT
SELECTDISTINCT
DISTINCTtitle
title
FROM
FROM Movie
MovieAS
ASxx
WHERE
WHERE year
year<>
<>ANY
ANY
(SELECT
(SELECT year
year
FROM
FROM Movie
Movie
WHERE
WHERE title
title== x.title);
x.title);
Note (1) scope of variables (2) this can still be expressed as single SFW
21
Complex Correlated Query
Product ( pname, price, category, maker, year)
Find products (and their manufacturers) that are more expensive
than all products made by the same manufacturer before 1972
SELECT
SELECTDISTINCT
DISTINCT pname,
pname,maker
maker
FROM
FROM Product
ProductAS
ASxx
WHERE
WHERE price
price>>ALL
ALL (SELECT
(SELECT price
price
FROM
FROM Product
ProductAS
ASyy
WHERE
WHERE x.maker
x.maker==y.maker
y.makerAND
ANDy.year
y.year<<1972);
1972);
Powerful, but much harder to optimize !
22
Aggregation
SELECT
SELECT
FROM
FROM
WHERE
WHERE
AVG(price)
AVG(price)
Product
Product
maker=Toyota
maker=Toyota
SQL supports several aggregation operations:
SUM, MIN, MAX, AVG, COUNT
23
Aggregation: Count
SELECT
SELECT COUNT(*)
COUNT(*)
FROM
FROM Product
Product
WHERE
WHERE year
year>>1995
1995
Except COUNT, all aggregations apply to a single attribute
24
Aggregation: Count
COUNT applies to duplicates, unless otherwise stated:
SELECT Count(category)
FROM Product
WHERE year > 1995
same as Count(*)
Better:
SELECT Count(DISTINCT category)
FROM Product
WHERE year > 1995
25
Simple Aggregation
Purchase(product, date, price, quantity)
Example 1: find total sales for the entire database
SELECT Sum(price * quantity)
FROM
Purchase
Example 1: find total sales of bagels
SELECT Sum(price * quantity)
FROM
Purchase
WHERE product = bagel
26
Simple
Aggregations
Purchase
27
Grouping and Aggregation
Usually, we want aggregations on certain parts of the relation.
Purchase(product, date, price, quantity)
Example 2: find total sales after 10/1 per product.
SELECT
SELECT product,
product,Sum(price*quantity)
Sum(price*quantity)AS
ASTotalSales
TotalSales
FROM
Purchase
FROM
Purchase
WHERE
date
WHERE
date>>10/1
10/1
GROUP
GROUPBY
BYproduct
product
Lets see what this means
28
Grouping and Aggregation
1. Compute the FROM and WHERE clauses.
2. Group by the attributes in the GROUP BY
3. Select one tuple for every group (and apply aggregation)
SELECT can have (1) grouped attributes or (2) aggregates.
29
First compute the FROM-WHERE clauses
(date > 10/1) then GROUP BY product:
Product
Date
Price
Quantity
Banana
10/19
0.52
17
Banana
10/22
0.52
Bagel
10/20
0.85
20
Bagel
10/21
0.85
15
30
Then, aggregate
Product
TotalSales
Bagel
$29.75
Banana
$12.48
SELECT
SELECT product,
product,Sum(price*quantity)
Sum(price*quantity)AS
ASTotalSales
TotalSales
FROM
Purchase
FROM
Purchase
WHERE
date
WHERE
date>>10/1
10/1
GROUPBY
GROUPBY product
product
31
GROUP BY v.s. Nested Queries
SELECT
SELECT product,
product,Sum(price*quantity)
Sum(price*quantity)AS
ASTotalSales
TotalSales
FROM
Purchase
FROM
Purchase
WHERE
WHERE date
date>>10/1
10/1
GROUP
GROUPBY
BY product
product
SELECT
SELECTDISTINCT
DISTINCT x.product,
x.product,(SELECT
(SELECTSum(y.price*y.quantity)
Sum(y.price*y.quantity)
FROM
FROM Purchase
Purchaseyy
WHERE
WHEREx.product
x.product==y.product
y.product
AND
ANDy.date
y.date>>10/1)
10/1)
AS
ASTotalSales
TotalSales
FROM
Purchase
FROM
Purchasexx
WHERE
32
WHERE x.date
x.date>>10/1
10/1
Another Example
For every product, what is the total sales and max quantity sold?
SELECT
product,
SELECT
product,Sum(price
Sum(price**quantity)
quantity)AS
ASSumSales
SumSales
Max(quantity)
Max(quantity)AS
ASMaxQuantity
MaxQuantity
FROM
Purchase
FROM
Purchase
GROUP
GROUPBY
BYproduct
product
Product
SumSales
MaxQuantity
Banana
$12.48
17
Bagel
$29.75
20
33
HAVING Clause
Same query, except that we consider only products that had
at least 100 buyers.
SELECT
product,
SELECT
product,Sum(price
Sum(price**quantity)
quantity)
FROM
Purchase
FROM
Purchase
WHERE
date
WHERE
date>>9/1
9/1
GROUP
GROUPBY
BYproduct
product
HAVING
HAVING Sum(quantity)
Sum(quantity)>>100
100
HAVING clause contains conditions on aggregates.
34
General form of Grouping and
Aggregation
SELECT S
FROM
R1,,Rn
WHERE C1
GROUP BY a1,,ak
HAVING C2
S = may contain attributes a1,,ak and/or any aggregates
but NO OTHER ATTRIBUTES
C1 = is any condition on the attributes in R1,,Rn
C2 = is any condition on aggregate expressions
Why ?
35
General form of Grouping and
Aggregation
SELECT S
FROM
R1,,Rn
WHERE C1
GROUP BY a1,,ak
HAVING C2
Evaluation steps:
1.
Compute the FROM-WHERE part, obtain a table with all attributes
in R1,,Rn
2.
Group by the attributes a1,,ak
3.
4.
Compute the aggregates in C2 and keep only groups satisfying C2
Compute aggregates in S and return the result
36
Aggregation
Author(login,name)
Document(url, title)
Wrote(login,url)
Mentions(url,word)
37
Find all authors who wrote at least 10
documents:
Attempt 1: with nested queries
This is
SQL by
a novice
SELECT
SELECTDISTINCT
DISTINCTAuthor.name
Author.name
FROM
Author
FROM
Author
WHERE
count(SELECT
WHERE
count(SELECTWrote.url
Wrote.url
FROM
FROMWrote
Wrote
WHERE
WHEREAuthor.login=Wrote.login)
Author.login=Wrote.login)
>>10
10
38
Find all authors who wrote at least 10
documents:
Attempt 2: SQL style (with GROUP BY)
SELECT
SELECT Author.name
Author.name
FROM
Author,
FROM
Author,Wrote
Wrote
WHERE
WHERE Author.login=Wrote.login
Author.login=Wrote.login
GROUP
GROUPBY
BYAuthor.name
Author.name
HAVING
HAVING count(wrote.url)
count(wrote.url)>>10
10
This is
SQL by
an expert
39
No need for DISTINCT: automatically from GROUP
BY
Find all authors who have a vocabulary
over 10000 words:
SELECT
SELECT Author.name
Author.name
FROM
Author,
FROM
Author,Wrote,
Wrote,Mentions
Mentions
WHERE
WHERE Author.login=Wrote.login
Author.login=Wrote.loginAND
ANDWrote.url=Mentions.url
Wrote.url=Mentions.url
GROUP
GROUPBY
BY Author.name
Author.name
HAVING
HAVING count(distinct
count(distinctMentions.word)
Mentions.word)>>10000
10000
Look carefully at the last two queries: you may
be tempted to write them as a nested queries,
40
but in SQL we write them best with GROUP BY