SQL: Part I
Introduction to Databases
CompSci 316 Fall 2014
2
Announcements (Tue. Sep. 15)
• Homework #1 due today 11:59pm
• Ben will have his office hours today 6-8pm in Link
(instead of tomorrow)
• This week only, for Homework #1
• Homework #2 to be posted on website tonight
3
SQL
• SQL: Structured Query Language
• Pronounced “S-Q-L” or “sequel”
• The standard query language supported by most DBMS
• A brief history
• IBM System R
• ANSI SQL89
• ANSI SQL92 (SQL2)
• ANSI SQL99 (SQL3)
• ANSI SQL 2003 (added OLAP, XML, etc.)
• ANSI SQL 2006 (added more XML)
• ANSI SQL 2008, …
4
Creating and dropping tables
• _
_ _
• _
• Examples
!"
#$# % $
& $ # '" '""
( '"
$# (
$# & $ #
$#
)) * % $ )) $ $% $ +
)) ,- $ . # +
)) ,- $ + + +++& $ #+++ /
)) 0 $ +++& +++
5
Basic queries: SFW statement
•, 1
2 ( 1
34
• Also called an SPJ (select-project-join) query
• Corresponds to (but not really equivalent to)
relational algebra query:
, ,…, !" × × ⋯×
6
Example: reading a table
•, 5 2 (
• Single-table query, so no cross product here
• 34 clause is optional
• 5 is a short hand for “all columns”
7
Example: selection and projection
• Name of users under 18
•, 2 ( 34 6'7
• When was Lisa born?
•, 8"'9)
2 (
34 : ; ;
•, list can contain expressions
• Can also use built-in functions such as , , , ,, etc.
• String literals (case sensitive) are enclosed in single
quotes
8
Example: join
• ID’s and names of groups with a user whose name
contains “Simpson”
•, & $ #+ & $ #+
2 ( ( & $ #
34 + : ( +
< ( + : & $ #+
< + /= ;>, # $ >;
• /= matches a string against a pattern
• > matches any sequence of 0 or more characters
• Okay to omit _ in _ _
if _ is unique
9
Example: rename
• ID’s of all pairs of users that belong to one group
• Relational algebra query:
.& !, .& !
' ( ) ⋈ .+ !, .+ !-∧- .& !/ .& ! ' ( )
• SQL:
, '+ , ' 8+ , 8
2 ( ( , ' ( , 8
34 '+ : 8+
< '+ ? 8+
• , keyword is completely optional
10
A more complicated example
• Names of all groups that Lisa and Ralph are both in
, +
2 ( ' 8 ( ' ( 8 & $ #
34 '+ : ; ; < 8+ : ; # ;
< '+ : '+ < 8+ : 8+
< '+ : + < 8+ : +
Tip: Write the 2 ( clause first, then 34 , and
then ,
11
Why SFW statements?
• Out of many possible ways of structuring SQL
statements, why did the designers choose
, )2 ()34 ?
• A large number of queries can be written using only
selection, projection, and cross product (or join)
• Any query that uses only these operators can be written
in a canonical form: 0 1 × ⋯×
• Example: 2. ,3.4 ⋈1 5 ⋈1 6.7 18 9
= 2. ,3.4,6.7 1 ∧1 ∧18 × 5 × 9
•, )2 ()34 captures this canonical form
12
Set versus bag semantics
• Set
• No duplicates
• Relational model and algebra use set semantics
• Bag
• Duplicates allowed
• Number of duplicates is significant
• SQL uses bag semantics by default
13
Set versus bag example
gid
+ !( )
#
$
Member uid gid
'98 # 1
'8! $
7@A
gid
7@A $ ,
2 ( ( #
9@B
$
9@B $
1 1
$
$
1
14
A case for bag semantics
• Efficiency
• Saves time of eliminating duplicates
• Which one is more useful?
• ;+< => )
•, 2 (
• The first query just returns all possible user ages
• The second query returns the user age distribution
• Besides, SQL provides the option of set semantics
with /, /< keyword
15
Forcing set semantics
• ID’s of all pairs of users that belong to one group
•, '+ , ' 8+ , 8
2 ( ( , ' ( , 8
34 '+ : 8+
< '+ ? 8+
• Say Lisa and Ralph are in both the book club and the student
government
•, /, /< '+ , ' 8+
, 8 1
• With /, /< , all duplicate ( ', 8) pairs are removed
from the output
16
Semantics of SFW
•, C /, /< D ? ? 1 ?
2 ( 1
34
• For each in :
For each in : … …
For each in :
If is true over , , …, :
Compute and output ? , ? , …, ? as a row
If /, /< is present
Eliminate duplicate rows in output
• , , …, are often called tuple variables
17
SQL set and bag operations
• </ <, E , /< ,
• Set semantics
• Duplicates in input tables, if any, are first eliminated
• Duplicates in result are also eliminated (for </ <)
• Exactly like set ∪, −, and ∩ in relational algebra
• </ < , E , /< ,
• Bag semantics
• Think of each row as having an implicit count (the
number of times it appears in the table)
• Bag union: sum up the counts from two tables
• Bag difference: proper-subtract the two counts
• Bag intersection: take the minimum of the two counts
18
Examples of bag operations
' 8
fruit fruit
## ##
## $
$ $
, 5 2 ( ' , 5 2 ( ' , 5 2 ( '
</ < E /< ,
, 5 2 ( 8 , 5 2 ( 8 , 5 2 ( 8
fruit fruit fruit
## ## ##
## $
$
##
$
$
19
Examples of set versus bag operations
Poke (uid1, uid2, timestamp)
• , ' 2 ( $F
E
, 8 2 ( $F
• Users who poked others but never got poked by others
• , ' 2 ( $F
E
, 8 2 ( $F
• Users who poked others more than others poke them
20
SQL features covered so far
•, )2 ()34 statements (select-project-
join queries)
• Set and bag operations
Next: how to nest SQL queries
21
Table expression
• Use query result as a table
• In set and bag operations, 2 ( clauses, etc.
• A way to “nest” queries
• Example: names of users who poked others more
than others poked them
•, /, /<
2 (
, ' , 2 ( $F
E
, 8 , 2 ( $F
,
34 + : +
22
Scalar subqueries
• A query that returns a single row can be used as a
value in 34 ,, , etc.
• Example: users at the same age as Bart
•, 5
2 ( What’s Bart’s age?
34 : ,
2 (
34 : ; ;
• Runtime error if subquery returns more than one row
• Under what condition will this error never occur?
• What if the subquery returns no rows?
• The answer is treated as a special value < , and the
comparison with < will fail
23
/< subqueries
• C /< > D ) checks if C is in the result of
> D )
• Example: users at the same age as (some) Bart
•, 5
2 ( What’s Bart’s age?
34 /< ,
2 (
34 : ; ;
24
E/, , subqueries
• E/, , > D ) checks if the result of
> D ) is non-empty
• Example: users at the same age as (some) Bart
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +
• This happens to be a correlated subquery—a subquery
that references tuple variables in surrounding queries
25
Semantics of subqueries
•, 5
2 ( ,
34 E/, , , 5 2 (
34 : ; ;
< : +
• For each row in
• Evaluate the subquery with the value of +
• If the result of the subquery is not empty, output +5
• The DBMS query optimizer may choose to process
the query in an equivalent, but more efficient way
(example?)
26
Scoping rule of subqueries
• To find out which table a column belongs to
• Start with the immediately surrounding query
• If not found, look in the one surrounding that; repeat if
necessary
• Use _ _ notation and ,
(renaming) to avoid confusion
27
Another example
•, 5 2 (
34 E/, ,
, 5 2 ( (
34 : +
< E/, ,
, 5 2 ( (
34 : +
< 6? +
• Users who join at least two groups
28
Quantified subqueries
• A quantified subquery can be used syntactically as a
value in a 34 condition
• Universal quantification (for all):
1 34 C > D ) 1
• True iff for all in the result of > D ) , C- -
• Existential quantification (exists):
1 34 C <G > D ) 1
• True iff there exists some in > D ) result such that
C- -
Beware
• In common parlance, “any” and “all” seem to be synonyms
• In SQL, <G really means “some”
29
Examples of quantified subqueries
• Which users are the most popular?
•, 5
2 (
34 #$# ?: , #$# 2 (
•, 5
2 (
34 <
#$# 6 <G , #$# 2 (
Use < to negate a condition
30
More ways to get the most popular
• Which users are the most popular?
•, 5
2 ( ,
34 < E/, ,
, 5 2 (
34 #$# ? +#$#
•, 5 2 (
34 < /<
, '+
2 ( , ' , 8
34 '+#$# 6 8+#$#
31
SQL features covered so far
•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Subqueries allow queries to be written in more
declarative ways (recall the “most popular” query)
• But in many cases they don’t add expressive power
• Try translating other forms of subqueries into [< ] E/, ,,
which in turn can be translated into join (and difference)
• Watch out for number of duplicates though
Next: aggregation and grouping
32
Aggregates
• Standard SQL aggregate functions: < , , (,
H&, (/<, ( E
• Example: number of users under 18, and their
average popularity
•, < 5 H& #$#
2 (
34 6 '7
• < 5 counts the number of rows
33
Aggregates with /, /<
• Example: How many users are in some group?
•, < /, /<
2 ( (
is equivalent to:
•, < 5
2 ( , /, /< 2 ( (
34
Grouping
•, 1 2 ( 1 34 1
& G > _ E_ >
• Example: compute average popularity for
each age group
•, H& #$#
2 (
& G
35
Semantics of & G
, 1 2 ( 1 34 1 & G 1
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the
values of & G columns
• Compute , for each group ( )
• For aggregation functions with /, /< inputs, first
eliminate duplicates within the group
Number of groups =
number of rows in the final output
36
Example of computing & G
, H& #$# 2 ( & G
uid name age pop
'98 '" "+I
Compute & G: group
7@A 7 "+A rows according to the values
'8! ( $ '" "+8 of & G columns
9@B # 7 "+!
uid name age pop
'98 '" "+I
Compute , '8! ( $ '" "+8
for each group 7@A 7 "+A
9@B # 7 "+!
age avg_pop
'" "+@@
7 "+@"
37
Aggregates with no & G
• An aggregate query with no & G clause =
all rows go into one group
, H& #$# 2 (
Group all rows Aggregate over
into one group the whole group
uid name age pop uid name age pop
'98 '" "+I '98 '" "+I avg_pop
7@A 7 "+A 7@A 7 "+A "+@8@
'8! ( $ '" "+8 '8! ( $ '" "+8
9@B # 7 "+! 9@B # 7 "+!
38
Restriction on ,
• If a query uses aggregation/group by, then every
column referenced in , must be either
• Aggregated, or
• A& G column
This restriction ensures that any ,
expression produces only one value for each group
39
Examples of invalid queries
•,
2 ( & G
• Recall there is one output row per group
• There can be multiple uid values per group
•, ( E #$# 2 (
• Recall there is only one group for an aggregate query
with no & G clause
• There can be multiple uid values
• Wishful thinking (that the output uid value is the one
associated with the highest popularity) does NOT work
Another way of writing the “most popular” query?
40
4 H/<&
• Used to filter groups based on the group properties
(e.g., aggregate values, & G column values)
•, 1 2 ( 1 34 1 & G 1
4 H/<&
• Compute 2 ( (×)
• Compute 34 ( )
• Compute & G: group rows according to the values
of & G columns
• Compute 4 H/<& (another over the groups)
• Compute , ( ) for each group that passes
4 H/<&
41
4 H/<& examples
• List the average popularity for each age group with
more than a hundred users
•, H& #$#
2 (
& G
4 H/<& < 5 ? '""
• Can be written using 34 and table expressions
• Find average popularity for each age group over 10
•, H& #$#
2 (
& G
4 H/<& ? '"
• Can be written using 34 without table expressions
42
SQL features covered so far
•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Aggregation and grouping
• More expressive power than relational algebra
Next: ordering output rows
43
G
•, C /, /< D 1
2 ( 1 34 1 & G 1 4 H/<& 1
G _ C , J , D 1
• , = ascending, , = descending
• Semantics: After , list has been computed
and optional duplicate elimination has been carried
out, sort the output according to G
specification
44
G example
• List all users, sort them by popularity (descending)
and name (ascending)
•, #$#
2 (
G #$# ,
• , is the default option
• Strictly speaking, only output columns can appear in
G clause (although some DBMS support more)
• Can use sequence numbers instead of names to refer to
output columns: G 9 , 8
45
SQL features covered so far
•, )2 ()34 statements
• Set and bag operations
• Table expressions, subqueries
• Aggregation and grouping
• Ordering
Next: < ’s, outerjoins, data modification,
constraints, …