Relation Definition
CSE 444: Database Internals
Database is collection of relations
Relation R is subset of S1 x S2 x x Sn
Where Si is the domain of attribute i
n is number of attributes of the relation
Lecture 2
Review of the
Relational Model and SQL
Relation is basically a table with rows & columns
SQL uses word table to refer to relations
Magda Balazinska - CSE 444, Spring 2011
Magda Balazinska - CSE 444, Spring 2011
Properties of a Relation
More Definitions
Each row represents an n-tuple of R
Ordering of rows is immaterial
All rows are distinct
Ordering of columns is significant
Relation schema: describes column heads
Relation name
Name of each field (or column, or attribute)
Domain of each field
Because two columns can have same domain
But columns are labeled so
Applications need not worry about order
They can simply use the names
Degree (or arity) of relation: nb attributes
Domain of each column is a primitive type
Database schema: set of all relation schemas
Relation consists of a relation schema and instance
Magda Balazinska - CSE 444, Spring 2011
Magda Balazinska - CSE 444, Spring 2011
Even More Definitions
Example
Relation schema
Relation instance: concrete table content
Supplier(sno: integer, sname: string, scity: string, sstate: string)
Set of tuples (also called records) matching the
schema
Relation instance
Cardinality of relation instance: nb tuples
Database instance: set of all relation
instances
Magda Balazinska - CSE 444, Spring 2011
sno
sname
scity
sstate
s1
city 1
WA
s2
city 1
WA
s3
city 2
MA
s4
city 2
MA
Magda Balazinska - CSE 444, Spring 2011
Integrity Constraints
Key Constraints
Key constraint: certain minimal subset of fields
is a unique identifier for a tuple
Integrity constraint
Condition specified on a database schema
Restricts data that can be stored in db instance
Candidate key
Minimal set of fields
That uniquely identify each tuple in a relation
DBMS enforces integrity constraints
Ensures only legal database instances exist
Simplest form of constraint is domain constraint
Attribute values must come from attribute domain
Magda Balazinska - CSE 444, Spring 2011
Foreign Key Constraints
Foreign key
Field that refers to tuples in another relation
Typically, this field refers to the primary key of other
relation
Can pick another field as well
Key Constraint SQL Examples
Magda Balazinska - CSE 444, Spring 2011
CREATE TABLE Part (
pno integer,
pname varchar(20),
psize integer,
pcolor varchar(20),
PRIMARY KEY (pno)
);
Magda Balazinska - CSE 444, Spring 2011
10
Key Constraint SQL Examples
CREATE TABLE Supply(
sno integer,
pno integer,
qty integer,
price integer
);
Magda Balazinska - CSE 444, Spring 2011
One candidate key can be selected as primary key
Key Constraint SQL Examples
A relation can refer to a tuple in another relation
Magda Balazinska - CSE 444, Spring 2011
Primary key
CREATE TABLE Supply(
sno integer,
pno integer,
qty integer,
price integer,
PRIMARY KEY (sno,pno)
);
11
Magda Balazinska - CSE 444, Spring 2011
12
Key Constraint SQL Examples
Key Constraint SQL Examples
CREATE TABLE Supply(
sno integer,
pno integer,
qty integer,
price integer,
PRIMARY KEY (sno,pno),
FOREIGN KEY (sno) REFERENCES Supplier,
FOREIGN KEY (pno) REFERENCES Part
);
Magda Balazinska - CSE 444, Spring 2011
CREATE TABLE Supply(
sno integer,
pno integer,
qty integer,
price integer,
PRIMARY KEY (sno,pno),
FOREIGN KEY (sno) REFERENCES Supplier
ON DELETE NO ACTION,
FOREIGN KEY (pno) REFERENCES Part
ON DELETE CASCADE
);
Magda Balazinska - CSE 444, Spring 2011
13
General Constraints
14
Relational Queries
Table constraints serve to express complex
constraints over a single table
Query inputs and outputs are relations
CREATE TABLE Part (
pno integer,
pname varchar(20),
psize integer,
pcolor varchar(20),
PRIMARY KEY (pno),
CHECK ( psize > 0 )
);
Query evaluation
Input: instances of input relations
Output: instance of output relation
Note: Also possible to create constraints over many tables
Magda Balazinska - CSE 444, Spring 2011
Magda Balazinska - CSE 444, Spring 2011
15
Relational Algebra
Relational Operators
Query language associated with relational model
Selection: condition(S)
Condition is Boolean combination (,) of terms
Term is: attr. op constant, attr. op attr.
Op is: <, <=, =, , >=, or >
Queries specified in an operational manner
A query gives a step-by-step procedure
Relational operators
Take one or two relation instances as argument
Return one relation instance as result
Easy to compose into relational algebra expressions
Magda Balazinska - CSE 444, Spring 2011
16
17
Projection: list-of-attributes(S)
Union (), Intersection (), Set difference (),
Cross-product or cartesian product ()
Join: R S = (R S)
Division: R/S, Rename (R(F),E)
Magda Balazinska - CSE 444, Spring 2011
18
Selection & Projection Examples
Patient
Relational Operators
zip,disease(Patient)
no
name
zip
disease
zip
disease
p1
98125
flu
98125
flu
p2
98125
heart
98125
heart
p3
98120
lung
98120
lung
p4
98120
heart
98120
heart
disease=heart(Patient)
Selection: condition(S)
Condition is Boolean combination (,) of terms
Term is: attr. op constant, attr. op attr.
Op is: <, <=, =, , >=, or >
zip (disease=heart(Patient))
no
name
zip
disease
zip
p2
98125
heart
98120
p4
98120
heart
98125
Magda Balazinska - CSE 444, Spring 2011
Projection: list-of-attributes(S)
Union (), Intersection (), Set difference (),
Cross-product or cartesian product ()
Join: R S = (R S)
Division: R/S, Rename (R(F),E)
Magda Balazinska - CSE 444, Spring 2011
19
Cross-Product Example
AnonPatient P
Different Types of Join
Voters V
age
zip
disease
name
age
zip
54
98125
heart
p1
54
98125
20
98120
flu
p2
20
98120
Theta-join: R
P.zip
disease
name
V.age
V.zip
54
98125
heart
p1
54
98125
54
98125
heart
p2
20
98120
20
98120
flu
p1
54
98125
20
98120
flu
p2
20
98120
Magda Balazinska - CSE 444, Spring 2011
Equijoin: R
= (R x S)
= A ((R x S))
Join condition consists only of equalities
Projection A drops all redundant attributes
Natural join: R
S = A ((R x S))
Equijoin
Equality on all fields with same name in R and in S
Magda Balazinska - CSE 444, Spring 2011
21
Theta-Join Example
AnonPatient P
Join of R and S with a join condition
Cross-product followed by selection
P V
P.age
20
22
Equijoin Example
AnonPatient P
Voters V
Voters V
age
zip
disease
name
age
zip
age
zip
disease
name
age
zip
54
98125
heart
p1
54
98125
54
98125
heart
p1
54
98125
20
98120
flu
p2
20
98120
20
98120
flu
p2
20
98120
P.age=V.age P.zip=A.zip P.age < 50
P.age
P.zip
disease
name
V.age
V.zip
20
98120
flu
p2
20
98120
Magda Balazinska - CSE 444, Spring 2011
23
P.age=V.age
age
P.zip
disease
name
V.zip
54
98125
heart
p1
98125
20
98120
flu
p2
98120
Magda Balazinska - CSE 444, Spring 2011
24
Natural Join Example
AnonPatient P
Voters V
age
zip
disease
name
age
zip
54
98125
heart
p1
54
98125
20
98120
flu
p2
20
98120
More Joins
Outer join
Include tuples with no matches in the output
Use NULL values for missing attributes
age
zip
disease
name
54
98125
heart
p1
20
98120
flu
p2
Variants
Left outer join
Right outer join
Full outer join
Magda Balazinska - CSE 444, Spring 2011
25
Outer Join Example
AnonPatient P
zip
disease
name
age
zip
54
98125
heart
p1
54
98125
20
98120
flu
p2
20
98120
33
98120
lung
Q1: Names of patients who have heart disease
name(Voter (disease=heart (AnonPatient))
age
zip
disease
name
54
98125
heart
p1
20
98120
flu
p2
33
98120
lung
null
Magda Balazinska - CSE 444, Spring 2011
26
Example of Algebra Queries
Voters V
age
Magda Balazinska - CSE 444, Spring 2011
27
Magda Balazinska - CSE 444, Spring 2011
28
Extended Operators
of Relational Algebra
More Examples
Duplicate elimination ()
Relations
Supplier(sno,sname,scity,sstate)!
!Part(pno,pname,psize,pcolor)!
!Supply(sno,pno,qty,price)!
Since commercial DBMSs operate on multisets
not sets
Aggregate operators ()
Min, max, sum, average, count
Q2: Name of supplier of parts with size greater than 10
sname(Supplier Supply (psize>10 (Part))
Grouping operators ()
Q3: Name of supplier of red parts or parts with size greater than 10
sname(Supplier Supply (psize>10 (Part) pcolor=red (Part) ) )
Partitions tuples of a relation into groups
Aggregates can then be applied to groups
Sort operator ()
(Many more examples in the book)
Magda Balazinska - CSE 444, Spring 2011
29
Magda Balazinska - CSE 444, Spring 2011
30
Structured Query Language: SQL
SQL Query
Basic form: (plus many many more bells and whistles)
Influenced by relational calculus (see 344)
Declarative query language
Multiple aspects of the language
SELECT <attributes>
FROM <one or more relations>
WHERE <conditions>
Data definition language
Statements to create, modify tables and views
Data manipulation language
Statements to issue queries, insert, delete data
More
Magda Balazinska - CSE 444, Spring 2011
Simple SQL Query
Product
32
Simple SQL Query
PName
Price
Category
Manufacturer
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
SingleTouch
$149.99
Photography
Canon
SingleTouch
$149.99
Photography
Canon
MultiTouch
$203.99
Household
Hitachi
MultiTouch
$203.99
Household
Hitachi
SELECT *
FROM
Product
WHERE category=Gadgets
selection
Magda Balazinska - CSE 444, Spring 2011
31
Product
SELECT PName, Price, Manufacturer
FROM
Product
WHERE Price > 100
PName
Price
Category
Manufacturer
Gizmo
$19.99
Gadgets
GizmoWorks
Powergizmo
$29.99
Gadgets
GizmoWorks
Magda Balazinska - CSE 444, Spring 2011
33
Details
selection and
projection
PName
Price
SingleTouch
$149.99
Manufacturer
Canon
MultiTouch
$203.99
Hitachi
Magda Balazinska - CSE 444, Spring 2011
34
Eliminating Duplicates
Category
Case insensitive:
SELECT DISTINCT category
FROM Product
Same: SELECT Select select
Same: Product product
Different: Seattle seattle
Gadgets
Photography
Household
Compare to:
Constants:
Category
abc - yes
abc - no
SELECT category
FROM Product
Gadgets
Gadgets
Photography
Household
Magda Balazinska - CSE 444, Spring 2011
35
Magda Balazinska - CSE 444, Spring 2011
36
Ordering the Results
Joins
Product (pname, price, category, manufacturer)
Company (cname, stockPrice, country)
SELECT pname, price, manufacturer
FROM Product
WHERE category=gizmo AND price > 50
ORDER BY price, pname
Find all products under $200 manufactured in Japan;
return their names and prices.
Ties are broken by the second attribute on the ORDER BY list, etc.
Ordering is ascending, unless you specify the DESC keyword.
Magda Balazinska - CSE 444, Spring 2011
37
Tuple Variables
Person(pname, address, worksfor)
Company(cname, address)
SELECT DISTINCT pname, address
FROM
Person, Company
WHERE worksfor = cname
SELECT PName, Price
FROM
Product, Company
WHERE Manufacturer=CName AND Country=Japan
AND Price <= 200
Magda Balazinska - CSE 444, Spring 2011
38
Nested Queries
Nested query
Which
address ?
Why do we need them?
Enables to refer to a table that must itself be computed
SELECT DISTINCT Person.pname, Company.address
FROM
Person, Company
WHERE Person.worksfor = Company.cname
SELECT DISTINCT x.pname, y.address
FROM
Person AS x, Company AS y
Magda Balazinska
444, Spring 2011
WHERE x.worksfor
=- CSE
y.cname
Query that has another query embedded within it
The embedded query is called a subquery
39
Subqueries Returning Relations
Company(name, city)
Product(pname, maker)
Purchase(id, product, buyer)
Return cities where one can find companies that manufacture
products bought by Joe Blow
SELECT Company.city
FROM Company
WHERE Company.name IN
(SELECT Product.maker
FROM Purchase , Product
WHERE Product.pname=Purchase.product
Magda Balazinska - CSE 444, Spring 2011
AND Purchase .buyer = Joe Blow);
Subqueries can appear in
WHERE clause (common)
FROM clause (less common)
HAVING clause (less common)
Magda Balazinska - CSE 444, Spring 2011
40
Subqueries Returning Relations
You can also use: s > ALL R
s > ANY R
EXISTS R
Product ( pname, price, category, maker)
Find products that are more expensive than all those produced
By Gizmo-Works
SELECT name
FROM Product
WHERE price > ALL (SELECT price
FROM Purchase
Magda Balazinska - CSE 444, Spring 2011
WHERE maker=Gizmo-Works)
Correlated Queries
Aggregation
Movie (title, year, director, length)
Find movies whose title appears more than once.
correlation
SELECT DISTINCT title
FROM Movie AS x
WHERE year <> ANY
(SELECT year
FROM Movie
WHERE title = x.title);
SELECT avg(price)
FROM
Product
WHERE maker=Toyota
SELECT count(*)
FROM Product
WHERE year > 1995
SQL supports several aggregation operations:
sum, count, min, max, avg
Except count, all aggregations apply to a single attribute
Note (1) scope of variables (2) this can still be expressed as single SFW
Magda Balazinska - CSE 444, Spring 2011
43
Magda Balazinska - CSE 444, Spring 2011
44
Grouping and Aggregation
SELECT S
FROM
R1,,Rn
WHERE C1
GROUP BY a1,,ak
HAVING C2
Conceptual evaluation steps:
1.
Evaluate FROM-WHERE, apply condition C1
2.
Group by the attributes a1,,ak
3.
Apply condition C2 to each group (may have aggregates)
4.
Compute aggregates in S and return the result
Read more about it in the book...
Magda Balazinska - CSE 444, Spring 2011
45