Normalization
1
E-R Diagram for a University Enterprise
5th Normal Form
A relation R is in 5NF if for all join dependencies at least
one of the following holds.
(a) (R1, R2, ..., Rn) is a trivial join-dependency.
(b) Every Ri is a candidate key for R.
3
5th Normal Form
• A table is said to be in the 5NF iff it is in
4NF and every join dependency in it is
implied by the candidate keys.
• Sometimes its impossible to break the table into 2
tables, that is when you can use the rules of 5NF
to normalize.
• Generally a table in 4th NF is always in 5th NF, but
sometimes real world constraint will cause the
Relation to be not comply with 5th NF.
4
5th Normal Form(cont’)
• Join Dependencies: They are basically
generalization of MVD.
• A condition where the natural join of all its
projections results in the reconstruction of
R.
• If such a condition is present then that
relation should be replaced with the
tables that consist of its projections.
5
5th Normal Form(cont’)
The psychiatrist is able
to offer reimbursable
treatment to patients who
suffer from the given
condition and who are
insured by the given
insurer. Psychiatrist-to-
Insurer-to-Condition is
necessary in order to
model the situation
correctly.
6
5th Normal Form(cont’)
• Suppose, however, that the following rule
applies: When a psychiatrist is authorized
to offer reimbursable treatment to
patients insured by Insurer P, and the
psychiatrist is able to treat condition C,
then – in the event that the Insurer P
covers condition C – it must be true that
the psychiatrist is able to provide
treatment to patients who suffer from
condition C and are insured by Insurer P. 7
5th (cont’)
These are all the possible projections of the Previous table. And
if (R1 |X| R2) or (R2 |X| R3) or (R1 |X| R3) result in R then
there are MVD (4th NF), and if NJ of {R1, R2, R3} results in R
8
then JD exist and the original table is not in 5th NF
5th (cont’)
• Only in rare situations does a 4NF table
not conform to 5NF. These are situations
in which a complex real-world constraint
governing the valid combinations of
attribute values in the 4NF table is not
implicit in the structure of that table.
9
Fifth Normal Form
• A relation R is in 5NF – also called
projection-join normal form, if and only if
every nontrivial join dependency that is
satisfied by R is implied by the candidate
key(s) of R
• It is the most general form possible for
projection-based normalization
10
DKNF
• DKNF offers a complete solution to the
problem of avoiding modification
abnormalities
• Domain/key normal form (DKNF). A
key uniquely identifies each row in a
table.
• By enforcing key and domain restrictions,
the database is assured of being freed
from any modification inconsistency.
11
DKNF
• Ronald Fagin (1981) proved that if a
Relation is in DKNF then it is free from
any anomalies(redundancies). Including
the ones caused by FDs, MVDs, JDs.
• DKNF seems simple enough then why all
the hoopla about 1NF, 2NF, 3NF, BCNF,
4NF, 5NF
12
DKNF
DKNF not always achievable, and there
is no formal definition to verify if a relation
schema is in DKNF
In short, sets of single-theme tables will
most likely be in DKNF.
13
Denormalization
• Denormalization is said to be necessary to
improve performance
• Technically normalization is a model
concept, not related to stored files
• In practice, denormalization will speed up
some queries, and drag down others
14
SQL
15
Example Schema
• Sailors(sid: integer, sname: string, rating: integer, age: real);
• Boats(bid: integer, bname: string, color: string);
• Reserves(sid: integer, bid: integer, day: date);
R1 sid bid day
Example Instances
22 101 10/10/96
58 103 11/12/96
• “Sailors” and sid sname rating age
S1
“Reserves” relations for
22 dustin 7 45.0
our examples.
31 lubber 8 55.5
58 rusty 10 35.0
S2 sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
SQL Overview
• CREATE TABLE <name> ( <field> <domain>, … )
• INSERT INTO <name> (<field names>)
VALUES (<field values>)
• DELETE FROM <name>
WHERE <condition>
• UPDATE <name>
SET <field name> = <value>
WHERE <condition>
• SELECT <fields>
FROM <name>
WHERE <condition>
Foreign Keys in SQL
• Only students listed in the Students relation should
be allowed to enroll for courses.
CREATE TABLE Enrolled
(sid CHAR(20), cid CHAR(20), grade CHAR(2),
PRIMARY KEY (sid,cid),
FOREIGN KEY (sid) REFERENCES Students )
Enrolled
sid cid grade Students
sid name login age gpa
53666 Carnatic101 C
53666 Reggae203 B 53666 Jones jones@cs 18 3.4
53650 Topology112 A 53688 Smith smith@eecs 18 3.2
53666 History105 B 53650 Smith smith@math 19 3.8
Enforcing Referential Integrity
• Consider Students and Enrolled; sid in Enrolled is
a foreign key that references Students.
• What should be done if an Enrolled tuple with a
non-existent student id is inserted? (Reject it!)
• What should be done if a Students tuple is
deleted?
– Also delete all Enrolled tuples that refer to it.
– Disallow deletion of a Students tuple that is referred to.
– Set sid in Enrolled tuples that refer to it to a default sid.
– (In SQL, also: Set sid in Enrolled tuples that refer to it to
a special value null, denoting `unknown’ or
`inapplicable’.)
• Similar if primary key of Students tuple is updated.
Referential Integrity in SQL
• SQL/92 and SQL:1999
support all 4 options on
deletes and updates. CREATE TABLE Enrolled
– Default is NO ACTION (sid CHAR(20),
(delete/update is cid CHAR(20),
rejected) grade CHAR(2),
– CASCADE (also delete PRIMARY KEY (sid,cid),
all tuples that refer to FOREIGN KEY (sid)
deleted tuple) REFERENCES Students
– SET NULL / SET ON DELETE CASCADE
ON UPDATE SET DEFAULT )
DEFAULT (sets foreign
key value of referencing
tuple)
Banking Example
branch (branch_name, branch_city, assets)
customer (customer_name, customer_street, customer_city)
account (account_number, branch_name, balance)
loan (loan_number, branch_name, amount)
depositor (customer_name, account_number)
borrower (customer_name, loan_number)
The Rename Operation
• SQL allows renaming relations and attributes using the
as clause:
old-name as new-name
• E.g. Find the name, loan number and loan amount of
all customers; rename the column name loan_number
as loan_id.
select customer_name, borrower.loan_number as loan_id, amount
from borrower, loan
where borrower.loan_number = loan.loan_number
Tuple Variables
• Tuple variables are defined in the from clause via the
use of the as clause.
• Find the customer names and their loan numbers and
amount for all customers having a loan at some branch.
select customer_name, T.loan_number, [Link]
from borrower as T, loan as S
where T.loan_number = S.loan_number
Find the names of all branches that have greater assets than
some branch located in Brooklyn.
select distinct T.branch_name
from branch as T, branch as S
where [Link] > [Link] and S.branch_city = 'Brooklyn'
Keyword as is optional and may be omitted
borrower as T ≡ borrower T
Some database such as Oracle require as to be omitted
String Operations
• SQL includes a string-matching operator for comparisons on
character strings. The operator “like” uses patterns that are
described using two special characters:
– percent (%). The % character matches any substring.
– underscore (_). The _ character matches any character.
• Find the names of all customers whose street includes the
substring “Main”.
select customer_name
from customer
where customer_street like '% Main%'
• Match the name “Main%”
like 'Main\%' escape '\'
• SQL supports a variety of string operations such as
– concatenation (using “||”)
– converting from upper to lower case (and vice versa)
– finding string length, extracting substrings, etc.
Ordering the Display of Tuples
• List in alphabetic order the names of all
customers having a loan in Perryridge branch
select distinct customer_name
from borrower, loan
where borrower loan_number = loan.loan_number
and branch_name = 'Perryridge'
order by customer_name
• We may specify desc for descending order
or asc for ascending order, for each attribute;
ascending order is the default.
Example: order by customer_name desc
Set Operations
• The set operations union, intersect, and except
operate on relations and correspond to the relational
algebra operations
• Each of the above operations automatically
eliminates duplicates; to retain all duplicates use the
corresponding multiset versions union all, intersect
all and except all.
Suppose a tuple occurs m times in r and n times in
s, then, it occurs:
• m + n times in r union all s
• min(m,n) times in r intersect all s
• max(0, m – n) times in r except all s
Set Operations
Find all customers who have a loan, an account, or both:
(select customer_name from depositor)
union
(select customer_name from borrower)
Find all customers who have both a loan and an account.
(select customer_name from depositor)
intersect
(select customer_name from borrower)
Find all customers who have an account but no loan.
(select customer_name from depositor)
except
(select customer_name from borrower)
Queries With GROUP BY and
HAVING
SELECT [DISTINCT]
target-list
FROM relation-list
WHERE qualification
GROUP BY grouping-list
HAVING group-
• The target-listqualification
contains (i) attribute names (ii) terms with
aggregate operations (e.g., MIN ([Link])).
– The attribute list (i) must be a subset of grouping-list.
Intuitively, each answer tuple corresponds to a group, and
these attributes must have a single value per group. (A
group is a set of tuples that have the same value for all
attributes in grouping-list.)
Aggregate Functions – Having Clause
• Find the names of all branches where the average
account balance is more than $1,200.
select branch_name, avg (balance)
from account
group by branch_name
having avg (balance) > 1200
Note: predicates in the having clause are applied after the
formation of groups whereas predicates in the where
clause are applied before forming groups
Aggregate Functions – Having and Where
Clause
• Find the average balance for each customer who lives in
Harrison and has at least three accounts.
Select d.customer_name, avg (balance)
from account a , depositor d, customer c
Where d.account_number=a.account_number and
d.customer_name=c.customer_name and
customer_city=‘Harrison’
group by d.customer_name
having count(d.account_number) >= 3
Find age of the youngest sailor with age 18,
for each rating with at least 2 such sailors
Sailors instance:
SELECT [Link], MIN ([Link])
AS minage sid sname rating age
FROM Sailors S 22 dustin 7 45.0
WHERE [Link] >= 18 29 brutus 1 33.0
GROUP BY [Link] 31 lubber 8 55.5
HAVING COUNT (*) > 1 32 andy 8 25.5
58 rusty 10 35.0
64 horatio 7 35.0
rating minage 71 zorba 10 16.0
Answer relation: 3 25.5 74 horatio 9 35.0
7 35.0 85 art 3 25.5
8 25.5 95 bob 3 63.5
96 frodo 3 25.5
Find age of the youngest sailor with age 18,
for each rating with at least 2 such sailors.
rating age rating age
7 45.0 1 33.0
1 33.0 3 25.5
8 55.5 3 63.5 rating minage
8 25.5 3 25.5 3 25.5
10 35.0 7 45.0 7 35.0
7 35.0 7 35.0 8 25.5
10 16.0 8 55.5
9 35.0 8 25.5
3 25.5
9 35.0
3 63.5
10 35.0
3 25.5
Division
• Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who
have reserved all boats.
• Let A have x2 |fields,
x , yx
and Bonly field y:
y; By have
A
– A/B =
– i.e., A/B contains all x tuples (sailors) such that
for every y tuple (boat) in B, there is an xy tuple
in A.
– Or: If the set of y values (boats) associated with
an x value (sailor) in A contains all y values in B,
the x value is in A/B.
• In general, x and y can be any lists of fields; y is the
Examples of Division A/B
sno pno pno pno pno
s1 p1 p2 p2 p1
s1 p2 p4 p2
s1 p3 B1 p4
s1 p4
B2
s2 p1 sno B3
s2 p2 s1
s3 p2 s2 sno
s4 p2 s3 s1 sno
s4 p4 s4 s4 s1
A A/B1 A/B2 A/B3
Division (con’t)
Expressing A/B Using Basic Operators
• Division is not essential op; just a useful shorthand.
– (Also true of joins, but joins are so common that systems
implement joins specially.)
• Idea: For A/B, compute all x values that are not
`disqualified’ by some y value in B.
– x value is disqualified if by attaching y value from B, we
obtain an xy tuple that is not in A.
Disqualified x values: x (( x ( A) B) A)
A/B: x ( A) all disqualified tuples
Nested Subqueries
• SQL provides a mechanism for the nesting
of subqueries.
• A subquery is a select-from-where
expression that is nested within another
query.
• A common use of subqueries is to perform
tests for set membership, set
comparisons, and set cardinality.
(1)SELECT [Link]
FROM Sailors S
Division in SQL WHERE NOT EXISTS
((SELECT [Link]
FROM Boats B)
ind sailors who’ve reserved all boats. EXCEPT
(SELECT [Link]
• Let’s do it the hard way, FROM Reserves R
without EXCEPT: WHERE [Link]=[Link]))
(2) [Link]
ELECT
ROM Sailors S
HERE NOT EXISTS (SELECT [Link]
FROM Boats B
Sailors S such that ... WHERE NOT EXISTS (SELECT [Link]
FROM Reserves R
there is no boat B without ... WHERE [Link]=B.
AND [Link]=S
a Reserves tuple showing S reserved B
Null Values
• It is possible for tuples to have a null value, denoted by
null, for some of their attributes
• null signifies an unknown value or that a value does not
exist.
• The predicate is null can be used to check for null
values.
– Example: Find all loan number which appear in the
loan relation with null values for amount.
select loan_number
from loan
where amount is null
• The result of any arithmetic expression involving null is
null
– Example: 5 + null returns null
• However, aggregate functions simply ignore nulls
Null Values and Three Valued Logic
• Any comparison with null returns unknown
– Example: 5 < null or null <> null or null = null
• Three-valued logic using the truth value unknown:
– OR: (unknown or true) = true,
(unknown or false) = unknown
(unknown or unknown) = unknown
– AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
– NOT: (not unknown) = unknown
– “P is unknown” evaluates to true if predicate P evaluates
to unknown
• Result of where clause predicate is treated as false if
it evaluates to unknown
Null Values and Aggregates
• Total all loan amounts
select sum (amount )
from loan
– Above statement ignores null amounts
• All aggregate operations except
count(*) ignore tuples with null
values on the aggregated attributes.
Derived Relations
o SQL allows a subquery expression to be used in the from
clause
o Find the average account balance of those branches where
the average account balance is greater than $1200.
select branch_name, avg_balance
from (select branch_name, avg (balance)
from account
group by branch_name )
as branch_avg ( branch_name, avg_balance )
where avg_balance > 1200
o Note that we do not need to use the having clause, since
we compute the temporary (view) relation branch_avg in the
from clause, and the attributes of branch_avg can be used
directly in the where clause.
Correlated Subqueries
SELECT outer1, outer2, ...
FROM table1 alias1
WHERE outer1 operator
(SELECT inner1
FROM table2 alias2
WHERE alias1.outer2 =
alias2.inner1);
The subquery references a column from
a table in the parent query.
Using Correlated Subqueries
• Find all employees who make more
than the average salary in their
department.
SQL> SELECT empno, sal, deptno
2 FROM emp outer
Each time the outer query
is processed the
3 WHERE sal > (SELECT AVG(sal) inner query is
4 FROM emp inner evaluated.
5 WHERE [Link] = [Link]);
EMPNO
EMPNO SAL
SAL DEPTNO
DEPTNO
--------
-------- ---------
--------- ---------
---------
7839
7839 5000
5000 10
10
7698
7698 2850
2850 30
30
7566
7566 2975
2975 20
20
...
...
66 rows
rows selected.
selected.
View Definition
• A relation that is not of the conceptual model but
is made visible to a user as a “virtual relation” is
called a view.
• A view is defined using the create view statement
which has the form
create view v as < query expression >
where <query expression> is any legal SQL
expression. The view name is represented by v.
• Once a view is defined, the view name can be
used to refer to the virtual relation that the view
generates.
Example Queries
• A view consisting of branches and their customers
create view all_customer as
(select branch_name, customer_name
from depositor, account
where depositor.account_number = account.account_number )
union
(select branch_name, customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number )
Find all customers of the Perryridge branch
select customer_name
from all_customer
where branch_name = 'Perryridge'