CSEN 501 – CSEN501 - Databases I
Lecture 5:
The Relational Algebra
Prof. Dr. Slim Abdennadher
Dr. Nada Sharaf
German University Cairo, Faculty of Media Engineering and Technology
©
Abdennadher GUC–MET
CSEN 501
Relational Data Manipulation Languages
Variety languages used by relational database
management systems
Procedural languages: The user tells the system how to
manipulate the data, e.g. Relational Algebra
Declarative languages: the user states what data is needed
but not exactly how it is to be located, e.g. Relational
Calculus and SQL
Graphical languages: allowing the user to give an example
or an illustration of what data should be found, e.g. QBE
©
Abdennadher GUC–MET
CSEN 501
Queries and query languages
Query: A question about the data in a database.
Query: A statement requesting the retrieval of information
from a database.
Example: Find the names of students who are taking
CSEN501.
Query language: language in which queries are
expressed.
Query languages versus programming languages!
Query languages are not intended to be used for complex
calculations.
Query languages support easy and efficient access to large
data sets.
©
Abdennadher GUC–MET
CSEN 501
Relational Algebra
Relational Algebra is a set of 6 operators that act on
tables to produce tables.
Just as we operate numbers with arithmetic, we operate on
tables with relational algebra.
Key to understanding SQL and query processing and
optimization.
SQL is, roughly speaking, a generalization of relational
algebra.
Internal languages: an SQL query is rewritten as relational
algebra expression, which can in turn be rewritten into a
more efficient form and evaluated using a bunch of well
developed algorithms.
©
Abdennadher GUC–MET
CSEN 501
Relational Algebra
A set of operations (functions), each of which takes a
relation (or relations) as input and produces a relation as
output.
Basic operations: using these we can build up
sophisticated database queries.
Projection
Selection
Union
Difference
Product
Renaming
Additional operations: Intersection, Join, Division.
©
Abdennadher GUC–MET
CSEN 501
Preliminaries
Review: a relational database is a collection of data.
A query is applied to relation instances and the result of a
query is also a relation instance.
Schemas of input relations for a query are fixed. The query
will run regardless of instances.
The schema for a result of a given query is also fixed. It will
be determined by the query.
Example schemas:
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
-------------------
©
Abdennadher GUC–MET
CSEN 501
Example Instances
Student: Enroll:
sid firstName gpa sid cid grade
1 Dina 1.0 1 501 A
2 Ahmed 2.3 2 502 A
3 Maria 0.7 3 501 C
4 Ali 1.0 3 502 B
Course
cid cname credits teacher
501 db 6 slim
502 tc 6 haytham
Question: Both Dina and Ali have a GPA of 1.0. Is Student
legal?
©
Abdennadher GUC–MET
CSEN 501
Projection
Given a list of column names A and a relation R.
πA (R): extracts the columns in A from the relation R.
Example:
sid sname gpa sid gpa
1 Dina 1.0 1 1.0
Student: 2 Ahmed 2.3 πsid,gpa (Student) 2 2.3
3 Maria 0.7 3 0.7
The result of the projection can be visualized as a vertical partition of
the relation into two relations.
Questions:
What is the schema of the result? Recall Student has
schema
Student(sid:int, sname:string, gpa:real)
What is the query (in English)?
©
Abdennadher GUC–MET
CSEN 501
Projection - continued
Suppose the result of πA (R) has duplicate values.
Example
Student πgpa (Student)
sid sname gpa
1 Dina 1.0 gpa
2 Ahmed 2.3 1.0
3 Maria 0.7 2.3
4 Ali 1.0 0.7
In relational algebra, the answer is always a set (has to
eliminate duplicates).
However, SQL and some other languages return, by
default, a bag (don’t eliminate duplicates)
©
Abdennadher GUC–MET
CSEN 501
Selection
Given a condition C and a relation R.
σC (R): extracts those rows from the relation R that satisfy
C.
Student σgpa≤2.0 (Student)
sid sname gpa
1 Dina 1.0 sid sname gpa
2 Ahmed 2.3 1 Dina 1.0
3 Maria 0.7 3 Maria 0.7
4 Ali 1.0 4 Ali 1.0
The result of the selection can be visualized as a horizontal
partition of the relation into two sets of tuples.
Questions:
What is the schema of the result? Recall Student has
schema Student(sid:int, sname:string, gpa:real)
What is the query (in English)?
©
Abdennadher GUC–MET
CSEN 501
Selection - What can go into the condition?
Condition C in σC (R) is built up from
Boolean operations on the field names: <; ≤; =; 6=; ≥; >.
Example: gpa ≤ 2.0, sname = Ali.
Predicates constructed from these using ∧ (and) , ∨ (or) , ¬
(not).
Question: What is the result of σgpa≤2.0∧sname=Ali (Student):
Student
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
σgpa≤2.0∧sname=Ali (Student)
sid sname gpa
4 Ali 1.0
©
Abdennadher GUC–MET
CSEN 501
Set Operations
Set operations: S ∪ T , S − T , S ∩ T
Union (S ∪ T ): a relation that includes all tuples that are
either in S or in T or in both S and T. Duplicate tuples are
eliminated.
Intersection (S ∩ T ): a relation that includes all tuples that
are in both S and T.
Difference (S − T ): a relation that includes all tuples that
are in S but not in T.
Condition: All these operations must be union-compatible:
Same number of fields (same arity).
Corresponding fields have the same domain (same type).
Question:
Recall Student and Course given above. Can we write
Student ∪ Course?
What is the schema of the result of a set operation?
©
Abdennadher GUC–MET
CSEN 501
Set Operations - Union
Student1 Student2
sid sname gpa sid sname gpa
1 Dina 1.0 1 Dina 1.0
2 Ahmed 2.3 2 Ahmed 2.3
3 Maria 0.7 3 Maria 0.7
4 Ali 1.0 5 Amira 1.0
Student1 ∪ Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
4 Ali 1.0
5 Amira 1.0
©
Abdennadher GUC–MET
CSEN 501
Set Operations - Intersection
Student1 Student2
sid sname gpa sid sname gpa
1 Dina 1.0 1 Dina 1.0
2 Ahmed 2.3 2 Ahmed 2.3
3 Maria 0.7 3 Maria 0.7
4 Ali 1.0 5 Amira 1.0
Student1 ∩ Student2
sid sname gpa
1 Dina 1.0
2 Ahmed 2.3
3 Maria 0.7
©
Abdennadher GUC–MET
CSEN 501
Set Operations - Difference
Student1 Student2
sid sname gpa sid sname gpa
1 Dina 1.0 1 Dina 1.0
2 Ahmed 2.3 2 Ahmed 2.3
3 Maria 0.7 3 Maria 0.7
4 Ali 1.0 5 Amira 1.0
Student1 − Student2
sid sname gpa
4 Ali 1.0
Student2 − Student1
sid sname gpa
5 Amira 1.0
©
Abdennadher GUC–MET
CSEN 501
Set Operations - Intersection
In relational algebra, basic set operations are union and
set difference only.
We can implement the other set operations using those
basic operations.
For example, for any relations S and T, we can already
express S ∩ T = S − (S − T )
It is mathematically nice to have fewer operators, however
operations like set difference may be less efficient than
intersection.
©
Abdennadher GUC–MET
CSEN 501
Product
Product S × T connects two relations S and T that are not
necessarily union-compatible.
Student:
sid sname gpa Course
cid cname credits teacher
1 Dina 1.0
501 db 6 slim
2 Ahmed 2.3
502 tc 6 haytham
3 Maria 0.7
Student × Course
sid sname gpa cid cname credits teacher
1 Dina 1.0 501 db 6 slim
2 Ahmed 2.3 501 db 6 slim
3 Maria 0.7 501 db 6 slim
1 Dina 1.0 502 tc 6 haytham
2 Ahmed 2.3 502 tc 6 haytham
3 Maria 0.7 502 tc 6 haytham
©
Abdennadher GUC–MET
CSEN 501
Cartesian Product
Each row of S is paired with each row of T.
Schema of the result has one field per field of S and T.
Example: The schema of Student × Course
(sid, sname, gpa, cid, cname, credits, teacher)
Question:
What is the primary key of S × T in general?
Answer: Primary key of S and primary key of T.
Cardinality: Suppose that S has n rows and T has m rows.
What is the cardinality of S × T?
Answer: n × m
©
Abdennadher GUC–MET
CSEN 501
Product - continued
What happens when we form a product of two relations
with columns having the same name?
Student Enroll:
sid sname gpa
sid cid grade
1 Dina 1.0
1 501 A
2 Ahmed 2.3
2 502 A
3 Maria 0.7
May vary among systems: Common answer is to suffix the
attribute names with 1 and 2:
Student×Enroll
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
2 Ahmed 2.3 1 501 A
3 Maria 0.7 1 501 A
1 Dina 1.0 2 502 A
... ... ... ... ... ...
©
Abdennadher GUC–MET
CSEN 501
Product - continued
Products are hardly used alone; they are typically used in
conjuction with a selection.
Example: σsid:1=sid:2∧cid=501 (Student × Enroll)
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
What does this query do (in English)?
Suppose we want to find the names and grades of
students who are taking 501. How to write the query?
πsname,grade (σsid:1=sid:2∧cid=501 (Student × Enroll))
sname grade
Dina A
©
Abdennadher GUC–MET
CSEN 501
Joins - Conditional Join
The combination of a selection and a join is so common
that it has a special symbol and name. S 1C T is defined
to be σC (S × T )
Example: Student1 1sid:1=sid:2 Enroll is
sid:1 sname gpa sid:2 cid grade
1 Dina 1.0 1 501 A
2 Ahmed 2.3 2 502 A
Questions:
What is the result schema? Assume that S(A1; . . . ; An) and
T (B1; . . . ; Bm), the join S 1C T results a relation with the
attributes (A1; . . . ; An; B1; . . . ; Bm)
Conditional join is in general more efficient than cross
product. Why?
The condition C in a conditional join is usually an equality
or conjunction of equalities (EquiJoin).
©
Abdennadher GUC–MET
CSEN 501
Natural Join
S 1 T : special case of conditional join, equality on
common fields of S and T
Equality condition only
On all common fields
Leave only one copy of these fields in the resulting relation.
Example:
S S1T
A B C T A B C D
A B D
1 2 a 1 2 a d
1 2 d
1 2 b 1 2 a e
1 2 e
1 3 c 1 2 b d
1 4 d
2 1 g 1 2 b e
Question: What if S and T have no fields in
common?Answer: Cartesian Product
©
Abdennadher GUC–MET
CSEN 501
Natural Join - Example
Student:
sid sname gpa Enroll
sid cid grade
1 Dina 1.0
1 501 A
2 Ahmed 2.3
2 502 A
3 Maria 0.7
Student 1 Enroll
sid sname gpa cid grade
1 Dina 1.0 501 A
2 Ahmed 2.3 502 A
©
Abdennadher GUC–MET
CSEN 501
Queries - Example (I)
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
------------------
1 Find the names of the students:
πsname (Student)
2 Find the courses taught by Slim
σteacher=Slim (Course)
3 Find the titles of courses taught by Slim
πcname (σteacher=Slim (Course))
These queries involve a single relation: Unary operations
The result of a query is also a relation and therefore can be used as
input of another query.
©
Abdennadher GUC–MET
CSEN 501
Queries - Example (II)
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
------------------
Find the names of students who are taking 501.
Two relations: use (natural) join or product
Fields: projection
Condition: selection
Solutions
πsname (σcid=501 (Student 1 Enroll))
πsname (Student 1 (σcid=501 (Enroll)))
©
Abdennadher GUC–MET
CSEN 501
Renaming - Another Operator
It is simpler to break down a complex sequence of
operations by specifying intermediate result relations.
Example:
πsname (Student 1 (σcid=501 (Enroll)))
is equivalent to
temp1 ← σcid=501 (Enroll)
temp2 ← Student 1 temp1
Result ← πsname (temp2)
The same technique can be used to rename the attributes
in the intermediate and result relations.
Example: R(firstName) ← πsname (temp2)
©
Abdennadher GUC–MET
CSEN 501
Renaming - Another Operator
General Rename operation when applied to a relation R of
degree n is denoted by one of the following three forms:
ρS(B1;...;Bn) (R): renames both the relation and the attributes
ρS (R): renames the relation only
ρ(B1;...;Bn) (R): renames the attributes only
ρ denotes the rename operator
S the new relation name
B1; . . . ; Bn the new attribute names
If the attributes of R are A1; . . . ; An in that order, then each
Ai is renamed as Bi .
©
Abdennadher GUC–MET
CSEN 501
Renaming
Sometimes, we have to use renaming
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
Find the pairs of sids (students) enrolled in the same
course
What does Enroll 1 Enroll return
πE1.sid,E2.sid (ρE1 Enroll 1E1.cid=E2.cid ρE1 Enroll)
©
Abdennadher GUC–MET
CSEN 501
Database Instance
sid sname gpa
cid cname credit ...
1 Dina 1.0
1 db 6 ...
2 Ahmed 2.3
2 tc 6 ...
3 Maria 0.7
3 media 4 ...
4 Omar 2.0
sid cid
1 1
1 3
Enroll: 2 1
2 2
2 3
3 1
What are the ids of students enrolled in ALL courses
All Students - Students not enrolled in some course(s)
©
Abdennadher GUC–MET
CSEN 501
Consider πsid,cid Student × Course vs. Enroll
Which records are different and which records are the
same
Consider a student taking all courses vs. a student missing
some courses
πsid,cid Student × Course:
sid cid
1 1
Enroll:
1 2 sid cid
1 3 1 1
2 1
2 2
1 3
2 3 2 1
3 1 2 2
3 2
3 3 2 3
4 1 3 1
4 2
4 3
©
Abdennadher GUC–MET
CSEN 501
Queries - Example (III)
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
Find the student ids of students who are taking all courses.
Build a relation with all possible pairs of sids and cids.
Allpairs ← be πsid (Student) × πcid (Course)
Find the set of (sid, cid) pairs for which sid is not taking cid.
Let temp1 ←Allpairs - π(sid,cid) (Enroll)
Find the set of sids of students who are not taking some course.
Let temp2 ←beπsid (temp1)
Find the set of sids of students who are taking all courses.
πsid (Student) − temp2
©
Abdennadher GUC–MET
CSEN 501
Division - Example
R
A B
a1 b1
a2 b1
a3 b1
a4 b1 S
A T=R/S
a1 b2 B
a1
a3 b2 b1
a2
a2 b3 b4
a3
a3 b3
a4 b3
a1 b4
a2 b4
a3 b4
©
Abdennadher GUC–MET
CSEN 501
Division - Another Operator
Find the sids of students who are taking all courses.
π(sid,cid) (Enroll)/πcid (Course)
In general: R/S
The schema of S must be a proper subset of the schema
of R, e.g. {cid} ⊂ {cid, sid}.
The schema of the result is the set difference of the
schema of R and the schema of S.
For every tuple t in the result and every tuple s in S, t s (t
appended onto s) is in the first relation R.
©
Abdennadher GUC–MET
CSEN 501
Division - Example
Student(sid:int, sname:string, gpa:real)
Course(cid:string, cname:string, credit:integer,
teacher:string)
Enroll(sid:int, cid:string, grade:string)
Find the names of students who are taking all courses.
ρ(tempid, π(sid,cid) (Enroll)/πcid (Course)
πsname (Student 1 tempid)
Find the names of students who are taking all courses
taught by Slim.
ρ(tempid, π(sid,cid) (Enroll)/πcid (σ(teacher=Slim) (Course)
πsname (Student 1 tempid)
©
Abdennadher GUC–MET
CSEN 501
What We Cannot Compute with Relational Algebra?
Arithmetic operations, e.g, 3 + 3.
Aggregate, e.g. the number of students who are taking
CSEN501, or the average GPA of all students.
IN SQL, these are possible - SQL has numerous
extensions to relational algebra.
Recursive queries: given a relation parent() compute the
ancestor. These are not possible in SQL either.
Complex structures, e.g. lists, arrays, nested relations, . . .
SQL cannot handle complex structures either, but they are
possible in object-oriented data models and query
languages.
©
Abdennadher GUC–MET
CSEN 501
Outer Joins
Inner joins only show relevant tuples
In other words, a student not enrolled in any course will
never be part of the output of the query Student 1 Enroll
However, we might still need to see the data of this student.
But what would be put in place of the cid NULL
You have to decide which side do you need the full
information from.
It is a normal join i.e. you have to think of the correct
joining condition
©
Abdennadher GUC–MET
CSEN 501
Left Outer Join
Student ./ Enroll
sid sname gpa cid grade
1 dina 1.0 1 a
1 dina 1.0 3 b
2 ahmed 2.3 1 a
2 ahmed 2.3 2 a
2 ahmed 2.3 3 b
3 Maria 0.7 1 a
4 omar 3.0 NULL NULL
©
Abdennadher GUC–MET
CSEN 501
Right Outer Join
Enroll ./ Student
sid cid grade sname gpa
1 1 a dina 1.0
1 3 b dina 1.0
2 1 a ahmed 2.3
2 2 a ahmed 2.3
2 3 b ahmed 2.3
3 1 a Maria 0.7
4 NULL NULL omar 3.0
©
Abdennadher GUC–MET
CSEN 501
Full Outer Join
Shows full information from both sides
Can you think of an example where it is needed
Car ./ Person
©
Abdennadher GUC–MET
CSEN 501
Relational Schema Example
employee(fname, lname, ssn, bdate, address, sex,
salary, dnr, superssn)
department(dname, dnr, dMangssn)
departmentLocation(dnr,dlocation)
project(pname,pnr,plocation,dnr)
workson(ssn,pnr,hours)
dependent(ssn,dependentname,sex,bdate,relationship)
©
Abdennadher GUC–MET
CSEN 501
Summary - What you should remember!
What are query languages?
Relational Algebra: A set of operations (functions), each of
which takes a relation (or relations) as input and produces
a relation as output.
Basic Operations: projection, selection, union, difference,
product, renaming
Additional Operations: intersection, division, join (very
useful)
What we cannot do with relational algebra.
©
Abdennadher GUC–MET
CSEN 501