Relational algebra
Query Algebra
Can use the relational algebra: an algebra of relations
Not seen by the user. User sees SQL
Makes it easier to manipulate relations, however:
o SQL uses “bags” (allows duplicate rows) not “sets”
SQL adds more operations:
o Aggregation
o Grouping
o Sorting
Different algebraic variations of a query are called plans
Relational Algebra
The results of an operator are themselves relations
The relational algebra forms a closed system, and operators can be composed:
Op3(Op2(Op1(R)))
Relational Algebra operator
Set operators: Union, intersection, difference; in SQL, UNION, INTERSECT and EXCEPT
Selection; in SQL a select with optional where clause
Projection: chooses columns; in SQL specify them in a select, rename them, and compute
them
Product: in SQL a join with no conditions
Join: in SQL several types of joins
Extension operators
Elimination of duplicates: turns a bag into a set; in SQL use distinct in the select clause
Grouping: in SQL use GROUP BY and aggregate functions
Sorting: in SQL use ORDER BY
An algebra of bags
R ⋃ S : a tuple is in the result the number of times it is in R plus the number of times in S
R ∩ S : a tuple is in the result the minimum of times it is in R and in S
R – S : a tuple is in the result the number of times it is in R minus the times it is in S, but not
fewer than 0 times
R and S must have the same schema
If R and S are sets
R ∩ S and R – S would give the same result as na algebra of sets
R ⋃ S would not: an element in both sets would appear twice in the result
In SQL
UNION, INTERSECT and EXCEPT eliminate duplicates from the result
Unless we specify the keyword ALL
select …. UNION ALL select ….
Selection operator
σc (R) : applies condition c to relation R
Result is the bag of rows that satisfies c
Condition c may involve:
Arithmetic,…., string operators, comparisons, and boolean operators (AND, …)
In SQL we can have sub-queries and operators such as EXISTS
Sub-queries should be expressed as joins in this relational algebra (cannot be expressed as
conditions)
Projection operator
πL (R) : projects the relation R into the list L
The list L can also rename attributes, and use string and numeric operations
The schema of the resulting relation is L
Note that the result may have duplicates even if R does not have
Product operator
R × S : the schema of the resulting relation has all attributes from R plus all from S (qualified
if they have the same name)
A tuple in the result is formed by a tuple of R concatenated by a tuple of S
And this for every tuple of R
Join operator
R ⋈ S : equivalent to πL ( sc (R x S))
Natural join: compares common columns of both tables using a condition c
o Should make sure common columns exist
o Projects only one of each equated attribute
Cartesian join: the product of two tables
θ-join: or equi-join if θis the = operator
o The equi-join comparing two attributes is the most common form of join (also called
inner join)
o Equated columns appear in the result
Renaming
ρ(R)
Used to rename result relations or columns
ρects ← credits(course)
Duplicate elimination
δ (R) : the result has no duplicates, as with SQL DISTINCT keyword
Converts a relation from a multiset (allows duplicates) to a set (no duplicates)
Sorting operator
τL (R)
Converts a bag to a list (implies ordering)
The tuples in relation R are sorted according to the sorting attributes (in ascending order by
default)
In SQL, ORDER BY the attributes in the list L
To order by descending order use the keyword DESC
If another operator is applied next, order is lost
Grouping
γL (R)
Aggregates tuples within a relation into subsets
Aggregation uses a function f in a single attribute (min, max, count, sum, avg)
The elements in L are either:
o A grouping attribute
o An aggregation function
Outer joins
Outer joins allow dangling tuples (no matching pair) to appear in the result by padding them
with ꓕ(NULL) values
Left outer joins keep dangling tuples from the left relation
Right outer joins keep dangling tuples from the right relation
Semi-join
R ⋉ S and R ⋊ S
Join and keep only the schema at left (right)
Anti-joins
R⊳S
Returns tuples of R having no matching in S
Also known as anti-semi-join, can be writen as
R⊳S=R−R⋉S
Division
R÷S
Returns tuples of R with a match in all tuples of S There is no corresponding SQL operator,
but there are several alternatives
Can answer questions with “give all tuples that match a given condition”
Can be translated as:
Properties of the operators
Natural join, multiplication, union and intersction are associative and commutative
o R ⋈ S ≡ R ⋈ S, (R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T)
o R × S ≡ R × S, (R × S) × T ≡ R × (S × T)
o R ⋃ S ≡ R ⋃ S, (R ⋃ S) ⋃ T ≡ R ⋃ (S ⋃ T)
o R ∩ S ≡ R ∩ S, (R ∩ S) ∩ T ≡ R ∩ (S ∩ T)
Selection is commutative:
o σ1 (σ 2 (R)) ≡ σ 2 (σ 1 (R))
Difference is neither associative nor commutative
Teta join may not be associative (natural join is)
Equivalence rules
Expression trees
Operators can be chained, resulting in expression trees
Leaves of the tree are names of relations
Interior nodes are operators
Execution proceeds bottom-up
There are generally equivalent trees
Example tree
A typical SPJ (select-project-join) expression
Write the relational algebra expression and the SQL of:
Transformations
Expression trees can be improved by the DBMS
Improvements depend on heuristics and the information from the catalog (size of columns,
number of tuples)
There is a special module in charge of optimizing queries: the query optimizer!