0% found this document useful (0 votes)
66 views108 pages

Unit 2

The document outlines the process of transforming an Entity-Relationship (ER) model into a relational database schema through ER to relational mapping. It details how entities, attributes, and relationships are converted into tables, columns, and constraints, while also covering concepts like primary keys, foreign keys, and integrity constraints. Additionally, it explains the structure and components of a relational database schema, emphasizing the importance of maintaining data integrity and consistency.

Uploaded by

mainak2003patra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views108 pages

Unit 2

The document outlines the process of transforming an Entity-Relationship (ER) model into a relational database schema through ER to relational mapping. It details how entities, attributes, and relationships are converted into tables, columns, and constraints, while also covering concepts like primary keys, foreign keys, and integrity constraints. Additionally, it explains the structure and components of a relational database schema, emphasizing the importance of maintaining data integrity and consistency.

Uploaded by

mainak2003patra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 108

Unit 2

Relational database design using ER to relational mapping


● Relational database, one common approach is to start with an Entity-Relationship
(ER) model and then convert it into a relational schema.
● The process of transforming an ER diagram to a relational schema is known as ER
to relational mapping.
● This process involves converting the entities, attributes, and relationships from the
ER diagram into tables, columns, and constraints in a relational database.
Mapping Entities to Relations (Tables):
● Entities in an ER diagram become relations (tables) in the relational schema.
● Each entity typically maps to a table, and the attributes of the entity become the
columns (attributes) of the table.

Steps:

● For each strong entity (an entity with a unique identifier), create a table where:
○ The primary key of the entity becomes the primary key of the table.
○ The attributes of the entity become the columns of the table.
Mapping Weak Entities to Relations

● A weak entity is an entity that cannot exist without a strong (owner) entity and relies on a
foreign key to identify it.
● The weak entity has a partial key, which is combined with the primary key of the strong
entity to form the primary key for the weak entity.

Steps:

● For a weak entity:


○ Create a table for the weak entity.
○ Use the primary key of the strong entity as a foreign key in the weak entity’s table.
○ The weak entity's partial key is included as part of the primary key of the new
○ table.
Partial Key
Mapping Attributes to Columns

● Attributes of entities and relationships in the ER diagram map directly to columns in the
relational schema.
● simple data type —--> mapped directly to a column.
● multivalued —------> handled with an additional table to represent these multiple values.

Mapping Relationships to Relations

● One-to-One Relationship:
a. If two entities are involved in a one-to-one relationship, include a foreign key in
either of the tables.
b. The foreign key will reference the primary key of the other table.

Many-Many: ?
Mapping Generalization and Specialization

● Generalization and specialization represent a hierarchy where an entity set is


divided into more specific entity sets.
● These can be mapped to relational models by creating separate tables for each
subtype and using foreign keys to represent the relationships between the generalized
entity and its specialized subtypes.

Mapping Aggregation

● Aggregation refers to the process of grouping multiple entities and relationships into
a higher-level relationship.
● This is typically mapped into a new table in the relational schema.
Summary of ER to Relational Mapping Steps:
1. Entities become relations (tables).
2. Attributes of entities become columns.
3. Weak entities are handled by including a foreign key from the strong entity and
combining it with the weak entity's partial key.
4. Relationships are mapped by adding foreign keys and creating junction tables in the
case of many-to-many relationships.
5. Generalization/Specialization and aggregation relationships are handled by creating
additional tables and linking them to the parent entity.
Relational Model Concepts
The Relational Model is a framework for organizing data into tables (or relations)
1. Relation (Table)

● A relation is a table that consists of rows and columns.


● Columns represent attributes (or fields), and rows represent records (or tuples).
● A table must have a unique name within a database.

2. Tuple (Row)

● A tuple is a single data record in a relation (i.e., a row in a table).


● Each tuple consists of values for each of the attributes in the relation.
3. Attribute (Column)

● An attribute is a column in a relation that describes a property or characteristic of the


entities represented by the relation.
● Each attribute has a domain, which is the set of allowable values it can hold.

4. Domain

● A domain refers to the set of valid values that an attribute can take.
● For example, a domain for the "Age" attribute could be integers from 0 to 100.

5. Primary Key

● A primary key is one or more attributes (columns) that uniquely identify each tuple in a
relation.
● No two tuples in a relation can have the same primary key value.
● It ensures the uniqueness of each record.
6. Foreign Key

● A foreign key is an attribute (or set of attributes) in one relation that refers to the primary
key of another relation.
● Foreign keys establish relationships between different relations (tables) and maintain
referential integrity.

7. Relationship

● The relationship between relations is established through foreign keys.


● For example, a "Student" relation and a "Course" relation might be related through a foreign
key in the "Course" relation pointing to the primary key of the "Student" relation.
8. Integrity Constraints

● These are rules that help maintain the accuracy and consistency of data within the
database.
● Common integrity constraints include:
○ Entity Integrity: Ensures that every relation has a primary key and that no
tuple can have a null primary key.
○ Referential Integrity: Ensures that foreign keys point to valid records in
related tables.
○ Domain Integrity: Ensures that attributes only contain values that are valid
according to their defined domain.
9. Relational Algebra and Relational Calculus

● Relational algebra is a set of operations (such as select, project, join, union, etc.)
used to query and manipulate data in relational databases.
● Relational calculus is a non-procedural query language used to express queries
based on mathematical logic.

10. Normalization

● Normalization is the process of organizing data in a relational database to reduce


redundancy and improve data integrity.
● It involves dividing large tables into smaller ones and defining relationships
between them based on certain rules (normal forms).
11. Schema

● A schema is the structure or blueprint of the database, describing how data is


organized in the database, including the tables, attributes, and relationships
between them.
● It is typically represented as a set of definitions of tables, constraints, and other
elements.
12. Instance

● A database instance refers to the data in the database at any given moment in time.
This is a snapshot of the database's state.

13. SQL (Structured Query Language)

● SQL is the language used to interact with relational databases. It allows users to
define, manipulate, and query data stored in relational tables.

The relational model provides a solid foundation for modern database systems, and its
principles continue to be widely used in database design and management.
Characteristics of Relations
In the relational model, a relation is essentially a table, and its characteristics define the
structure and integrity of the data stored within it. The key characteristics of relations are:

1. Unique Name

● Each relation (table) must have a unique name within a database.


● The name serves as an identifier and should clearly describe the type of data stored in
the table (e.g., Students, Orders, Employees).

2. Tuples (Rows)

● A relation consists of a finite set of tuples, which are the rows in the table.
● Each tuple represents a unique record or data entry in the relation.
● All tuples in a relation are distinct, meaning no two rows are exactly the same.
3. Attributes (Columns)

● A relation consists of a set of attributes, which are the columns in the table.
● Each attribute represents a specific characteristic or property of the data (e.g.,
"Name", "Age", "Address").
● The number of attributes is fixed for the relation, but the number of tuples (rows)
can vary.

4. Order of Tuples

● The order of tuples (rows) in a relation does not matter in the relational model.
The data is not dependent on the order in which the rows are stored.
● This is a key distinction from other data models (such as hierarchical or network
models), where the order can be significant.
5. Order of Attributes

● The order of attributes (columns) in a relation is also not significant.


● While the order of attributes can be arranged for human convenience, it does not
affect the logical structure of the relation.

6. Atomicity (Attribute Values)

● Each attribute in a relation must hold atomic (indivisible) values, meaning each
value in the table is considered a single unit.
● For example, the attribute Phone Number must hold a single value, not a list of
phone numbers or a composite value.
● This characteristic is referred to as First Normal Form (1NF), a basic
requirement for relational databases.
7. Domain of Attributes

● Each attribute has a domain, which defines the set of valid values it can take.
● For example, a Date attribute may have a domain of valid dates, and an Age
attribute may have a domain of integers between 0 and 120.
● The values in each attribute must belong to the specified domain.

8. No Duplicate Tuples

● A relation cannot have duplicate tuples (rows). Each tuple must be unique
within the relation.
● This ensures that each record in the relation can be uniquely identified.
12. Relational Integrity Constraints

● A relation must satisfy various integrity constraints to maintain the consistency and
validity of the data, including:
○ Entity Integrity: Ensures that every tuple in a relation has a unique and non-null
primary key.
○ Referential Integrity: Ensures that foreign keys in a relation correspond to
primary keys in other relations.
○ Domain Integrity: Ensures that the values in each attribute adhere to the defined
domain for that attribute.
13. Homogeneity of Attributes

● All values in an attribute must be of the same type or data domain.


● For example, all values in a Salary attribute should be numeric values, not a
mix of numbers and text.

14. Closure of Operations

● The relational model supports operations (such as select, project, join, union,
etc.) that allow for the transformation and combination of relations in a logical
manner.
● The result of a relational operation (e.g., joining two relations) is always a
relation itself.
Relational Model Constraints and Relational Database Schemas

1. Entity Integrity Constraint

● Definition: This constraint ensures that every relation (table) has a primary key
and that no tuple (row) in the relation can have a null value for the primary key
attribute(s).
● Reason: The primary key uniquely identifies each tuple in the relation, and null
values would violate this uniqueness.
● Example: In a table Employees, the Employee_ID attribute might be the primary
key. Each employee must have a unique Employee_ID, and this field cannot
contain a null value.
2. Referential Integrity Constraint

● Definition: This constraint ensures that foreign keys in a relation must


correspond to valid primary keys in the related relation.
● Reason: It maintains consistency between related tables. A foreign key value
must either match an existing primary key value or be null (if allowed).
● Example: In a Orders table, a Customer_ID foreign key might reference the
Customer_ID primary key in the Customers table. Each Customer_ID in the
Orders table must exist in the Customers table.
3. Domain Integrity Constraint

● Definition: This constraint ensures that all values in a column (attribute) are valid
according to the domain of that attribute.
● Reason: It enforces data type consistency and ensures that only valid values can be inserted
into an attribute.
● Example: If the Age attribute in the Employee table has an integer domain, no non-integer
values (like text or symbols) can be inserted into the Age column.

4. Key Constraints

● Reason: A relation can have only one primary key, but it can have multiple candidate keys
(alternative unique identifiers). The primary key is chosen from the candidate keys.
● Example: In a Students table, both Student_ID and Email might uniquely identify each
student, so both can be candidate keys. The Student_ID is chosen as the primary key.
5. Check Constraints

● Reason: It allows for the enforcement of domain-specific rules beyond just the data
type.
● Example: In an Orders table, a check constraint might ensure that the Order_Total is
greater than zero (CHECK (Order_Total > 0)).

6. Uniqueness Constraints

● Reason: It ensures that no two rows in the table have the same value for the specified
attribute(s).
● Example: A Product_Code in a Products table might have a uniqueness constraint,
ensuring that no two products can have the same product code.
7. NOT NULL Constraints

● Reason: It enforces that certain attributes must always contain valid,


non-missing data.
● Example: In a Customers table, the Customer_Name attribute might have a
NOT NULL constraint to ensure that every customer has a name.
Relational Database Schema
A relational database schema defines the logical structure of the database, including the
relations (tables), attributes (columns), data types, and the relationships between tables.
1. Definition of Schema

● A schema is a collection of database objects (such as tables, views, and indexes) that
define the structure of the database.
● A schema is typically defined using a DDL (Data Definition Language), which
includes statements such as CREATE TABLE, ALTER TABLE, and DROP
TABLE.
● In many relational database systems, schemas are organized in a hierarchical fashion,
where each schema belongs to a specific database or user.
2. Components of a Relational Database Schema

● Relations (Tables): The schema specifies the tables that make up the database and defines
the structure of each table (the attributes or columns, and the data types of those columns).
● Attributes (Columns): For each relation, the schema defines the attributes (fields) and the
data types of those attributes (e.g., VARCHAR, INTEGER, DATE).
● Primary Key: Each relation will have a primary key that uniquely identifies each tuple in
the table.
● Foreign Key: If a table refers to another table, a foreign key is used to establish the
relationship.
● Constraints: The schema defines integrity constraints, such as NOT NULL, UNIQUE,
CHECK, DEFAULT, etc., that ensure the integrity of the data.
● Indexes: Schemas also define indexes, which help speed up query processing by providing
efficient access paths to data in the relations.
Example of Relational Database Schema

Let’s consider a database schema for a university database. The schema might include the
following tables:

1. Student Table:
○ Student_ID (Primary Key)
○ First_Name
○ Last_Name
○ DOB (Date of Birth)
○ Email (Unique)
2. Course Table:
○ Course_ID (Primary Key)
○ Course_Name
○ Department
1. Enrollment Table:

○ Enrollment_ID (Primary Key)


○ Student_ID (Foreign Key, references Student(Student_ID))
○ Course_ID (Foreign Key, references Course(Course_ID))
○ Grade

In this schema:

● Student_ID in the Enrollment table is a foreign key that references the Student_ID in
the Student table.
● Course_ID in the Enrollment table is a foreign key that references the Course_ID in the
Course table.
● The Grade attribute in the Enrollment table might have a CHECK constraint to ensure
that the grade is a valid value (e.g., A, B, C, D, F).
Summary

● Relational Model Constraints are the rules (e.g., entity integrity, referential integrity,
domain integrity, etc.) that ensure the validity, consistency, and accuracy of data in a
relational database.
● A Relational Database Schema is a formal blueprint of the database, defining its
structure, tables, attributes, relationships, constraints, and other objects that manage
and enforce the organization and integrity of the data stored within the database.

These constraints and schema components help in maintaining data consistency, ensuring
efficient query processing, and providing clear relationships between different data entities.
Relational Model Notation
● The Relational Model uses a specific set of notations to describe the structure of a
database.
● The relationships between tables, and the constraints that ensure data integrity.
● This notation is vital for representing and communicating the design of relational databases.
1. Relation (Table) Notation
The general notation for a relation is:

R(A1, A2, ..., An)


Where:
● R is the name of the relation (table).
● A1, A2, ..., An are the attributes (columns) of the relation.
2. Domain Notation

A domain is the set of allowable values for an attribute in a relation.

The domain notation is used to describe the data type or the set of valid values for
each attribute.

The general notation for a domain is

Di = {v1, v2, ..., vn}


Where:

● Di is the domain of attribute Ai.


● v1, v2, ..., vn are the valid values that attribute Ai can take.

In practice, a domain is often represented as a data type (e.g., INTEGER, VARCHAR,


DATE).
3. Tuple Notation

A tuple represents a single row in a relation (table). The tuple consists of values for
each attribute in the relation.

The general notation for a tuple is:

t = (v1, v2, ..., vn)

Where:

● t is the tuple.
● v1, v2, ..., vn are the values corresponding to the attributes of the relation.
4. Primary Key Notation

A primary key is a set of attributes that uniquely identify each tuple in a relation.
The primary key is typically underlined in the notation.

The general notation for a primary key is:

PK(A1, A2, ..., Ak)

Where:

● PK denotes the primary key.


● A1, A2, ..., Ak are the attributes that form the primary key.
5. Foreign Key Notation

A foreign key is an attribute (or a set of attributes) in one relation that refers to the
primary key of another relation. It is used to represent relationships between tables.

The general notation for a foreign key is:

FK(A1, A2, ..., Ak) → R(B1, B2, ..., Bk)

Where:

● FK(A1, A2, ..., Ak) represents the foreign key attributes in the relation.
● R(B1, B2, ..., Bk) represents the relation (table) that the foreign key refers to, and
B1, B2, ..., Bk are the primary key attributes of the referenced relation.
6. Multivalued Attribute Notation

A multivalued attribute is an attribute that can hold multiple values for a single
tuple. The general notation for a multivalued attribute is:

R(A1, A2, ..., An)


M(A1, A2, ..., An, B1, B2, ..., Bm)

Where:

● M represents the new relation for the multivalued attribute.


● A1, A2, ..., An are the attributes of the original relation.
● B1, B2, ..., Bm are the attributes that represent the multivalued attribute.
7. Generalization and Specialization Notation

In the relational model, generalization and specialization refer to the inheritance of attributes and relationships from a parent
entity to child entities. This is typically represented in the ER model but can also be captured in the relational model using
separate tables for each specialized entity.

Example (Generalization):

Suppose we have a generalized Employee entity with two subtypes: Manager and Engineer. The notation would look like:

scss
Copy
Employee(EmployeeID: INTEGER, Name: VARCHAR, HireDate: DATE)
PK(EmployeeID)

Manager(EmployeeID: INTEGER, Department: VARCHAR)


PK(EmployeeID)
FK(EmployeeID) → Employee(EmployeeID)

Engineer(EmployeeID: INTEGER, SkillSet: VARCHAR)


PK(EmployeeID)
FK(EmployeeID) → Employee(EmployeeID)

Here:

● The Employee table represents the general entity.


7. Set and Relational Algebra Notation

The standard operations include union (∪), intersection (∩), difference (−), and
cartesian product (×).

These operations are used to query and combine data in the relational model.

Example:

For two relations R and S, the union operation would be represented as: R ∪ S

Where:

● R and S are relations.


● ∪ denotes the union of two relations.

Similarly, you can perform intersection, difference, and cartesian product operations
using similar notations.
Summary of Key Relational Model Notations:

1. Relation (Table): R(A1, A2, ..., An)


2. Domain: D = {v1, v2, ..., vn}
3. Tuple: t = (v1, v2, ..., vn)
4. Primary Key: PK(A1, A2, ..., Ak)
5. Foreign Key: FK(A1, A2, ..., Ak) → R(B1, B2, ..., Bk)
6. Multivalued Attribute: Represented as a separate relation with the original table
and additional attributes.
7. Relational Algebra: Operations like ∪, ∩, −, and × to manipulate relations
Populated database state
Each relation will have many tuples in its current relation state.

The relational database state is a union of all the individual relation states.

Whenever the database is changed, a new state arises.

Basic operations for changing the database:


INSERT a new tuple in a relation
DELETE an existing tuple from a relation
MODIFY an attribute of an existing tuple
Update Operations on Relations
● INSERT a tuple.
● DELETE a tuple.
● MODIFY a tuple.
● Integrity constraints should not be violated by the update operations.
● Several update operations may have to be grouped together.
● Updates may propagate to cause other updates automatically. This may be
necessary to maintain integrity constraints.
In case of integrity violation, several actions can be taken:
● Cancel the operation that causes the violation (RESTRICT or REJECT
option)
● Perform the operation but inform the user of the violation
● Trigger additional updates so the violation is corrected (CASCADE option,
SET NULL option)
● Execute a user-specified error-correction routine
Possible violations for each operation
INSERT may violate any of the constraints:
Domain constraint:
If one of the attribute values provided for the new tuple is not of the specified attribute
domain
Key constraint:
If the value of a key attribute in the new tuple already exists in another tuple in the relation
Referential integrity:
If a foreign key value in the new tuple references a primary key value that does not exist in
the referenced relation
Entity integrity:
If the primary key value is null in the new tuple
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is referenced from other tuples in
the database
Can be remedied by several actions: RESTRICT, CASCADE, SET NULL
RESTRICT option: reject the deletion
CASCADE option: propagate the new primary key value into the foreign keys of the
referencing tuples
SET NULL option: set the foreign keys of the referencing tuples to NULL
One of the above options must be specified during database design for each foreign
key constraint
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being
modified
Any of the other constraints may also be violated, depending on the attribute being updated:
Updating the primary key (PK):
Similar to a DELETE followed by an INSERT
Need to specify similar options to DELETE
Updating a foreign key (FK):
May violate referential integrity
Updating an ordinary attribute (neither PK nor FK):
Can only violate domain constraints
Relational Algebra
● Procedural query Language
● Set of operations on relation(s)
● Set of algebraic operations.
● Input: one or more tables.
● Output : one Relation.
● Theoretical foundation.
● Optimized Query.
Relational Algebra Overview
● Relational algebra is the basic set of operations for the relational model
● These operations enable a user to specify basic retrieval requests (or queries)
● The result of an operation is a new relation, which may have been formed from
one or more input relations
● This property makes the algebra “closed” (all objects in relational algebra are
relations)
● The algebra operations thus produce new relations.
● These can be further manipulated using operations of the same algebra
● A sequence of relational algebra operations forms a relational algebra
expression
● The result of a relational algebra expression is also a relation that represents the
result of a database query (or retrieval request)
Relational Algebra consists of several groups of operations
Unary Relational Operations
SELECT (symbol: (σ))
PROJECT (symbol: (𝛑))
RENAME (symbol: (𝜌))
Relational Algebra Operations From Set Theory
UNION ( ), INTERSECTION ( ), DIFFERENCE (or MINUS, – )
CARTESIAN PRODUCT ( x )
Binary Relational Operations
JOIN (several variations of JOIN exist)
DIVISION
Additional Relational Operations
OUTER JOINS, OUTER UNION
AGGREGATE FUNCTIONS (These compute summary of information: for
example, SUM, COUNT, AVG, MIN, MAX)
Unary Relational Operations: SELECT
The SELECT operation to select a subset of the tuples from a relation based on a selection
condition.
The selection condition acts as a filter
Tuples satisfying the condition are selected whereas the other tuples are discarded (filtered out)
Examples:
Select the EMPLOYEE tuples whose department number is 4:
σDNO = 4 (EMPLOYEE)
Select the employee tuples whose salary is greater than $30,000:
σSALARY > 30,000 (EMPLOYEE)
● In general, the select operation is denoted by σ<selection condition>(R) where
the symbol σ(sigma) is used to denote the select operator
● The selection condition is a Boolean (conditional) expression specified on the
attributes of relation R
● Tuples that make the condition true are selected appear in the result of the operation
● Tuples that make the condition false are filtered out discarded from the result of the
operation
SELECT Operation Properties
● The SELECT operation σ <selection condition>(R) produces a relation S that has the same
schema (same attributes) as R
● SELECT σ is commutative:
○ σ <condition1>(σ < condition2> (R)) = σ <condition2> (σ < condition1> (R))
● Because of commutativity property, a cascade (sequence) of SELECT operations may
be applied in any order:
○ σ <cond1>(σ <cond2> (σ <cond3> (R)) = σ <cond2> (σ <cond3> (σ <cond1> ( R)))
● A cascade of SELECT operations may be replaced by a single selection with a
conjunction of all the conditions:
○ σ <cond1>(σ < cond2> (σ <cond3>(R)) = σ <cond1> AND < cond2> AND < cond3>(R)))
● The number of tuples in the result of a SELECT is less than (or equal to) the number of
tuples in the input relation R
Unary Relational Operations: PROJECT
● PROJECT Operation is denoted by 𝛑 (pi)
● This operation keeps certain columns (attributes) from a relation and discards the
other columns.
● PROJECT creates a vertical partitioning
● The list of specified columns (attributes) is kept in each tuple
● The other attributes in each tuple are discarded

Example: To list each employee’s first and last name and salary, the following is used:

𝛑 LNAME, FNAME,SALARY(EMPLOYEE)
The general form of the project operation is:

𝛑<attribute list>(R)

𝛑 (pi) is the symbol used to represent the project operation

<attribute list> is the desired list of attributes from relation R.

The project operation removes any duplicate tuples

This is because the result of the project operation must be a set of tuples

Mathematical sets do not allow duplicate elements.


PROJECT Operation Properties

The number of tuples in the result of projection 𝛑<list>(R) is always less or equal to the
number of tuples in R

If the list of attributes includes a key of R, then the number of tuples in the result of
PROJECT is equal to the number of tuples in R

PROJECT is not commutative

𝛑<list1> (𝛑 <list2> (R) ) = 𝛑 <list1> (R) as long as <list2> contains the attributes in <list1>
Relational Algebra consists of several groups of operations

Unary Relational Operations

SELECT (symbol: σ (sigma))

PROJECT (symbol: 𝛑 (pi))

RENAME (symbol: ⍴ (rho))


THE RELATIONAL ALGEBRA AND CALCULUS

✔ Outline
✔ Relational Algebra
✔ Unary Relational Operations
✔ Relational Algebra Operations From Set Theory
✔ Binary Relational Operations
✔ Additional Relational Operations
✔ Examples of Queries in Relational Algebra
✔ Relational Calculus
✔ Tuple Relational Calculus
✔ Domain Relational Calculus
✔ Example Database Application (COMPANY)
DATABASE STATE FOR COMPANY
EXAMPLES OF APPLYING SELECT AND PROJECT OPERATIONS
RELATIONAL ALGEBRA OVERVIEW
Relational Algebra consists of several groups of operations
● Unary Relational Operations
SELECT (symbol: σ (sigma))
PROJECT (symbol: π (pi))
RENAME (symbol: ρ (rho))
● Relational Algebra Operations From Set Theory
UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – ), CARTESIAN PRODUCT ( x )
● Binary Relational Operations
JOIN (several variations of JOIN exist) ,DIVISION
● Additional Relational Operations
OUTER JOINS, OUTER UNION
AGGREGATE FUNCTIONS (These compute summary of information: for example, SUM, COUNT,
AVG, MIN, MAX)
RELATIONAL ALGEBRA EXPRESSIONS
● We may want to apply several relational algebra operations one after the other

○ Either we can write the operations as a single relational algebra expression by nesting
the operations, or

○ We can apply one operation at a time and create intermediate result relations.

● In the latter case, we must give names to the relations that hold the intermediate results.
SINGLE EXPRESSION VERSUS SEQUENCE OF RELATIONAL OPERATIONS
(EXAMPLE)

● To retrieve the first name, last name, and salary of all employees who work in
department number 5, we must apply a select and a project operation
● We can write a single relational algebra expression as follows:
○ πFNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))
● OR We can explicitly show the sequence of operations, giving a name to each
intermediate relation:
○ DEP5_EMPS ← σ DNO=5(EMPLOYEE)
○ RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)
UNARY RELATIONAL OPERATIONS: RENAME

● The RENAME operator is denoted by ρ (rho)


● In some cases, we may want to rename the attributes of a relation or the relation name or both
○ Useful when a query requires multiple operations
○ Necessary in some cases (see JOIN operation later)
● The general RENAME operation ρ can be expressed by any of the following forms:
○ ρS (B1, B2, …, Bn )(R) changes both:
■ the relation name to S, and
■ the column (attribute) names to B1, B1, …..Bn
○ ρS(R) changes:
■ the relation name only to S
○ ρ(B1, B2, …, Bn )(R) changes:
■ the column (attribute) names only to B1, B1, …..Bn
UNARY RELATIONAL OPERATIONS: RENAME (CONTD.)
● For convenience, we also use a shorthand for renaming attributes in an intermediate
relation:
○ If we write:
• RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)
• RESULT will have the same attribute names as DEP5_EMPS (same attributes as
EMPLOYEE)
• If we write:
• RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO)← π FNAME, LNAME, SALARY
(DEP5_EMPS)
• The 10 attributes of DEP5_EMPS are renamed to F, M, L, S, B, A, SX, SAL, SU,
DNO, respectively
EXAMPLE OF APPLYING MULTIPLE OPERATIONS AND RENAME
RELATIONAL ALGEBRA OPERATIONS FROM
SET THEORY: UNION
● UNION Operation
○ Binary operation, denoted by ∪
○ The result of R ∪ S, is a relation that includes all tuples that are either in R or in
S or in both R and S
○ Duplicate tuples are eliminated
○ The two operand relations R and S must be “type compatible” (or UNION
compatible)
■ R and S must have same number of attributes
■ Each pair of corresponding attributes must be type compatible (have same
or compatible domains)
RELATIONAL ALGEBRA OPERATIONS FROM SET THEORY: UNION
Example:

● To retrieve the social security numbers of all employees who either work in department 5
(RESULT1 below) or directly supervise an employee who works in department 5
(RESULT2 below)
● We can use the UNION operation as follows:
DEP5_EMPS ← σDNO=5 (EMPLOYEE)

RESULT1 ← π SSN(DEP5_EMPS)

RESULT2(SSN) ← πSUPERSSN(DEP5_EMPS)

RESULT ← RESULT1 ∪ RESULT2

● The union operation produces the tuples that are in either RESULT1 or RESULT2 or both
EXAMPLE OF THE RESULT OF A UNION OPERATION

UNION Example
RELATIONAL ALGEBRA OPERATIONS FROM SET THEORY

● Type Compatibility of operands is required for the binary set operation UNION ∪,
(also for INTERSECTION ∩, and SET DIFFERENCE –)
● R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:
○ They have the same number of attributes, and
○ The domains of corresponding attributes are type compatible (i.e.
dom(Ai)=dom(Bi) for i=1, 2, ..., n).
● The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2,) has the same attribute
names as the first operand relation R1 (by convention)
RELATIONAL ALGEBRA OPERATIONS FROM SET THEORY: INTERSECTION

● INTERSECTION is denoted by ∩
● The result of the operation R ∩ S, is a relation that includes all tuples that are in both
R and S
○ The attribute names in the result will be the same as the attribute names in R
● The two operand relations R and S must be “type compatible”
RELATIONAL ALGEBRA OPERATIONS FROM
SET THEORY: SET DIFFERENCE (CONT.)
● SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –
● The result of R – S, is a relation that includes all tuples that are in R but not in S
○ The attribute names in the result will be the same as the attribute names in R
● The two operand relations R and S must be “type compatible”
EXAMPLE TO ILLUSTRATE THE RESULT OF UNION, INTERSECT, AND DIFFERENCE
SOME PROPERTIES OF UNION, INTERSECT,
AND DIFFERENCE
● Notice that both union and intersection are commutative operations; that is
○ R ∪ S = S ∪ R, and R ∩ S = S ∩ R
● Both union and intersection can be treated as n-ary operations applicable to any number
of relations as both are associative operations
○ R ∪ (S ∪ T) = (R ∪ S) ∪ T
○ (R ∩ S) ∩ T = R ∩ (S ∩ T)
● The minus operation is not commutative; that is, in general
○ R–S≠S–R
RELATIONAL ALGEBRA OPERATIONS FROM SET THEORY: CARTESIAN
PRODUCT
CARTESIAN (or CROSS) PRODUCT Operation
○ This operation is used to combine tuples from two relations in a combinatorial
fashion.
○ Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
○ Result is a relation Q with degree n + m attributes:
■ Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
○ The resulting relation state has one tuple for each combination of tuples—one
from R and one from S.
○ Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S
will have nR * nS tuples.
○ The two operands do NOT have to be "type compatible”
RELATIONAL ALGEBRA OPERATIONS FROM
SET THEORY: CARTESIAN PRODUCT (CONT.)
● Generally, CROSS PRODUCT is not a meaningful operation
○ Can become meaningful when followed by other operations
● Example (not meaningful):
○ FEMALE_EMPS ← σ gender =’F’(EMPLOYEE)
○ EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)
○ EMP_DEPENDENTS ← EMPNAMES x DEPENDENT
● EMP_DEPENDENTS will contain every combination of EMPNAMES and
DEPENDENT
○ whether or not they are actually related
RELATIONAL ALGEBRA OPERATIONS FROM SET THEORY: CARTESIAN
PRODUCT (CONT.)
● To keep only combinations where the DEPENDENT is related to the EMPLOYEE, we
add a SELECT operation as follows
● Example (meaningful):
○ FEMALE_EMPS ← σ gender =’F’(EMPLOYEE)
○ EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)
○ EMP_DEPENDENTS ← EMPNAMES x DEPENDENT
○ ACTUAL_DEPS ← σ SSN=ESSN(EMP_DEPENDENTS)
○ RESULT ← π FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)
● RESULT will now contain the name of female employees and their dependents
EXAMPLE OF
APPLYING
CARTESIAN
PRODUCT
BINARY RELATIONAL OPERATIONS: JOIN

JOIN Operation (denoted by ⨝ )


● The sequence of CARTESIAN PRODUCT followed by SELECT is used quite commonly to
identify and select related tuples from two relations
● A special operation, called JOIN combines this sequence into a single operation
● This operation is very important for any relational database with more than a single relation,
because it allows us combine related tuples from various relations
● The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . .,
Bm) is:
● R ⨝ <join condition>S
● where R and S can be any relations that result from general relational algebra expressions.
BINARY RELATIONAL OPERATIONS: JOIN
(CONT.)
Example: Suppose that we want to retrieve the name of the manager of each department.

● To get the manager’s name, we need to combine each DEPARTMENT tuple with the
EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple.
● We do this by using the join operation.
● DEPT_MGR ← DEPARTMENT MGRSSN=SSN
EMPLOYEE
● MGRSSN=SSN is the join condition
● Combines each department record with the employee who manages the department
● The join condition can also be specified as DEPARTMENT.MGRSSN= EMPLOYEE.SSN
EXAMPLE OF APPLYING THE JOIN
OPERATION
SOME PROPERTIES OF JOIN
Consider the following JOIN operation:
○ R(A1, A2, . . ., An) S(B1, B2, . . ., Bm)
■ R.Ai=S.Bj
○ Result is a relation Q with degree n + m attributes:
■ Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
○ The resulting relation state has one tuple for each combination of tuples—r from R
and s from S, but only if they satisfy the join condition r[Ai]=s[Bj]
○ Hence, if R has nR tuples, and S has nS tuples, then the join result will generally
have less than nR * nS tuples.
○ Only related tuples (based on the join condition) will appear in the result
SOME PROPERTIES OF JOIN
● The general case of JOIN operation is called a Theta-join: R S
■ theta
● The join condition is called theta
● Theta can be any general boolean expression on the attributes of R and S; for
example:
○ R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)
● Most join conditions involve one or more equality conditions “AND”ed together;
for example:
○ R.Ai=S.Bj AND R.Ak=S.Bl AND R.Ap=S.Bq
BINARY RELATIONAL OPERATIONS:
EQUIJOIN
EQUIJOIN Operation

● The most common use of join involves join conditions with equality comparisons only

● Such a join, where the only comparison operator used is =, is called an EQUIJOIN.

○ In the result of an EQUIJOIN we always have one or more pairs of attributes (whose
names need not be identical) that have identical values in every tuple.

○ The JOIN seen in the previous example was an EQUIJOIN.


BINARY RELATIONAL OPERATIONS:
NATURAL JOIN OPERATION
NATURAL JOIN Operation

○ Another variation of JOIN called NATURAL JOIN — denoted by * — was created


to get rid of the second (superfluous) attribute in an EQUIJOIN condition.
because one of each pair of attributes with identical values is superfluous

○ The standard definition of natural join requires that the two join attributes, or each
pair of corresponding join attributes, have the same name in both relations
○ If this is not the case, a renaming operation is applied first.
BINARY RELATIONAL OPERATIONS NATURAL JOIN (CONTD.)
Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT and
DEPT_LOCATIONS, it is sufficient to write:
○ DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS
● Only attribute with the same name is DNUMBER
● An implicit join condition is created based on this attribute:
○ DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER
Another example: Q ← R(A,B,C,D) * S(C,D,E)
○ The implicit join condition includes each pair of attributes with the same name,
“AND”ed together:
■ R.C=S.C AND R.D.S.D
○ Result keeps only one attribute of each such pair:
■ Q(A,B,C,D,E)
EXAMPLE OF NATURAL JOIN
OPERATION
COMPLETE SET OF RELATIONAL
OPERATIONS
● The set of operations including SELECT σ, PROJECT π , UNION ∪,
DIFFERENCE − , RENAME ρ, and CARTESIAN PRODUCT X is called a
complete set because any other relational algebra expression can be expressed
by a combination of these five operations.
● For example:
○ R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))
○ R ⨝ <join condition>
S = σ <join condition> (R X S)
BINARY RELATIONAL OPERATIONS:
DIVISION
DIVISION Operation

● The division operation is applied to two relations

R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪ Y); that is, let
Y be the set of attributes of R that are not attributes of S.

● The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear in R
with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S.

● For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R
in combination with every tuple in S.
EXAMPLE OF DIVISION
RECAP OF
RELATIONAL
ALGEBRA
OPERATIONS
ADDITIONAL RELATIONAL OPERATIONS: AGGREGATE FUNCTIONS
AND GROUPING
● A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database.
● Examples of such functions include retrieving the average or total salary of all
employees or the total number of employee tuples.
○ These functions are used in simple statistical queries that summarize information
from the database tuples.
● Common functions applied to collections of numeric values include
○ SUM, AVERAGE, MAXIMUM, and MINIMUM.
● The COUNT function is used for counting tuples or values.
AGGREGATE FUNCTION OPERATION
● Use of the Aggregate Functional operation ℱ
○ ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from the
EMPLOYEE relation
○ ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from the
EMPLOYEE relation
○ ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from the EMPLOYEE
relation
○ ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count (number) of
employees and their average salary
■ Note: count just counts the number of rows, without removing duplicates
USING GROUPING WITH AGGREGATION
● The previous examples all summarized one or more attributes for a set of tuples
○ Maximum Salary or Count (number of) Ssn
● Grouping can be combined with Aggregate Functions
● Example: For each department, retrieve the DNO, COUNT SSN, and AVERAGE SALARY
● A variation of aggregate operation ℱ allows this:
○ Grouping attribute placed to left of symbol
○ Aggregate functions to right of symbol
○ DNO
ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE)
● Above operation groups employees by DNO (department number) and computes the count
of employees and average salary per department
EXAMPLES OF APPLYING AGGREGATE FUNCTIONS AND GROUPING
ILLUSTRATING AGGREGATE FUNCTIONS AND GROUPING
ADDITIONAL RELATIONAL
OPERATIONS (CONT.)
● Recursive Closure Operations
○ Another type of operation that, in general, cannot be specified in the basic original
relational algebra is recursive closure.
■ This operation is applied to a recursive relationship.
○ An example of a recursive operation is to retrieve all SUPERVISEES of an
EMPLOYEE e at all levels — that is, all EMPLOYEE e’ directly supervised by e;
all employees e’’ directly supervised by each employee e’; all employees e’’’
directly supervised by each employee e’’; and so on.
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
● Although it is possible to retrieve employees at each level and then take their union, we
cannot, in general, specify a query such as “retrieve the supervisees of ‘James Borg’ at
all levels” without utilizing a looping mechanism.
● The SQL3 standard includes syntax for recursive closure.
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
✔ The OUTER JOIN Operation
✔ In NATURAL JOIN and EQUIJOIN, tuples
without a matching (or related) tuple are
eliminated from the join result
✔ Tuples with null in the join attributes are also eliminated
✔ This amounts to loss of information.

✔ A set of operations, called OUTER joins, can


be used when we want to keep all the tuples
in R, or all those in S, or all those in both
relations in the result of the join, regardless
of whether or not they have matching tuples
in the other relation.
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
● The left outer join operation keeps every tuple in the first or left relation
R in R S; if no matching tuple is found in S, then the attributes of S in
the join result are filled or “padded” with null values.
● A similar operation, right outer join, keeps every tuple in the second or
right relation S in the result of R S.
● A third operation, full outer join, denoted by keeps all tuples in
both the left and the right relations when no matching tuples are found,
padding them with null values as needed.
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
ADDITIONAL RELATIONAL OPERATIONS
(CONT.)
✔ OUTER UNION Operations
✔ The outer union operation was developed to take the union of tuples
from two relations if the relations are not type compatible.
✔ This operation will take the union of tuples in two relations R(X, Y) and
S(X, Z) that are partially compatible, meaning that only some of their
attributes, say X, are type compatible.
✔ The attributes that are type compatible are represented only once in
the result, and those attributes that are not type compatible from
either relation are also kept in the result relation T(X, Y, Z).
ADDITIONAL RELATIONAL OPERATIONS (CONT.)

● Example: An outer union can be applied to two relations whose schemas are
STUDENT(Name, SSN, Department, Advisor) and INSTRUCTOR(Name, SSN, Department,
Rank).
○ Tuples from the two relations are matched based on having the same combination of
values of the shared attributes— Name, SSN, Department.
○ If a student is also an instructor, both Advisor and Rank will have a value; otherwise, one
of these two attributes will be null.
○ The result relation STUDENT_OR_INSTRUCTOR will have the following attributes:
● STUDENT_OR_INSTRUCTOR (Name, SSN, Department, Advisor, Rank)
EXAMPLES OF QUERIES IN RELATIONAL
ALGEBRA
✔ Q1: Retrieve the name and address of all
employees who work for the ‘Research’
department.
✔ RESEARCH_DEPT ← σ DNAME=’Research’
(DEPARTMENT)
✔ RESEARCH_EMPS ← (RESEARCH_DEPT DNUMBER=

DNOEMPLOYEE
EMPLOYEE)
✔ RESULT ← π FNAME, LNAME, ADDRESS
(RESEARCH_EMPS)

✔ Q6: Retrieve the names of employees who have


no dependents.
✔ ALL_EMPS ← π SSN(EMPLOYEE)
✔ EMPS_WITH_DEPS(SSN) ← π ESSN(DEPENDENT)

You might also like