0% found this document useful (0 votes)
2 views44 pages

DBMS Notes Unit 2

Uploaded by

thombreasmita21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views44 pages

DBMS Notes Unit 2

Uploaded by

thombreasmita21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

19

Syllabus

Unit Introduction to Databases and DBMS 7


I hrs

1. Database Basics

o Definition, importance, and applications of databases. o Data vs.


Information, Database vs. File System.

o Characteristics of databases.

2. Database Management Systems

o DBMS architecture: Levels of abstraction (physical, logical, and view


levels).

oTypes of DBMS: Relational, NoSQL, Object-Oriented. o Database


models: Hierarchical, Network, and Relational models.

3. Database Environment

o Role of database administrators (DBAs) and database designers.

o Overview of database users and their roles.

Unit Relational Database Design and Normalization 8


II hrs

1. Relational Model Concepts

o Attributes, tuples, relations, and schema.

o Keys: Primary, candidate, foreign, and superkeys.

2. Relational Algebra and Operations

o Basic operations: Selection, projection, union,

intersection, difference, and Cartesian product.

o Advanced operations: Joins, division, aggregation, and grouping.

3. Database Design Process

o Entity-Relationship (ER) modeling: Entities, attributes, relationships, and


ER diagrams.

o Conversion of ER diagrams to relational schema.

4. Normalization
19
o Functional dependencies.

o Normal forms: 1NF, 2NF, 3NF, and BCNF (with

examples).

o Denormalization and its applications.

Unit SQL and Query Optimization 8


III hrs

D Y Patil College of Engineering, Akurdi, Pune


An Autonomous Institute from AY 2024-25, Affiliated to Savitribai
Phule Pune University, Pune Department of Information
Technology

Second Year Engineering SY B Tech Semester III (2024 Course)

1. Structured Query Language (SQL)

o Basics of SQL: Data definition (CREATE, ALTER, DROP) and data


manipulation (INSERT, UPDATE,

DELETE).

o Querying data: SELECT, WHERE, GROUP BY,

HAVING, ORDER BY, LIMIT.

o Joins: Inner, outer, self-joins, and cross-joins.

o Subqueries and set operations.

2. Advanced SQL Features

o Views, Indexes, and Triggers.

o Stored Procedures and Functions.


19
o Role of constraints (NOT NULL, UNIQUE, CHECK, DEFAULT).

3. Query Optimization

o Query execution plans.

o Optimization strategies: Indexing and query rewriting.

Unit Transactions, Concurrency, and Recovery 8hrs


IV

1. Transaction Management

o ACID properties: Atomicity, Consistency, Isolation, Durability.

o Transaction states and lifecycle.

2. Concurrency Control

o Problems in concurrent transactions: Lost updates,

uncommitted data, and inconsistent analysis.

o Lock-based protocols: Shared, exclusive locks, and two phase locking.

o Deadlock handling in DBMS.

3. Database Recovery

o Types of failures: System crash, media failure.

o Recovery techniques: Log-based recovery, shadow

paging.

o Checkpointing and its significance.

Unit Emerging Trends in Databases 8 hrs


V
19
1. NoSQL Databases

o Types of NoSQL databases: Key-value stores, document databases,


column-family stores, graph databases.

o Use cases and advantages of NoSQL over relational databases.

o Introduction to popular NoSQL systems (MongoDB, Cassandra,


Redis).

2. Big Data and Distributed Databases

o Basics of distributed databases and CAP theorem.

Unit 2

Conceptual Database Design - Entity Relationship(ER) Modeling:


Database Design Techniques
1. ER Modeling (Top down Approach)
2. Normalization (Bottom Up approach)

What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementation
We need to be familiar with the following terms to go further.
Entity

Any thing that has an independent existence and about which we collect data. It is also known as entity type.
In ER modeling, notation for entity is given below.

Entity instance
Entity instance is a particular member of the entity type.
Example for entity instance : A particular employee
Regular Entity
An entity which has its own key attribute is a regular entity.
Example for regular entity : Employee.
19
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its own is a weak

entity.
Example for a weak entity : In a parent/child relationship, a parent is considered as a strong entity and the child
is a weak entity.
In ER modeling, notation for weak entity is given below.

Attributes

Properties/characteristics which describe entities are called attributes.


In ER modeling, notation for attribute is given below.

Domain of Attributes

The set of possible values that an attribute can take is called the domain of the attribute. For example, the
attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this set can be termed as the
domain of the attribute day.
Key attribute

The attribute (or combination of attributes) which is unique for every entity instance is called key attribute.
E.g the employee_id of an employee, pan_card_number of a person [Link] the key attribute consists of two or
more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.

Simple attribute

If an attribute cannot be divided into simpler components, it is a simple attribute.


Example for simple attribute : employee_id of an employee.

Composite attribute
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into First_name, Middle_name, and
Last_name.
Single valued Attributes
20
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
example for single valued attribute : age of a student. It can take only one value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute. Multi-valued
example for multi valued attribute : telephone number of an employee, a particular employee may have multiple
telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.

Stored Attribute

An attribute which need to be stored permanently is a stored attribute


Example for stored attribute : name of a student
Derived Attribute

An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute : age of employee which can be calculated from date of birth and current date.
In ER modeling, notation for derived attribute is given below.

Relationships

Associations between entities are called relationships


Example : An employee works for an organization. Here "works for" is a relation between the entities employee
and organization.
In ER modeling, notation for relationship is given below.

However in ER Modeling, To connect a weak Entity with others, you should use a weak relationship notation as
given below
21

Degree of a Relationship

Degree of a relationship is the number of entity types involved. The n-ary relationship is the general form for
degree n. Special cases are unary, binary, and ternary ,where the degree is 1, 2, and 3, respectively.
Example for unary relationship : An employee ia a manager of another employee
Example for binary relationship : An employee works-for department.
Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship

Relationship cardinalities specify how many of each entity type is allowed. Relationships can have four possible
connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship
The minimum and maximum values of this connectivity is called the cardinality of the relationship
Example for Cardinality – One-to-One (1:1)

Employee is assigned with a parking space.

One employee is assigned with only one parking space and one parking space is assigned to only one
employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1)
In ER modeling, this can be mentioned using notations as given below
22

Example for Cardinality – One-to-Many (1:N)

Organization has employees

One organization can have many employees , but one employee works in only one organization. Hence it is a
1:N relationship and cardinality is One-To-Many (1:N)
In ER modeling, this can be mentioned using notations as given below

Example for Cardinality – Many-to-One (M :1)

It is the reverse of the One to Many relationship. employee works in organization

One employee works in only one organization But one organization can have many employees. Hence it is a
M:1 relationship and cardinality is Many-to-One (M :1)
24
In ER modeling, this can be mentioned using notations as given below.

Cardinality – Many-to-Many (M:N)

Students enrolls for courses

One student can enroll for many courses and one course can be enrolled by many students. Hence it is a M:N
relationship and cardinality is Many-to-Many (M:N)
In ER modeling, this can be mentioned using notations as given below

Relationship Participation

1. Total
In total participation, every entity instance will be connected through the relationship to another instance of the
other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of the
department. In other words, only few instances of employee entity participate in the above relationship. So
employee entity's participation is partial in the said relationship.
1. However each department will be headed by some employee. So department entity's participation is total in the
said relationship
24
Advantages and Disadvantages of ER Modeling ( Merits and Demerits of ER Modeling )
Advantages

1. ER Modeling is simple and easily understandable. It is represented in business users language and it can be
understood by non-technical specialist.
2. Intuitive and helps in Physical Database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design.
5. Gives a higher level description of the system.
Disadvantages

1. Physical design derived from E-R Model may have some amount of ambiguities or inconsistency.
2. Sometime diagrams may lead to misinterpretations

Relational Model
The relational model is the primary data model for commercial data processing applications. It attained its primary
position because of its simplicity, which eases the job of the programmer, compared to earlier data models
such as the network model or the hierarchical model.
Structure of Relational Databases:

A relational database consists of a collection of tables, each of which is assigned a unique name. For example,
consider the instructor [Link] table has column headers: ID, name, dept name, and salary. Each row of this
table records information about an instructor, consisting of the instructor’s ID, name, dept name, and salary.
Similarly, the course table stores information about courses, consisting of a course id, title, dept name, and
credits, for each course. Note that each instructor is identified by the value of the column ID, while each
course is identified by the value of the column course id.

Thus, a row in the prefer table indicates that two courses are related in the sense that one course is a
prerequisite for the other. As another example, we consider the table instructor, a row in the table can be thought
of as representing the relationship between a specified ID and the corresponding values for name, dept name,
and salary values.
28

Figure 1.5: The instructor relation (2.1)

In general, a row in a table represents a relationship among a set of values. Since a table is a collection of such
relationships, there is a close correspondence between the concept of table and the mathematical concept of
relation, from which the relational data model takes its name. In mathematical terminology, a tuple is simply a
sequence (or list) of values. A relationship between n values is represented mathematically by an n-tuple of
values, i.e., a tuple with n values, which corresponds to a row in a table.

Figure: 1.6: The course relation (2.2)

Figure: 1.7: The prereq relation. (2.3)

Thus, in the relational model the term relation is used to refer to a table, while the term tuple is used to refer to
a row. Similarly, the term attribute refers to a column of a table.
Examining Figure 1.5, we can see that the relation instructor has four attributes:
ID, name, dept name, and salary.

We use the term relation instance to refer to a specific instance of a relation, i.e., containing a specific set of
rows. The instance of instructor shown in Figure 1.5 has 12 tuples, corresponding to 12 instructors.

In this topic, we shall be using a number of different relations to illustrate the various concepts underlying the
relational data model. These relations represent part of a university. They do not include all the data an actual
28
university database would contain, in order to simplify our presentation.

The order in which tuples appear in a relation is irrelevant, since a relation is a set of tuples. Thus, whether the
tuples of a relation are listed in sorted order, as in Figure 1.5, or are unsorted, as in Figure 1.8, does not matter;
the relations in the two figures are the same, since both contain the same set of tuples. For ease of exposition,
we will mostly show the relations sorted by their first attribute. For each attribute of a relation, there is a set of
permitted values, called the domain of that attribute. Thus, the domain of the salary attribute of the instructor
relation is the set of all possible salary values, while the domain of the name attribute is the set of all possible
instructor names.

We require that, for all relations r, the domains of all attributes of r be atomic. A domain is atomic if elements of
the domain are considered to be indivisible units.

Figure: 1.8: Unsorted display of the instructor relation. (2-4)

For example, suppose the table instructor had an attribute phone number, which can store a set of phone
numbers corresponding to the instructor. Then the domain of phone number would not be atomic, since an
element of the domain is a set of phone numbers, and it has subparts, namely the individual phone numbers
in the set.

The null value is a special value that signifies that the value is unknown or does not exist. For example, suppose
as before that we include the attribute phone number in the instructor relation. It may be that an instructor does
not have a phone number at all, or that the telephone number is unlisted. We would then have to use the null
value to signify that the value is unknown or does not exist. We shall see later that null values cause a number of
difficulties when we access or update the database, and thus should be eliminated if at all possible. We shall
assume null values are absent initially.

Database Schema

When we talk about a database, we must differentiate between the database schema, which is the logical
design of the database, and the database instance, which is a snapshot of the data in the database at a given
instant in time. The concept of a relation corresponds to the programming-language notion of a variable, while
the concept of a relation schema corresponds to the programming-language notion of type definition.
28

Figure 1.9: The department relation.(2-5)

similarly the contents of a relation instance may change with time as the relation is updated. In contrast, the
schema of a relation does not generally change. Although it is important to know the difference between a
relation schema and a relation instance, we often use the same name, such as instructor, to refer to both the
schema and the instance. Where required, we explicitly refer to the schema or to the instance, for example “the
instructor schema,” or “an instance of the instructor relation.” However, where it is clear whether we mean the
schema or the instance, we simply use the relation name.

Consider the department relation of Figure 1.9. The schema for that relation is

department (dept name, building, budget)

Note that the attribute dept name appears in both the instructor schema and the department schema. This
duplication is not a coincidence. Rather, using common attributes in relation schemas is one way of relating
tuples of distinct relations.

For example, suppose we wish to find the information about all the instructors who work in the Watson building.
We look first at the department relation to find the dept name of all the departments housed in Watson. Then, for
each such department, we look in the instructor relation to find the information about the instructor associated
with the corresponding dept name.

Let us continue with our university database example. Each course in a university may be offered multiple times,
across different semesters, or even within a [Link] need a relation to describe each individual
offering, or section, of the class. The schema is
section (course id, sec id, semester, year, building, room number, time slot id)

Figure 1.10 shows a sample instance of the section relation. We need a relation to describe the association
between instructors and the class sections that they teach. The relation schema to describe this association is
teaches (ID, course id, sec id, semester, year)
28
Figure 1.10: The section relation.(2-6)

Figure 1.11 shows a sample instance of the teaches relation. As you can imagine, there are many more relations
maintained in a real university database. In addition to those relations we have listed already, instructor,
department, course, section, prereq, and teaches,we use the following relations in this text:

Figure: 1.11: The teaches relation.(2-7)

• student (ID, name, dept name, tot cred)


• advisor (s id, i id)
• takes (ID, course id, sec id, semester, year, grade)
• classroom (building, room number, capacity)
• time slot (time slot id, day, start time, end time)
29

Keys

We must have a way to specify how tuples within a given relation are distinguished. This is expressed in terms of their
attributes. That is, the values of the attribute values of a tuple must be such that they can uniquely identify the tuple. In other
words, no two tuples in a relation are allowed to have exactly the same value for all attributes

These Are
 Super key
 Candidate Key
 Primary Key
 Foreign Key

Super Key : A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely
a tuple in the relation. For example, the ID attribute of the relation instructor is sufficient to distinguish one
instructor tuple from another. Thus, ID is a superkey. The name attribute of instructor, on the other hand, is not a
superkey, because several instructors might have the same name. Formally, let R denote the set of attributes
in the schema of relation r. If we say that a subset K of R is a superkey for r , we are restricting consideration
to instances of relations r in which no two distinct tuples have the same values on all attributes in K. That is, if
t1 and t2 are in r and t1 = t2, then t1.K = t2.K.

Example

Let‟s consider a table called Students:

StudentID Name Email Phone


101 Alice alice@[Link] 123-456-7890
102 Bob bob@[Link] 987-654-3210
103 Charlie charlie@[Link] 456-789-1230

Possible Super Keys:

1. {StudentID}
2. {Email}
3. {Phone}
4. {StudentID, Name}
5. {Email, Phone}
6. {StudentID, Email, Phone}

All of these sets can uniquely identify a student, so they are super keys.

candidate keys A superkey may contain extraneous attributes. For example, the combination of ID and name
is a superkey for the relation instructor. If K is a superkey, then so is any superset of K. We are often
interested in superkeys for which no proper subset is a superkey. Such minimal superkeys are called candidate
keys.

It is possible that several distinct sets of attributes could serve as a candidate key. Suppose that a combination
of name and dept name is sufficient to distinguish among members of the instructor relation. Then, both {ID} and
{name, dept name} are candidate keys. Although the attributes ID and name together can distinguish instructor
tuples, their combination, {ID, name}, does not form a candidate key, since the attribute ID alone is a candidate key.
29
primary key A key (whether primary, candidate, or super) is a property of the entire relation, rather than of the
individual tuples. Any two individual tuples in the relation are prohibited from having the same value on the
key attributes at the same time. The designation of a key represents a constraint in the real-world
enterprise being modeled.

Primary keys must be chosen with care.


As we noted, the name of a person is obviously not sufficient, because there may be many people with the same
name. In the United States, the social-security number attribute of a person would be a candidate key. Since
non-U.S. residents usually do not have social-security numbers, international enterprises must generate
their own unique identifiers.

foreign key It is customary to list the primary key attributes of a relation schema before the other attributes; for
example, the dept name attribute of department is listed first, since it is the primary key. Primary key
attributes are also underlined. A relation, say r1, may include among its attributes the primary key of another
relation, say r2. This attribute is called a foreign key from r1, referencing r2.

The relation r1 is also called the referencing relation of the foreign key dependency, and r2 is called the referenced
relation of the foreign key. For example, the attribute dept name in instructor is a foreign key from instructor, referencing
department, since dept name is the primary key of department. In any database instance, given any tuple, say ta, from the
instructor relation, there must be some tuple, say tb, in the department relation such that the value of the dept name
attribute of ta is the same as the value of the primary key, dept name, of tb.
The constraint from section to teaches is an example of a referential integrity constraint; a referential integrity
constraint requires that the values appearing in specified attributes of any tuple in the referencing relation also
appear in specified attributes of at least one tuple in the referenced relation.

Schema Diagrams

A database schema, along with primary key and foreign key dependencies, can be depicted by schema
diagrams. Figure 1.12 shows the schema diagram for our university organization. Each relation appears as a
box, with the relation name at the top in blue, and the attributes listed inside the box. Primary key attributes are
shown underlined. Foreign key dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation.

Figure 1.12 : Schema diagram for the university database.

Referential integrity constraints other than foreign key constraints are not shown explicitly in schema diagrams.
We will study a different diagrammatic representation called the entity-relationship diagram.

Dept of CSE, Unit-2 Page


17
Relational Algebra and Calculus
RELATIONAL ALGEBRA
Relational algebra is one of the two formal query languages associated with the re-
lational model. Queries in algebra are composed using a collection of operators. A
fundamental property is that every operator in the algebra accepts (one or two) rela-
tion instances as arguments and returns a relation instance as the result. This property
makes it easy to compose operators to form a complex query—a relational algebra
expression is recursively defined to be a relation, a unary algebra operator applied to a
single expression, or a binary algebra operator applied to two expressions.

1. Unary algebra:
 Selection (σ)
 Projection (π)
 Rename ρ(R)
2. Binary algebra:

Selection and Projection


Relational algebra includes operators to select rows from a relation (σ) and to project
columns (π). These operations allow us to manipulate data in a single relation. Con- sider
the instance of the Sailors relation shown in Figure 4.2, denoted as S2. We can retrieve
rows corresponding to expert sailors by using the σ operator. The expression,
σ<condion>(R2)
evaluates to the relation shown in Figure 4.4. The subscript rating>8 specifies the
selection criterion to be applied while retrieving tuples.
sname rating
yuppy 9
Lubber 8
Guppy 5
Rusty 10
sid sname rating age
28 Yuppy 9 35.0
58 Rusty 10 35.0
Figure 4.4 σrating>8(S2) Figure 4.5πsname,rating(S2)

Dept of CSE, Unit-2 Page


18
The selection operator σ specifies the tuples to retain through a selection condition. In
general, the selection condition is a boolean combination (i.e., an expression using the
logical connectives ∧ and ∨) of terms that have the form attribute op constant or
attribute1 op attribute2, where op is one of the comparison operators <, <=, =, =, >=,
or >. The reference to an attribute can be by position (of the form .i or i) or by name
(of the form .name or name). The schema of the result of a selection is the schema of
the input relation instance
The projection operator π allows us to extract columns from a relation; for example, we
can find out all sailor names and ratings by using π. The expression πsname,rating(S2)

Suppose that we wanted to find out only the ages of sailors. The expression

πage(S2)
a single tuple with age=35.0 appears in the result of the projection. This follows from
the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.
We can compute the names and ratings of highly rated sailors by combining two of the
preceding queries. The expression
πsname,rating(σrating>8(S2))

age sname rating


yuppy 9
35.0 10
55.5

Figure 4.6 πage(S2) Figure 4.7 πsname,rating(σrating>8(S2))

Set Operations

The following standard operations on sets are also available in relational algebra: union (U),
intersection (∩), set-difference (−), and cross-product (×).

Dept of CSE, Unit-2 Page


19
 Union: R u S returns a relation instance containing all tuples that occur in either
relation instance R or relation instance S (or both). R and S must be union-
compatible, and the schema of the result is defined to be identical to the schema
of R.
 Intersection: R ∩ S returns a relation instance containing all tuples that occur in
both R and S. The relations R and S must be union-compatible, and the schema of
the result is defined to be identical to the schema of R.
 Set-difference: R − S returns a relation instance containing all tuples that occur
in R but not in S. The relations R and S must be union-compatible, and the
schema of the result is defined to be identical to the schema of R.
 Cross-product: R × S returns a relation instance whose schema contains all the
fields of R (in the same order as they appear in R) followed by all the fields of S
(in the same order as they appear in S). The result of R × S contains one tuple
〈 r, s 〉 (the concatenation of tuples r and s) for each pair of tuples r ∈ R, s ∈ S.

The cross-product opertion is sometimes called Cartesian product.


We now illustrate these definitions through several examples. The union of S1 and
S2 is shown in Figure 4.8. Fields are listed in order; field names are also inherited
from S1. S2 has the same field names, of course, since it is also an instance of
[Link] general, fields of S2 may have different names; recall that we require only
domains to match. Note that the result is a set of tuples. Tuples that appear in both
S1 and S2 appear only once in S1 ∪ S2. Also, S1 ∪ R1 is not a valid operation
because the two relations are not union-compatible. The intersection of S1 and S2 is
shown in Figure 4.9, and the set-difference S1 − S2 is shown in Figure 4.10.
sid sname rating age
22 Dustin 7 45.0
31 Lubber 8 55.5
58 Rusty 10 35.0
28 Yuppy 9 35.0
44 Guppy 5 35.0

Figure 4.8 S1 ∪ S2

Dept of CSE, Unit-2 Page


20
sid rating age sid rating age
31 8 55.5
58 10 35.0 22 7 45.0

Figure 4.9 S1 ∩ S2 Figure 4.10 S1 − S2

The result of the cross-product S1 × R1 is shown in Figure 4.11 The fields in S1


× R1 have the same domains as the
corresponding fields in R1 and S1. In Figure 4.11 sid is listed in parentheses
to
emphasize that it is not an inherited field name; only the corresponding domain is
inherited.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 22 101 10/10/96
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.0 22 101 10/10/96
58 Rusty 10 35.0 58 103 11/12/96
Figure 4.11 S1 × R1

Renaming

We introduce a renaming operator ρ for this purpose. The expression ρ(R(F ), E)


takes an arbitrary relational algebra expression E and returns an instance of a (new)
relation called R. R contains the same tuples as the result of E, and has the same
schema as E, but some fields are renamed. The field names in relation R are the
same as in E, except for fields renamed in the renaming list F.

For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the following schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid: integer,day:
dates).

It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant; R
∩ S can be defined as R − (R − S).) We will consider these additional operators,and
their definition in terms of the basic operators, in the next two subsections.

Dept of CSE, Unit-2 Page


21
Joins

The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products.

joins have received a lot of attention, and there are several variants of
the join operation.

Condition Joins

The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:

R 𝝰⊳c S = σc(R × S)

Thus 𝝰⊳ is defined to be a cross-product followed by a selection. Note that the condition


c can (and typically does) refer to attributes of both R and S. The reference to an

attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form [Link]).As an example, the result of S1 𝝰⊳[Link]<[Link] R1 is shown in Figure 4.12.
Because sid appears in both S1 and R1, the corresponding fields in the result of the cross-
product S1 × R1 (and therefore in the result of S1 𝝰⊳[Link]<[Link] R1) are unnamed.
Domains are inherited from the corresponding fields of S1 and R1.

(sid) sname rating age (sid) bid day


22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96

Figure 4.12 S1 𝝰⊳[Link]<[Link] R1

Equijoin

Dept of CSE, Unit-2 Page


22
A common special case of the join operation R 𝝰⊳ S is when the join condition con-
sists solely of equalities (connected by ∧) of the form R.name1 = S.name2, that is,
equalities between two fields in R and S. In this case, obviously, there is some redun-
dancy in retaining both attributes in the result. For join conditions that contain only
such equalities, the join operation is refined by doing an additional projection in which
S.name2 is dropped. The join operation with this refinement is called equijoin.
The schema of the result of an equijoin contains the fields of R (with the same names
and domains as in R) followed by the fields of S that do not appear in the join
conditions. If this set of fields in the result relation includes two fields that inherit the
same name from R and S, they are unnamed in the result relation.

We illustrate S1 𝝰⊳[Link]=[Link] R1 in Figure 4.13. Notice that only one field called sid

appears in the result.


sid sname rating age bid day
22 Dustin 7 45.0 101 10/10/96
58 Rusty 10 35.0 103 11/12/96

Figure 4.13 S1 𝝰⊳[Link]=[Link] R1

Natural Join

A further special case of the join operation R 𝝰⊳ S is an equijoin in which equalities


are specified on all fields having the same name in R and S. In this case, we can
simply omit the join condition; the default is that the join condition is a collection of
equalities on all common fields. We call this special case a natural join, and it has the
nice property that the result is guaranteed not to have two fields with the same name.

The equijoin expression S1 𝝰⊳[Link]=[Link] R1 is actually a natural join and can simply be
denoted as S1 𝝰⊳ R1, since the only common field is sid. If the two relations have no
attributes in common, S1 𝝰⊳ R1 is simply the cross-product.
Division

The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use the
basic operators of the algebra to define division is a useful exercise. However,

Dept of CSE, Unit-2 Page


23
the division operator does not have the same importance as the other operators—it
is not needed as often, and database systems do not try to exploit the semantics of
division by implementing it as a distinct operator (as, for example, is done with the
join operator).

We discuss division through an example. Consider two relation instances A and B in


which A has (exactly) two fields x and y and B has just one field y, with the same
domain as in A. We define the division operation A/B as the set of all x values (in
the form of unary tuples) such that for every y value in (a tuple of) B, there is a tuple
〈x,y〉in A.

Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.

An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.

Division is illustrated in Figure 4.14. It helps to think of A as a relation listing the


parts supplied by suppliers, and of the B relations as listing parts. A/Bi computes
suppliers who supply all parts listed in relation instance Bi.

Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a

y value from B, we obtain a tuple 〈 x,y 〉 that is not in A. We can compute disqualified tuples
using the algebra expression

πx((πx(A) × B) − A)
Thus we can define A/B as

πx(A) − πx((πx(A) × B) − A)

Dept of CSE, Unit-2 Page


24
To understand the division operation in full generality, we have to consider the case
when both x and y are replaced by a set of attributes.
More Examples of Relational Algebra Queries

We illustrate queries using thei nstances S3 of Sailors, R2 of Reserves, and B1 of Boats,


shown in Figures 4.15,4.16,and4.17, respectively.

sid sname rating age sid bid day


22 7 45.0 22 101
29 1 33.0 22 102
31 8 55.5 22 103
32 Andy 8 25.5 22 104
58 10 35.0 31 102
64 Horatio 7 35.0 31 103
71 Zorba 10 16.0 31 104
74 Horatio 9 35.0 64 101
85 3 25.5 64 102
95 Bob 3 63.5 74 103

Figure 4.15An Instance S3 of Sailors Figure 4.16An Instance R2 of Reserves

bid bname color


101 Interlak blue
102 Interlak red
103 Clipper green
104 Marine red

Figure 4.17 An Instance B1 of Boats

(Q1) Find the names of sailors who have reserved boat 103.

This query can be written as follows:


πsname((σbid=103Reserves) 𝝰⊳Sailors)

We first compute the set of tuples in Reserves with bid = 103 and then take the
natural join of this set with Sailors. This expression can be evaluated on instances
of Reserves and Sailors. Evaluated on the instances R2 and S3, it yields a relation
that contains just one field, called sname, and three tuples 〈Dustin〉, 〈 Horatio〉,and

Dept of CSE, Unit-2 Page


25
Lubber 〉 . (Observe that there are two sailors called Horatio, and only one of them has
reserved a red boat.)

We can break this query into smaller pieces using the renaming operator ρ:
ρ(T emp1, σbid=103Reserves)
ρ(T emp2, T emp1 𝝰⊳ Sailors) πsname(Temp2)
T emp1 denotes an intermediate relation that identifies reservations of boat 103. T
emp2 is another intermediate relation, and it denotes sailors who have made a
reservation in the set T emp1. The instances of these relations when evaluating this
query on the instances R2 and S3 are illustrated in Figures 4.18 and 4.19. Finally, we
extract the sname column from T emp2.
Figure 4.18Instance of T emp1 Figure 4.19 Instance of T emp2

sid bid day sid sname rating age bid day


22 103 22 7 45.0 103
31 103 31 8 55.5 103
74 103 74 Horatio 9 35.0 103

πsname(σbid=103(Reserves𝝰⊳Sailors))

The DBMS translates an SQL query into (an extended form of) relational algebra, and
then looks for other algebra expressions that will produce the same answers but are
cheaper to evaluate. If the user‟s query is first translated into the expression
πsname(σbid=103(Reserves𝝰⊳Sailors))
a good query optimizer will find the equivalent expression

πsname((σbid=103Reserves) 𝝰⊳ Sailors)

(Q2) Find the names of sailors who have reserved a red boat.

πsname((σcolor=′red′ Boats) 𝝰⊳ Reserves 𝝰⊳ Sailors


This query involves a series of two joins. First we choose (tuples describing) red boats.
Then we join this set with Reserves (natural join, with equality specified on the bid
column) to identify reservations of red boats. Next we join the resulting intermediate

Dept of CSE, Unit-2 Page 26


relation with Sailors (natural join, with equality specified on the sid column) to retrieve the
names of sailors who have made reservations of red boats. Finally, we project the sailors‟
names. The answer, when evaluated on the instances B1, R2 and S3, contains the names
Dustin, Horatio, and [Link] equivalent expression is:
πsname(πsid((πbidσcolor=′red′ Boats) 𝝰⊳ Reserves) 𝝰⊳ Sailors)

The reader is invited to rewrite both of these queries by using ρ to make the interme-
diate relations explicit and to compare the schemas of the intermediate relations. The
second expression generates intermediate relations with fewer fields (and is therefore
likely to result in intermediate relation instances with fewer tuples, as well). A rela-
tional query optimizer would try to arrive at the second expression if it is given the
first.
(Q3) Find the colors of boats reserved by Lubber.

πcolor((σsname=′Lubber′ Sailors) 𝝰⊳ Reserves 𝝰⊳ Boats)

This query is very similar to the query we used to compute sailors who reserved red boats.
On instances B1, R2, and S3, the query will return the colors gren and red.

(Q4) Find the names of sailors who have reserved at least one boat.

πsname(Sailors 𝝰⊳ Reserves)

(Q5) Find the names of sailors who have reserved a red or a green boat.

ρ(T empboats, (σcolor=′red′ Boats) U (σcolor=′green′ Boats))

πsname(Tempboats 𝝰⊳Reserves 𝝰⊳Sailors)


We identify the set of all boats that are either red or green (Tempboats, which contains
boats with the bids 102, 103, and 104 on instances B1, R2, and S3). Then we join with
Reserves to identify sids of sailors who have reserved one of these boats; this gives us sids
22, 31, 64, and 74 over our example instances. Finally, we join (an intermediate relation
containing this set of sids) with Sailors to find the names of Sailors with these sids. This
gives us the names Dustin, Horatio, and Lubber on the instances B1, R2, and S3.
Another equivalent definition is the following:
ρ(T empboats, (σcolor=′red′∨ color=′green′ Boats))

πsname(Tempboats 𝝰⊳Reserves 𝝰⊳ Sailors)

Dept of CSE, Unit-2 Page 27


(Q6) Find the names of sailors who have reserved a red and a green boat

ρ(T empboats2, (σcolor=′red′ Boats) ∩ (σcolor=′green′

Boats)) πsname(Tempboats2 𝝰⊳ Reserves 𝝰⊳ Sailors)

However, this solution is incorrect—it instead tries to compute sailors who have re-
served a boat that is both red and green. (Since bid is a key for Boats, a boat can be
only one color; this query will always return an empty answer set.) The correct
approach is to find sailors who have reserved a red boat, then sailors who have reserved a
green boat, and then take the intersection of these two sets:
ρ(T empred, πsid((σcolor=′red′ Boats) 𝝰⊳ Reserves))

ρ(T empgreen, πsid((σcolor=′green′ Boats) 𝝰⊳ Reserves))

πsname((Tempred ∩ Tempgreen) 𝝰⊳ Sailors)

The two temporary relations compute the sids of sailors, and their intersection identifies
sailors who have reserved both red and green boats. On instances B1, R2, and S3, the
sids of sailors who have reserved a red boat are 22, 31, and 64. The sids of sailors who
have reserved a green boat are 22, 31, and 74. Thus, sailors 22 and 31 have reserved
both a red boat and a green boat; their names are Dustin and Lubber.
This formulation of Query Q6 can easily be adapted to find sailors who have reserved
red or green boats (Query Q5); just replace ∩ by ∪ :
ρ(T empred, πsid((σcolor=′red′ Boats) 𝝰⊳ Reserves))

ρ(T empgreen, πsid((σcolor=′green′ Boats) 𝝰⊳ Reserves))

πsname((Tempred ∪ Tempgreen) 𝝰⊳Sailors)


In the above formulations of Queries Q5 and Q6, the fact that sid (the field over which
we compute union or intersection) is a key for Sailors is very important. Consider the
following attempt to answer Query Q6:

ρ(T empred, πsname((σcolor=′red′ Boats) 𝝰⊳Reserves 𝝰⊳ Sailors))

ρ(T empgreen, πsname((σcolor=′green′ Boats) 𝝰⊳Reserves 𝝰⊳


Sailors)) T empred ∩T empgreen

This attempt is incorrect for a rather subtle reason. Two distinct sailors with the same
name, such as Horatio in our example instances, may have reserved red and

Dept of CSE, Unit-2 Page 28


green boats, respectively. In this case, the name Horatio will (incorrectly) be included
in the answer even though no one individual called Horatio has reserved a red boat
and a green boat. The cause of this error is that sname is being used to identify sailors
(while doing the intersection) in this version of the query, but sname is not a key.
(Q7) Find the names of sailors who have reserved at least two boats.
ρ(Reservations, πsid,sname,bid(Sailors 𝝰⊳ Reserves))
ρ(Reservationpairs(1 → sid1, 2 → sname1, 3 → bid1, 4 → sid2,
5 → sname2,6 → bid2),Reservations × Reservations)
πsname1σ(sid1=sid2) ∩ (bid1=bid2)Reservationpairs

First we compute tuples of the form 〈 sid,sname,bid 〉 , where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations.
Next we find all pairs of Reservations tuples where the same sailor has made both
reservations and the boats involved are distinct. Here is the central idea: In order
to show that a sailor has reserved two boats, we must find two Reservations tuples
involving the same sailor but distinct boats. Over instances B1, R2, and S3, the
sailors with sids 22, 31, and 64 have each reserved at least two boats. Finally, we
project the names of such sailors to obtain the answer, containing the names Dustin,
Horatio, and Lubber.

Notice that we included sid in Reservations because it is the key field identifying sailors, and
we need it to check that two Reservations tuples involve the same sailor. As noted in the
previous example, we can‟t use sname for this purpose.

(Q8) Find the sids of sailors with age over 20 who have not reserved a red boat.

πsid(σage>20Sailors) −πsid((σcolor=′red′ Boats) 𝝰⊳ Reserves 𝝰⊳ Sailors)

(Q9) Find the names of sailors who have reserved all boats.

The use of the word all (or every) is a good indication that the division operation might
be applicable:
ρ(T empsids, (πsid,bidReserves)/(πbidBoats))

πsname(Tempsids 𝝰⊳ Sailors)

(Q10) Find the names of sailors who have reserved all boats called Interlake.

ρ(T empsids, (πsid,bidReserves)/(πbid(σbname=′Interlake′ Boats))) πsname(Tempsids 𝝰⊳

Sailors)

RELATIONAL CALCULUS

Relational calculus is an alternative to relational algebra. In contrast to the algebra,


which is procedural, the calculus is nonprocedural, or declarative, in that it allows
us to describe the set of answers without being explicit about how they should be
Dept of CSE, Unit-2 Page 29
computed. Relational calculus has had a big influence on the design of commercial
query languages such as SQL and, especially, Query-by-Example (QBE).
The variant of the calculus that we present in detail is called the tuple relational
calculus (TRC). Variables in TRC take on tuples as values. In another variant, called

Dept of CSE, Unit-2 Page 30


the domain relational calculus (DRC), the variables range over field values. TRC has
had more of an influence on SQL, while DRC has strongly influenced QBE.
Tuple Relational Calculus

A tuple variable is a variable that takes on tuples of a particular relation schema as


values. That is, every value assigned to a given tuple variable has the same number and
type of fields. A tuple relational calculus query has the form { T | p(T) }, where T is a
tuple variable and p(T ) denotes a formula that describes T ; we will shortly define
formulas and queries rigorously. The result of this query is the set of all tuples t for which
the formula p(T ) evaluates to true with T = t. The language for writing formulas p(T ) is
thus at the heart of TRC and is essentially a simple subset of first-order logic. As a simple
example, consider the following query.
(Q11) Find all sailors with a rating above 7.

{S I S E Sailors ^ S. rating > 7}

When this query is evaluated on an instance of the Sailors relation, the tuple variable S is
instantiated successively with each tuple, and the test [Link]>7 is applied. The answer
contains those instances of S that pass this test. On instance S3 of Sailors, the answer
contains Sailors tuples with sid 31, 32, 58, 71, and 74.
Syntax of TRC Queries

We now define these concepts formally, beginning with the notion of a formula. Let Rel
be a relation name, R and S be tuple variables, a an attribute of R, and b an attribute
of S. Let op denote an operator in the set {<,>,=,≤, ≥, =}. An atomic formula is one
of the following:
a. R E Rel
b. R.a op S.b
c. R.a op constant, or constant op R.a
A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(R) denotes a formula in which the variable R appears:
any atomic formula
¬ p, p V q, p ^ q, or p  q

 R(p(R)), where R is a tuple variable

Dept of CSE, Unit-2 Page 31


 R(p(R)), where R is a tuple variable

We observe that every variable in a TRC formula appears in a subformula that is


atomic, and every relation schema specifies a domain for each field; this observation
ensures that each variable in a TRC formula has a well-defined domain from which
values for the variable are drawn. That is, each variable has a well-defined type, in the
programming language sense. Informally, an atomic formula R ∈ Rel gives R the type
of tuples in Rel, and comparisons such as R.a op S.b and R.a op constant induce type
restrictions on the field R.a. If a variable R does not appear in an atomic formula of
the form R E Rel (i.e., it appears only in atomic formulas that are comparisons), we
will follow the convention that the type of R is a tuple whose fields include all (and
only) fields of R that appear in the formula.

We will not define types of variables formally, but the type of a variable should be clear in
most cases, and the important point to note is that comparisons of values having
different types should always fail. (In discussions of relational calculus, the simplifying
assumption is often made that there is a single domain of constants and that this is the
domain associated with each field of each relation.)

A TRC query is defined to be expression of the form {T | p(T)}, where T is the only free
variable in the formula p.
Semantics of TRC Queries

What does a TRC query mean? More precisely, what is the set of answer tuples for a
given TRC query? The answer to a TRC query {T | p(T)}, as we noted earlier, is the
set of all tuples t for which the formula p(T ) evaluates to true with variable T assigned
the tuple value t. To complete this definition, we must state which assignments of tuple
values to the free variables in a formula make the formula evaluate to true.
A query is evaluated on a given instance of the database. Let each free variable in a
formula F be bound to a tuple value. For the given assignment of tuples to variables,

Dept of CSE, Unit-2 Page 32


with respect to the given database instance, F evaluates to (or simply „is‟) true if one
of the following holds:
 F is an atomic formula R  Rel, and R is assigned a tuple in the instance
of relation Rel.
 F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the
tuples assigned to R and S have field values R.a and S.b that make the comparison
true.
 F is of the form ¬p, and p is not true; or of the form p ^ q, and both p and q are
true; or of the form p V q, and one of them is true, or of the form p  q and q is
true whenever4 p is true.
 F is of the form R(p(R)), and there is some assignment of tuples to the free
variables in p(R), including the variable R,5 that makes the formula p(R) true.
 F is of the form R(p(R)), and there is some assignment of tuples to the
free variables in p(R) that makes the formula p(R) true no matter what
tuple is assigned to R.
Examples of TRC Queries

We now illustrate the calculus through several examples, using the instances B1 of
Boats, R2 of Reserves, and S3 of Sailors shown in Figures 4.15, 4.16, and 4.17. We will
use parentheses as needed to make our formulas unambiguous. Often, a formula p(R)
includes a condition RRel, and the meaning of the phrases some tuple R and for all
tuples R is intuitive. We will use the notation R  Rel(p(R)) for  R(R  Rel  p(R)).
Similarly, we use the notation R  Rel(p(R)) for R(R  Rel  p(R)).
(Q12) Find the names and ages of sailors with a rating above 7 .

{P | S  Sailors([Link] > 7  [Link] = [Link]  [Link] = [Link])}


This query illustrates a useful convention: P is considered to be a tuple variable with
exactly two fields, which are called name and age, because these are the only fields of
P that are mentioned and P does not range over any of the relations in the query;
that is, there is no subformula of the form P  Relname. The result of this query is
a relation with two fields, name and age. The atomic formulas [Link] = [Link]

Dept of CSE, Unit-2 Page 33


and [Link] = [Link] give values to the fields of an answer tuple P . On instances B1,
R2, and S3, the answer is the set of tuples 〈Lubber, 55.5〉, 〈Andy, 25.5〉, 〈Rusty, 35.0〉,

〈Zorba, 16.0〉, and 〈Horatio,35.0〉.

(Q13) Find the sailor name, boat id, and reservation date for each reservation

{P | R Reserves S  Sailors

([Link] = [Link]  [Link] = [Link]  [Link] = [Link]  [Link] = [Link])}


For each Reserves tuple, we look for a tuple in Sailors with the same sid. Given a pair
of such tuples, we construct an answer tuple P with fields sname, bid, and day bycopying
the corresponding fields from these two tuples. This query illustrates how we can combine
values from different relations in each answer tuple. The answer to this query on
instances B1, R2, and S3 is shown in Figure 4.20.

sname bid day


Dustin 10 10/10/98
Dustin 102 10/10/98
Dustin 103 10/8/98
Dustin 104 10/7/98
Lubber 102 11/10/98
Lubber 103 11/6/98
Lubber 104 11/12/98
Horati 101 9/5/98
Horati 102 9/8/98
Horati 103 9/8/98

Figure 4.20 Answer to Query Q13

(Q1) Find the names of sailors who have reserved boat 103.

{P | S  Sailors R  Reserves([Link] = [Link]  [Link] = 103 [Link] = [Link])}


This query can be read as follows: “Retrieve all sailor tuples for which
there exists a tuple in Reserves, having the same value in the sid field, and
with bid = 103.” That is, for each sailor tuple, we look for a tuple in
Reserves that shows that this sailor has reserved boat 103. The answer tuple
P contains just one field, sname

Dept of CSE, Unit-2 Page 34


(Q2) Find the names of sailors who have reserved a red boat.

{P | S  Sailors R  Reserves([Link] = [Link]  [Link] =

[Link]  B Boats([Link] = [Link]  [Link] =′red′))}


This query can be read as follows: “Retrieve all sailor tuples S for which there exist
tuples R in Reserves and B in Boats such that [Link] = [Link], [Link] = [Link], and
[Link] =′red′.” Another way to write this query, which corresponds more closely to this
reading, is as follows:

(Q7) Find the names of sailors who have reserved at least two boats.

{P | S  Sailors R1  Reserves R2  Reserves ([Link] = [Link]

 [Link] = [Link]  [Link] ≠ [Link]  [Link] = [Link])}


Contrast this query with the algebra version and see how much simpler the calculus
version is. In part, this difference is due to the cumbersome renaming of fields in the
algebra version, but the calculus version really is simpler.

(Q9) Find the names of sailors who have reserved all boats.

{P | S  Sailors B  Boats

(R  Reserves([Link] = [Link] [Link] = [Link] [Link] = [Link]))}


This query was expressed using the division operator in relational algebra. Notice how
easily it is expressed in the calculus. The calculus query directly reflects how we might
express the query in English: “Find sailors S such that for all boats B there is a Reserves
tuple showing that sailor S has reserved boat B.”
(Q14) Find sailors who have reserved all red boats.

{S | S  Sailors  B  Boats

([Link] =′red′ (R ∈ Reserves([Link] = [Link]  [Link] = [Link])))}


This query can be read as follows: For each candidate (sailor), if a boat is red, the
sailor must have reserved it. That is, for a candidate sailor, a boat being red must
imply the sailor having reserved it. Observe that since we can return an entire sailor
tuple as the answer instead of just the sailor‟s name, we have avoided introducing a
new free variable (e.g., the variable P in the previous example) to hold the answer
values. On instances B1, R2, and S3, the answer contains the Sailors tuples with sids 22

Dept of CSE, Unit-2 Page 35


expression of the form p q is logically equivalent to ¬p q:

{S | S  Sailors B  Boats

([Link] ≠′red′  (R  Reserves([Link] = [Link]  [Link] = [Link])))}


This query should be read as follows: “Find sailors S such that for all boats B, either
the boat is not red or a Reserves tuple shows that sailor S has reserved boat B.”
Domain Relational Calculus

A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute
whose domain is the set of integers). A DRC query has the form {〈x1, x2, . . . , xn〉 |
p(〈x1,x2,.. ., xn〉)}, where each xi is either a domain variable or a constant and
p( 〈 x1,x2,.. ., xn 〉 ) denotes a DRC formula whose only free variables are thevari-ables

among the xi, 1 ≤ i ≤ n. The result of this query is the set of all tuples 〈x1, x2,..

.,xn 〉 for which the formula evaluates to true.

A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are now domain variables. Let op
denote an operator in the set {<, >, =, ≤, ≥, =} and let X and Y be domain variables. An
atomic formula in DRC is one of the following:

〈 x1,x2,...,xn 〉 ∈ Rel, where Rel is a relation with n attributes; each xi, 1 ≤ i ≤ n is either a
variable or a constant.

X op Y

X op constant, or constant op X

A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(X) denotes a formula in which the variable X appears:

Dept of CSE, Unit-2 Page 36


¬ p, p V q, p  q, or p  q

 X(p(X)), where X is a domain variable


 X(p(X)), where X is a domain variable
Examples of DRC Queries

We now illustrate DRC through several examples. The reader is invited to compare
these with the TRC versions.

(Q11) Find all sailors with a rating above 7.

{〈I, N, T, A〉 | 〈I, N, T, A〉  Sailors  T > 7}

This differs from the TRC version in giving each attribute a (variable) name. The
condition 〈 I, N, T, A 〉  Sailors ensures that the domain variables I, N , T , and
Aarerestricted to be fields of the same tuple. In comparison with the TRC query, we can
say T > 7 instead of [Link] > 7, but we must specify the tuple 〈 I, N, T, A 〉 in the result,
rather than just S.

(Q1) Find the names of sailors who have reserved boat 103.

{〈N 〉 | I, T, A(〈I, N, T, A〉  Sailors

Ir, Br, D(〈Ir, Br, D〉  Reserves  Ir = I  Br = 103))}


(Q2) Find the names of sailors who have reserved a red boat.

{〈N 〉 | I, T, A(〈I, N, T, A〉  Sailors

I, Br, D〉  Reserves  〈Br, BN,′red′〉  Boats)}


(Q7) Find the names of sailors who have reserved at least two boats.

{〈N 〉| I, T, A(〈I, N, T, A〉 Sailors  Br1, Br2, D1, D2(〈I, Br1, D1〉Reserves 
〈I, Br2, D2 〉  Reserves  Br1 = Br2)
Notice how the repeated use of variable I ensures that the same sailor has reserved both
the boats in question.

(Q9) Find the names of sailors who have reserved all boats.

{〈N〉|I,T, A(〈I, N,T,A〉Sailors

B, BN, C(¬(〈B, BN, C〉  Boats) 


30
(〈Ir, Br, D〉 Reserves(I = Ir  Br = B))))}

This query can be read as follows: “Find all values of N such that there is some tuple
〈 I, N,T,A 〉 in Sailors satisfying the following condition: for every 〈 B, BN,C 〉 , either

this is not a tuple in Boats or there is some tuple 〈 Ir, Br, D 〉 in Reserves that proves

that Sailor I has reserved boat B.” The ∀ quantifier allows the domain variables B,
BN, and C to range over all values in their respective attribute domains, and the
pattern „¬( 〈 B, BN, C 〉  Boats)∨ ‟ is necessary to restrict attention to those values
that appear in tuples of Boats.

FUNCTIONAL DEPENDENCIES

A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a
relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R
1
satisfies the FD X ! Y if the following holds for every
pair of tuples t1 and t2 in r:

If t1:X = t2:X, then t1:Y = t2:Y .

We use the notation t1:X to refer to the projection of tuple t1 onto the attributes in X, in a natural
extension of our TRC notation (see Chapter 4) t:a for referring to attribute a of tuple t. An FD X ! Y
essentially says that if two tuples agree on the values in attributes X, they must also agree on the
values in attributes Y.
30

Normalization

Database normalization is a database design principle for organizing data in an organized and consistent
way.

It helps you avoid redundancy and maintain the integrity of the database. It also helps you eliminate
undesirable characteristics associated with insertion, deletion, and updating.

Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is stored in only one place,
reducing the risk of data inconsistency and making it easier to maintain data accuracy.
By breaking down data into smaller, more specific tables, normalization helps ensure that each table
stores only relevant data, which improves the overall data integrity of the database.
Normalization simplifies the process of updating data, as it only needs to be changed in one place rather
than in multiple places throughout the database.
Normalization enables users to query the database using a variety of different criteria, as the data is
organized into smaller, more specific tables that can be joined together as needed.
Normalization can help ensure that data is consistent across different applications that use the same
database, making it easier to integrate different applications and ensuring that all users have access to
accurate and consistent data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for additional join
operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across multiple tables and
require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database design and the
normalization process.
Normalization can increase the complexity of a database design, especially if the data model is not well
understood or if the normalization process is not carried out correctly.
Types of Normalization:
1. 1NF
2. 2NF
3. 3NF
4. BCNF

1NF: each column in a table contains atomic, indivisible values. There should be no repeating
groups, and each column should have a unique name.
30
Consider an unnormalized table that stores customer orders:

OrderID Customer Products


1 John Apples, Bananas, Oranges
2 Alice Grapes, Strawberries
3 Bob Lemons, Limes
This table violates 1NF because the Products column contains a list of items. To bring it to 1NF, we
split the products into separate rows:
OrderID Customer Product
1 John Apples
1 John Bananas
1 John Oranges
2 Alice Grapes
2 Alice Strawberries
3 Bob Lemons
3 Bob Limes

2NF: 2NF eliminates partial dependencies. A table is in 2NF if it‟s in 1NF and all non-key
attributes are functionally dependent on the entire primary key.
StudentID CourseID CourseName Instructor
1 101 Math Prof. Smith
1 102 Physics Prof. Johnson
2 101 Math Prof. Smith
3 103 History Prof. Davis
This table violates 2NF because the Instructor attribute depends on both StudentID and CourseID.
To achieve 2NF, we split the table into two separate tables:
Students Table:
StudentID StudentName
1 John
2 Alice
3 Bob
Courses Table:
CourseID CourseName Instructor
101 Math Prof. Smith
102 Physics Prof. Johnson
103 History Prof. Davis
Now, the Instructor attribute depends only on the CourseID, and the table is in 2NF.
30

3NF: A stricter version of 3NF, BCNF ensures that every non-trivial functional dependency is a
superkey. This means that no partial dependencies or transitive dependencies are allowed.

EmployeeID ProjectID ProjectName Manager


1 101 ProjectA John
1 102 ProjectB Alice
2 101 ProjectA John
3 103 ProjectC Bob
This table violates 3NF because the Manager attribute depends on the EmployeeID, not directly on the
primary key. To bring it to 3NF, we split the table into two separate tables:
Employees Table:
EmployeeID EmployeeName
1 John
2 Alice
3 Bob
Projects Table:
ProjectID ProjectName
101 ProjectA
102 ProjectB
103 ProjectC
EmployeeProjects Table:
EmployeeID ProjectID
1 101
1 102
2 101
3 103

BCNF : BCNF is a stricter version of 3NF. To illustrate BCNF, consider a table that stores information
about professors and their research areas:

ProfessorID ResearchArea OfficeNumber


1 Artificial Intelligence 101
2 Machine Learning 102
3 Artificial Intelligence 103
This table violates BCNF because there is a non-trivial functional dependency
between ResearchArea and OfficeNumber (i.e., the office number depends on the research area). To
achieve BCNF, we split the table into two separate tables:
Professors Table:
ProfessorID ProfessorName
1 Prof. Smith
30
ProfessorID ProfessorName
2 Prof. Johnson
3 Prof. Davis
ResearchAreas Table:
ResearchArea OfficeNumber
Artificial Intelligence 101
Machine Learning 102
ProfessorResearch Table:
ProfessorID ResearchArea
1 Artificial Intelligence
2 Machine Learning
3 Artificial Intelligence
Now, the table is in BCNF because there are no non-trivial functional dependencies.
9
14

You might also like