DBMS Notes Unit 2
DBMS Notes Unit 2
Syllabus
1. Database Basics
o Characteristics of databases.
3. Database Environment
4. Normalization
19
o Functional dependencies.
examples).
DELETE).
3. Query Optimization
1. Transaction Management
2. Concurrency Control
3. Database Recovery
paging.
Unit 2
What is ER Modeling?
A graphical technique for understanding and organizing the data independent of the actual database
implementation
We need to be familiar with the following terms to go further.
Entity
Any thing that has an independent existence and about which we collect data. It is also known as entity type.
In ER modeling, notation for entity is given below.
Entity instance
Entity instance is a particular member of the entity type.
Example for entity instance : A particular employee
Regular Entity
An entity which has its own key attribute is a regular entity.
Example for regular entity : Employee.
19
Weak entity
An entity which depends on other entity for its existence and doesn't have any key attribute of its own is a weak
entity.
Example for a weak entity : In a parent/child relationship, a parent is considered as a strong entity and the child
is a weak entity.
In ER modeling, notation for weak entity is given below.
Attributes
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the attribute. For example, the
attribute day may take any value from the set {Monday, Tuesday ... Friday}. Hence this set can be termed as the
domain of the attribute day.
Key attribute
The attribute (or combination of attributes) which is unique for every entity instance is called key attribute.
E.g the employee_id of an employee, pan_card_number of a person [Link] the key attribute consists of two or
more attributes in combination, it is called a composite key.
In ER modeling, notation for key attribute is given below.
Simple attribute
Composite attribute
If an attribute can be split into components, it is called a composite attribute.
Example for composite attribute : Name of the employee which can be split into First_name, Middle_name, and
Last_name.
Single valued Attributes
20
If an attribute can take only a single value for each entity instance, it is a single valued attribute.
example for single valued attribute : age of a student. It can take only one value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a multi-valued attribute. Multi-valued
example for multi valued attribute : telephone number of an employee, a particular employee may have multiple
telephone numbers.
In ER modeling, notation for multi-valued attribute is given below.
Stored Attribute
An attribute which can be calculated or derived based on other attributes is a derived attribute.
Example for derived attribute : age of employee which can be calculated from date of birth and current date.
In ER modeling, notation for derived attribute is given below.
Relationships
However in ER Modeling, To connect a weak Entity with others, you should use a weak relationship notation as
given below
21
Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary relationship is the general form for
degree n. Special cases are unary, binary, and ternary ,where the degree is 1, 2, and 3, respectively.
Example for unary relationship : An employee ia a manager of another employee
Example for binary relationship : An employee works-for department.
Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship
Relationship cardinalities specify how many of each entity type is allowed. Relationships can have four possible
connectivities as given below.
1. One to one (1:1) relationship
2. One to many (1:N) relationship
3. Many to one (M:1) relationship
4. Many to many (M:N) relationship
The minimum and maximum values of this connectivity is called the cardinality of the relationship
Example for Cardinality – One-to-One (1:1)
One employee is assigned with only one parking space and one parking space is assigned to only one
employee. Hence it is a 1:1 relationship and cardinality is One-To-One (1:1)
In ER modeling, this can be mentioned using notations as given below
22
One organization can have many employees , but one employee works in only one organization. Hence it is a
1:N relationship and cardinality is One-To-Many (1:N)
In ER modeling, this can be mentioned using notations as given below
One employee works in only one organization But one organization can have many employees. Hence it is a
M:1 relationship and cardinality is Many-to-One (M :1)
24
In ER modeling, this can be mentioned using notations as given below.
One student can enroll for many courses and one course can be enrolled by many students. Hence it is a M:N
relationship and cardinality is Many-to-Many (M:N)
In ER modeling, this can be mentioned using notations as given below
Relationship Participation
1. Total
In total participation, every entity instance will be connected through the relationship to another instance of the
other participating entity types
2. Partial
Example for relationship participation
Consider the relationship - Employee is head of the department.
Here all employees will not be the head of the department. Only one employee will be the head of the
department. In other words, only few instances of employee entity participate in the above relationship. So
employee entity's participation is partial in the said relationship.
1. However each department will be headed by some employee. So department entity's participation is total in the
said relationship
24
Advantages and Disadvantages of ER Modeling ( Merits and Demerits of ER Modeling )
Advantages
1. ER Modeling is simple and easily understandable. It is represented in business users language and it can be
understood by non-technical specialist.
2. Intuitive and helps in Physical Database creation.
3. Can be generalized and specialized based on needs.
4. Can help in database design.
5. Gives a higher level description of the system.
Disadvantages
1. Physical design derived from E-R Model may have some amount of ambiguities or inconsistency.
2. Sometime diagrams may lead to misinterpretations
Relational Model
The relational model is the primary data model for commercial data processing applications. It attained its primary
position because of its simplicity, which eases the job of the programmer, compared to earlier data models
such as the network model or the hierarchical model.
Structure of Relational Databases:
A relational database consists of a collection of tables, each of which is assigned a unique name. For example,
consider the instructor [Link] table has column headers: ID, name, dept name, and salary. Each row of this
table records information about an instructor, consisting of the instructor’s ID, name, dept name, and salary.
Similarly, the course table stores information about courses, consisting of a course id, title, dept name, and
credits, for each course. Note that each instructor is identified by the value of the column ID, while each
course is identified by the value of the column course id.
Thus, a row in the prefer table indicates that two courses are related in the sense that one course is a
prerequisite for the other. As another example, we consider the table instructor, a row in the table can be thought
of as representing the relationship between a specified ID and the corresponding values for name, dept name,
and salary values.
28
In general, a row in a table represents a relationship among a set of values. Since a table is a collection of such
relationships, there is a close correspondence between the concept of table and the mathematical concept of
relation, from which the relational data model takes its name. In mathematical terminology, a tuple is simply a
sequence (or list) of values. A relationship between n values is represented mathematically by an n-tuple of
values, i.e., a tuple with n values, which corresponds to a row in a table.
Thus, in the relational model the term relation is used to refer to a table, while the term tuple is used to refer to
a row. Similarly, the term attribute refers to a column of a table.
Examining Figure 1.5, we can see that the relation instructor has four attributes:
ID, name, dept name, and salary.
We use the term relation instance to refer to a specific instance of a relation, i.e., containing a specific set of
rows. The instance of instructor shown in Figure 1.5 has 12 tuples, corresponding to 12 instructors.
In this topic, we shall be using a number of different relations to illustrate the various concepts underlying the
relational data model. These relations represent part of a university. They do not include all the data an actual
28
university database would contain, in order to simplify our presentation.
The order in which tuples appear in a relation is irrelevant, since a relation is a set of tuples. Thus, whether the
tuples of a relation are listed in sorted order, as in Figure 1.5, or are unsorted, as in Figure 1.8, does not matter;
the relations in the two figures are the same, since both contain the same set of tuples. For ease of exposition,
we will mostly show the relations sorted by their first attribute. For each attribute of a relation, there is a set of
permitted values, called the domain of that attribute. Thus, the domain of the salary attribute of the instructor
relation is the set of all possible salary values, while the domain of the name attribute is the set of all possible
instructor names.
We require that, for all relations r, the domains of all attributes of r be atomic. A domain is atomic if elements of
the domain are considered to be indivisible units.
For example, suppose the table instructor had an attribute phone number, which can store a set of phone
numbers corresponding to the instructor. Then the domain of phone number would not be atomic, since an
element of the domain is a set of phone numbers, and it has subparts, namely the individual phone numbers
in the set.
The null value is a special value that signifies that the value is unknown or does not exist. For example, suppose
as before that we include the attribute phone number in the instructor relation. It may be that an instructor does
not have a phone number at all, or that the telephone number is unlisted. We would then have to use the null
value to signify that the value is unknown or does not exist. We shall see later that null values cause a number of
difficulties when we access or update the database, and thus should be eliminated if at all possible. We shall
assume null values are absent initially.
Database Schema
When we talk about a database, we must differentiate between the database schema, which is the logical
design of the database, and the database instance, which is a snapshot of the data in the database at a given
instant in time. The concept of a relation corresponds to the programming-language notion of a variable, while
the concept of a relation schema corresponds to the programming-language notion of type definition.
28
similarly the contents of a relation instance may change with time as the relation is updated. In contrast, the
schema of a relation does not generally change. Although it is important to know the difference between a
relation schema and a relation instance, we often use the same name, such as instructor, to refer to both the
schema and the instance. Where required, we explicitly refer to the schema or to the instance, for example “the
instructor schema,” or “an instance of the instructor relation.” However, where it is clear whether we mean the
schema or the instance, we simply use the relation name.
Consider the department relation of Figure 1.9. The schema for that relation is
Note that the attribute dept name appears in both the instructor schema and the department schema. This
duplication is not a coincidence. Rather, using common attributes in relation schemas is one way of relating
tuples of distinct relations.
For example, suppose we wish to find the information about all the instructors who work in the Watson building.
We look first at the department relation to find the dept name of all the departments housed in Watson. Then, for
each such department, we look in the instructor relation to find the information about the instructor associated
with the corresponding dept name.
Let us continue with our university database example. Each course in a university may be offered multiple times,
across different semesters, or even within a [Link] need a relation to describe each individual
offering, or section, of the class. The schema is
section (course id, sec id, semester, year, building, room number, time slot id)
Figure 1.10 shows a sample instance of the section relation. We need a relation to describe the association
between instructors and the class sections that they teach. The relation schema to describe this association is
teaches (ID, course id, sec id, semester, year)
28
Figure 1.10: The section relation.(2-6)
Figure 1.11 shows a sample instance of the teaches relation. As you can imagine, there are many more relations
maintained in a real university database. In addition to those relations we have listed already, instructor,
department, course, section, prereq, and teaches,we use the following relations in this text:
Keys
We must have a way to specify how tuples within a given relation are distinguished. This is expressed in terms of their
attributes. That is, the values of the attribute values of a tuple must be such that they can uniquely identify the tuple. In other
words, no two tuples in a relation are allowed to have exactly the same value for all attributes
These Are
Super key
Candidate Key
Primary Key
Foreign Key
Super Key : A superkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely
a tuple in the relation. For example, the ID attribute of the relation instructor is sufficient to distinguish one
instructor tuple from another. Thus, ID is a superkey. The name attribute of instructor, on the other hand, is not a
superkey, because several instructors might have the same name. Formally, let R denote the set of attributes
in the schema of relation r. If we say that a subset K of R is a superkey for r , we are restricting consideration
to instances of relations r in which no two distinct tuples have the same values on all attributes in K. That is, if
t1 and t2 are in r and t1 = t2, then t1.K = t2.K.
Example
1. {StudentID}
2. {Email}
3. {Phone}
4. {StudentID, Name}
5. {Email, Phone}
6. {StudentID, Email, Phone}
All of these sets can uniquely identify a student, so they are super keys.
candidate keys A superkey may contain extraneous attributes. For example, the combination of ID and name
is a superkey for the relation instructor. If K is a superkey, then so is any superset of K. We are often
interested in superkeys for which no proper subset is a superkey. Such minimal superkeys are called candidate
keys.
It is possible that several distinct sets of attributes could serve as a candidate key. Suppose that a combination
of name and dept name is sufficient to distinguish among members of the instructor relation. Then, both {ID} and
{name, dept name} are candidate keys. Although the attributes ID and name together can distinguish instructor
tuples, their combination, {ID, name}, does not form a candidate key, since the attribute ID alone is a candidate key.
29
primary key A key (whether primary, candidate, or super) is a property of the entire relation, rather than of the
individual tuples. Any two individual tuples in the relation are prohibited from having the same value on the
key attributes at the same time. The designation of a key represents a constraint in the real-world
enterprise being modeled.
foreign key It is customary to list the primary key attributes of a relation schema before the other attributes; for
example, the dept name attribute of department is listed first, since it is the primary key. Primary key
attributes are also underlined. A relation, say r1, may include among its attributes the primary key of another
relation, say r2. This attribute is called a foreign key from r1, referencing r2.
The relation r1 is also called the referencing relation of the foreign key dependency, and r2 is called the referenced
relation of the foreign key. For example, the attribute dept name in instructor is a foreign key from instructor, referencing
department, since dept name is the primary key of department. In any database instance, given any tuple, say ta, from the
instructor relation, there must be some tuple, say tb, in the department relation such that the value of the dept name
attribute of ta is the same as the value of the primary key, dept name, of tb.
The constraint from section to teaches is an example of a referential integrity constraint; a referential integrity
constraint requires that the values appearing in specified attributes of any tuple in the referencing relation also
appear in specified attributes of at least one tuple in the referenced relation.
Schema Diagrams
A database schema, along with primary key and foreign key dependencies, can be depicted by schema
diagrams. Figure 1.12 shows the schema diagram for our university organization. Each relation appears as a
box, with the relation name at the top in blue, and the attributes listed inside the box. Primary key attributes are
shown underlined. Foreign key dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation.
Referential integrity constraints other than foreign key constraints are not shown explicitly in schema diagrams.
We will study a different diagrammatic representation called the entity-relationship diagram.
1. Unary algebra:
Selection (σ)
Projection (π)
Rename ρ(R)
2. Binary algebra:
Suppose that we wanted to find out only the ages of sailors. The expression
πage(S2)
a single tuple with age=35.0 appears in the result of the projection. This follows from
the definition of a relation as a set of tuples. In
practice, real systems often omit the expensive step of eliminating duplicate tuples,
leading to relations that are multisets. However, our discussion of relational algebra
and calculus assumes that duplicate elimination is always done so that relations are
always sets of tuples.
We can compute the names and ratings of highly rated sailors by combining two of the
preceding queries. The expression
πsname,rating(σrating>8(S2))
Set Operations
The following standard operations on sets are also available in relational algebra: union (U),
intersection (∩), set-difference (−), and cross-product (×).
Figure 4.8 S1 ∪ S2
Renaming
For example, the expression ρ(C(1 → sid1, 5 → sid2), S1 × R1) returns a relation
that contains the tuples shown in Figure 4.11 and has the following schema: C(sid1:
integer, sname: string, rating: integer, age: real, sid2: integer, bid: integer,day:
dates).
It is customary to include some additional operators in the algebra, but they can all be
defined in terms of the operators that we have defined thus far. (In fact, the renaming
operator is only needed for syntactic convenience, and even the ∩ operator is redundant; R
∩ S can be defined as R − (R − S).) We will consider these additional operators,and
their definition in terms of the basic operators, in the next two subsections.
The join operation is one of the most useful operations in relational algebra and is
the most commonly used way to combine information from two or more relations.
Although a join can be defined as a cross-product followed by selections and projections,
joins arise much more frequently in practice than plain cross-products.
joins have received a lot of attention, and there are several variants of
the join operation.
Condition Joins
The most general version of the join operation accepts a join condition c and a pair of
relation instances as arguments, and returns a relation instance. The join condition is
identical to a selection condition in form. The operation is defined as follows:
R 𝝰⊳c S = σc(R × S)
attribute of a relation, say R, can be by position (of the form R.i) or by name (of the
form [Link]).As an example, the result of S1 𝝰⊳[Link]<[Link] R1 is shown in Figure 4.12.
Because sid appears in both S1 and R1, the corresponding fields in the result of the cross-
product S1 × R1 (and therefore in the result of S1 𝝰⊳[Link]<[Link] R1) are unnamed.
Domains are inherited from the corresponding fields of S1 and R1.
Equijoin
We illustrate S1 𝝰⊳[Link]=[Link] R1 in Figure 4.13. Notice that only one field called sid
Natural Join
The equijoin expression S1 𝝰⊳[Link]=[Link] R1 is actually a natural join and can simply be
denoted as S1 𝝰⊳ R1, since the only common field is sid. If the two relations have no
attributes in common, S1 𝝰⊳ R1 is simply the cross-product.
Division
The division operator is useful for expressing certain kinds of queries, for example:
“Find the names of sailors who have reserved all boats.” Understanding how to use the
basic operators of the algebra to define division is a useful exercise. However,
Another way to understand division is as follows. For each x value in (the first column
of) A, consider the set of y values that appear in (the second field of) tuples of A with
that x value. If this set contains (all y values in) B, the x value is in the result of A/B.
An analogy with integer division may also help to understand division. For integers A
and B, A/B is the largest integer Q such that Q ∗ B ≤ A. For relation instances A
and B, A/B is the largest relation instance Q such that Q × B ⊆ A.
Expressing A/B in terms of the basic algebra operators is an interesting exercise, and
the reader should try to do this before reading further. The basic idea is to compute
all x values in A that are not disqualified. An x value is disqualified if by attaching a
y value from B, we obtain a tuple 〈 x,y 〉 that is not in A. We can compute disqualified tuples
using the algebra expression
πx((πx(A) × B) − A)
Thus we can define A/B as
πx(A) − πx((πx(A) × B) − A)
(Q1) Find the names of sailors who have reserved boat 103.
We first compute the set of tuples in Reserves with bid = 103 and then take the
natural join of this set with Sailors. This expression can be evaluated on instances
of Reserves and Sailors. Evaluated on the instances R2 and S3, it yields a relation
that contains just one field, called sname, and three tuples 〈Dustin〉, 〈 Horatio〉,and
We can break this query into smaller pieces using the renaming operator ρ:
ρ(T emp1, σbid=103Reserves)
ρ(T emp2, T emp1 𝝰⊳ Sailors) πsname(Temp2)
T emp1 denotes an intermediate relation that identifies reservations of boat 103. T
emp2 is another intermediate relation, and it denotes sailors who have made a
reservation in the set T emp1. The instances of these relations when evaluating this
query on the instances R2 and S3 are illustrated in Figures 4.18 and 4.19. Finally, we
extract the sname column from T emp2.
Figure 4.18Instance of T emp1 Figure 4.19 Instance of T emp2
πsname(σbid=103(Reserves𝝰⊳Sailors))
The DBMS translates an SQL query into (an extended form of) relational algebra, and
then looks for other algebra expressions that will produce the same answers but are
cheaper to evaluate. If the user‟s query is first translated into the expression
πsname(σbid=103(Reserves𝝰⊳Sailors))
a good query optimizer will find the equivalent expression
πsname((σbid=103Reserves) 𝝰⊳ Sailors)
(Q2) Find the names of sailors who have reserved a red boat.
The reader is invited to rewrite both of these queries by using ρ to make the interme-
diate relations explicit and to compare the schemas of the intermediate relations. The
second expression generates intermediate relations with fewer fields (and is therefore
likely to result in intermediate relation instances with fewer tuples, as well). A rela-
tional query optimizer would try to arrive at the second expression if it is given the
first.
(Q3) Find the colors of boats reserved by Lubber.
This query is very similar to the query we used to compute sailors who reserved red boats.
On instances B1, R2, and S3, the query will return the colors gren and red.
(Q4) Find the names of sailors who have reserved at least one boat.
πsname(Sailors 𝝰⊳ Reserves)
(Q5) Find the names of sailors who have reserved a red or a green boat.
However, this solution is incorrect—it instead tries to compute sailors who have re-
served a boat that is both red and green. (Since bid is a key for Boats, a boat can be
only one color; this query will always return an empty answer set.) The correct
approach is to find sailors who have reserved a red boat, then sailors who have reserved a
green boat, and then take the intersection of these two sets:
ρ(T empred, πsid((σcolor=′red′ Boats) 𝝰⊳ Reserves))
The two temporary relations compute the sids of sailors, and their intersection identifies
sailors who have reserved both red and green boats. On instances B1, R2, and S3, the
sids of sailors who have reserved a red boat are 22, 31, and 64. The sids of sailors who
have reserved a green boat are 22, 31, and 74. Thus, sailors 22 and 31 have reserved
both a red boat and a green boat; their names are Dustin and Lubber.
This formulation of Query Q6 can easily be adapted to find sailors who have reserved
red or green boats (Query Q5); just replace ∩ by ∪ :
ρ(T empred, πsid((σcolor=′red′ Boats) 𝝰⊳ Reserves))
This attempt is incorrect for a rather subtle reason. Two distinct sailors with the same
name, such as Horatio in our example instances, may have reserved red and
First we compute tuples of the form 〈 sid,sname,bid 〉 , where sailor sid has made a
reservation for boat bid; this set of tuples is the temporary relation Reservations.
Next we find all pairs of Reservations tuples where the same sailor has made both
reservations and the boats involved are distinct. Here is the central idea: In order
to show that a sailor has reserved two boats, we must find two Reservations tuples
involving the same sailor but distinct boats. Over instances B1, R2, and S3, the
sailors with sids 22, 31, and 64 have each reserved at least two boats. Finally, we
project the names of such sailors to obtain the answer, containing the names Dustin,
Horatio, and Lubber.
Notice that we included sid in Reservations because it is the key field identifying sailors, and
we need it to check that two Reservations tuples involve the same sailor. As noted in the
previous example, we can‟t use sname for this purpose.
(Q8) Find the sids of sailors with age over 20 who have not reserved a red boat.
(Q9) Find the names of sailors who have reserved all boats.
The use of the word all (or every) is a good indication that the division operation might
be applicable:
ρ(T empsids, (πsid,bidReserves)/(πbidBoats))
πsname(Tempsids 𝝰⊳ Sailors)
(Q10) Find the names of sailors who have reserved all boats called Interlake.
Sailors)
RELATIONAL CALCULUS
When this query is evaluated on an instance of the Sailors relation, the tuple variable S is
instantiated successively with each tuple, and the test [Link]>7 is applied. The answer
contains those instances of S that pass this test. On instance S3 of Sailors, the answer
contains Sailors tuples with sid 31, 32, 58, 71, and 74.
Syntax of TRC Queries
We now define these concepts formally, beginning with the notion of a formula. Let Rel
be a relation name, R and S be tuple variables, a an attribute of R, and b an attribute
of S. Let op denote an operator in the set {<,>,=,≤, ≥, =}. An atomic formula is one
of the following:
a. R E Rel
b. R.a op S.b
c. R.a op constant, or constant op R.a
A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(R) denotes a formula in which the variable R appears:
any atomic formula
¬ p, p V q, p ^ q, or p q
We will not define types of variables formally, but the type of a variable should be clear in
most cases, and the important point to note is that comparisons of values having
different types should always fail. (In discussions of relational calculus, the simplifying
assumption is often made that there is a single domain of constants and that this is the
domain associated with each field of each relation.)
A TRC query is defined to be expression of the form {T | p(T)}, where T is the only free
variable in the formula p.
Semantics of TRC Queries
What does a TRC query mean? More precisely, what is the set of answer tuples for a
given TRC query? The answer to a TRC query {T | p(T)}, as we noted earlier, is the
set of all tuples t for which the formula p(T ) evaluates to true with variable T assigned
the tuple value t. To complete this definition, we must state which assignments of tuple
values to the free variables in a formula make the formula evaluate to true.
A query is evaluated on a given instance of the database. Let each free variable in a
formula F be bound to a tuple value. For the given assignment of tuples to variables,
We now illustrate the calculus through several examples, using the instances B1 of
Boats, R2 of Reserves, and S3 of Sailors shown in Figures 4.15, 4.16, and 4.17. We will
use parentheses as needed to make our formulas unambiguous. Often, a formula p(R)
includes a condition RRel, and the meaning of the phrases some tuple R and for all
tuples R is intuitive. We will use the notation R Rel(p(R)) for R(R Rel p(R)).
Similarly, we use the notation R Rel(p(R)) for R(R Rel p(R)).
(Q12) Find the names and ages of sailors with a rating above 7 .
(Q13) Find the sailor name, boat id, and reservation date for each reservation
{P | R Reserves S Sailors
(Q1) Find the names of sailors who have reserved boat 103.
(Q7) Find the names of sailors who have reserved at least two boats.
(Q9) Find the names of sailors who have reserved all boats.
{P | S Sailors B Boats
{S | S Sailors B Boats
A domain variable is a variable that ranges over the values in the domain of some
attribute (e.g., the variable can be assigned an integer if it appears in an attribute
whose domain is the set of integers). A DRC query has the form {〈x1, x2, . . . , xn〉 |
p(〈x1,x2,.. ., xn〉)}, where each xi is either a domain variable or a constant and
p( 〈 x1,x2,.. ., xn 〉 ) denotes a DRC formula whose only free variables are thevari-ables
among the xi, 1 ≤ i ≤ n. The result of this query is the set of all tuples 〈x1, x2,..
A DRC formula is defined in a manner that is very similar to the definition of a TRC
formula. The main difference is that the variables are now domain variables. Let op
denote an operator in the set {<, >, =, ≤, ≥, =} and let X and Y be domain variables. An
atomic formula in DRC is one of the following:
〈 x1,x2,...,xn 〉 ∈ Rel, where Rel is a relation with n attributes; each xi, 1 ≤ i ≤ n is either a
variable or a constant.
X op Y
X op constant, or constant op X
A formula is recursively defined to be one of the following, where p and q are them-
selves formulas, and p(X) denotes a formula in which the variable X appears:
We now illustrate DRC through several examples. The reader is invited to compare
these with the TRC versions.
This differs from the TRC version in giving each attribute a (variable) name. The
condition 〈 I, N, T, A 〉 Sailors ensures that the domain variables I, N , T , and
Aarerestricted to be fields of the same tuple. In comparison with the TRC query, we can
say T > 7 instead of [Link] > 7, but we must specify the tuple 〈 I, N, T, A 〉 in the result,
rather than just S.
(Q1) Find the names of sailors who have reserved boat 103.
{〈N 〉| I, T, A(〈I, N, T, A〉 Sailors Br1, Br2, D1, D2(〈I, Br1, D1〉Reserves
〈I, Br2, D2 〉 Reserves Br1 = Br2)
Notice how the repeated use of variable I ensures that the same sailor has reserved both
the boats in question.
(Q9) Find the names of sailors who have reserved all boats.
This query can be read as follows: “Find all values of N such that there is some tuple
〈 I, N,T,A 〉 in Sailors satisfying the following condition: for every 〈 B, BN,C 〉 , either
this is not a tuple in Boats or there is some tuple 〈 Ir, Br, D 〉 in Reserves that proves
that Sailor I has reserved boat B.” The ∀ quantifier allows the domain variables B,
BN, and C to range over all values in their respective attribute domains, and the
pattern „¬( 〈 B, BN, C 〉 Boats)∨ ‟ is necessary to restrict attention to those values
that appear in tuples of Boats.
FUNCTIONAL DEPENDENCIES
A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a
relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R
1
satisfies the FD X ! Y if the following holds for every
pair of tuples t1 and t2 in r:
We use the notation t1:X to refer to the projection of tuple t1 onto the attributes in X, in a natural
extension of our TRC notation (see Chapter 4) t:a for referring to attribute a of tuple t. An FD X ! Y
essentially says that if two tuples agree on the values in attributes X, they must also agree on the
values in attributes Y.
30
Normalization
Database normalization is a database design principle for organizing data in an organized and consistent
way.
It helps you avoid redundancy and maintain the integrity of the database. It also helps you eliminate
undesirable characteristics associated with insertion, deletion, and updating.
Advantages of Normalization
Normalization eliminates data redundancy and ensures that each piece of data is stored in only one place,
reducing the risk of data inconsistency and making it easier to maintain data accuracy.
By breaking down data into smaller, more specific tables, normalization helps ensure that each table
stores only relevant data, which improves the overall data integrity of the database.
Normalization simplifies the process of updating data, as it only needs to be changed in one place rather
than in multiple places throughout the database.
Normalization enables users to query the database using a variety of different criteria, as the data is
organized into smaller, more specific tables that can be joined together as needed.
Normalization can help ensure that data is consistent across different applications that use the same
database, making it easier to integrate different applications and ensuring that all users have access to
accurate and consistent data.
Disadvantages of Normalization
Normalization can result in increased performance overhead due to the need for additional join
operations and the potential for slower query execution times.
Normalization can result in the loss of data context, as data may be split across multiple tables and
require additional joins to retrieve.
Proper implementation of normalization requires expert knowledge of database design and the
normalization process.
Normalization can increase the complexity of a database design, especially if the data model is not well
understood or if the normalization process is not carried out correctly.
Types of Normalization:
1. 1NF
2. 2NF
3. 3NF
4. BCNF
1NF: each column in a table contains atomic, indivisible values. There should be no repeating
groups, and each column should have a unique name.
30
Consider an unnormalized table that stores customer orders:
2NF: 2NF eliminates partial dependencies. A table is in 2NF if it‟s in 1NF and all non-key
attributes are functionally dependent on the entire primary key.
StudentID CourseID CourseName Instructor
1 101 Math Prof. Smith
1 102 Physics Prof. Johnson
2 101 Math Prof. Smith
3 103 History Prof. Davis
This table violates 2NF because the Instructor attribute depends on both StudentID and CourseID.
To achieve 2NF, we split the table into two separate tables:
Students Table:
StudentID StudentName
1 John
2 Alice
3 Bob
Courses Table:
CourseID CourseName Instructor
101 Math Prof. Smith
102 Physics Prof. Johnson
103 History Prof. Davis
Now, the Instructor attribute depends only on the CourseID, and the table is in 2NF.
30
3NF: A stricter version of 3NF, BCNF ensures that every non-trivial functional dependency is a
superkey. This means that no partial dependencies or transitive dependencies are allowed.
BCNF : BCNF is a stricter version of 3NF. To illustrate BCNF, consider a table that stores information
about professors and their research areas: