0% found this document useful (0 votes)
15 views47 pages

DBMS Module-3 Notes

The document provides notes on Database Management Systems (DBMS) focusing on normalization and database design theory. It outlines informal design guidelines for relation schemas, addressing issues such as redundancy, NULL values, and spurious tuples, while emphasizing the importance of clear semantics and functional dependencies. The guidelines aim to ensure efficient database design that minimizes anomalies and optimizes data integrity.

Uploaded by

akashn.23cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views47 pages

DBMS Module-3 Notes

The document provides notes on Database Management Systems (DBMS) focusing on normalization and database design theory. It outlines informal design guidelines for relation schemas, addressing issues such as redundancy, NULL values, and spurious tuples, while emphasizing the importance of clear semantics and functional dependencies. The guidelines aim to ensure efficient database design that minimizes anomalies and optimizes data integrity.

Uploaded by

akashn.23cs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

MODULE –3 NOTES DBMS- BCS403

MODULE 3 NOTES
ON
DATABASE MANAGEMENT SYSTEMS( BCS403)
2024 – 2025

B. E IV Semester

Department of Computer Science & Engineering

Sai Vidya Institute of Technology


Bengaluru -560064
Madhura N, Assistant Professor
Dr. Varun E, Associate Professor
Dr. Shashikumar D R, Professor and HOD

Web: www.saividya.ac.in
https://www.facebook.com/SaiVidyaInstituteOfTechnology

Page 1
MODULE –3 NOTES DBMS- BCS403

CHAPTER 1

NORMALIZATION: DATABASE DESIGN THEORY


▪ Relational Schema--consists of a number of attributes

▪ Relational database schema- consists of a number of relation schemas

▪ Assume that attributes are grouped to form a relation schema by using the common sense of the
database designer or by mapping a database schema design from a conceptual data model such as
the ER data model.

▪ These models make the designer identify entity types and relationship types and their respective
attributes, which leads to a natural and logical grouping of the attributes into relations.

▪ Database Design- coming up with a ‘good’ schema is very important.

14.1 Informal Design Guidelines for Relation Schemas

Four informal guidelines that may be used as measures to determine the quality of relation
schema design:

1. Making sure that the semantics of the attributes is clear in the schema

2. Reducing the redundant information in tuples

3. Reducing the NULL values in tuples

4. Disallowing the possibility of generating spurious tuples.

These measures are not always independent of one another .

14.1.1 Imparting Clear Semantics to Attributes in Relations


• Semantics of a relation refers to its meaning resulting from the interpretation of attribute values
in a tuple .

• Whenever we group attributes to form a relation schema, we assume that attributes belonging to
one relation have certain real-world meaning and a proper interpretation associated with them.

• The easier it is to explain the semantics of the relation, the better the relation schema design will
be.

Page 2
MODULE –3 NOTES DBMS- BCS403

Guideline 1

• Design a relation schema so that it is easy to explain its meaning.

• Do not combine attributes from multiple entity types and relationship types into a single relation.

• If a relation schema corresponds to one entity type or one relationship type, it is straightforward to interpret
and to explain its meaning.

• If the relation corresponds to a mixture of multiple entities and relationships, semantic ambiguities will
result and the relation cannot be easily explained.

Page 3
MODULE –3 NOTES DBMS- BCS403

Examples of Violating Guideline 1

Fig 2: Schema diagram for company

• Both the relation schemas have clear semantics .

• A tuple in the EMP_DEPT relation schema represents a single employee but includes additional information—
the name (Dname) of the department for which the employee works and the Social Security number
(Dmgr_ssn) of the department manager.

• A tuple in the EMP_PROJ relates an employee to a project but also includes the employee name (Ename),
project name (Pname), and project location (Plocation) .

• Although there is nothing wrong logically with these two relations, they violate Guideline 1 by mixing attributes
from distinct real-world entities.

➢ EMP_DEPT mixes attributes of employees and departments.

➢ EMP_PROJ mixes attributes of employees and projects and the WORKS_ON relationship.

• They may be used as views, but they cause problems when used as base relations.

14.1.2 Redundant Information in Tuples and Update Anomalies


• One goal of schema design is to minimize the storage space used by the base relations.

• Grouping attributes into relation schemas has a significant effect on storage space.

• For example, compare the space used by the two base relations EMPLOYEE and DEPARTMENT with that
for an EMP_DEPT base relation.

Page 4
MODULE –3 NOTES DBMS- BCS403

• In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are
repeated for every employee who works for that department.

• In contrast, each department’s information appears only once in the DEPARTMENT relation. Only the
department number Dnumber is repeated in the EMPLOYEE relation for each employee who works in that
department as a foreign key.

Page 5
MODULE –3 NOTES DBMS- BCS403

• Storing natural joins of base relations leads to an additional problem referred to as update
anomalies. These can be classified into:

➢ insertion anomalies

➢ deletion anomalies,

➢ modification anomalies

Page 6
MODULE –3 NOTES DBMS- BCS403
Insertion Anomalies

Insertion anomalies can be differentiated into two types, illustrated by the following
examples based on the EMP_DEPT relation:

1. To insert a new employee tuple into EMP_DEPT, we must include either the attribute values
for the department that the employee works for, or NULLs

✓ For example, to insert a new tuple for an employee who works in department number 5, we
must enter all the attribute values of department 5 correctly so that they are consistent with
the corresponding values for department 5 in other tuples in EMP_DEPT

✓ In the design of Employee in fig 14.2, we do not have to worry about this consistency
problem because we enter only the department number in the employee tuple; all other
attribute values of department 5 are recorded only once in the database, as a single tuple in
the DEPARTMENT relation

2. It is difficult to insert a new department that has no employees as yet in the EMP_DEPT
relation. The only way to do this is to place NULL values in the attributes for employee

✓ This violates the entity integrity for EMP_DEPT because Ssn is its primary key.

✓ This problem does not occur in the design of Figure 14.2 because a department is entered
in the DEPARTMENT relation whether or not any employees work for it, and whenever an
employee is assigned to that department, a corresponding tuple is inserted in EMPLOYEE.

Deletion Anomalies

The problem of deletion anomalies is related to the second insertion anomaly situation
just discussed.

✓ If we delete from EMP_DEPT an employee tuple that happens to represent the last
employee working for a particular department, the information concerning that
department is lost from the database.

Page 7
MODULE –3 NOTES DBMS- BCS403
✓ This problem does not occur in the database of Figure 14.2 because DEPARTMENT
tuples are stored separately.

Modification Anomalies

• In EMP_DEPT, if we change the value of one of the attributes of a particular department—


say, the manager of department 5—we must update the tuples of all employees who work in
that department; otherwise, the database will become inconsistent.

• If we fail to update some tuples, the same department will be shown to have two different
values for manager in different employee tuples, which would be wrong

Guideline 2

• Relations Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations.

• If any anomalies are present, note them clearly and make sure that the programs that update
the database will operate correctly.

• The second guideline is consistent with and, in a way, a restatement of the first guideline.

• These guidelines may sometimes have to be violated in order to improve the performance of
certain queries.

14.1.3 NULL Values in Tuples

• If many of the attributes do not apply to all tuples in the relation, we end up with many NULLs
in those tuples
➢ this can waste space at the storage level.
➢ may lead to problems with understanding the meaning of the attributes.
➢ may also lead to problems with specifying JOIN operations.
➢ how to account for them when aggregate operations such as COUNT or SUM are applied.

Page 8
MODULE –3 NOTES DBMS- BCS403
• SELECT and JOIN operations involve comparisons; if NULL values are present, the results may
become unpredictable.

• Moreover, NULLs can have multiple interpretations, such as the following:

➢ The attribute does not apply to this tuple. For example, Visa_status may not apply to U.S.
students.

➢ The attribute value for this tuple is unknown. For example, the Date_of_birth may be
unknown for an employee.

➢ The value is known but absent; that is, it has not been recorded yet. For example, the
Home_Phone_Number for an employee may exist, but may not be available and recorded yet.

Guideline 3

• As far as possible, avoid placing attributes in a base relation whose values may frequently
be NULL.

• If NULLs are unavoidable, make sure that they apply in exceptional cases only and do
not apply to a majority of tuples in the relation.

• Using space efficiently and avoiding joins with NULL values are the two overriding
criteria that determine whether to include the columns that may have NULLs in a relation
or to have a separate relation for those columns with the appropriate key columns.

• For example, if only 15 percent of employees have individual offices, there is little
justification for including an attribute Office_number in the EMPLOYEE relation; rather,
a relation EMP_OFFICES (Essn, Office_number) can be created to include tuples for
only the employees with individual offices.

Page 9
MODULE –3 NOTES DBMS- BCS403

14.1.4 Generation of Spurious Tuples

• Consider the two relation schemas EMP_LOCS and EMP_PROJ1 which can be used
instead of the single EMP_PROJ

• A tuple in EMP_LOCS means that the employee whose name is Ename works on some
project whose location is Plocation.

• A tuple in EMP_PROJ1 refers to the fact that the employee whose Social Security number is
Ssn works Hours per week on the project whose name, number, and location are Pname,
Pnumber, and Plocation.

Page 10
MODULE –3 NOTES DBMS- BCS403

• Suppose that we used EMP_PROJ1 and EMP_LOCS as the base relations instead of EMP_PROJ.
This produces a particularly bad schema design because we cannot recover the information that was
originally in EMP_PROJ from EMP_PROJ1 and EMP_LOCS.

• If we attempt a NATURAL JOIN operation on EMP_PROJ1 and EMP_LOCS, the result produces
many more tuples than the original set of tuples in EMP_PROJ.

• Additional tuples that were not in EMP_PROJ are called spurious tuples because they represent
spurious information that is not valid.

• The spurious tuples are marked by asterisks (*).

• Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because when we JOIN
them back using NATURAL JOIN, we do not get the correct original information.

• This is because in this case Plocation is the attribute that relates EMP_LOCS and EMP_PROJ1, and
Plocation is neither a primary key nor a foreign key in either EMP_LOCS or EMP_PROJ1.

Page 11
MODULE –3 NOTES DBMS- BCS403

Guideline 4

• Design relation schemas so that they can be joined with equality conditions on attributes that are
appropriately related (primary key, foreign key) pairs in a way that guarantees that no spurious tuples
are generated.

• Avoid relations that contain matching attributes that are not (foreign key, primary key) combinations
because joining on such attributes may produce spurious tuples.

14.1.5 Summary and Discussion of Design Guidelines


• We informally discussed situations that lead to problematic relation schemas and we proposed
informal guidelines for a good relational design .

• The problems we pointed out, which can be detected without additional tools of analysis, are as
follows:

✓ Anomalies that cause redundant work to be done during insertion into and modification of a
relation, and that may cause accidental loss of information during a deletion from a relation.

✓ Waste of storage space due to NULLs and the difficulty of performing selections, aggregation
operations, and joins due to NULL

14.2 Functional Dependencies


Formal tool for analysis of relational schemas that enables us to detect and describe some of the problems
in precise terms.

Definition of Functional Dependency

• A functional dependency is a constraint between two sets of attributes from the database.

• Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also
in R, (written X Y) if and only if each X value is associated with at most one Y value.

• X is the determinant set and Y is the dependent attribute. Thus, given a tuple and the values of the
attributes in X, one can determine the corresponding value of the Y attribute.

• The abbreviation for functional dependency is FD or f.d. The set of attributes X is called the left-hand
side of the FD, and Y is called the right-hand side.

• A functional dependency is a property of the semantics or meaning of the attributes.

Page 12
MODULE –3 NOTES DBMS- BCS403

• The database designers will use their understanding of the semantics of the attributes of R to specify
the functional dependencies that should hold on all relation states (extensions) r of R.

Consider the relation schema EMP_PROJ;

• From the semantics of the attributes and the relation, we know that the following functional
dependencies should hold:

a. Ssn→Ename

b. Pnumber →{Pname, Plocation}

c. {Ssn, Pnumber}→Hours

• These functional dependencies specify that

(a) the value of an employee’s Social Security number (Ssn) uniquely determines the employee name
(Ename)

(b) the value of a project’s number (Pnumber) uniquely determines the project name (Pname) and
location (Plocation), and

(c) a combination of Ssn and Pnumber values uniquely determines the number of hours the employee
currently works on the project per week (Hours).

• Alternatively,we say that Ename is functionally determined by (or functionally dependent on) Ssn, or
given a value of Ssn, we know the value of Ename, and so on.

• Relation extensions r(R) that satisfy the functional dependency constraints are called legal relation
states (or legal extensions) of R.

• A functional dependency is a property of the relation schema R, not of a particular legal relation state r
of R.

• Therefore, an FD cannot be inferred automatically from a given relation extension r but must be
defined explicitly by someone who knows the semantics of the attributes of R

Page 13
MODULE –3 NOTES DBMS- BCS403

Figure 14.7 A relation state of TEACH with a possible functional dependency TEXT → COURSE.
However, TEACHER → COURSE, TEXT → TEACHER and COURSE → TEXT are ruled out.

See the illustrative example relation in Figure 14.8.

The following FDs may hold because the four tuples in the current extension have no violation of these
constraints:
B→C
C→B
{A, B} → C
{A, B}→ D
{C, D }→ B.

The following do not hold because we already have violations of them in the given extension:

A → B (tuples 1 and 2 violate this constraint)

B →A (tuples 2 and 3 violate this constraint)

D → C (tuples 3 and 4 violate it)

Page 14
MODULE –3 NOTES DBMS- BCS403

Diagrammatic notation for displaying FDs

• Each FD is displayed as a horizontal line

• The left-hand-side attributes of the FD are connected by vertical lines to the line representing the FD

• The right-hand-side attributes are connected by the lines with arrows pointing toward the attributes.

Types of Functional Dependencies


1) A Functional Dependency X -> Y is a full functional dependency if removal of any attribute A from
X means that the dependency does not hold any more.

Ex: {SSN , Pnumber } -> Hours is a full dependency (neither {SSN -> Hours} nor {Pnumber->
Hour} holds.

2) A Functional Dependency X -> Y is a partial functional dependency if removal of any attribute A


from X means that the dependency still holds.

Ex: {Ssn , Pnumber} -> Ename is a partial dependency because Ssn-> Enmae holds.

3) A Functional Dependency X -> Y is a transitive dependency if there exists a set of attributes Z in R


such that X-> Z and Z -> Y holds.

Ex: The dependency ssn -> Dmgr_ssn is transitive in EMP_DEPT because of the dependencies

Ssn ->Dnumber and Dnumber -> Dmgr_Ssn.

4) Trivial Functional dependency : If a functional dependency (FD) X -> Y holds , where Y is the
subset of X, then it is called the trivial FD.

Non –Trivial : If an FD X -> Y holds, where Y is not a subset of X , then it is called a non-trivial FD.

Page 15
MODULE –3 NOTES DBMS- BCS403

14.3 Normal Forms Based on Primary Keys


14.3.1 Normalization of Relations

• The normalization process, as first proposed by Codd (1972a), takes a relation schema through a series
of tests to certify whether it satisfies a certain normal form.

• Initially, Codd proposed three normal forms, which he called first, second, and third normal form.

• All these normal forms are based on a single analytical tool: the functional dependencies among the
attributes of a relation.

• A fourth normal form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of
multivalued dependencies and join dependencies, respectively.

• Normalization of data can be considered a process of analyzing the given relation schemas based on
their FDs and primary keys to achieve the desirable properties of

(1) minimizing redundancy and

(2) minimizing the insertion, deletion, and update anomalies.

Normalization:

• The process of decomposing unsatisfactory "bad“ relations by breaking up their attributes into smaller
relations.

• Definition: The normal form of a relation refers to the highest normal form condition that it meets, and
hence indicates the degree to which it has been normalized.

• Denormalization: The process of storing the join of higher normal form relations as a base relation—
which is in a lower normal form.

14.3.2 Definitions of Keys and Attributes Participating in Keys


Definition. A superkey of a relation schema R = {A1, A2, … , An} is a set of attributes S ⊆ R with the
property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S].

A key K is a superkey with the additional property that removal of any attribute from K will cause K
not to be a superkey anymore.

Page 16
MODULE –3 NOTES DBMS- BCS403

The difference between a key and a superkey is that a key has to be minimal; that is, if we have a key K
= {A1, A2, … , Ak} of R, then K − {Ai} is not a key of R for any Ai, 1 ≤ i ≤ k.

Example: {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn, Ename, Bdate}, and
any set of attributes that includes Ssn are all superkeys.

If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is
arbitrarily designated to be the primary key, and the others are called secondary keys.

Definition. An attribute of relation schema R is called a prime attribute of R if it is a member of some


candidate key of R. An attribute is called nonprime if it is not a prime attribute—that is, if it is not a
member of any candidate key.

14.3.4 First Normal Form

• It states that the domain of an attribute must include only atomic (simple, indivisible) values and
that the value of any attribute in a tuple must be a single value from the domain of that attribute.

• Defined to disallow multivalued attributes, composite attributes, and their combinations.

• Hence, 1NF disallows having a set of values, a tuple of values, or a combination of both as an
attribute value for a single tuple.

• Consider the DEPARTMENT relation schema, whose primary key is Dnumber, and suppose
that we extend it by including the Dlocations attribute as shown in Figure 14.9(a).

• We assume that each department can have a number of locations. As we can see, this is not in
1NF because Dlocations is not an atomic attribute, as illustrated by the first tuple in Figure
14.9(b).

There are two ways we can look at the Dlocations attribute:


• The domain of Dlocations contains atomic values, but some tuples can have a set of these values.
In this case, Dlocations is not functionally dependent on the primary key Dnumber.

• The domain of Dlocations contains sets of values and hence is nonatomic. In this case,
Dnumber → Dlocations because each set is considered a single member of the attribute domain.

Page 17
MODULE –3 NOTES DBMS- BCS403

There are three main techniques to achieve first normal form for such a relation:

1. Remove the attribute Dlocations that violates 1NF and place it in a separate relation
DEPT_LOCATIONS along with the primary key Dnumber of DEPARTMENT. The primary key
of this relation is the combination {Dnumber, Dlocation}. A distinct tuple in
DEPT_LOCATIONS exists for each location of a department. This decomposes the non-1NF
relation into two 1NF relations.

2. Expand the key so that there will be a separate tuple in the original DEPARTMENT relation for
each location of a DEPARTMENT. In this case, the primary key becomes the combination
{Dnumber, Dlocation}. This solution has the disadvantage of introducing redundancy in the
relation as shown in Figure. 14.9 C.

3. If a maximum number of values is known for the attribute for example, if it is known that at most
three locations can exist for a department replace the Dlocations attribute by three atomic
attributes: Dlocation1, Dlocation2, and Dlocation3. This solution has the disadvantage of

introducing NULL values if most departments have fewer than three locations. Querying on this
attribute becomes more difficult; for example, consider how you would write the query: List the
departments design.

• Of the three solutions, the first is generally considered best because it does not suffer
from redundancy and it is completely general, having no limit placed on a maximum
number of values

• First normal form also disallows multivalued attributes that are themselves composite.

• These are called nested relations because each tuple can have a relation within it.
Page 18
MODULE –3 NOTES DBMS- BCS403

• Figure above shows how the EMP_PROJ relation could appear if nesting is allowed.

• Each tuple represents an employee entity, and a relation PROJS(Pnumber, Hours) within each
tuple employee works on each project.

• The schema of this EMP_PROJ relation can be represented as follows:


EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)})

• Ssn is the primary key of the EMP_PROJ relation and Pnumber is the partial key of the nested
relation; that is, within each tuple, the nested relation must have unique values of Pnumber.

• To normalize this into 1NF, we remove the nested relation attributes into a new relation and
propagate the primary key into it; the primary key of the new relation will combine the partial
key with the primary key of the original relation.

• Decomposition and primary key propagation yield the schemas EMP_PROJ1 and EMP_PROJ2.

Page 19
MODULE –3 NOTES DBMS- BCS403

14.3.5 Second Normal Form


• Second normal form (2NF) is based on the concept of full functional dependency.

• A functional dependency X -> Y is a full functional dependency if removal of any attribute A


from X means that the dependency does not hold any more; that is, For any attribute A ε X, (X-
{A}) does not functionally determine Y.

• A functional dependency X→Y is a partial dependency if some attribute A ε X can be removed


from X and the dependency still holds; that is, for some A ε X, (X–{A}) → Y.

• In the above figure {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor
Pnumber→Hours holds)

• {Ssn,Pnumber}→Ename is partial because Ssn→Ename holds.

Definition. A relation schema R is in 2NF if every nonprime attribute A in R is fully functionally


dependent on the primary key of R.

• The test for 2NF involves testing for functional dependencies whose left-hand side attributes are
part of the primary key.

• If the primary key contains a single attribute, the test need not be applied at all.

• The EMP_PROJ relation is in 1NF but is not in 2NF.

• The nonprime attribute Ename violates 2NF because of FD2, as do the nonprime attributes
Pname and Plocation because of FD3.

• The functional dependencies FD2 and FD3 make Ename, Pname, and Plocation partially
dependent on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.

• If a relation schema is not in 2NF, it can be second normalized or 2NF normalized into a number
of 2NF relations in which nonprime attributes are associated only with the part of the
primary key on which they are fully functionally dependent.

Page 20
MODULE –3 NOTES DBMS- BCS403

• Therefore, the functional dependencies FD1, FD2, and FD3 lead to the decomposition of
EMP_PROJ into the three relation schemas EP1, EP2, and EP3 shown in Figure below, each of
which is in 2NF.

14.3.6 Third Normal Form


• Transitive functional dependency
A functional dependency X Y in a relation schema R is transitive dependency if there exists a
set of attribute Z that are neither a primary nor a subset of any key of R(candidate key) and both
X Z and Y Z holds.
• Example:

• SSN DMGRSSN is a transitive FD since SSN -> DNUMBER and DNUMBER -> DMGRSSN
hold.

Dnumber is neither a key itself nor a subset of the key of EMP_DEPT

• SSN ENAME is non-transitive since there is no set of attributes X where SSN -> X and
X ->ENAME.

Page 21
MODULE –3 NOTES DBMS- BCS403

Definition: A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute
A in R is transitively dependent on the primary key.

The relation schema EMP_DEPT is in 2NF, since no partial dependencies on a key exist. However,
EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn (and also Dname) on Ssn
via Dnumber.

We can normalize EMP_DEPT by decomposing it into the two 3NF relation schema ED1 and ED2.

• ED1 and ED2 represent independent entity facts about employees and departments.

• A NATURAL JOIN operation on ED1 and ED2 will recover the original relation EMP_DEPT
without generating spurious tuples

Problematic FD
• Left-hand side is part of primary key
• Left-hand side is a non-key attribute

2NF and 3NF normalization remove these problem FDs by decomposing the original relation into new
relations

In general, we want to design our relation schemas so that they have neither partial nor transitive
dependencies because these types of dependencies cause the update anomalies.

Page 22
MODULE –3 NOTES DBMS- BCS403
14.4 General Definitions of Second and Third Normal Forms

14.4.1 General Definition of Second and Third Normal Form

• Takes into account all candidate keys of a relation into account.

Definition of 2NF: A relation schema R is in second normal form (2NF) if every nonprime attribute A
in R is not partially dependent on any key of R.

Consider the relation schema LOTS which describes parcels of land for sale in various counties of a
state.

Suppose that there are two candidate keys: Property_id# and {County_name, Lot#}; that is, lot numbers
are unique only within each county, but Property_id# numbers are unique across counties for the entire
state.

Page 23
MODULE –3 NOTES DBMS- BCS403

Based on the two candidate keys Property_id# and {County_name, Lot#}, the functional dependencies
FD1 and FD2 hold.
FD1:Property_id → {County_name,Lot#,Area,Price,Tax_rate}
FD2:{County_name,Lot#} →{Property_id, Area,Price,Tax_rate}
FD3: County_name → Tax_rate
FD4: Area →Price

• We choose Property_id# as the primary key, but no special consideration will be given to this key
over the other candidate key.
• FD3 says that the tax rate is fixed for a given county (does not vary lot by lot within the same
county).
• FD4 says that the price of a lot is determined by its area regardless of which county it is in.
• The LOTS relation schema violates the general definition of 2NF because Tax_rate is partially
dependent on the candidate key {County_name, Lot#}, due to FD3
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2.

• We construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and placing
it with Area (the lefthand side of FD4 that causes the transitive dependency) into another relation
LOTS1B.
• Both LOTS1A and LOTS1B are in 3NF.

Definition of 3NF: A relation schema R is in third normal form (3NF) if, whenever a non-trivial
functional dependency X -> A holds in R, either (a) X is a super key of R, (b) A is a prime attribute of
R.

Page 24
MODULE –3 NOTES DBMS- BCS403

• According to this definition, LOTS2 is in 3NF


• FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime attribute in
LOTS1
• To normalize LOTS1 into 3NF, we decompose it into the relation schemas LOTS1A and
LOTS1B

• We construct LOTS1A by removing the attribute Price that violates 3NF from LOTS1 and
placing it with Area (the lefthand side of FD4 that causes the transitive dependency) into another
relation LOTS1B.

• Both LOTS1A and LOTS1B are in 3NF

14.4.2 Boyce-Codd Normal Form


• Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF.
• Every relation in BCNF is also in 3NF; however, a relation in 3NF is not necessarily in BCNF.

Definition. A relation schema R is in BCNF if whenever a nontrivial functional dependency X-> A


holds in R, then X is a superkey of R.

Page 25
MODULE –3 NOTES DBMS- BCS403

• The formal definition of BCNF differs from the definition of 3NF in that condition (b) of 3NF,
which allows A to be prime, is absent from BCNF. That makes BCNF a stronger normal form
compared to 3NF.
• In our example, FD5 violates BCNF in LOTS1A because AREA is not a superkey of LOTS1A.
• FD5 satisfies 3NF in LOTS1A because County_name is a prime attribute (condition b), but this
condition does not exist in the definition of BCNF.
• We can decompose LOTS1A into two BCNF relations LOTS1AX and LOTS1AY. This
decomposition loses the functional dependency FD2 because its attributes no longer coexist in
the same relation after decomposition.

• In practice, most relation schemas that are in 3NF are also in BCNF.
• Only if X A holds in a relation schema R with X not being a superkey and A being a prime
attribute will R be in 3NF but not in BCNF.
Example: consider the relation TEACH with the following dependencies:

Page 26
MODULE –3 NOTES DBMS- BCS403

TEACH(Student, Course, Instructor)


FD1: {Student, Course} → Instructor -- (Each student takes a course with one instructor.)
FD2: Instructor → Course -- means that each instructor teaches one course
{Student, Course} is a candidate key for this relation.
The dependencies shown follow the pattern in Figure below with Student as A, Course as B, and
Instructor as C
So this relation is in 3NF but not in BCNF

Instructor → Course.
Instructor is not a superkey, so this FD violates BCNF.

If an instructor teaches only one course, then repeating their name in multiple rows with the same course
(but different students) is redundant.
Decomposition of Relations not in BCNF
We split TEACH into two tables:
1. TEACH1(Instructor, Course)
(Removes redundancy: each instructor appears only once with the course.)
2. TEACH2(Student, Instructor) (Connects students to instructors.)

Now both tables are in BCNF:


• In TEACH1, Instructor is the key.
• In TEACH2, {Student, Instructor} is the key and there are no violating FDs.

Now we test - Can we get back the original TEACH(Student, Course, Instructor) relation by joining
TEACH1 and TEACH2?
TEACH1 ⨝ TEACH2
-- join on common attribute: Instructor

Page 27
MODULE –3 NOTES DBMS- BCS403

• Rather, the process of normalization through decomposition must also confirm the existence of
additional properties that the relational schemas, taken together, should possess. These would
include two properties:
➢ The nonadditive join or lossless join property, which guarantees that the spurious tuple
generation problem does not occur with respect to the relation schemas created after
decomposition.
➢ The dependency preservation property, which ensures that each functional dependency is
represented in some individual relation resulting after decomposition.

Definition: A decomposition of a relation R into sub-relations R1, R2, ..., Rn is said to have the lossless
join (nonadditive) property if the natural join of all these sub-relations results in exactly the original
relation R, without introducing any spurious tuples.

If R1 ∩ R2 → R1 - R2 or R1 ∩ R2 → R2 - R1 is in F+, then the decomposition is lossless.

• R1 = TEACH1(Instructor, Course)
• R2 = TEACH2(Instructor, Student)
• Common attribute = Instructor

14.4.3 Multivalued Dependency and Fourth Normal Form


A relation will be in 4NF if it is in
--- Boyce Codd Normal Form
--- and has no –multi valued dependency.
A multivalued dependency exists in a relation when:
• One attribute (or set of attributes) independently determines multiple values of two other
attributes.
• These attributes are not functionally dependent on each other.
Three conditions for multi –valued dependency are
--- If A →→ B , for a single value of A, more than one value of B exists.
--- Relation should have at –least 3 columns.
--- For relation R(A,B,C) , B and C should be independent.

Page 28
MODULE –3 NOTES DBMS- BCS403

If all these cases satisfy, then we can say that the relation may have multi –valued dependency.

For example, consider the relation EMP shown in Figure below:

EMP(Ename, Pname, Dname)


Employee Ename works on Pname and has dependent Dname.

• Smith works on both X and Y.


• Smith has dependents John and Anna.
• Every project is repeated for each dependent and vice versa.
• This is not due to a functional dependency, but because of a multivalued dependency:
• Ename →→ Pname
• Ename →→ Dname
BCNF (Boyce-Codd Normal Form) only handles functional dependencies (FDs). But in this case:
• There are no functional dependencies.
• Still, redundancy exists — so BCNF fails to remove it.
Thus, we need something stronger: → Fourth Normal Form (4NF).

From EMP(Ename, Pname, Dname), we observe:


• Ename →→ Pname
• Ename →→ Dname
This means:
• Projects and Dependents are independent of each other but are tied to Ename.

Page 29
MODULE –3 NOTES DBMS- BCS403

A multivalued dependency is non-trivial if:


• Y is not a subset of X
• X ∪ Y ≠ entire schema
Here:
• Ename →→ Pname is non-trivial
• Ename →→ Dname is also non-trivial
So, EMP violates 4NF.

Apply 4NF Rule


If X →→ Y is a non-trivial MVD, then X must be a superkey.
In EMP:
• X = Ename
• But key of EMP is (Ename, Pname, Dname)
• So Ename is not a superkey → violates 4NF

Decompose EMP into:


1. EMP_PROJECTS(Ename, Pname)
Represents: Projects worked on by employees
2. EMP_DEPENDENTS(Ename, Dname)
Represents: Dependents of employees

Each table now:


• Contains only one MVD
• That MVD becomes trivial (i.e., the key includes both attributes)
• So, both are in 4NF.

Page 30
MODULE –3 NOTES DBMS- BCS403

14.4.4 Join Dependencies and Fifth Normal Form


We have normalized up to:
• 1NF: No repeating groups
• 2NF: No partial dependencies
• 3NF: No transitive dependencies
• BCNF: Left side of every functional dependency is a superkey.
• 4NF: No multivalued dependencies (MVDs)
But even after 4NF, redundancy may still exist due to join dependencies (JDs) that cannot be
captured by functional or multivalued dependencies.
A Join Dependency means:
A relation R can be reconstructed by joining its n projections (subsets of attributes), without losing or
gaining any data.
JD(R1, R2, ..., Rn) means:
πR1(R) ⋈ πR2(R) ⋈ ... ⋈ πRn(R) = R
A relation is in 5NF (also called Project-Join Normal Form, PJNF) if:
For every non-trivial JD(R1, R2, ..., Rn) that holds, each Ri is a superkey of R.

The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).

Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3.

Page 31
MODULE –3 NOTES DBMS- BCS403

CHAPTER 2
SQL
Introduction
• SQL was called SEQUEL (Structured English Query Language) and was designed and
implemented at IBM Research.The SQL language may be considered one of the major reasons
for the commercial success of relational databases. SQL is a comprehensive database language.
• It has statements for data definitions, queries, and updates. Hence, it is both a DDL and a DML.
• In addition, it has facilities for defining views on the database, for specifying security and
authorization, for defining integrity constraints, and for specifying transaction controls. It also
has rules for embedding SQL statements into a general-purpose programming language such as
Java, COBOL, or C/C++.

6.1 SQL Data Definition and Data Types


SQL uses the terms table, row, and column for the formal relational model terms relation, tuple, and
attribute, respectively. The main SQL command for data definition is the CREATE statement, which
can be used to create schemas, tables (relations), domains, views, assertions and triggers.

6.1.1 Schema and Catalog Concepts in SQL

• An SQL schema is identified by a schema name, and includes an authorization identifier to


indicate the user or account who owns the schema, as well as descriptors for each element in the
schema. Schema elements include tables, constraints, views, domains, and other constructs (such
as authorization grants) that describe the schema. A schema is created via the CREATE
SCHEMA statement .

• For example, the following statement creates a schema called COMPANY, owned by the

CREATE SCHEMA COMPANY AUTHORIZATION ’Jsmith’;

• In general, not all users are authorized to create schemas and schema elements. The privilege to
create schemas, tables, and other constructs must be explicitly granted to the relevant user
accounts by the system administrator or DBA.

Page 32
MODULE –3 NOTES DBMS- BCS403

• SQL uses the concept of a catalog ---a named collection of schemas in an SQL environment.
A catalog always contains a special schema called INFORMATION_SCHEMA, which
provides information on all the schemas in the catalog and all the element descriptors in these
schemas. Integrity constraints such as referential integrity can be defined between relations only
if they exist in schemas within the same catalog. Schemas within the same catalog can also share
certain elements, such as domain definitions.

6.1.2 The CREATE TABLE Command in SQL

• The CREATE TABLE command is used to specify a new relation by giving it a name and
specifying its attributes and initial constraints. The attributes are specified first, and each attribute is
given a name, a data type to specify its domain of values, and any attribute constraints, such as
NOT NULL.

• The key, entity integrity, and referential integrity constraints can be specified within the CREATE
TABLE statement after the attributes are declared, or they can be added later using the ALTER
TABLE command.

• Typically, the SQL schema in which the relations are declared is implicitly specified in the
environment in which the CREATE TABLE statements are executed. Alternatively, we can
explicitly attach the schema name to the relation name, separated by a period. For example, by
writing

CREATE TABLE COMPANY.EMPLOYEE ...

rather than

CREATE TABLE EMPLOYEE ...

Page 33
MODULE –3 NOTES DBMS- BCS403

The relations declared through CREATE TABLE statements are called base tables.

Examples:

Page 34
MODULE –3 NOTES DBMS- BCS403

6.1.3 Attribute Data Types and Domains in SQL

Basic data types

1. Numeric data types includes


• integer numbers of various sizes (INTEGER or INT, and SMALLINT)
• floating-point (real) numbers of various precision (FLOAT or REAL, and DOUBLE
PRECISION).
• Formatted numbers can be declared by using DECIMAL(i,j) or DEC(i,j) or NUMERIC(i,j)
where
i - precision, total number of decimal digits
j - scale, number of digits after the decimal point

2. Character-string data types


• fixed length CHAR(n) or CHARACTER(n), where n is the number of characters
• varying length VARCHAR(n) or CHAR VARYING(n) or CHARACTER VARYING(n), where
n is the maximum number of characters
• When specifying a literal string value, it is placed between single quotation marks (apostrophes),
and it is case sensitive
• For fixed length strings, a shorter string is padded with blank characters to the right
• Padded blanks are generally ignored when strings are compared.
• Another variable-length string data type called CHARACTER LARGE OBJECT or CLOB is
also available to specify columns that have large text values, such as documents
• The CLOB maximum length can be specified in kilobytes (K), megabytes (M), or gigabytes (G)
• For example, CLOB(20M) specifies a maximum length of 20 megabytes.

3. Bit-string data types are either of


• fixed length n BIT(n) or varying length BIT VARYING(n), where n is the maximum number of
bits.
• The default for n, the length of a character string or bit string, is 1.
• Literal bit strings are placed between single quotes but preceded by a B to distinguish them from
character strings for example , B’10101’.
• Another variable-length bitstring data type called BINARY LARGE OBJECT or BLOB is also
available to specify columns that have large binary values, such as images.
• The maximum length of a BLOB can be specified in kilobits (K), megabits (M), or gigabits (G)
• For example, BLOB(30G) specifies a maximum length of 30 gigabits.

4. A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because of the presence
of NULL values, a three-valued logic is used, so a third possible value for a Boolean data type is
UNKNOWN.

5. The DATE data type has ten positions, and its components are YEAR, MONTH, and DAY in
the form YYYY-MM-DD.

Page 35
MODULE –3 NOTES DBMS- BCS403

6. The TIME data type has at least eight positions, with the components HOUR, MINUTE, and
SECOND in the form HH:MM:SS.

Only valid dates and times should be allowed by the SQL implementation.

7. TIME WITH TIME ZONE data type includes an additional six positions for specifying the
displacement from the standard universal time zone, which is in the range +13:00 to 12:59 in units of
HOURS:MINUTES. If WITH TIME ZONE is not included, the default is the local time zone for the
SQL session.

Additional data types

1. Timestamp data type (TIMESTAMP) includes the DATE and TIME fields, plus a minimum of six
positions for decimal fractions of seconds and an optional WITH TIME ZONE qualifier.

2. INTERVAL data type. This specifies an interval a relative value that can be used to increment or
decrement an absolute value of a date, time, or timestamp. Intervals are qualified to be either
YEAR/MONTH intervals or DAY/TIME intervals.

It is possible to specify the data type of each attribute directly or a domain can be declared, and the
domain name used with the attribute Specification. This makes it easier to change the data type for a
domain that is used by numerous attributes in a schema, and improves schema readability. For example,
we can create a domain SSN_TYPE by the following statement:

CREATE DOMAIN SSN_TYPE AS CHAR(9);

We can use SSN_TYPE in place of CHAR(9) for the attributes Ssn and Super_ssn ofEMPLOYEE,
Mgr_ssn of DEPARTMENT, Essn of WORKS_ON, and Essn of DEPENDENT.

6.2 Specifying Constraints in SQL

Basic constraints that can be specified in SQL as part of table creation:


• key and referential integrity constraints
• Restrictions on attribute domains and NULLs
• constraints on individual tuples within a relation.

6.2.1 Specifying Attribute Constraints and Attribute Defaults

• Because SQL allows NULLs as attribute values, a constraint NOT NULL may be specified if
NULL is not permitted for a particular attribute. This is always implicitly specified for the
attributes that are part of the primary key of each relation, but it can be specified for any other
attributes whose values are required not to be NULL.

Page 36
MODULE –3 NOTES DBMS- BCS403

• It is also possible to define a default value for an attribute by appending the clause DEFAULT
<value> to an attribute definition. The default value is included in any new tuple if an explicit
value is not provided for that attribute.

CREATE TABLE DEPARTMENT


(...,

Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’

-----------------
-------------------
)

Another type of constraint can restrict attribute or domain values using the CHECK clause following an
attribute or domain definition . For example, suppose that department numbers are restricted to integer
numbers between 1 and 20; then, we can change the attribute declaration of Dnumber in the
DEPARTMENT table to the following:

Dnumber INT NOT NULL CHECK (Dnumber > 0 AND Dnumber < 21);

The CHECK clause can also be used in conjunction with the CREATE DOMAIN statement. For
example, we can write the following statement:

CREATE DOMAIN D_NUM AS INTEGER


CHECK (D_NUM > 0 AND D_NUM < 21);

We can then use the created domain D_NUM as the attribute type for all attributes that refer to
department number such as Dnumber of DEPARTMENT, Dnum of PROJECT, Dno of EMPLOYEE,
and so on.

6.2.2 Specifying Key and Referential Integrity Constraints

The PRIMARY KEY clause specifies one or more attributes that make up the primary key of a relation.
If a primary key has a single attribute, the clause can follow the attribute directly. For example, the
primary key of DEPARTMENT can be specified as:
Dnumber INT PRIMARY KEY;

The UNIQUE clause can also be specified directly for a secondary key if the secondary key is a
single attribute, as in the following example:
Dname VARCHAR(15) UNIQUE;

Referential integrity is specified via the FOREIGN KEY clause


FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn),
FOREIGN KEY (Dno) REFERENCES DEPARTMENT(Dnumber)

Page 37
MODULE –3 NOTES DBMS- BCS403

A referential integrity constraint can be violated when tuples are inserted or deleted, or when a foreign
key or primary key attribute value is modified. The default action that SQL takes for an integrity
violation is to reject the update operation that will cause a violation, which is known as the RESTRICT
option.

The schema designer can specify an alternative action to be taken by attaching a referential triggered
action clause to any foreign key constraint. The options include SET NULL, CASCADE, and SET
DEFAULT.

An option must be qualified with either ON DELETE or ON UPDATE

FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber) ON DELETE SET DEFAULT


ON UPDATE CASCADE

FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET NULL ON


UPDATE CASCADE

FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber) ON DELETE CASCADE


ON UPDATE CASCADE

In general, the action taken by the DBMS for SET NULL or SET DEFAULT is the same for both ON
DELETE and ON UPDATE: The value of the affected referencing attributes is changed to NULL for
SET NULL and to the specified default value of the referencing attribute for SET DEFAULT.

The action for CASCADE ON DELETE is to delete all the referencing tuples whereas the action for
CASCADE ON UPDATE is to change the value of the referencing foreign key attribute(s) to the
updated (new) primary key value for all the referencing tuples .

It is the responsibility of the database designer to choose the appropriate action and to specify it in the
database schema. As a general rule, the CASCADE option is suitable for “relationship” relations, such
as WORKS_ON;

6.2.3 Giving Names to Constraints


The names of all constraints within a particular schema must be unique. A constraint name is used to
identify a particular constraint in case the constraint must be dropped later and replaced with another
constraint.

6.2.4 Specifying Constraints on Tuples Using CHECK


In addition to key and referential integrity constraints, which are specified by special keywords, other
table constraints can be specified through additional CHECK clauses at the end of a CREATE TABLE
statement. These can be called tuple-based constraints because they apply to each tuple individually and
are checked whenever a tuple is inserted or modified

Page 38
MODULE –3 NOTES DBMS- BCS403

For example, suppose that the DEPARTMENT table had an additional attribute Dept_create_date,
which stores the date when the department was created. Then we could add the following CHECK
clause at the end of the CREATE TABLE statement for the DEPARTMENT table to make sure
manager’s start date is later than the department creation date.

CHECK (Dept_create_date <= Mgr_start_date);

6.3 Basic Retrieval Queries in SQL

SQL has one basic statement for retrieving information from a database: the SELECT statement.

6.3.1 The SELECT-FROM-WHERE Structure of Basic SQL Queries

The basic form of the SELECT statement, sometimes called a mapping or a select-from-where block, is
formed of the three clauses SELECT, FROM, and WHERE and has the following form:

SELECT <attribute list>


FROM <table list>
WHERE <condition>;

where ,

<attribute list> is a list of attribute names whose values are to be retrieved by the query
<table list> is a list of the relation names required to process the query.
<condition> is a conditional (Boolean) expression that identifies the tuples to be retrieved by the query

The SELECT clause of SQL specifies the attributes whose values are to be retrieved, which are called
the projection attributes.
The WHERE clause specifies the Boolean condition that must be true for any retrieved tuple, which is
known as the selection condition.

Page 39
MODULE –3 NOTES DBMS- BCS403

In the WHERE clause selection condition that chooses the particular tuple of interest in the
DEPARTMENT table, because Dname is an attribute of DEPARTMENT. The condition Dnumber =
Dno is called a join condition, because it combines two tuples: one from DEPARTMENT and one from
EMPLOYEE, whenever the value of Dnumber in DEPARTMENT is equal to the value of Dno in
EMPLOYEE.A query that involves only selection and join conditions plus projection attributes is
known as a select-project-join query.

6.3.2 Ambiguous Attribute Names, Aliasing, Renaming, and Tuple Variables

In SQL, the same name can be used for two or more attributes as long as the attributes are indifferent
relations. If this is the case, and a multitable query refers to two or more attributes with the same name,
we must qualify the attribute name with the relation name to prevent ambiguity.

This is done by prefixing the relation name to the attribute name and separating the two by a period.

Page 40
MODULE –3 NOTES DBMS- BCS403

• Example: Retrieve the name and address of all employees who work for the ‘Research’
department.

SELECT Fname, EMPLOYEE.Name, Address


FROM EMPLOYEE, DEPARTMENT
WHERE DEPARTMENT.Name=‘Research’ AND
DEPARTMENT.Dnumber = EMPLOYEE.Dnumber;

The ambiguity of attribute names also arises in the case of queries that refer to the same relation twice.

In this case, we are required to declare alternative relation names E and S, called aliases or tuple
variables, for the EMPLOYEE relation. An alias can follow the key word AS, as shown in Q8, or it can
directly follow the relation name—for example, by writing EMPLOYEE E, EMPLOYEE S in the
FROM clause of Q8.

It is also possible to rename the relation attributes within the query in SQL by giving them aliases. For
example, if we write

EMPLOYEE AS E(Fn, Mi, Ln, Ssn, Bd, Addr, Sex, Sal, Sssn, Dno)

in the FROM clause, Fn becomes an alias for Fname, Mi for Minit, Ln for Lname, and so on.

6.3.3 Unspecified WHERE Clause and Use of the Asterisk

A missing WHERE clause indicates no condition on tuple selection; hence, all tuples of the relation
specified in the FROM clause qualify and are selected for the query result. If more than one relation is
specified in the FROM clause and there is no WHERE clause, then the CROSS PRODUCT all possible
tuple combinations of these relations is selected.

Page 41
MODULE –3 NOTES DBMS- BCS403

To retrieve all the attribute values of the selected tuples, we do not have to list the attribute names
explicitly in SQL; we just specify an asterisk (*), which stands for all the attributes. For example, the
following query retrieves all the attribute values of any EMPLOYEE who works in DEPARTMENT
number 5.

6.3.4 Tables as Sets in SQL

SQL usually treats a table not as a set but rather as a multiset; duplicate tuples can appear more than
once in a table, and in the result of a query. SQL does not automatically eliminate duplicate tuples in the
results of queries, for the following reasons:

• Duplicate elimination is an expensive operation. One way to implement it is to sort the tuples
first and then eliminate duplicates.
• The user may want to see duplicate tuples in the result of a query.
• When an aggregate function is applied to tuples, in most cases we do not want to eliminate
duplicates.

If we do want to eliminate duplicate tuples from the result of an SQL query, we use the keyword
DISTINCT in the SELECT clause, meaning that only distinct tuples should remain in the result.

Example : Retrieve the salary of every employee and all distinct salary values

(a) SELECT ALL Salary FROM EMPLOYEE;


(b) SELECT DISTINCT Salary FROM EMPLOYEE;

Page 42
MODULE –3 NOTES DBMS- BCS403

SQL has directly incorporated some of the set operations from mathematical set theory, which are also
part of relational algebra. There are
▪ set union (UNION)
▪ set difference (EXCEPT) and
▪ set intersection (INTERSECT)

The relations resulting from these set operations are sets of tuples; that is, duplicate tuples are eliminated
from the result. These set operations apply only to union-compatible relations, so we must make sure
that the two relations on which we apply the operation have the same attributes and that the attributes
appear in the same order in both relations.

6.3.5 Substring Pattern Matching and Arithmetic Operators

Several more features of SQL

The first feature allows comparison conditions on only parts of a character string, using the LIKE
comparison operator. This can be used for string pattern matching. Partial strings are specified using two
reserved characters:
▪ % replaces an arbitrary number of zero or more characters
▪ _ (underscore) replaces a single character

For example, consider the following query: Retrieve all employees whose address is in Houston, Texas.

Page 43
MODULE –3 NOTES DBMS- BCS403

If an underscore or % is needed as a literal character in the string, the character should be preceded by
an escape character, which is specified after the string using the keyword ESCAPE.

For example, ‘AB\_CD\%EF’ ESCAPE ‘\’ represents the literal string ‘AB_CD%EF’ because \ is
specified as the escape character. Any character not used in the string can be chosen as the escape
character.
Also, we need a rule to specify apostrophes or single quotation marks (‘ ’) if they are to be included in a
string because they are used to begin and end strings. If an apostrophe (’) is needed, it is represented as
two consecutive apostrophes (”) so that it will not be interpreted as ending the string.

Another feature allows the use of arithmetic in queries. The standard arithmetic operators for addition
(+), subtraction (−), multiplication (*), and division (/) can be applied to numeric values or attributes
with numeric domains.

For example, suppose that we want to see the effect of giving all employees who work on the
‘ProductX’ project a 10% raise; we can issue Query 13 to see what their salaries would become. This
example also shows how we can rename an attribute in the query result using AS in the SELECT clause.

Page 44
MODULE –3 NOTES DBMS- BCS403

6.3.6 Ordering of Query Results

SQL allows the user to order the tuples in the result of a query by the values of one or more of the
attributes that appear in the query result, by using the ORDER BY clause.

Example: Retrieve a list of employees and the projects they are working on, ordered by department and,
within each department, ordered alphabetically bylast name, then first name.

SELECT D.Dname, E.Lname, E.Fname, P.Pname


FROM DEPARTMENT D, EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE D.Dnumber= E.Dno AND E.Ssn= W.Essn AND W.Pno= P.Pnumber
ORDER BY D.Dname, E.Lname, E.Fname;

The default order is in ascending order of values.We can specify the keyword DESC if we want to see
the result in a descending order of values. The keyword ASC can be used to specify ascending order
explicitly.
For example, if we want descending alphabetical order on Dname and ascending order on Lname,
Fname, the ORDER BY clause can be written as

ORDER BY D.Dname DESC, E.Lname ASC, E.Fname ASC

6.3 INSERT, DELETE, and UPDATE Statements in SQL

6.3.1 The INSERT Command


INSERT is used to add a single tuple to a relation. We must specify the relation name and a list of values
for the tuple. The values should be listed in the same order in which the corresponding attributes were
specified in the CREATE TABLE command.

A second form of the INSERT statement allows the user to specify explicit attribute names that
correspond to the values provided in the INSERT command. The values must include all attributes with
NOT NULL specification and no default value. Attributes with NULL allowed or DEFAULT values are
the ones that can be left out.

Page 45
MODULE –3 NOTES DBMS- BCS403

A variation of the INSERT command inserts multiple tuples into a relation in conjunction with creating
the relation and loading it with the result of a query. For example, to create a temporary table that has
the employee last name, project name, and hours per week for each employee working on a project, we
can write the statements in U3A and U3B:

A table WORKS_ON_INFO is created by U3A and is loaded with the joined information retrieved from
the database by the query in U3B. We can now query WORKS_ON_INFO as we would any other
relation;

6.4.2 The DELETE Command

The DELETE command removes tuples from a relation. It includes a WHERE clause, similar to that
used in an SQL query, to select the tuples to be deleted. Tuples are explicitly deleted from only one table
at a time. The deletion may propagate to tuples in other relations if referential triggered actions are
specified in the referential integrity constraints of the DDL.

Depending on the number of tuples selected by the condition in the WHERE clause, zero, one, or several
tuples can be deleted by a single DELETE command. A missing WHERE clause specifies that all tuples
in the relation are to be deleted; however, the table remains in the database as an empty table.

Page 46
MODULE –3 NOTES DBMS- BCS403

6.4.3 The UPDATE Command

The UPDATE command is used to modify attribute values of one or more selected Tuples. An
additional SET clause in the UPDATE command specifies the attributes to be modified and their new
values. For example, to change the location and controlling department number of project number 10 to
‘Bellaire’ and 5, respectively, we use U5:

As in the DELETE command, a WHERE clause in the UPDATE command selects the tuples to be
modified from a single relation. However, updating a primary key value may propagate to the foreign
key values of tuples in other relations if such a referential triggered action is specified in the referential
integrity constraints of the DDL.

Several tuples can be modified with a single UPDATE command. An example is to give all employees
in the ‘Research’ department a 10% raise in salary, as shown in U6.

Each UPDATE command explicitly refers to a single relation only. To modify multiple relations, we
must issue several UPDATE commands.

6.4 Additional Features of SQL

▪ SQL has various techniques for specifying complex retrieval queries, including nested queries,
aggregate functions, grouping, joined tables, outer joins, and recursive queries; SQL views, triggers,
and assertions; and commands for schema modification.
▪ SQL has various techniques for writing programs in various programming languages that include
SQL statements to access one or more databases.
▪ SQL has transaction control commands. These are used to specify units of database processing for
concurrency control and recovery purposes. SQL has language constructs for specifying the
granting and revoking of privileges to users.
▪ SQL has language constructs for creating triggers. These are generally referred to as active database
techniques, since they specify actions that are automatically triggered by events such as database
updates.
▪ SQL has incorporated many features from object-oriented models to have more powerful
capabilities, leading to enhanced relational systems known as object-relational.

Page 47

You might also like