0% found this document useful (0 votes)
24 views271 pages

Database Module

This document provides an introduction to database systems, covering key concepts such as the definition of databases, the differences between file-based and database systems, and the roles of Database Management Systems (DBMS). It outlines the characteristics and benefits of database approaches, including data redundancy control, data sharing, integrity enforcement, and security measures. Additionally, the document discusses data models, schemas, and instances, emphasizing the importance of data abstraction and the various levels of data models.

Uploaded by

Azemeraw Amare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views271 pages

Database Module

This document provides an introduction to database systems, covering key concepts such as the definition of databases, the differences between file-based and database systems, and the roles of Database Management Systems (DBMS). It outlines the characteristics and benefits of database approaches, including data redundancy control, data sharing, integrity enforcement, and security measures. Additionally, the document discusses data models, schemas, and instances, emphasizing the importance of data abstraction and the various levels of data models.

Uploaded by

Azemeraw Amare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 271

Unit one

Introduction to Database System


Objectives

After going through this UNIT, you will be in a position to:

 Explain the concept of database


 Appreciate the features of database management system (DBMS)
 Differentiate between file system and database system
 Know different types of data models
 Explain the structure of database
 Know the architecture of database system
 Discuss who the user of a database are
 Discuss different types of database models
 Explain database schemas and instances

Structures

Introduction to database system

Database systems concepts

Data models, schemas and instances

Database architecture and database users

1
Introduction

All of us are familiar with the term data. In fact, unknowingly we come across data in our day to
day life every day. The age of a person, price of potato, number of students in a school, pin code
of a city, etc. are some examples of data. In our life we have to remember so much of data. But it
is easier for us to remember all information for a few individuals. For example, you may be in a
position to tell accurately the age, height, complexion, income, educational qualification,
residential address, etc. of your close friends. But it is too difficult for you to memorize all these
information for a large number of individuals. Let us consider the example of National Open
School (NOS). Every year about one lakh students take admission in NOS. If you are asked to
memorize records of date of birth, subjects offered and postal address of all these students, it will
not be possible for you.

To deal with such problems we construct a database. We arrange all information about students
in a tabular form. We keep all the records so that if I am asked, „How many students are there in
Economics?‟ I am in a position to answer.

Section 1: Database Systems Concepts

Section objectives

At the end of this section, you will find yourself being able to:

 Define what a file system is


 Define what a database system is
 Discuss the demerits of file based system
 Describe the advantages of database system
 Understand the characteristics of Database System
 Know what a DBMS is

1.1. File-based system and Database System

There are two approaches to data storage: File-based and Database

2
a) File-based Approach:

File system was used before the advent of Database systems. One way to keep the information
on a computer is to store it in the permanent files. The system has a number of application
programs; each of them is defined to manipulate the data files. These application programs have
been written on request of the users in the organization. New application will be added to the
system as the need arises. The system just described is called the file-based system.

Consider a traditional banking system which uses the file-based system in managing the
organization‟s data in the picture below. As we can see, there are different departments in the
Bank, each of them has its own applications which manage and manipulate different data files.
For banking system, the programs can be the one to debit or credit an account, find the balance
of an account, add a new mortgage loan or generate monthly statements etc.

Figure 1.1: File-based approach for banking system

Keeping organizational information in this approach has a number of disadvantages, including:

 Data Redundancy: Since files and applications are created by different programmer of
various departments over long period of time, it might lead to several problems:
 Inconsistency in data format: Updates to files may result in inconsistent data across the
organization. For example, if an accounting application updates a customer name
without notifying other application areas that also maintain customer name, customer
name will be stored differently for different applications. The same information may be

3
kept in several different places (files). Data inconsistency which means various copies of
the same data is conflicting; waste storage space and duplication of effort.
 Data Isolation: A file-based data storage approach makes it difficult for other
applications to access data not owned by their application. Data owned by other
applications may be stored in a format not consistent with the retrieval capabilities of
another application. It is difficult for new applications to retrieve the appropriate data
which might be stored in various files.
 Integrity problems: Data values must satisfy certain consistency constraints which are
specified in the application programs. File-based approaches to data storage are tied to
applications rather than the entities to which the files refer. File-based approaches do
not recognize relationships between entities until such information is needed by an
application. It is difficult to add or change the programs to enforce new constraints.
 Security problems: There are constraints regarding access privileges and application is
added to the system in the ad-hoc manner so it is difficult to enforce those constraints.
 Concurrent – access anomalies: Data may be accessed by many applications that have
not been coordinated previously so it is not easy to provide a strategy to support multiple
users to update data simultaneously.

These difficulties have prompted the development of a new approach in managing large amount
of organizational information – database approach. In the following section, we shall see the
concepts that have been introduced to get over the problems mentioned.

b) Database Approach

Database approaches to data storage support the sharing of data across multiple applications with
multiple users. Databases are structured in a way that is meaningful to an organization. For
example, if an organization maintains information on suppliers and the geographic areas they
service, there would be a link in the database between the suppliers and geographic areas.
Databases reduce data redundancy.

A Database Management System (DBMS) is the software that handles all database accesses.
A DBMS presents a logical view of the data to the users. How this data is stored and retrieved is
hidden from the users. A DBMS ensures that the data is consistent across the database and
controls who can access what data.

4
Database and database technology play an important role in most of social areas where
computers are used, including business, education, medicine etc. To understand the fundamental
of database system, we start from introducing the basic concepts in this area.

1.2 Definition of Database System and DBMS

Database is a shared collection of related data which will be used to support the activities of
particular organization. Database can be viewed as a repository of data that is defined once and
then is accessed by various users. A database has the following properties:

It is a representation of some aspect of the real world; or perhaps, a collection of data elements
(facts) representing real-world information. Database is logical coherent and internally
consistent.

Database is designed, built, and populated with data for a specific purpose.

Database Management System (DBMS) is a collection of programs that enable users to create,
maintain database and control all the access to the database. The primary goal of the DBMS is to
provide an environment that is both convenient and efficient for users to retrieve and store
information.

Application programs accesses the data stored in the database by sending requests to the DBMS.
Some examples of DBMS software are MS-SQL, MS-Access, Oracle, Informix, etc.

5
Figure 1.2: The components of a database system

With the database approach, we can have the traditional banking system as shown in the
following picture.

Figure 1.3: Database approach for banking system

1.3 Characteristics of Database approach

There are a number of characteristics that distinguish the database approach from the file-based
approach. In this section, we describe in detail some of those important characteristics.

6
1. Self-Describing Nature of a Database System: Database System contains not only the
database itself but also the descriptions of data structure and constraints (meta-data).
This information is used by the DBMS software or database users if needed. This
separation makes database system totally different from traditional file-based system in
which data definition is a part of application programs.
2. Insulation between Program and Data: In the filed-based system, the structure of the
data files is defined in the application programs so if user wants to change the structure
of a file, all the programs access to that files might need to be changed. On the other
hand, in database approach, data structure is stored in the system catalog not in the
programs so such changes might not occur.
3. Support multiple views of data: A view is a subset of the database which is defined
and dedicated for particular users of the system. Multiple users in the system might have
different views of the system. Each view might contain only the interested data of a user
or a group of user.
4. Sharing of data and Multiuser system: A multiuser database system must allow
multiple users access the database at the same time. As the result, the multiuser DBMS
must have concurrency control strategies to ensure that several user try to access the
same data item at a time do so in the manner so that the data always be correct.

1.4 Benefits of Database Approach

1. Controlling Data Redundancy

Users of traditional file processing systems, each had a “copy” of relevant data, causing

 Duplication of effort
 Wastage of storage space
 Inconsistent data

Stud-No Name Degree Subject Grade


90 Smith BA COMP182 A
87 Brown BA COMP182 A
98 James BSc COMP181 B
90 Smith BIT COMP181 B

7
Fig. a file consisting of students‟ list in the registrar office

Stud-No Name Degree Finance-Type

90 Smith BIT Self

87 Brown BA UQ Scholar

98 Harrison BSc Self

Fig. a file consisting of students‟ list in the finance office

In the Database approach, ideally each data item is stored in only one place in the database.
However, in some cases, redundancy still exists to improving system performance, but such
redundancy is controlled and kept to minimum.

2. Data Sharing

The integration of the whole data in an organization leads to the ability to produce more
information from a given amount of data

3. Enforcing Integrity Constraints

DBMSs should provide capabilities to define and enforce certain constraints such as data type,
data uniqueness and other constraints.

The DBMS has the capability to define and enforce integrity constraints which are restrictions
placed on the data, based on the semantics or meaning of the data. Examples of constraints are
just like the following:

 Every Subject must have a unique Code


 A student cannot have 2 different grades for the same subject
 A student cannot enroll in more than four 12-credit subjects in a semester
 Student No must be a 9 digit integer
 DBMS cannot check spelling or typing errors, for example, if C was entered
as the grade of a student getting A, the DBMS will not identify the error!

8
4. Restricting Unauthorized Access

Different user groups may have different access privileges (Create/Alter, Update, and Retrieve),
which are controlled through DBMS security sub-system, through the use of Accounts &
Passwords called user accounts. For example,

 Casual users may not have access to confidential data, e.g. medical records, salary
packages, police reports
 Parametric users may be given update access, but are generally not allowed to change
the structure of data
 Database administrators (DBAs) generally have highest privileges, create user accounts
and enforce restrictions

5. Data Independence

The system data descriptions are separated from the application programs. Changes to the data
structure is handled by the DBMS and not embedded in the program.

6. Transaction Processing

The DBMS must include concurrency control subsystem to ensure that several users trying to
update the same data do so in a controlled manner so that the result of the updates is correct.

7. Providing multiple views of data

A view may be a subset of the database. Various users may have different views of the database
itself. Users may not need to be aware of how and where the data they refer to is stored.

8. Providing backup and recovery facilities

DBMS provides facility to recover from hardware and software failures through its backup and
recovery sub-system. If the computer system fails in the middle of a complex update program,
the recovery subsystem is responsible for making sure that the database is restored to the stage it
was in before the program started executing.

9
Section 2: Data Models, Schemas and Instances

Section objectives

At the end of this section, learners should be able to:

 Know what a data model is


 Describe different types of data models
 Discuss advantages and disadvantages of each data model
 Define and understand database schema and database instances

2.1 Data Model

Data model is a collection of concepts that can be used to describe the structure of database.
Structure of database means data types, relationships and constraints. In addition, most data
model includes a set of basic operations for specifying retrievals and modifications on the
database. Data Model provides a means to achieve Data Abstraction. Data Abstraction refers to
the hiding of certain details of how the data are stored and maintained. With several levels of
abstraction, the user‟s view of the database is simplified and this leads to the improved
understanding of data.

There are three levels of abstractions:

1. View level: The highest level of abstraction describes only part of the entire database.
Many users will not be concerned with the large database. Instead, they need to access
only a part of it so that view level abstraction is defined. There are many views for the
same database. View – corresponds to the third view of data: What part of the data is
seen by a specific application
2. Logical/conceptual level: This level describes what data are stored in the whole
database. Conceptual – corresponds to the second view of data: What we want the data
to express and what relationships between data we must express, what “ story” data tells,
are all data necessary for the “story‟ are discussed.
3. Physical level: The lowest level of abstraction describes how the data are actually stored.
Physical – corresponds to the first view of data: How data is stored, how is it accessed,

10
how data is modified, is data ordered, how data is allocated to computer memory and/or
peripheral devices, how data items are actually represented (ASCI, EBCDIC, …)

2.2 Categories of Data Model

1. High-level Conceptual Data models: Provide concepts that are close to the way people
perceive data to present the data. Typical example of this type is entity – relationship model
which uses main concepts like entities, attributes, and relationships. An entity represents
real-world objects such as an employee, a project. An entity has some attributes which
represents properties of entity such as employee‟s name, address, birth-date. A relationship
represents association among entities for example a works on relationship between employee
and project.
2. Record-based Logical Data models: Provide concepts that can be understood by the user
but not too far from the way data is stored in the computer. Three well-known data models
of this type are relational data model, network data model and hierarchical data model.

a) The Relational data model: Represents data as relations. Here is an example of


relational schema for the SUPERMARKET database

EMPS (ENAME, SALARY) MANAGERS (ENAME)

DEPTS (DNAME, DEPT#) SUPPLIERS (SNAME, SADDR)

ITEMS (INAME, ITEM#) ORDERS (O#, DATE)

WORKS_IN (ENAME, DNAME) MANAGES (ENAME, DNAME)

CARRIES (INAME, DNAME) PLACED_BY (O#, CNAME)

CUSTOMERS(CNAME,CADDR,BALANCE) INCLUDES (O#, INAME, QUANTITY)

SUPPLIES (SNAME, INAME, PRICES)

Advantages of RDBMS

The relational database management system offers the following advantages:

 Flexible and well-established

11
 Sound theoretical foundation and use over many years has resulted in stable,
standardized products available.
 Standard data access language through SQL.
 Costs and risks associated with large development efforts and with large databases are
well understood.
 The fundamental structure, i.e., a table, is easily understood and the design and
normalization process is well defined.

Weaknesses of RDBMS

 Performance problems associated with re-assembling simple data structures into their
more complicated real-world representations.
 Lack of support for complex base types, e.g., drawings
 SQL is limited when accessing complex data.
 Knowledge of the database structure is required to create ad hoc queries.
 Locking mechanisms defined by RDBMSs do not allow design transactions to be
supported, e.g., the "check in" and "check out" type of feature that would allow an
engineer to modify a drawing over the course of several working days.

b) The Network model

In a network database structure, data is represented by records and links. There is no limit to the
number of immediate parent segments or the number of immediate child segments a given record
occurrence may have. The network database structure allows the modeling of many-to-many
relationships. Represents data as record types and also represents a limited type of one-to-many
relationships, called set type. The figure below shows a schema in network model notation.

12
Figure 1.4: Sample schema in network model

The advantages of a network database are:

 Efficient representation of some structures (varies


widely with physical implementation),
 More flexibility than a hierarchical approach (all
relationships can be represented without redundancy).

Weaknesses of Network Databases :

 machine performance (physical implementations that


perform well for one type of network may perform
poorly for another type),
 maintainability (changing relationships may require
physical reorganization of data),
 Lack of robustness. A failure in the system can leave
dangling references to data which must somehow be
recovered,
 Update overheads in multi-key databases.

13
c) The Hierarchical model:

In the hierarchical structure, data is represented by a simple tree structure. The record type at the
top of the tree is usually known as the "root." The simple hierarchical structure consists of a root
and a single dependent record type. In general, the root may have any number of dependent
records, each of which may have any number of lower-level dependents, and so on, to any
number of levels. The hierarchy view contains records of different types connected by links.

Hierarchical relationships of records are explicitly defined in the data structure. A parent record
can have many child records but a child record can have only one parent. There are no many-to-
many relationships between records. No dependent record within a hierarchical data structure
can exist without its parent record. For this reason, records must be seen in context.

Here is the schema in hierarchical model notation.

Figure 1.5: Sample schema in hierarchical model notation

The advantages of a hierarchical database are:

 Efficient representation of hierarchical structures,


 Efficient single key search and access time (if the hierarchical structure corresponds to
application views of the data),
 Fast update performance where locality of reference exists (locality of reference states that
performance is significantly enhanced when the processing is close to the data being
processed).

The disadvantages of a hierarchical database are:

 Lack of flexibility (non-hierarchical relationships are awkward to represent; redundancy

14
may be required),
 Poor performance for non-hierarchical accesses,
 Lack of maintainability (changing relationships may require physical reorganization of
data).

d) Object-Oriented Data Model

With the increased awareness of the object-oriented paradigm, many DBMSs claim to be object-
oriented databases. The object-oriented model provides the basic DBMS characteristics. In
addition Object-oriented Database Management Systems provide the following features:

 Support for Complex Objects: An ODBMS must provide support for data types that allow
storage of complex objects. An RDBMS supports only basic types, (e.g., integer,
character), that are assembled into table rows to define objects.
 Identity: An object must be uniquely identified.
 Encapsulation: Hiding information and only allowing access through the public interface is
a basic concept in object-orientation. Strict implementations enforce access to data
through programs; however, some ODBMS products allow query access to the data.
 Classes and Types: An ODBMS must allow definition of classes and data types by the
user. In addition these must not be restricted by the implementation. User defined types
must be treated the same as the product supplier's base types in the database functionality.
 Inheritance: Inheritance is a basic concept in object-orientation that is a major factor in re-
use. Inheritance of data and methods is a primary feature of an object-oriented
programming language (OOPL).
 Dynamic Binding and Polymorphism: This characteristic allows different implementations
of semantically similar methods with the same name in different classes.
 Computational Completeness: The integrated language must be computationally complete.
For example, SQL is relationally complete since a SELECT statement can be constructed
to retrieve any subset of data within an RDBMS. SQL is not computationally complete (it
is sometimes called a data sublanguage) since there are computations that cannot be done
using SQL constructs. The ODBMS language cannot have this restriction.
 Extensibility: The user must be able to add non-restricted types to the database

15
Strengths of ODBMS

 Representations of complex structures allow creation of a model that more closely


resembles its real world counterpart.
 Storage of complex objects results in better performance.
 The model exhibits better coupling and cohesion characteristics. This results in a more
maintainable and flexible database.

Weaknesses of ODBMS

 Lack of maturity in the products. This leads to more risk in employing the product. There
is also a lack of product standardization. Database standards are emerging.
 There is no equivalent of SQL for object-oriented databases. Work is being done on
establishing standards.
 There is much less experience with project development. Hence, there is a lack of
information on costs and schedules. In addition, there is added training required for the
team because of the lack of familiarity with the object paradigm.
 There is no consistent, solid theoretical basis to support ODBMS products. This raises
questions about the completeness of the implementations. In addition, there is a plethora of
analysis and design approaches.

3. Physical Data models: Provide concepts that describe how data is actually stored in the
computer.

2.3 Selecting a Database

Database technology may not always be needed, depending on the sophistication of the system's
data structures, complexity of data, and the need for features provided by databases.

In selecting a database, it is important to consider, in addition to the actual data management


capabilities, the extent of the development environment provided or supported. Since the
physical view of data can be separated from the application view, the development environment
provided or supported will have a significant effect on the ability to generate new applications or

16
maintain existing applications using this data. Application engineers and developers do not need
to know about the storage structure or access strategy of data to develop or modify applications.

Support for a standard query language, such as SQL, is also an important criterion. By
developing applications using a standard query language, flexibility is achieved since the
underlying database product can be changed or modified without necessarily affecting the
applications that access the data. Note, however, that some DBMSs have added functionality not
supported by the standard query language. Use the added functionality with caution.

2.4 Database Instances and Schemas

The description of a database is called the DB Schema. It is specified during the DB design. It is
not expected to be changed frequently. It is usually represented by diagrams (Schema diagram)
(see next figure). Each object of the schema (such as student and course) is called a schema
construct. The Schema diagram specifies some aspect (names of records and data items) and
some other aspects as data type and relationships are not specified. The schema is called the
“intension”. The stored description of the schema is called the “meta-data”. The description of
the database which is designed in the early stage and is not expected to change frequently is
called the database schema. Database system has several schemas. The following are examples
of schemas:

Student
Name StudentNo Class Major

Course
CourseName CourseNumber CreditHours Department

The data in the DB at a particular moment in time is called a Database State or Snapshot or the
Current Set of Occurrence or Instances. Its initial state is generated when the DB is first
populated with data and then it moves from valid state to another valid state during data
manipulation. It is called the “extension”. Since information can be inserted to or deleted from
database at anytime, database changes over time. At a particular moment, the collection of
information stored in the database is called an instance of the database.

17
Section 3: Database Architecture and database users

Section objectives

After successful completion of this section, you will be able to:

 Fully understand the architecture of a database (ANSI/SPARC Architecture)


 Know the concept of data independence
 Know the different types of data base users
 Know different types of database languages
 Understand levels of data abstruction

3.1. ANSI/SPARC Architecture (Three –level Architecture)

Three – level architecture for database system is proposed to archive the characteristics of the
database approach. The goal of this architecture is to separate the applications programs from the
physical database so the actual details of how data is organized are hidden from the users.

Figure 1.6: Three- level Architecture

18
As we can see from the picture above, there are three levels of schemas in the database
architecture

a) External level: In this highest level, there exists a number of views which of is
defined a part of the actual database. Each view is provided for a user or a group of
users so that it helps in simplified the interaction between the user and system. Also
high-level conceptual data model or implementation data model can be used at this
level.
b) Conceptual level: Conceptual Schema in this level describes the logical structure of
the whole database. The entire database is described using simple logical concepts
such as objects, their properties or relationships. Thus the complexity of the
implementation detail of the data will be hidden from the users. Conceptual level has a
conceptual schema which describes the structure of the DB. It hides the details of the
physical storage structure. Also it describes the entities, data types, relationships,
constraints and user operations. High-level conceptual data model or implementation
data model can be used at this level.
c) Internal level: Internal Schema in this level describes how the data are actually
stored, how to access the data. Internal schema has a Physical Data Model

3.1.1 Notes on the Three-Schema architecture

It is a convenient tool for the user to understand and visualize the schema levels of the DB
system. Most of the DBMSs do not separate the three levels completely. Some DBMSs may
include physical details in the conceptual schema. Most of the DBMSs include external views in
the conceptual data model. DBMSs perform mappings to transform requests and results between
the levels. Mappings between levels are a time-consuming process, so some DBMSs do not
support the external views.

3.1.2. Data Independence

Data independence is the ability to modify the schema in one level without affecting the schema
in the higher level. There are two levels of data independence:

Logical data independence is the ability to make change in the conceptual schema without
causing a change in the user views or application program. Conceptual schema may be changed

19
to expand the DB as adding new data items or reducing it by removing data items. The
applications involved in the modified data items can only be changed without any effect on the
remaining applications.

Physical data independence is the ability to make change in the internal schema without causing
a change in the conceptual schema or application program. Internal schema may be changed due
to some file reorganization to meet certain system performance.

Physical data independence seem to be easier to achieve since the way the data is organized in
the memory affect only the performance of the system. Meanwhile, the application program
depends much on the logical structure of the data that they access.

3.2. Database Languages

Data Definition Language (DDL): This is used to define the conceptual and internal
schemas for a database system. It is not procedural language, rather a language for describing the
types of entities and relationships among them in terms of a particular data model.

These are the commands for creating tables, defining relationships, and controlling other aspects
of the database that are more structural than data related.

Data Manipulation Language (DML): This is used to manipulate the database, which
typically includes retrieval, insertion, deletion, and modification of the data.

Data Control Language (DCL): This is a set of commands that generally control
permissions on the data, such as defining access rights. Many database users will never use these
commands, because they work in larger company environments where one or more database
administrators are employed specifically to manage the database, and usually one of their roles is
to control permissions.

3.3. Database Users

There are different types of users of the Database. These are of the following:

I. End users

20
These are people whose jobs require access to database for querying, updating and generating a
report. End users might be one of the following:

Naïve users: Naïve users are people who use the existing application programs to perform their
daily tasks. They do not know about database too much, invoke application programs that are
prepared already.

Sophisticated users: They are people who use their own way to access to the database. This
means they do not use the application program provided in the system. Instead, they might
define their own application or describe their need directly in a query language. They know
advanced use of the system and can use the system and packages on the top of the system.

Specialized users/ DBMS system users: Maintain the personal database by using ready –made
program packages that provide easy-to-use menu. Write specialized database applications that do
not fit into the traditional data processing framework.

II. Application Programmers

People implement specific application programs to access the stored data. This kind of user
needs to be familiar with the DBMSs to accomplish their task. Know how to interact with the
system but may not know how DBMS is designed.

III. Database Administrators(DBAs)

A person or a group of people in the organization who is responsible for authorizing the access
to the database, monitoring its use and managing all the resource to support the use of the whole
database system. Coordinates all the activities of the database system; the database administrator
has a good understanding of the enterprise‟s information resources and needs.

Database administrator's duties include:

 Schema definition
 Storage structure and access method definition
 Schema and physical organization modification
 Granting user authority to access the database
 Specifying integrity constraints

21
 Acting as liaison with users
 Monitoring performance and responding to changes in requirements

IV. Database Designers

Database designers are responsible for identifying the data to be stored in the database and for
choosing appropriate structures to represent and store this data. These tasks are mostly
undertaken before the database is actually implemented and populated with data. It is the
responsibility of database designers to communicate with all prospective database users, in order
to understand their requirements, and to come up with a design that meets these requirements.

In many cases, the designers are on the staff of the DBA and may be assigned other staff
responsibilities after the database design is completed. Database designers typically interact with
each potential group of users and develop a view of the database that meets the data and
processing requirements of this group. These views are then analyzed and integrated with the
views of other user groups. The final database design must be capable of supporting the
requirements of all user groups.

22
UNIT TWO

DATA MODELING USING ENTITY-RELATIONSHIP


MODEL

Objectives

After successful completion of this UNIT, you will be able to:

 Define entity, attribute and relationship


 Formulate ER diagram from specifications of needs of an organization
 Differentiate entity types, attribute types and relationship types
 Understand problems of Entity Relationship data model
 Define and understand EER
 Construct EER diagram

 Identify additional concepts in EER model

Structures

Data modeling using ER model

Database application development process

Entity relationship model (ER Model)

Enhanced-ER (EER) model concepts

Introduction

Conceptual modeling is an important phase in the database design process. In this UNIT, we
will discuss briefly about data modeling and then focus on modeling concepts of the Entity –
Relationship Model, which is popular high-level conceptual data model.

23
Section 1: Database Application Development Process

Section objectives

After going through this section, you will be able to:

 Know the steps of database design process


 Use the knowledge you have got to design a real database application

1.2. Database Application Development

Database Application Development is the process of obtaining real-world requirements, analyze


requirement, design the data and functions of the system and then implement the operations in
the system.

The figure below shows the main phases of the database application development process.

Figure 2.1: The main phases of the database application development process

As we can see, the first step is Requirement Collection and Analysis. During this step, the
database designers have to interview the customers (database users) to understand the be-built
system, obtain and document the data and functional requirements. The result of this step is a
document including the detail requirements of the users.

24
Data Modeling is the next step in the process. This step is sometimes considered as a high-level
and abstract design phase (conceptual design). The aims of this phase are:

 Describes what data is contained in the database (E.g. entities: students, lecturers,
courses, subjects etc.)
 Describes the relationships between data items (E.g. Students are supervised by
Lecturers; Lecturers teach Courses )
 Describes the constraints on data (E.g. Student Number has exact 8 digits; A subject has
4 or 6 unit of credits only)

The data items, the relationships and constraints all are expressed using the concepts provided by
the high-level data model. Because these concepts do not include the implementation details so
the results of the data modeling process is a (semi) formal representation of the database
structure. This result is quite easy to understand so it is used as reference to make sure that all the
user‟s requirements are met.

The third step is Database Design. During this step, we might have two sub steps called Database
Logical Design which define a database in a data model of a specific DBMS, and Database
Physical Design which define the internal database storage structure, file organization or
indexing techniques. The last two steps shown are Database Implementation and
Operations/Interfaces Building focus on create instance of schema and implementing operations
and user interfaces.

In the database design phases, data is represented using a certain data model. Data Model is a
collection of conceptual concepts or notations for describing data, data relationships, and data
semantics and data constraints. Most data models also include a set of basic operations for
manipulating data in the database.

As mentioned under “the three levels of data models”, data models are either:

1. Conceptual models
o Database is considered as a collection of entities (objects) of various kinds.
o These models provide a flexible data structuring capabilities.
o The typical example of this model is entity-relationship model, object-oriented
model or semantic data model.

25
2. Record based logical models
o Database is considered as a collection of fixed – size record.
o These models are closer to the physical level or file structure so they are easier
to implement.
o The three most well-known models of this kind are relational data model,
network data model or hierarchical data model.
3. Physical models
o Provide concepts that describe the details of how data is stored in the computer‟s
memory

In the next section, we will discuss the Entity-Relationship Data Model in more detail.

Section 2: Entity Relationship Model (ER Model)

Section objectives

At the end of this section, you will be able to:

 Fully understand an ER database model


 Define entities, attributes and relationships
 Identify entities, attributes and relationships from specifications
 Draw ER diagram from specifications

Introduction

Entity Relationship Data Model has existed for 30 years (original description: Chen, ACM
Transactions on Database Systems, 1(1), 1976).

The ER model is well suited to data modeling for use with databases because it is fairly abstract
and is easy to discuss and explain. ER models are readily translated to relations.

ER modeling is based on two concepts: Entities, that is, things. E.g. Prof. Ba, Course Database
System and Relationships, that is, associations or interactions between entities. E.g.. Prof. Ba
teaches course Database Systems. ER models (or ER schemas) are represented by ER diagrams.

26
In this section, we will use a sample database called the COMPANY database to illustrate the
concepts of Entity Relationship Model. This database contains the information about employees,
departments and projects:

There are several departments in the company. Each department has a unique identification, a
name, location of the office and a particular employee who manages the department. A
department controls a number of projects, each of which has unique name, a unique number and
the budget. Each employee has name, an identification number, address, salary, birth date. An
employee is assigned to one department but can join in several projects. We need to record the
start date of the employee in each project. We also need to know the direct supervisor of each
employee. We want to keep track of the dependents of the employees. Each dependent has name,
birth-date and relationship with the employee.

2.1. Entity, Entity Set and Entity Type

Entity

Entity is an object in the real world with an independent existence and can be differentiated from
other objects.

An entity might be an object with physical existence like a lecturer, a student, a car or an object
with conceptual existence like a course, a job, a position, etc.

An Entity Type defines a collection of similar entities.

An Entity Set is a collection of entities of an entity type at a point of time. In ER diagram, an


entity type is represented by a name in a box

Figure 2.2: Entity type in ER diagram

27
2.2. Attribute

Each entity is described by a set of attributes that model “properties of interest”.

E.g. Employee = (Name, Address, Age, Salary)

Each attribute has a name, associated with an entity and is associated with a domain of legal
values. However the information about attribute domain is not presented on the ER diagram

In the diagram, each attribute is represented by an oval with a name inside.

Figure 2.3: Attribute diagram

Types of Attributes

I) SIMPLE attributes are attributes that are drawn from the atomic value domains

E.g. Name = {John}; Age = {23}

II) COMPOSITE attributes: Attributes that consist of a hierarchy of attributes

E.g. Address may consist of “Number”, “Street” and “Suburb” → Address = {59 + „Meek
Street‟ + „Kingsford‟}

III) SINGLE VALUED attributes: Attributes that have only one value for each entity

E.g. Name, Age for EMPLOYEE

IV) MULTIVALUED attributes: Attributes that have a set of values for each entity

28
E.g. Degrees of a person: „BSc‟, „MIT‟, „PhD‟

V) DERIVED attributes: Attributes Contain values that are calculated from other attributes

E.g. Age can be derived from attribute Date-Of-Birth. In this situation, Date-Of-Birth might be
called Stored Attribute.

Figure 2.4: Notation of Composite attribute in ER Diagram

Figure 2.5: Notation of Multi-valued attribute in ER Diagram

29
Figure 2.6: Notation of Derived attribute in ER Diagram

Keys

An important constraint on the entities is the key. Key is an attribute or a group of attributes
whose values can be used to uniquely identify individual entity in an entity set.

For example, for the entity EMPLOYEE = {EID, Name, Address, Age, Salary}

 Definite keys are any set that involves EID


 Possible keys might be {Name, Address}
 Unlikely keys: {Name}, {Age}

There is a key that is chosen by the database designer used as an identifying mechanism for the
whole entity set: primary key. This key is indicated by underlying attributes in the ER model.

Figure 2.7: Key Notation

2.3. Relationship, Relationship Set and Relationship Type

Relationship is an association among several entities

E.g. Employee (John) joins Project (Netlife)

 The set of similar associations at a point of time is called the Relationship Set

Eg. The following “joins in” is a relationship set

Employee (John) joins in Project (Netlife)

Employee (Mark) joins in Project (GreenNet)

30
Employee (Sara) joins in Project (mFORM)

Employee (Steve) joins in Project (Netlife)

Relationship type is a collection of relationships among entities from a certain set of entity
types.

We can define the relationship type more formally.

Definition: Consider n ≥ 2 possibly non-distinct entity types E1, E2, En. Then a relationship is a
tuple (e1, e2… en) is a subset of E1 x E2 x … x En.

A relationship type R is a subset of the Cartesian Product E1 x E2 x … x En. R has degree n

In the ER diagram, relationship type is represented by a diamond with a name inside and is
connected by straight line to the rectangles representing entity type.

Figure 2.8: Relationship type notation

Degree of Relationship Type

The degree of a relationship type is the number of participating entities types.

 Unary (Recursive) relationship type is the relationship that involves only one entity
type. However, the same entity type participates in a relationship type in different roles.
For example, the figure below shows the Supervise relationship type which relates an
Employee and a Supervisor who is also an employee. So in this relationship, one
employee has the role of supervisor, another has the role of supervisee.

31
Figure 2.9: Supervise Relationship

In ER model, we have the diagram

Figure 2.10: Supervisor Notation

 Binary relationship type: This relationship type has two entity types link together. This
is the most common relationship.

For example the “Joins in” relationship between EMPLOYEE and PROJECT

32
Figure 2.11: "Joins in" Relationship

The ER diagram for this relationship type is :

Figure 2.12: "Joins in" relationship notation

 Ternary relationship type: If there are three entity types link together, the
relationship is called ternary relationship.

For example: The Supply relationship associates a SUPPLIER, a PART and a PROJECT.

33
Figure 2.13: Ternary relationship example

The ER diagram for this relationship type is:

Figure 2.14: Ternary relationship notation

Relationship Type Constraints

The constraints of relationship types limit the possible combination of entities that participate in
the relationship set. Two main types of constraints is mapping cardinality and participation.

I. Mapping Cardinality

Mapping Cardinality describes the maximum number of entities that a given entity can be
associated with via a relationship. In this section, we consider only the cardinality constraint for

34
the binary relationship. The possible cardinality for binary relationship types are: One - to- One
(1-1), One – to – Many (1 – N) and Many – to – Many (M-N).

I. One to one relationship: Given two entity sets A and B, there is a one-to-one
relationship between A and B if each entity in the set A is associated with at most one
entity in the set B and vice versa each entity in the set B is associated with at most one
entity in the set A.

Figure 15: One-to-one relationship example

Example: An employee if is a manager then can manage one department and a department can
have only one manager.

Figure 16: One-to-one relationship notation

II. One-to-many relationship: There is a one-to-many relationship sets associates two


entities sets A and B if each entity in A is associated with several entities in B however,
each entity in B is associated with at most one entity in A.

35
Figure 2.17: One-to-many relationship example

Example: There are many employees work in a department however; an employee can work for
only one department.

Figure 2.18: One-to-many relationship notation

III. Many-to-many relationship: There is a many-to-many relationship sets associates two


entities sets A and B if each entity in A is associated with several entities in B however,
each entity in B is associated with several entities in A.

Figure 2.19: Many-to-many relationship example

Example: An employee can join in several projects and a project can have several employees

36
Figure 2.20: Many-to-many relationship notation

II. Participation

The participation constraints specify whether the existence of an entity depends on its being
related to another entity via the relationship type.

Participation in relationship set R by entity set A may be

IV. Total: It means every entity a in A participates in at least 1 relationship in R

E.g. every project has at least 1 employee joined in it.

V. Partial: It means only some a in A participate in relationships in R

E.g. not every employee in the company joins in a project.

Figure 2.21: Participation in relationship notation

Attributes of Relationship Types

The relationship types can have attributes like entity types. For 1-1 and 1-N relationship types,
attributes of the relationship can be migrated to one of the participating entity types. However,
for M-N relationship types, attributes must be specified as the relationship attribute.

Example: The Start_Date attribute records the date on which an employee joins in a project

37
Figure 2.22: Attribute in relationship example

Weak Entity Type

Weak entity types are those whose existence depends on the existence of another entity type.
Entities belonging to weak entity type are identified by being related to specific entities from
another entity type which is called strong entity type.

The weak entity types do not have key attributes of their own. The keys of this type are partially
or totally derived from strong entity types.

In ER diagram, weak entity and strong/weak relationship are denoted by double box/diamond

Example: Entity type DEPENDENT related to EMPLOYEE. This entity type is used to keep
track of the dependents of each employee via 1-N relationship. DEPENDENT entity has Name,
Birthdate and Relationship. Two dependents of two distinct employees might have the same
{Name, Birthdate, Relationship} values but they are totally different. They are identified as
distinct only after considering the association with particular employee entities to which the
dependents relate. So the key of the DEPENDENT might be {EID, Name} in which EID is the
attribute from the strong entity type EMPLOYEE.

38
Figure 2.23: Weak entity type notation

Entity Relationship Diagram for COMPANY database

Modeling Entities:

Figure 2.24: Entities of COMPANY database

Figure 2.25: Entities of COMPANY database

39
Modeling Relationships

Problems with ER Models

There are several problems that may arise when designing a conceptual data model. These are
known as connection traps. There are two main types of connection traps:

3.1. Fan traps

A fan trap occurs when a model represents a relationship between entity types, but the pathway
between certain entity occurrences is ambiguous. It occurs when 1:M relationships fan out from
a single entity.

A single site contains many departments and employs many staffs. However, which staff works
in a particular department? The fan trap is resolved by restructuring the original ER model to
represent the correct association.

40
3.2. Chasm traps

A chasm trap occurs when a model suggests the existence of a relationship between entity types,
but the pathway does not exist between certain entity occurrences. It occurs where there is a
relationship with partial participation, which forms part of the pathway between entities that are
related.

is_allocated oversees
Branch Staff Property
n 0 0

A single branch is allocated to many staffs who oversee the management of properties for rent.
Not all staffs oversee property and not all property is managed by a member of staff.

• What properties are available at a branch?


• The partial participation of Staff and Property in the oversee relation means that some
properties cannot be associated with a branch office through a member of staff.
• We need to add the missing relationship which is called „has‟ between the Branch and
the Property entities.
• You need to therefore be careful when you remove relationships which you consider to
be redundant.

41
Section 3: Enhanced-ER (EER) Model Concepts

Section objectives

At the end of this section, you will be in a position to:

 Understand additional concepts in EER model


 Draw EER diagram
 Use generalization and specialization processes to design database
 Understand some object-oriented concepts in database design

Introduction

EER (Enhanced Entity Relationship) data model includes all modeling concepts of basic ER
database model. It also includes additional concepts: subclasses/superclasses,
specialization/generalization, categories, and attribute inheritance. The resulting model is called
the enhanced-ER or Extended ER (E2R or EER) model. It is used to model applications more
completely and accurately if needed and it includes some object-oriented concepts, such as
inheritance

3.1. Subclasses and Superclasses

An entity type may have additional meaningful sub groupings of its entities

Example: EMPLOYEE may be further grouped into SECRETARY, ENGINEER, MANAGER,


TECHNICIAN, SALARIED_EMPLOYEE, HOURLY_EMPLOYEE, etc.

– Each of these groupings is a subset of EMPLOYEE entities


– Each is called a subclass of EMPLOYEE
– EMPLOYEE is the superclass for each of these subclasses

These are called superclass/subclass relationships.

– Example: EMPLOYEE/SECRETARY, EMPLOYEE/TECHNICIAN

42
These are also called IS-A relationships (SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A
EMPLOYEE, …).

Note: An entity that is member of a subclass represents the same real-world entity as some
member of the superclass

– The Subclass member is the same entity in a distinct specific role


– An entity cannot exist in the database merely by being a member of a subclass;
it must also be a member of the superclass
– A member of the superclass can be optionally included as a member of any
number of its subclasses
• Example: A salaried employee who is also an engineer belongs to the two subclasses
ENGINEER and SALARIED_EMPLOYEE
– It is not necessary that every entity in a superclass be a member of some
subclass

Attribute Inheritance in Superclass / Subclass Relation

An entity that is member of a subclass inherits all attributes of the entity as a member of the
superclass. It also inherits all relationships

3.2. Specialization and Generalzation

a) Specialization

Is the process of defining a set of subclasses of a superclass. The set of subclasses is based upon
some distinguishing characteristics of the entities in the superclass.

Example: {SECRETARY, ENGINEER, TECHNICIAN} is a specialization of EMPLOYEE


based upon job type.

– May have several specializations of the same superclass

Example: Another specialization of EMPLOYEE based in method of pay is


{SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}.

43
– Superclass/subclass relationships and specialization can be diagrammatically
represented in EER diagrams
– Attributes of a subclass are called specific attributes. For example, TypingSpeed
of SECRETARY
– The subclass can participate in specific relationship types. For example,
BELONGS_TO of HOURLY_EMPLOYEE

Example of a Specialization

b) Generalization

Generalization is the reverse of the specialization process. Several classes with common features
are generalized into a superclass; original classes become its subclasses.

Example: CAR, TRUCK generalized into VEHICLE; both CAR, TRUCK become subclasses of
the superclass VEHICLE.

– We can view {CAR, TRUCK} as a specialization of VEHICLE


– Alternatively, we can view VEHICLE as a generalization of CAR and TRUCK

Generalization and Specialization

– Arrow pointing to the generalized superclass represents a generalization


– Arrows pointing to the specialized subclasses represent a specialization
– We do not use this notation because it is often subjective as to which process is
more appropriate for a particular situation

44
– We advocate not drawing any arrows in these situations
– A superclass or subclass represents a set of entities
– Shown in rectangles in EER diagrams (as are entity types)
– Sometimes, all entity sets are simply called classes, whether they are entity
types, superclasses, or subclasses

3.3. Constraints on Specialization and Generalization

If we can determine exactly those entities that will become members of each subclass by a
condition, the subclasses are called predicate-defined (or condition-defined) subclasses

– Condition is a constraint that determines subclass members


– Display a predicate-defined subclass by writing the predicate condition next to
the line attaching the subclass to its superclass

If all subclasses in a specialization have membership condition on same attribute of the


superclass, specialization is called an attribute defined-specialization

– Attribute is called the defining attribute of the specialization


– Example: JobType is the defining attribute of the specialization {SECRETARY,
TECHNICIAN, ENGINEER} of EMPLOYEE

If no condition determines membership, the subclass is called user-defined

– Membership in a subclass is determined by the database users by applying an


operation to add an entity to the subclass
– Membership in the subclass is specified individually for each entity in the
superclass by the user

Two other conditions apply to a specialization/generalization:

I. Disjointness Constraint:

Specifies that the subclasses of the specialization must be disjointed (an entity can be a member
of at most one of the subclasses of the specialization). It is specified by d in EER diagram. If not

45
disjointed, overlap; that is the same entity may be a member of more than one subclass of the
specialization. Overlap is specified by o in EER diagram

II. Completeness Constraint:


 Total specifies that every entity in the superclass must be a member of some subclass in the
specialization/ generalization. It is shown in EER diagrams by a double line.
 Partial allows an entity not to belong to any of the subclasses. Shown in EER diagrams by a
single line

Hence, we have four types of specialization/generalization:

 Disjoint, total
 Disjoint, partial
 Overlapping, total
 Overlapping, partial

Note: Generalization usually is total because the superclass is derived from the subclasses.

Example of disjoint partial Specialization

Specialization / Generalization Hierarchies, Lattices and Shared Subclasses

A subclass may itself have further subclasses specified on it. This forms a hierarchy or a lattice.

 Hierarchy has a constraint that every subclass has only one superclass (called single
inheritance)

46
 In a lattice, a subclass can be subclass of more than one superclass (called multiple
inheritances)

In a lattice or hierarchy, a subclass inherits attributes not only of its direct superclass, but also of
all its predecessor superclasses. A subclass with more than one superclass is called a shared
subclass

We can have specialization hierarchies or lattices, or generalization hierarchies or lattices

In specialization, start with an entity type and then define subclasses of the entity type by
successive specialization (top down conceptual refinement process)

In generalization, start with many entity types and generalize those that have common properties
(bottom up conceptual synthesis process). In practice, the combination of two processes is
employed.

Specialization / Generalization Lattice Example (UNIVERSITY)

47
Categories (UNION TYPES)

All of the superclass/subclass relationships we have seen thus far have a single superclass. A
shared subclass is subclass in more than one distinct superclass/subclass relationships, where
each relationship has a single superclass (multiple inheritance). In some cases, we need to model
a single superclass/subclass relationship with more than one superclass.

Superclasses represent different entity types. Such a subclass is called a category or UNION
TYPE.

Example: Database for vehicle registration, vehicle owner can be a person, a bank (holding a lien
on a vehicle) or a company.

Category (subclass) OWNER is a subset of the union of the three superclasses COMPANY,
BANK, and PERSON.

A category member must exist in at least one of its superclasses

Note: The difference from shared subclass, which is subset of the intersection of its superclasses
(shared subclass member must exist in all of its superclasses).

Example of categories (UNION TYPES)

48
49
UNIT THREE

RELATIONAL DATA MODEL

Unit objectives

After successful completion of this unit, you will be able to:

 Define a relation(table), record(tuple), field(attribute)


 Appreciate the features of relational database model
 Know the domain of attributes
 Understand other constraints that should hold on the data in the database
 Transform ER models to Relational models
 Transform EER models to Relational models
 Realize relation schema

Structures

Relational data model

Fundamental concepts in relational data model

Mapping ER designs to relational schemas

Mapping ER model constraints to relations

Introduction

The relational data model is introduced by C.F.Codd in 1970. Currently, it is considered as the
most widely used data model.

50
The relational model has provided basis for:

 Research on theory of data/relationship/constraint


 Numerous database design methodologies
 The standard database access language SQL
 Almost all modern commercial database management systems

The relational data model describes the world as “a collection of inter-related relations (or
tables)“

Section 1: Fundamental concepts in Relational Data Model

Section objectives

At the end of this section, you will be in a position to:

 Use allowable values for attributes or fields


 Define domain, relation, record
 Understand the different constraints that should hold on data in the database

1.1. Domain: A domain D is the original sets of atomic values used to model data. By atomic,
we mean that each value in the domain is indivisible as far as the relational model is
concerned. For example:

The domain of day shift is the set of all possible days: {Mon, Tue, Wed…}. The domain of
salary is the set of all floating-point numbers greater than 0 and less than 200,000 (say). The
domain of name is the set of character strings that represents names of person.

1.2. Relation (Relation state): A relation is a subset of the Cartesian product of a list of
domains characterized by a name.

Given n domains denoted by D1, D2, …, Dn , r is a relation defined on these domains if


r⊆D1×D2×...×Dn

51
Relation can be viewed as a “table”. In that table, each row represents a tuple of data values and
each column represents an attribute.

4.4. Attribute: A column of a relation designated by name. The name associated should be
meaningful. Each attributes associates with a domain.

4.5. A relation schema denoted by R is a list of attributes (A1, A2, …, An)

The degree of the relation is the number of attributes of its relation schema.

The cardinality of the relation is the number of tuples in the relation.

Example of relation, relation schema and attribute:

EMPLOYEE is Relation Name

Eid, Name, Birthdate, Salary, Department are called Attributes (or Columns)

Each row of the tables is called Tuple (or Row/Record)

Figure 1: Sample relation

Relation schema: EMPLOYEE(Eid, Name, Birthdate, Salary, Department)

3.4. Characteristic of relations

 Ordering of Tuples in a relation has no effect: A tuple is a set of values. A relation is


a set of tuples. Since a relation is a set, there is no ordering on rows.

52
 Ordering of Values within a tuple is not important: The order of attributes and their
values within a relation is not important as long as the correspondence between
attributes and values is maintained. Thus the following is a different representation of
the above EMPLOYEE relation

EMPLOYEE
Department Eid Name Birthdate Salary
D1 20001234 Nguyễn Thành 22/11/1970 30000
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000
D2 20000980 Vũ Minh 30/4/1967 50000
D3 19991323 Nguyễn Thành 10/10/1966 50000

 Values and NULL values in the tuple: Each value in a tuple is atomic. That means
each value cannot be divided into smaller components. Hence, the composite and multi-
valued attributes are not allowed in a relation.

In a relation, there can be several values of attributes that may be unknown or currently not
applicable. The following table contains some NULL values in it:

EMPLOYEE
Department Eid Name Birthdate Salary Cell-Phone
D1 20001234 Nguyễn Thành 22/11/1970 30000 0912304101
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000 NULL
D2 20000980 Vũ Minh 30/4/1967 50000 0903132123
D3 19991323 Nguyễn Thành 10/10/1966 50000 NULL

The data shows that there are employees with no cell phone because the number might be
unknown at this point of time.

3.5. Constraints in Relational Data

Constraint is a very important feature in relational model. In fact, relational model support a
well-defined theory of constraint on attributes or tables.

53
Constraint is useful because it allows designer to specify the semantics of data in database and it
is the rules to enforce DBMSs to check that new data satisfies the semantics.

Integrity constraint

Relation allows us to represent data and association. Domain restricts the values of attributes in
the relation and it is a constraint of relational model. However, there are real –world semantics
on data that cannot specify if use only domain. We need more specific way to state what data
values are/are not allowed, what format is suitable for an attributes. For example, employee
number must be unique; employees‟ age is in the range [23, 65]. Such information is provided in
logical statements called integrity constraints.

There are several kinds of integrity constraints:

1. Key constraint

A relation is a set of tuples. By definition, all elements in a set are distinct hence all tuple in a
relation must be distinct.

In relational model, tuples have no identity like object –identification. Tuple identity is totally
value-based. Therefore, we need key constraint that is the way of uniquely identify a tuple.

Given a relation schema R with U is the list of attributes, there are a set K which is a subset of U.
If in a relation r of E with any two distinct tuples t1 and t2 we have the constraint that t1[K] ≠
t2[K] then K is called a superkey of the relation schema R.

A superkey can have redundant attributes. A superkey that has no such attributes is called a
candidate key. Formally, we can define a candidate key as following.

Given a relation schema R, K‟ is called a candidate key of R if K‟ satisfies two constraints:

 K‟ is a superkey of R
 There is no such proper subset of K‟ that is also a superkey of R or K‟ is the minimum
superkey)

54
Since a relation schema may have more than one candidate key thus there is a chosen candidate
key whose values are used to identify tuples in the relation. Such key is primary key. Primary
key is usually the most simple candidate key (i.e. key with single attribute or small number of
attributes)

In relational database, key are often implemented by introducing an attribute specially designed
for being key.

Example: What could be the superkey, candidate key and primary key of the following relation?

EMPLOYEE
Department Eid Name Birthdate Salary Cell-Phone
D1 20001234 Nguyễn Thành 22/11/1970 30000 0912304101
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000 NULL
D2 20000980 Vũ Minh 30/4/1967 50000 0903132123
D3 19991323 Nguyễn Thành 10/10/1966 50000 NULL

Superkey: {Name, Salary} , {Eid}, {Name, Birthdate}

Candidate key: {Eid}

Primary key: {Eid}

2. Entity constraint: No attribute in the primary key can be NULL. This is because, NULL
values for the primary key means we cannot identify some tuples. For example, in the
EMPLOYEE relation showed above, Cell-Phone cannot be a key since we cannot use this
attribute to identify employees 20012322 and employee 19991323.

3. Referential constraint: The constraint that is specified between two relations and
maintains the correspondence between tuples in these relations. It means the reference from a
tuple in one relation to other relation must be valid.

Examples of Referential integrity constraint

In the Bank Database (From Data Modeling Note) : The ACCOUNT relation need to take note
the BRANCH where each account is held so in implementation, in each tuple of ACCOUNT

55
relation, there is an attribute such as branch name to identify the associate BRANCH . The
referential integrity constraint must state that the branch-name attribute in the ACCOUNT
relation refer to a valid branch (i.e. existing branch)

In the Company Database: In the EMPLOYEE relation, we need to store the information about
the department where each employee works in. The attribute Department in EMPLOYEE
relation is used for that purpose so the values appears in the columns Department must be taken
from set of values identify the Department number in the relation DEPARTMENT if this relation
exists.

Referential constraint in relational model relates to notation of foreign key.

A set of attributes FK in a relation schema R1 is foreign key if

The attributes in FK correspond to the attributes in the primary key of another relation schema
R2

The value for FK in each tuple of R1 either occur as values of primary key of a tuple in R2 or is
entirely NULL

In a database of many relations, there are usually many foreign keys. They provide the “glue”
that links individual relations into a cohesive database structure.

The following figure describes an example of referential integrity constraint in a part of the
BANK database

56
Figure 2: Referential integrity constraints in part of BANK database

4, Semantic constraints: This is a special kind of constraints that may have to be enforced
in relational database. Such constraints describe the semantics of data in the database or
sometimes called the rules on data. For example, in the COMPANY database, we have the rule
“An employee cannot take a part in more than 5 projects” or “Salary of an employee cannot
exceed the salary of the employee‟s manager”

3.6. Relational Database with Examples

Relations, keys, foreign keys and integrity constraints provide a complete toolkit for building
relational databases. A relational database consists of many relations and tuples in relations are
related in various ways. Here, we will define relational database schema and relational database
instance.

A relational database schema is:

 A set of relation schemas S = {R1, R2, … , Rn} , and


 A set of integrity constraints

A relational database instance is:

 A set of relations (relation states) {r1(R1) , r2(R2) , … , rn(Rn) } where all of the
integrity constraints are satisfied.

57
Constraint Checking

Relational database instance is changing over time. At a moment of time, we can have an
instance that satisfied all the constraints but when some update operations performed, we must
re-check the constraints. There are three basic update operations on relations: insert a new
record, delete an existing record and modify an existing record.

Consider the following relations ACCOUNT, BRANCH, CUSTOMER and ACCOUNT-


HOLDRE

ACCOUNT BRANCH
branch-Name balance account-Number branch-Name address assets
Hashanah 20000 C-12894349 Hashanah Hai Ba Trung 900000000
Dingo 20000 C-12894350 DongDo Dong Da 400000000
Dingo 3500 S-141510751 ThangLong Hoan Kiem 500000000
Hashanah 50000 S-520522620

CUSTOMER
customerNumber Name address homeBranch
111111 Anh Hai Ba Trung HaThanh
121314 Van Anh Hai Ba Trung Dong Do
515016 Son Hoan Kiem HaThanh

ACCOUNT-HOLDER
customerNumber accountNumber
111111 C-12894349
121314 C-12894350
121314 S-141510751
515016 S-520522620
111111 C-12894350

58
Domain constraint checking:

 For insert operation, we need to check attribute value for type and other domain
restrictions.
 For delete operation, it is no need to check any domain constraints
 For update operation, it is also needed to check attribute value for type and other domain
restrictions.

The following changes satisfy domain constraints

 Insert Account(HaThanh, 50000, S-20071280)


 Insert Account(HaThan, 20000, C-20072242) ( it looks ok but actually the data value is
not correct)
 Update Account(HaThanh, 50000, S-20071280) to Account(HaThanh, 60000, S-
20071280)

The changes that do not satisfy domain constraints:

 Insert Account(HaThanh, 5000USD, S-20071280)


 Insert Account(DongDo, -20, C-12894349)
 Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525)

Key constraint checking:

 For insert operation, it is need to check the key value does not occur in any existing tuple
in the relation.
 For delete operation, it is no need to check any domain key constraints
 For update operation, if the key value is modified then need the same check as for
insertion.

Changes that satisfy key constraints:

 Insert Account(DongDo, 20000, C-12894350) (there is no account with that account


number in the current relation)

59
 Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number
12334)
 Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525) (key is not modified)

Changes that do not satisfied key constraints:

 Insert Account(DongDo, 50000, C-12894350) (key is alredy there in the relation)


 Update Account(DongDo, 20000, C-12894350) to Account(DongDo, 20000, C-
12894349) ( the account C-12894349 is already in the relation)

Referential integrity constraint checking:

 For insert operation, it is need to check the foreign keys occur as primary keys in the
referenced relation.
 For delete operation, check all relations that have foreign keys referring to this relation
 An update need to treat as delete - then – insert for referential constraints checking.

Changes that satisfy referential constraint

 Insert Account(ThangLong, 5000, C-12891230)


 Insert Account-Holder(111111, C-12891230)
 Update Customer(515016, Son, Hoan Kiem, HaThanh) to Customer(515016, Son, Hoan
Kiem, ThangLong)
 Delete Account-Holder(111111, C-12894350)

Changes that do not satisfy referential constraint

 Insert Account-Holder(12334, C-12894350) ( no such customer)


 Insert Customer(222222, Nha, DongDa, An Binh) ( no such branch)
 Delete Customer with customerNumber = „111111‟ ( this is not acceptable since there
are tuples in Account-Holder relation refer to this customer)

Deletion can violate referential constraint when the tuple being deleted is referenced by the
foreign keys from others tuples in a different relation. Several approaches are considered to
handle this kind of violation.

60
 The first approach simply disallows the deletion.
 The second approach: User must find the referring tuple then either delete them
manually or change their foreign key to an acceptable value or NULL value ( not
possible if the foreign key also forms part of the primary key such as in the Account-
Holder relation)
 The third approach: attempt to remove all referring tuple automatically (cascade)

When the referential constraint is specified in the database during the creation phase, the DBMSs
will allow user to specify which of the above approach applies when a violation occur.

Section 2: Mapping ER Designs to Relational Schemas

Section objectives

After successful completion of this section, you will be able to:

 Transform ER diagrams to relational schemas


 Transform EER diagrams to relational schemas
 Genuinely decide how to decompose M-N relationships in to 1-N relationships

Introduction

One useful strategy of database design is: Firstly, performing data modeling using conceptual
model (ER model or Object-Oriented Model) then transform this conceptual to relational model.
This section will introduce some “rule” to map an ER diagram in to a relational database
schema.

2.1 Mapping or Transformation

 Mapping Regular Entity Types: An entity type E with atomic attributes A1, A2, ... An

maps to a relation R with attributes (columns) A1, A2, ... An . In addition, one of the

keys of the entity type E will be chosen as the primary key for R. If the chosen keys is

composite then the primary keys of R is made up of several attributes.

61
Example:

Figure 3: Mapping of regular Entity Types

 Composite attributes are mapped by concatenation or flattering. Concatenating means


we keep the composite attributes only, ignore the components of it. Flattering, on the
other hands, means including only the simple component attributes of a composite
attribute.

Example:

Figure 4: Mapping of Composite attribute

3.6.1. Multi-valued attributes are mapped by using additional relation schema. For each
multi-valued attribute A, create a new relation R. This relation will include attribute

62
corresponding to A plus the primary key K of the entity type or relationship type that has
A as attribute. Combination of A and K becomes primary key of R.

Example:

Figure 5: Mapping of multi-valued attributes

 Derived attribute: A is mapped to an ordinary attribute in a relation corresponding to


the entity type which has A as attribute.
 Mapping Weak Entities: For each weak entity type W with the strong entity type E,
create a relation R and include all attributes of W as attributes of R (attributes are
mapped as mentioned in the previous section). In addition, include in R also the primary
key K of the relation that corresponding to the strong entity E. Keys of the relation R is
the combination of K and the partial key of the weak entity type (if any)

Example: Consider the below entity relationship diagram, weak entity type DEPENDENT is
mapped into relation EMP-DEPENTDENT. Primary key of this relation is {EID, Name}
because Name can be considered as partial key of DEPENDENT.

63
Figure 6: Mapping of Week Entities

 Mapping 1-1 Binary Relationship Type: For each 1-1 Binary Relationship Type R,
identify relation A and B that correspond to the entity types participating in R. The
relationship itself can be mapped using one of the following approaches:
o Using foreign key: This is the most popular approach. Choose one of the
participating relation, say A ( usually one with total participation) ; include in A
as the foreign key the primary key of B. If R has attributes then put them in the
relation A as well.
o Merged relation: If both participations of two related entity type are total then
we can merge two entity types and the relationship into a relation.
o Defining relationship relation: Define a relation R to represent the
relationship. All attributes of relationship is included in R. In addition, put the
primary keys of two relations A , B into R. The primary key of R is combination
of primary keys of A and B.

Example: Consider the relationship Manages between EMPLOYEE and DEPARTMENT, in this
relationship the participation of DEPARTMENT is total.

64
Figure 7: Mapping of 1-1 Binary Relationship Type

 Mapping 1-N Binary Relationship Type: For each 1-N Binary Relationship Type R,
identify two relations A and B correspond to two entity types participating in R. A represents
the entity type at 1-side and B represents the entity type at N-side. Include the primary key
of A as the foreign key in B. This foreign key represents the relationship type R.

Example:

Figure 8: Mapping of 1-N Binary Relationship Type

 Mapping M-N Binary Relationship Type: For each M-N Binary Relationship Type

65
R, identify two relation A, B represent two entity type participating in R. Create a new
relation S to represent R. Include in S as foreign keys the primary keys of A and B and all
the simple attributes of R. The combination of primary keys of A and B will make the
primary key of S.

Example:

Figure 9: Mapping of M-N Binary Relationship Type

 Mapping Unary Relationship Type: For each Unary Relationship Type R, we identify
the relation A represents the entity type involves in this relationship. Include in this
relation one more time as foreign key the primary of itself. The two keys (primary key
and foreign key) are the same but they represents two entities of different roles relate to
this relationship.

Example

66
Figure 10: Mapping of Unary Relationship Type

 Mapping Ternary Relationship Type: For each n-ary ( > 2 ) Relationships create a
new relation to represent the relationship. The primary key of the new relation is the
combination of the primary keys of the participating entities that hold the N (many) side.
In most cases of an n-ary relationship all the participating entities hold a many side.

Figure 11: Mapping of Ternary Relationship Type

67
Section 4: Mapping EER Model Constructs to Relations

Section objectives

After successful completion of this UNIT, you will be able to:

 Map Specialization or Generalization


 Map Union Types (Categories)

4.1 Mapping Specialization or Generalization

We can use different options to map Specialization or Generalization.

Option A: Multiple relations-Superclass and subclasses

Option B: Multiple relations-Subclass relations only

Option C: Single relation with one type attribute

Option D: Single relation with multiple type attributes

Option A: Multiple relations-Superclass and subclasses

Create a relation L for C with attributes Attrs(L) = {k,a1,…an} and PK(L) = k. Create a relation
Li for each subclass Si, 1 < i < m, with the attributesAttrs(Li) = {k} U {attributes of Si} and
PK(Li)=k. This option works for any specialization (total or partial, disjoint of over-lapping).

Example

68
Option B: Multiple relations-Subclass relations only

Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) = {attributes of
Si} U {k,a1…,an} and PK(Li) = k. This option only works for a specialization whose subclasses
are total (every entity in the superclass must belong to (at least) one of the subclasses.

Example

69
Option C: Single relation with one type attribute

Create a single relation L with attributes Attrs(L) = {k,a1,…an} U {attributes of S1} U…U
{attributes of Sm} U {t} and PK(L) = k. The attribute t is called a type (or discriminating)
attribute that indicates the subclass to which each tuple belongs

Example

70
Option D: Single relation with multiple type attributes

Create a single relation schema L with attributes Attrs(L) = {k,a1,…an} U {attributes of S1}
U…U {attributes of Sm} U {t1, t2,…,tm} and PK(L) = k. Each ti, 1 < I < m, is a Boolean type
attribute indicating whether a tuple belongs to the subclass Si.

Example

4.2 Mapping of Union Types (Categories)

For mapping a category whose defining superclass have different keys, it is customary to specify
a new key attribute, called a surrogate key, when creating a relation to correspond to the category

71
In the example below we can create a relation OWNER to correspond to the OWNER category
and include any attributes of the category in this relation. The primary key of the OWNER
relation is the surrogate key, which we called OwnerId.

72
UNIT FOUR

RELATIONAL ALGEBRA

Unit objectives

At the end of this UNIT, you will be able to:

 Use relational algebra unary operations to retrieve data from the database
 Use relational algebra binary operations to retrieve data from the database
 Use relational algebra set operations to retrieve data from the database

 Understand the relation between relational algebra operations and SQL operations

Structures

Relational algebra

Relational algebra operations from set theory

Other relational algebra operations

Introduction

Relational Algebra (RA) can be viewed as a data manipulation language for relational model. It
consists of several basic operations which enable users to specify retrieval requests. RA is called
a procedural language in which users need to specify how to retrieve the expected data.

Relational Algebra has the following components:

 Operands: Relations or Variables that represent relations


 Operators that map relations to relations
 Rules for combining operands and operators to relational algebra expression

73
 Rules for evaluating those expressions

Operations of relational algebra include the followings:

 Union, Intersect, Set Difference, Cartesian Product are operations based on set theory
 Select, Project, Join, and Division are operations developed especially for relational
databases.

Figure 1: Example Database

Section 1: Relational Algebra Operations from Set Theory

Section objectives

After completing this section, you will be able to:

 Use different types of relational algebra operations from set theory


 Compare these operations with SQL operations

Definition: Two relations r(A1, A2, …, An) and s(B1, B2, …, Bn) are union compatible if they
have the same degree n and dom(Ai) = dom(Bi) for 1 ≤ i ≤ n.

This means two union compatible relations have the same number of attributes and each
corresponding pair of attributes have the same domain

74
1.1 UNION Operation

The UNION operation combines two union compatible relations into a single relation via set
union of sets of tuples.

 Notation: r1∪r2

Figure 2: Union Operation Notation

 r1∪r2={t∣t∈r1∨t∈r2{ where r1(R) and r2(R)


 Result size: ∣r1∪r2∣≤∣r1∣+∣r2∣
 Result schema: R
 Producing the result of UNION
o Make a copy of relation r1
o For each tuple t in relation r2, check whether it is in the result or not. If it is not
already in the result then place it there.
 Example:

Figure 3: Union Operation Example

1.2 INTERSECTION Operation

The INTERSECTION operation combines two union compatible relations into a single relation
via set intersection of sets of tuples.

75
 Notation: r1∩r2

Figure 4: Intersection Operation Notation

 r1∩r2={t∣t∈r1∧t∈r2{ where r1(R) and r2(R)


 Result size:

∣r1∩r2∣≤min(∣r1∣,∣r2∣)

 Result schema: R
 Producing the result of INTERSECTION
o Initially, result set is empty
o For each tuple t in relation r1, if t is in the relation r2 then place t in the result
set.

Example

Figure 5: Intersection Operation Example

1.3 SET DIFFERENCE Operation

The DIFFERENCE operation finds the set of tuples that exist in one relation but do not occur in
the other union compatible relation

 Notation: r1 \ r2

76
Figure 6: Difference Operation Notation

r1={t∣t∈r1∧t∉r2{

 where r1(R) and r2(R)


 Result schema: R
 Producing the result of the DIFFERENCE operation
o Initially, result set is empty
o For each tuple in r1, check whether it appear in r2 or not. If it does not then
place it in the result set. Otherwise, ignores it

Example

Figure 7: Difference Operation Example

1.4 CARTESIAN PRODUCT Operation

The PRODUCT operation combines information from two relations pair wise on tuples.

 Notation: r x s
 r×s={(t1,t2)∣t1∈r∧t2∈s { where r(R) and s(S)
 Each tuple in the result contains all attributes in r and s, possibly with some fields
renamed to avoid ambiguity. The result set contains all possible tuple that can be
constructed from one tuple in r and one tuple in s.
 Result schema: If we have R(A1, A2, …, An) and S(B1, B2, …, Bm) then the list of
attributes in Result is (A1, A2, …, An, B1, B2, …, Bm)

77
 Result size: ∣r×s∣=∣r∣∗∣s∣
 Producing the result of PRODUCT operation:
o For each tuple in r, form new tuples by pair it with each tuple in s
o Place all of these new tuples in the result set
 Example

Figure 8: Cartesian Product Operation Notation

Figure 9: Cartesian Product Operation Example

 Note: As we can notice, the CARTESIAN PRODUCT operation by itself is not a useful
querying mechanism since the result size is large. However, it is an extremely important
operation of relational algebra since it is the basic mechanism for combining information
across relations. We will discuss about this topic in more detail in Query Processing
lecture.

78
Section 2: Other Relational Algebra Operations

Section objectives

After this section, you will be able to:

 Use unary operations to retrieve data from the database


 Use other binary operations retrieve data from the database
 Compare and contrast relational algebra operations with SQL operations

2.1 SELECT Operation

The SELECT operation is a unary operation. It means the input of this operation is only one
relation and its output is also a relation.

The SELECT operation returns a subset of the tuples from a relation that satisfies a selection
condition. The SELECT operation can be viewed as a horizontal filter of the relation. It
partitions the input relation into two sets of tuples: those tuples that satisfy the condition are
selected; those tuples that do not satisfy the condition are discarded.

 Notation:

σselection−condition>(r)

Figure 10: Select Operation Notation

 σF(r)={t∣t∈r∧F(t) { where r(R) and F is a Boolean expression on attributes in R

The selection condition is made up of a number of clauses of the form

 <attribute name> <comparison op> <constant value> OR

79
 <attribute name 1> <comparison op> <attribute name 2>

In the clause, the comparison operations could be one of the following: ≤, ≥, ≠, =, >, < . Clauses
are connected by Boolean operators : and, or , not

 Result size ∣σF(r)∣≤∣r∣


 Result schema: R
 Producing the result of the SELECT operation
o Selection condition F is evaluate for each tuple in r, with the attribute variables
in F set to their values in the tuples
o Any tuple t that F(t) = true is placed in the result set
o Other tuples are not included in the result.
 Example: Retrieve the Id, Name, Suburb of students who live in Bundoora

Figure 11: Select Operation Example - 1

 Example: Retrieve the Id, Name, Suburb of student whose name is Mary or students who
live in Bundoora

Figure 12: Select Operation Example – 2

80
2.2 PROJECT Operation

The PROJECT operation is another unary operation. This operation returns a set of tuples
containing a subset of the attributes in the original relation. Thus, as we state that the SELECT
operation selects some rows and discards the others. The PROJECT operation, on the other hand,
selects some columns of the relation and discards the other column. The PROJECT operation
can be viewed as the vertical filter of the relation.

 Notation: πattribute−list(r)

Figure 13: Project Operation Notation

 πX(r)={t[X]∣t∈r { where r(R)


 Result size: ∣πX(r)∣≤∣r∣
 Result schema: R‟(X)
 Producing the result of PROJECT operation
o Take each tuple in the original relation, extract the values of the specified
attributes
o Form new tuple from these values and place the new tuple in the result if it is
not already there. This steps includes the duplicate removal phase, this makes
the result of PROJECT operation a relation

 Example: Retrieve the suburbs that are stored in database

Figure 14: Project Operation Example - 1

81
 Retrieve the name of the subjects and department which is responsible for the subject

Figure 15: Project Operation Example - 2

2.3 JOIN Operation

The JOIN operation is used to combine related tuples from two relations into a single tuple. The
Theta-JOIN is a specialized product containing only pairs that match on a supplied condition
called join-condition.

 Notation: r⊲⊳join−conditions

 r⊲⊳Cs={(t1,t2)∣t1∈r∧t2∈s∧C(t1,t2) { where r(R) , s(S)


 Similar to PRODUCT, each tuple in the result of JOIN operation contains all attributes
from two original relations. However, in this operation one tuple in R and one tuple in S
can be combined together to form a tuple in the result if the combination satisfies the
join condition.
 Join condition is of the form:

<condition> AND <condition> AND …AND <condition>

where <condition> is a comparison between one attribute in R and one attribute in S, provided
that these two attributes have the same domain.

 Result size: ∣r⊲⊳Cs∣≤∣r∣∗∣s∣


 Result schema: If we have R(A1, A2, …, An) and S(B1, B2, …, Bm) then the list of
attributes in Result is (A1, A2, …, An, B1, B2, …, Bm)
 Producing the result of JOIN operation:
o For each tuple in r, form new tuples by pair it with each tuple in s

82
o If the new tuple satisfies the specified condition, then place it in the result set.

Example:

Figure 16: Join Operation Example - 1

Figure 17: Join Operation Example - 2

Figure 18: Join Operation Example - 3

Variations of JOIN

EQUI-JOIN: A JOIN where the only comparison operator used in the join condition is “=” is
called EQUI-JOIN. The result of Equijoin always has one or more pairs of attributes that have
identical values in every tuple.

83
Example:

Figure 19: Natural Join Operation Example - 1

NATURAL JOIN: The Natural Join operation is a specialized product where the result tuple
contains only pairs of tuples that match on their common attributes with one of each pair of
common attributes is eliminated. The standard definition of Natural Join requires that the two
join attributes have the same name. Therefore, we can see that Natural Join is created to get rid
of the duplicate columns in an Equijoin.

 Notation: r * s
 Natural Join can be defined using other operation r∗s=πR∪S(σcondition(r×s))

where r(R) and s(S) and condition is Boolean expression (A1 = B1) AND (A2 = B2 ) AND …
AND (Ak = Bk) with Ai is the attribute in r , Bi is the attribute in s and (Ai, Bi) is a pair of
common attributes.

Producing the result of Natural Join

 For each tuple in relation r, compare common attributes with those in each tuple of s
 If two tuples match in their common attributes then combine tuples, remove duplicate
attributes and add to the result.

 Example: From the example of Equijoin, assume that the attribute list in s now is ( E,
F,B) instead of (E, F, G) then we can have the expression r * s .

84
Figure 20

 Retrieve the information of student who enrolls in at least one course.

Figure 21: Natural Join Operation Example - 2

2.4 DIVISION Operation

The Division Operation is defined on two relation r(U1) and s(U2) where U2 is the subset of U1
and s is not an empty relation: r÷s={t∣t∈r(U1−U2)∧satisfy { where satisfy=∀ts∈s(∃tr∈r(tr[U2]=ts
∧tr[U1−U2]=t ))

This means that for a tuple t to appear in the result of Division, the values in t must appear in r in
combination with every tuple in s.

The Division is very useful for a special kind of query such as “Retrieve the name of the student
who enrolls in all course teach by Professor Ba”

 Producing the result of the Division operation


o Consider each subset of tuple in r that match on t[U1 – U2]
o For this subset of tuples, take the values t[U2] from each. If this covers all tuples
in s then add t[U1 – U2] in the result.
 Example: Retrieve the name of subject that is taught in all courses

85
Figure 22: Division Operation Example

Data Manipulation with Relational Algebra Expression

Sample Database

In this session, we use the COMPANY database in the examples for illustrating the use of
Relational Algebra for answering several queries.

The relational database schema for the COMPANY database is specified as below

EMPLOYEE (EID, Name, Bdate, Address, Salary, DeptId)

DEPARTMENT(DeptId, Dname, Office, Mng-EID)

PROJECT(Code, Name, Budget, DeptId)

JOIN(EID, PCode, StartDate)

EMP-DEPENDENT(EID, Dependent-Name, Bdate, Relationship)

Sample Queries

Query 1: Find all employees whose salary is greater than 30.000

σsalary>30000(EMPLOYEE)

Query 2: Find the name, address of employees who works for department number 1

πName,Address(σDeptId=1(EMPLOYEE))

Query 3: Find the name of the department that employee John Smith works for.

86
πDname(σName='JohnSmith'(EMPLOYEE∗DEPARTMENT))

Query 4: Find the name, relationship of all the dependents of employees who works for
Department Human Resource

πDependent−Name,Relationship(σDName='HumanResource'
(EMP−DEPENDENT,EMPLOYEE, DEPARTMENT))

87
UNIT FIVE
INTRODUCTION TO SQL
Objectives

After successful completion of this unit, students will be able to:

 Define SQL
 Know DML and DDL well
 Use DDL statements to create, alter and drop objects in SQL
 Use DML statements to manipulate data in the database
 Use built-in functions in SQL
 Use aggregate functions in SQL

Structures

SQL definitions and statements

What is SQL?

SQL statements

SQL aggregate functions

SQL scalar functions

88
Section 1: SQL Definition and SQL statements

Section objectives

After completing this section, you will be able to:

 Define SQL
 Know DML and DDL well
 Use DDL statements to create, alter and drop objects in SQL
 Use DML statements to manipulate data in the database

1.1 What is SQL?

 SQL stands for Structured Query Language


 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard

SQL is a standard language for accessing and manipulating databases.

What Can SQL do?

 SQL can execute queries against a database


 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views

SQL is a Standard - BUT....

89
Although SQL is an ANSI (American National Standards Institute) standard, there are many
different versions of the SQL language.

However, to be compliant with the ANSI standard, they all support at least the major commands
(such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner.

Note: Most of the SQL database programs also have their own proprietary extensions in addition
to the SQL standard!

RDBMS

RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and
for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft
Access. The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.

Database Tables

A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data. Below is an example of a
table called "Persons":

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The table above contains three records (one for each person) and five columns (P_Id, LastName,
FirstName, Address, and City).

1.2. SQL Statements

Most of the actions you need to perform on a database are done with SQL statements.

The following SQL statement will select all the records in the "Persons" table:

90
SELECT * FROM Persons

Keep in Mind That SQL is not case sensitive

Should there be Semicolon after SQL Statements?

Some database systems require a semicolon at the end of each SQL statement.

Semicolon is the standard way to separate each SQL statement in database systems that allow
more than one SQL statement to be executed in the same call to the server.

We are using MS Access and SQL Server 2000 and we do not have to put a semicolon after each
SQL statement, but some database programs force you to use it.

1.2.1. SQL DML and DDL

SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data
Definition Language (DDL).

The query and update commands form the DML part of SQL:

 SELECT - extracts data from a database


 UPDATE - updates data in a database
 DELETE - deletes data from a database
 INSERT INTO - inserts new data into a database

The DDL part of SQL permits database tables to be created or deleted. It also define indexes
(keys), specify links between tables, and impose constraints between tables. The most important
DDL statements in SQL are:

 CREATE DATABASE - creates a new database


 ALTER DATABASE - modifies a database
 CREATE TABLE - creates a new table
 ALTER TABLE - modifies a table
 DROP TABLE - deletes a table
 CREATE INDEX - creates an index (search key)

91
 DROP INDEX - deletes an index

SQL SELECT Statement

The SELECT statement is used to select data from a database.

The result is stored in a result table, called the result-set.

SQL SELECT Syntax

SELECT * FROM
SELECT column_name(s)
And
table_name
FROM table_name

Note: SQL is not case sensitive. SELECT is the same as select.

SQL SELECT Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the content of the columns named "LastName" and "FirstName" from the
table above.

We use the following SELECT statement:

SELECT LastName,FirstName FROM Persons

The result-set will look like this:

LastName FirstName
Hansen Ola

92
Svendson Tove
Pettersen Kari

SELECT * Example

Now we want to select all the columns from the "Persons" table.

We use the following SELECT statement:

SELECT * FROM Persons

Tip: The asterisk (*) is a quick way of selecting all columns!

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The SQL SELECT DISTINCT Statement

In a table, some of the columns may contain duplicate values. This is not a problem, however,
sometimes you will want to list only the different (distinct) values in a table.

The DISTINCT keyword can be used to return only distinct (different) values.

SQL SELECT DISTINCT Syntax

SELECT DISTINCT column_name(s) FROM table_name

SELECT DISTINCT Example

The "Persons" table:

P_Id LastName FirstName Address City

93
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select only the distinct values from the column named "City" from the table
above.

We use the following SELECT statement: The result-set will look like this:

SELECT City
DISTINCT City Sandnes
FROM Persons Stavanger

The WHERE Clause

The WHERE clause is used to extract only those records that fulfill a specified criterion.

SQL WHERE Syntax

SELECT column_name(s)

FROM table_name

WHERE column_name operator value

WHERE Clause Example The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select only the persons living in the city "Sandnes" from the table above.

94
We use the following SELECT statement:

SELECT * FROM Persons

WHERE City='Sandnes'

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

Quotes around Text Fields

SQL uses single quotes around text values (most database systems will also accept double
quotes).

Although, numeric values should not be enclosed in quotes.

For text value For numeric values:

This is correct: This is correct:

SELECT * FROM Persons WHERE SELECT * FROM Persons WHERE Year=1965


FirstName='Tove'
This is wrong:
This is wrong:
SELECT * FROM Persons WHERE Year='1965'
SELECT * FROM Persons WHERE
FirstName=Tove

Operators Allowed in the WHERE Clause

95
With the WHERE clause, the following operators can be used:

Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN If you know the exact value you want to return for at least one of the columns

Note: In some versions of SQL the <> operator may be written as !=

The AND & OR operators are used to filter records based on more than one condition.

The AND & OR Operators

The AND operator displays a record if both the first condition and the second condition is true.

The OR operator displays a record if either the first condition or the second condition is true.

AND Operator Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select only the persons with the first name equal to "Tove" AND the last name
equal to "Svendso

96
We use the following SELECT statement: The result-set will look like this

SELECT * FROM Persons


P_Id LastName FirstName Address City
WHERE FirstName='Tove'
2 Svendson Tove Borgvn 23 Sandnes
AND LastName='Svendson'

The result-set will look like this:

OR Operator Example

Now we want to select only the persons with the first name equal to "Tove" OR the first name
equal to "Ola":

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


Timoteivn
WHERE FirstName='Tove' 1 Hansen Ola Sandnes
10

OR FirstName='Ola' 2 Svendson Tove Borgvn 23 Sandnes

Combining AND & OR

You can also combine AND and OR (use parenthesis to form complex expressions).

Now we want to select only the persons with the last name equal to "Svendson" AND the first
name equal to "Tove" OR to "Ola

97
We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


WHERE 2 Svendson Tove Borgvn 23 Sandnes

LastName='Svendson'

AND (FirstName='Tove' OR
FirstName='Ola')

The ORDER BY keyword is used to sort the result-set.

The ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set by a specified column.

The ORDER BY keyword sorts the records in ascending order by default. If you want to sort the
records in a descending order, you can use the DESC keyword.

SQL ORDER BY Syntax

SELECT column_name(s)

FROM table_name

ORDER BY column_name(s) ASC|DESC

ORDER BY Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

98
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger

Now we want to select all the persons from the table above, however, we want to sort the
persons by their last name.

We use the following SELECT statement: The result-set will look like this:

SELECT * P_I LastNam FirstNam


Address City
FROM Persons d e e
Timoteiv
ORDER BY 1 Hansen Ola Sandnes
n 10
LastName
Vingvn Stavang
4 Nilsen Tom
23 er
Stavang
3 Pettersen Kari Storgt 20
er
Borgvn
2 Svendson Tove Sandnes
23

ORDER BY DESC Example

Now we want to select all the persons from the table above, however, we want to sort the
persons descending by their last name.

We use the following SELECT statement:

SELECT * FROM Persons

ORDER BY LastName DESC

99
The result-set will look like this:

P_Id LastName FirstName Address City


2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger
1 Hansen Ola Timoteivn 10 Sandnes

The INSERT INTO Statement

The INSERT INTO statement is used to insert a new row in a table.

SQL INSERT INTO Syntax

It is possible to write the INSERT INTO statement in two forms.

The first form doesn't specify the column names where the data will be inserted, only their
values:

INSERT INTO table_name

VALUES (value1, value2, value3,...)

The second form specifies both the column names and the values to be inserted:

INSERT INTO table_name (column1, column2, column3,...)

VALUES (value1, value2, value3,...)

SQL INSERT INTO Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

100
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to insert a new row in the "Persons" table.

We use the following SQL statement:

INSERT INTO Persons

VALUES (4,'Nilsen', 'Johan', 'Bakken 2', 'Stavanger')

The "Persons" table will now look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger

Insert Data Only in Specified Columns

It is also possible to only add data in specific columns.

The following SQL statement will add a new row, but only add data in the "P_Id", "LastName"
and the "FirstName" column

INSERT INTO Persons (P_Id, LastName, FirstName)

VALUES (5, 'Tjessem', 'Jakob')

The "Persons" table will now look like this:

101
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob

The UPDATE Statement

The UPDATE statement is used to update existing records in a table.

SQL UPDATE Syntax

UPDATE table_name

SET column1=value, column2=value2,...

WHERE some_column=some_value

Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which
record or records that should be updated. If you omit the WHERE clause, all records will be
updated!

SQL UPDATE Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob

102
Now we want to update the person "Tjessem, Jakob" in the "Persons" table.

We use the following SQL statement:

UPDATE Persons

SET Address='Nissestien 67', City='Sandnes'

WHERE LastName='Tjessem' AND FirstName='Jakob'

The "Persons" table will now look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob Nissestien 67 Sandnes

SQL UPDATE Warning

Be careful when updating records. If we had omitted the WHERE clause in the example above,
like this

UPDATE Persons

SET Address='Nissestien 67', City='Sandnes'

The "Persons" table would have looked like this:

P_Id LastName FirstName Address City


1 Hansen Ola Nissestien 67 Sandnes
2 Svendson Tove Nissestien 67 Sandnes
3 Pettersen Kari Nissestien 67 Sandnes

103
4 Nilsen Johan Nissestien 67 Sandnes
5 Tjessem Jakob Nissestien 67 Sandnes

The DELETE Statement

The DELETE statement is used to delete rows in a table.

SQL DELETE Syntax

DELETE FROM table_name

WHERE some_column=some_value

Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies which
record or records that should be deleted. If you omit the WHERE clause, all records will be
deleted!

SQL DELETE Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob Nissestien 67 Sandnes

Now we want to delete the person "Tjessem, Jakob" in the "Persons" table.

We use the following SQL statement:

104
DELETE FROM Persons

WHERE LastName='Tjessem' AND FirstName='Jakob'

The "Persons" table will now look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger

Delete All Rows

It is possible to delete all rows in a table without deleting the table. This means that the table
structure, attributes, and indexes will be intact:

DELETE FROM table_name

or

DELETE * FROM table_name

Note: Be very careful when deleting records. You cannot undo this statement!

The TOP Clause

The TOP clause is used to specify the number of records to return.

The TOP clause can be very useful on large tables with thousands of records. Returning a large
number of records can impact on performance.

Note: Not all database systems support the TOP clause.

105
SQL Server Syntax

SELECT TOP number|percent column_name(s)

FROM table_name

SQL TOP Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger

Now we want to select only the two first records in the table above.

We use the following SELECT statement:

SELECT TOP 2 * FROM Persons

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

SQL TOP PERCENT Example

The "Persons" table:

106
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger

Now we want to select only 50% of the records in the table above.

We use the following SELECT statement:

SELECT TOP 50 PERCENT * FROM Persons

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
The LIKE Operator

The LIKE operator is used to search for a specified pattern in a column.

SQL LIKE Syntax

SELECT column_name(s)
FROM table_name
WHERE column_name
LIKE pattern

LIKE Operator Example

The "Persons" table:

107
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the persons living in a city that starts with "s" from the table above.

We use the following SELECT statement:

SELECT * FROM Persons

WHERE City LIKE 's%'

The "%" sign can be used to define wildcards (missing letters in the pattern) both before and
after the pattern.

The result-set will look like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Next, we want to select the persons living in a city that ends with an "s" from the "Persons"
table.

108
We use the following SELECT statement:

SELECT * FROM Persons WHERE City LIKE '%s'

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
The result-set will look like this:

Next, we want to select the persons living in a city that contains the pattern "tav" from the
"Persons" table.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


3 Pettersen Kari Storgt 20 Stavanger
WHERE City LIKE '%tav%'

It is also possible to select the persons living in a city that NOT contains the pattern "tav" from
the "Persons" table, by using the NOT keyword.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
WHERE City NOT LIKE
2 Svendson Tove Borgvn 23 Sandnes
'%tav%'

1.2.2. SQL Wildcards

SQL wildcards can substitute for one or more characters when searching for data in a database.

109
SQL wildcards must be used with the SQL LIKE operator.

With SQL, the following wildcards can be used:

Wildcard Description
% A substitute for zero or more characters
_ A substitute for exactly one character
[charlist] Any single character in charlist
[^charlist] or Any single character not in charlist

[!charlist]

SQL Wildcard Examples

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Using the % Wildcard

Now we want to select the persons living in a city that starts with "sa" from the "Persons" table.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
WHERE City LIKE 'sa%'
2 Svendson Tove Borgvn 23 Sandnes

Next, we want to select the persons living in a city that contains the pattern "nes" from the
"Persons" table.

110
We use the following SELECT statement:
The result-set will look like this:
SELECT * FROM Persons
P_Id LastName FirstName Address City
WHERE City LIKE '%nes%' 1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes

Using the _ Wildcard

Now we want to select the persons with a first name that starts with any character, followed by
"la" from the "Persons" table.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_I LastNam FirstNam


Address City
d e e
WHERE FirstName LIKE '_la'
Timoteiv Sandne
1 Hansen Ola
n 10 s

Next, we want to select the persons with a last name that starts with "S", followed by any
character, followed by "end", followed by any character, followed by "on" from the "Persons"
table.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


2 Svendson Tove Borgvn 23 Sandnes
WHERE LastName LIKE
'S_end_on'

Using the [charlist] Wildcard

111
Now we want to select the persons with a last name that starts with "b" or "s" or "p" from the
"Persons" table.

We use the following SELECT statement: P_Id LastName FirstName Address City
2 Svendson Tove Borgvn 23 Sandnes
SELECT * FROM Persons
3 Pettersen Kari Storgt 20 Stavanger

WHERE LastName LIKE


'[bsp]%'

The result-set will look like this:

Next, we want to select the persons with a last name that do not start with "b" or "s" or "p" from
the "Persons" table.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_I LastNam FirstNam


Address City
d e e
WHERE LastName LIKE '[!bsp]%'
Timoteiv Sandne
1 Hansen Ola
n 10 s

The IN Operator

The IN operator allows you to specify multiple values in a WHERE clause.

SQL IN Syntax

SELECT column_name(s)

FROM table_name

WHERE column_name IN (value1,value2,...)

IN Operator Example

The "Persons" table:

112
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the persons with a last name equal to "Hansen" or "Pettersen" from the
table above.

We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_Id LastName FirstName Address City


Timoteivn
WHERE LastName IN 1 Hansen Ola Sandnes
10
('Hansen','Pettersen')
3 Pettersen Kari Storgt 20 Stavanger

BETWEEN Operator Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the persons with a last name alphabetically between "Hansen" and
"Pettersen" from the table above.

113
We use the following SELECT statement: The result-set will look like this:

SELECT * FROM Persons P_I LastNam FirstNam


Address City
d e e
WHERE LastName
Timoteiv Sandne
1 Hansen Ola
n 10 s
BETWEEN 'Hansen' AND 'Pettersen'

Note: The BETWEEN operator is treated differently in different databases.

In some databases a person with the LastName of "Hansen" or "Pettersen" will not be listed
(BETWEEN only selects fields that are between and excluding the test values).

In other databases a person with the last name of "Hansen" or "Pettersen" will be listed
(BETWEEN selects fields that are between and including the test values).

And in other databases a person with the last name of "Hansen" will be listed, but "Pettersen"
will not be listed (BETWEEN selects fields between the test values, including the first test value
and excluding the last test value).

Therefore: Check how your database treats the BETWEEN operator.

Example 2

To display the persons outside the range in the previous example, use NOT BETWEEN:

SELECT * FROM Persons

WHERE LastName

NOT BETWEEN 'Hansen' AND 'Pettersen'

The result-set will look like this:

114
P_Id LastName FirstName Address City
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

1.2.3. SQL Alias

You can give a table or a column another name by using an alias. This can be a good thing to do
if you have very long or complex table names or column names.

An alias name could be anything, but usually it is short.

SQL Alias Syntax for Tables

SELECT column_name(s)

FROM table_name

AS alias_name

SQL Alias Syntax for Columns

SELECT column_name AS alias_name

FROM table_name

Alias Example

Assume we have a table called "Persons" and another table called "Product_Orders". We will
give the table aliases of "p" an "po" respectively.

Now we want to list all the orders that "Ola Hansen" is responsible for.

We use the following SELECT statement:

SELECT po.OrderID, p.LastName, p.FirstName

115
FROM Persons AS p,

Product_Orders AS po

WHERE p.LastName='Hansen'

WHERE p.FirstName='Ola'

The same SELECT statement without aliases:

SELECT Product_Orders.OrderID, Persons.LastName, Persons.FirstName

FROM Persons,

Product_Orders

WHERE Persons.LastName='Hansen'

WHERE Persons.FirstName='Ola'
As you'll see from the two SELECT statements above; aliases can make queries easier to both
write and to read.

1.2.4. SQL JOIN

The JOIN keyword is used in an SQL statement to query data from two or more tables, based on
a relationship between certain columns in these tables.

Tables in a database are often related to each other with keys.

A primary key is a column (or a combination of columns) with a unique value for each row.
Each primary key value must be unique within the table. The purpose is to bind data together,
across tables, without repeating all of the data in every table.

Look at the "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes

116
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Note that the "P_Id" column is the primary key in the "Persons" table. This means that no two
rows can have the same P_Id. The P_Id distinguishes two persons even if they have the same
name.

Next, we have the "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id" column
refers to the persons in the "Persons" table without using their names.

Notice that the relationship between the two tables above is the "P_Id" column.

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and the
differences between them.

 JOIN: Return rows when there is at least one match in both tables
 LEFT JOIN: Return all rows from the left table, even if there are no matches in the
right table
 RIGHT JOIN: Return all rows from the right table, even if there are no matches in the
left table
 FULL JOIN: Return rows when there is a match in one of the tables

117
SQL INNER JOIN Keyword

The INNER JOIN keyword return rows when there is at least one match in both tables.

SQL INNER JOIN Syntax

SELECT column_name(s)

FROM table_name1

INNER JOIN table_name2

ON table_name1.column_name=table_name2.column_name

PS: INNER JOIN is the same as JOIN.

SQL INNER JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Now we want to list all the persons with any orders.

118
We use the following SELECT statement:

SELECT Persons.LastName, Persons.FirstName, Orders.OrderNo

FROM Persons

INNER JOIN Orders

ON Persons.P_Id=Orders.P_Id

ORDER BY Persons.LastName

The result-set will look like this:

LastName FirstName OrderNo


Hansen Ola 22456
Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678
The INNER JOIN keyword return rows when there is at least one match in both tables. If there
are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.

SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no
matches in the right table (table_name2).

SQL LEFT JOIN Syntax

SELECT column_name(s)

FROM table_name1

LEFT JOIN table_name2

ON table_name1.column_name=table_name2.column_name

119
PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.

SQL LEFT JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Now we want to list all the persons and their orders - if any, from the tables above.

We use the following SELECT statement:


The result-set will look like this:
SELECT Persons.LastName,
Persons.FirstName, Orders.OrderNo LastName FirstName OrderNo
Hansen Ola 22456
FROM Persons Hansen Ola 24562
Pettersen Kari 77895
LEFT JOIN Orders
Pettersen Kari 44678
ON Persons.P_Id=Orders.P_Id Svendson Tove

ORDER BY Persons.LastName

120
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no
matches in the right table (Orders).

SQL RIGHT JOIN Keyword

The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if there are
no matches in the left table (table_name1).

SQL RIGHT JOIN Syntax

SELECT column_name(s)

FROM table_name1

RIGHT JOIN table_name2

ON table_name1.column_name=table_name2.column_name

PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

SQL RIGHT JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1

121
4 24562 1
5 34764 15

Now we want to list all the orders with containing persons - if any, from the tables above.

We use the following SELECT statement: The result-set will look like this:

SELECT Persons.LastName, LastName FirstName OrderNo


Persons.FirstName, Orders.OrderNo Hansen Ola 22456
Hansen Ola 24562
FROM Persons
Pettersen Kari 77895
RIGHT JOIN Orders Pettersen Kari 44678
34764
ON Persons.P_Id=Orders.P_Id

ORDER BY Persons.LastName

The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no
matches in the left table (Persons).

SQL FULL JOIN Keyword

The FULL JOIN keyword return rows when there is a match in one of the tables.

SQL FULL JOIN Syntax

SELECT column_name(s)

FROM table_name1

FULL JOIN table_name2

ON table_name1.column_name=table_name2.column_name

122
SQL FULL JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Now we want to list all the persons and their orders, and all the orders with their persons.

We use the following SELECT statement: The result-set will look like this:

SELECT Persons.LastName, LastName FirstName OrderNo


Persons.FirstName, Orders.OrderNo Hansen Ola 22456
Hansen Ola 24562
FROM Persons
Pettersen Kari 77895
FULL JOIN Orders Pettersen Kari 44678
Svendson Tove
ON Persons.P_Id=Orders.P_Id
34764

ORDER BY Persons.LastName

123
The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from
the right table (Orders). If there are rows in "Persons" that do not have matches in "Orders” or if
there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as
well.

The SQL UNION Operator

The UNION operator is used to combine the result-set of two or more SELECT statements.

Notice that each SELECT statement within the UNION must have the same number of columns.
The columns must also have similar data types. Also, the columns in each SELECT statement
must be in the same order.

SQL UNION Syntax

SELECT column_name(s) FROM table_name1

UNION

SELECT column_name(s) FROM table_name2

Note: The UNION operator selects only distinct values by default. To allow duplicate values,
use UNION ALL.

SQL UNION ALL Syntax

SELECT column_name(s) FROM table_name1

UNION ALL

SELECT column_name(s) FROM table_name2

PS: The column names in the result-set of a UNION are always equal to the column names in
the first SELECT statement in the UNION.

124
SQL UNION Example

Look at the following tables:

"Employees_Norway":

E_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari

"Employees_USA":

E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
04 Scott, Stephen

Now we want to list all the different employees in Norway and USA.

We use the following SELECT statement:


The result-set will look like this:
SELECT E_Name FROM
Employees_Norway E_Name
Hansen, Ola
UNION Svendson, Tove
Svendson, Stephen
SELECT E_Name FROM
Employees_USA Pettersen, Kari
Turner, Sally
Kent, Clark
Scott, Stephen

125
Note: This command cannot be used to list all employees in Norway and USA. In the example
above we have two employees with equal names, and only one of them will be listed. The
UNION command selects only distinct values.

SQL UNION ALL Example

Now we want to list all employees in Norway and USA:

SELECT E_Name FROM Employees_Norway

UNION ALL

SELECT E_Name FROM Employees_USA

Result

E_Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Svendson, Stephen
Scott, Stephen

The SQL SELECT INTO Statement

The SELECT INTO statement selects data from one table and inserts it into a different table.

The SELECT INTO statement is most often used to create backup copies of tables.

SQL SELECT INTO Syntax

126
We can select all columns into the new Or we can select only the columns we want
table: into the new table:

SELECT * SELECT column_name(s)

INTO new_table_name [IN INTO new_table_name [IN


externaldatabase] externaldatabase]

FROM old_tablename FROM old_tablename

SQL SELECT INTO Example

Make a Backup Copy - Now we want to make an exact copy of the data in our "Persons" table.

We use the following SQL statement: We can also use the IN clause to copy the
table into another database:
SELECT *
SELECT *
INTO Persons_Backup
INTO Persons_Backup IN
FROM Persons 'Backup.mdb'

FROM Persons

127
We can also copy only a few fields into the new table:

SELECT LastName,FirstName

INTO Persons_Backup

FROM Persons

SQL SELECT INTO - With a WHERE Clause

We can also add a WHERE clause.

The following SQL statement creates a "Persons_Backup" table with only the persons who lives
in the city "Sandnes":

SELECT LastName,Firstname

INTO Persons_Backup

FROM Persons

WHERE City='Sandnes'

SQL SELECT INTO - Joined Tables

Selecting data from more than one table is also possible.

The following example creates a "Persons_Order_Backup" table contains data from the two
tables "Persons" and "Orders":

128
SELECT Persons.LastName,Orders.OrderNo

INTO Persons_Order_Backup

FROM Persons

INNER JOIN Orders

ON Persons.P_Id=Orders.P_Id

The CREATE DATABASE Statement

The CREATE DATABASE statement is used to create a database.

SQL CREATE DATABASE Syntax

CREATE DATABASE database_name

CREATE DATABASE Example

Now we want to create a database called "my_db".

We use the following CREATE DATABASE statement:

CREATE DATABASE my_db

Database tables can be added with the CREATE TABLE statement.

The CREATE TABLE Statement

The CREATE TABLE statement is used to create a table in a database.

129
SQL CREATE TABLE Syntax

CREATE TABLE table_name

column_name1 data_type,

column_name2 data_type,

column_name3 data_type,)

The data type specifies what type of data the column can hold. For a complete reference of all
the data types available in MS Access, MySQL, and SQL Server, go to our complete

CREATE TABLE Example

Now we want to create a table called "Persons" that contains five columns: P_Id, LastName,
FirstName, Address, and City.

We use the following CREATE TABLE statement:

CREATE TABLE Persons (

P_Id int,

LastName varchar(255),

FirstName varchar(255),

Address varchar(255),

City varchar(255) );

130
The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and
City columns are of type varchar with a maximum length of 255 characters.

The empty "Persons" table will now look like this:

P_Id LastName FirstName Address City

The empty table can be filled with data with the INSERT INTO statement.

SQL Constraints

Constraints are used to limit the type of data that can go into a table.

Constraints can be specified when a table is created (with the CREATE TABLE statement) or
after the table is created (with the ALTER TABLE statement).

We will focus on the following constraints:

 NOT NULL
 UNIQUE
 PRIMARY KEY
 FOREIGN KEY
 CHECK
 DEFAULT

SQL NOT NULL Constraint

The NOT NULL constraint enforces a column to NOT accept NULL values.

The NOT NULL constraint enforces a field to always contain a value. This means that you
cannot insert a new record, or update a record without adding a value to this field.

The following SQL enforces the "P_Id" column and the "LastName" column to not accept
NULL values:

131
CREATE TABLE Persons (

P_Id int NOT NULL,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255) );

SQL UNIQUE Constraint

The UNIQUE constraint uniquely identifies each record in a database table.

The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a
column or set of columns.

A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.

Note that you can have have many UNIQUE constraints per table, but only one PRIMARY KEY
constraint per table.

SQL UNIQUE Constraint on CREATE TABLE

The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons"
table is created:

SQL Server:

CREATE TABLE Persons

P_Id int NOT NULL UNIQUE,

132
LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255)

To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple
columns, use the following SQL syntax:

SQL Server:

CREATE TABLE Persons

P_Id int NOT NULL,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255),

CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)

133
SQL UNIQUE Constraint on ALTER TABLE

To create a UNIQUE constraint on the "P_Id" column when the table is already created, use the
following SQL:

SQL Server

ALTER TABLE Persons

ADD UNIQUE (P_Id)

To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple
columns, use the following SQL syntax:

SQL Server

ALTER TABLE Persons

ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)

To DROP a UNIQUE Constraint

To drop a UNIQUE constraint, use the following SQL:

SQL Server:

ALTER TABLE Persons

DROP CONSTRAINT uc_PersonID

SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.

Primary keys must contain unique values.

A primary key column cannot contain NULL values.

134
Each table should have a primary key, and each table can have only one primary key.

SQL PRIMARY KEY Constraint on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is
created:

SQL Server

CREATE TABLE Persons

P_Id int NOT NULL PRIMARY KEY,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255)

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint
on multiple columns, use the following SQL syntax:

SQL Server

CREATE TABLE Persons

P_Id int NOT NULL,

LastName varchar(255) NOT NULL,

135
FirstName varchar(255),

Address varchar(255),

City varchar(255),

CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)

SQL PRIMARY KEY Constraint on ALTER TABLE

To create a PRIMARY KEY constraint on the "P_Id" column when the table is already created,
use the following SQL:

SQL Server

ALTER TABLE Persons

ADD PRIMARY KEY (P_Id)

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint
on multiple columns, use the following SQL syntax:

SQL Server

ALTER TABLE Persons

ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)

Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s)
must already have been declared to not contain NULL values (when the table was first created).

To DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:

SQL Server

136
ALTER TABLE Persons

DROP CONSTRAINT pk_PersonID

SQL FOREIGN KEY Constraint

A FOREIGN KEY in one table points to a PRIMARY KEY in another table.

Let's illustrate the foreign key with an example. Look at the following two tables:

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 2
4 24562 1

Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons"
table.

The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.

The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.

The FOREIGN KEY constraint is used to prevent actions that would destroy link between tables.

The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign key
column, because it has to be one of the values contained in the table it points to.

137
SQL FOREIGN KEY Constraint on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is
created:

SQL Server

CREATE TABLE Orders

O_Id int NOT NULL PRIMARY KEY,

OrderNo int NOT NULL,

P_Id int FOREIGN KEY REFERENCES Persons(P_Id)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint
on multiple columns, use the following SQL syntax:

SQL Server

CREATE TABLE Orders

O_Id int NOT NULL,

OrderNo int NOT NULL,

P_Id int,

PRIMARY KEY (O_Id),

CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)

138
REFERENCES Persons(P_Id)

SQL FOREIGN KEY Constraint on ALTER TABLE

To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already
created, use the following SQL:

SQL Server

ALTER TABLE Orders

ADD FOREIGN KEY (P_Id)

REFERENCES Persons(P_Id)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint
on multiple columns, use the following SQL syntax:

SQL Server

ALTER TABLE Orders

ADD CONSTRAINT fk_PerOrders

FOREIGN KEY (P_Id)

REFERENCES Persons(P_Id)

To DROP a FOREIGN KEY Constraint

To drop a FOREIGN KEY constraint, use the following SQL:

139
SQL Server

ALTER TABLE Orders

DROP CONSTRAINT fk_PerOrders

SQL CHECK Constraint

The CHECK constraint is used to limit the value range that can be placed in a column.

If you define a CHECK constraint on a single column it allows only certain values for this
column.

If you define a CHECK constraint on a table it can limit the values in certain columns based on
values in other columns in the row.

SQL CHECK Constraint on CREATE TABLE

The following SQL creates a CHECK constraint on the "P_Id" column when the "Persons" table
is created. The CHECK constraint specifies that the column "P_Id" must only include integers
greater than 0.

SQL Server:

CREATE TABLE Persons (

P_Id int NOT NULL CHECK (P_Id>0),

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255)

140
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple
columns, use the following SQL syntax:

SQL Server:

CREATE TABLE Persons (

P_Id int NOT NULL,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255),

CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')

SQL CHECK Constraint on ALTER TABLE

To create a CHECK constraint on the "P_Id" column when the table is already created, use the
following SQL:

SQL Server

ALTER TABLE Persons

ADD CHECK (P_Id>0)

To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple
columns, use the following SQL syntax:

SQL Server:

141
ALTER TABLE Persons

ADD CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')

To DROP a CHECK Constraint

To drop a CHECK constraint, use the following SQL:

SQL Server:

ALTER TABLE Persons

DROP CONSTRAINT chk_Person

SQL DEFAULT Constraint

The DEFAULT constraint is used to insert a default value into a column.

The default value will be added to all new records, if no other value is specified.

SQL DEFAULT Constraint on CREATE TABLE

The following SQL creates a DEFAULT constraint on the "City" column when the "Persons"
table is created:

SQL Server:

CREATE TABLE Persons (

P_Id int NOT NULL,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

142
City varchar(255) DEFAULT 'Sandnes'

The DEFAULT constraint can also be used to insert system values, by using functions like
GETDATE():

CREATE TABLE Orders (

O_Id int NOT NULL,

OrderNo int NOT NULL,

P_Id int,

OrderDate date DEFAULT GETDATE()

SQL DEFAULT Constraint on ALTER TABLE

To create a DEFAULT constraint on the "City" column when the table is already created, use the
following SQL:

SQL Server:

ALTER TABLE Persons

ALTER COLUMN City SET DEFAULT 'SANDNES'

To DROP a DEFAULT Constraint

To drop a DEFAULT constraint, use the following SQL:

SQL Server:

143
ALTER TABLE Persons

ALTER COLUMN City DROP DEFAULT

The CREATE INDEX statement is used to create indexes in tables.

Indexes allow the database application to find data fast; without reading the whole table.

Indexes

An index can be created in a table to find data more quickly and efficiently.

The users cannot see the indexes, they are just used to speed up searches/queries.

Note: Updating a table with indexes takes more time than updating a table without (because the
indexes also need an update). So you should only create indexes on columns (and tables) that
will be frequently searched against.

SQL CREATE INDEX Syntax

Creates an index on a table. Duplicate values are allowed:

CREATE INDEX index_name

ON table_name (column_name)

SQL CREATE UNIQUE INDEX Syntax

Creates a unique index on a table. Duplicate values are not allowed:

CREATE UNIQUE INDEX index_name

ON table_name (column_name)

Note: The syntax for creating indexes varies amongst different databases. Therefore: Check the
syntax for creating indexes in your database.

CREATE INDEX Example

144
The SQL statement below creates an index named "PIndex" on the "LastName" column in the
"Persons" table:

CREATE INDEX PIndex

ON Persons (LastName)

If you want to create an index on a combination of columns, you can list the column names
within the parentheses, separated by commas:

CREATE INDEX PIndex

ON Persons (LastName, FirstName)

The DROP INDEX Statement

The DROP INDEX statement is used to delete an index in a table.

DROP INDEX Syntax for MS Access:

DROP INDEX index_name ON table_name

DROP INDEX Syntax for MS SQL Server:

DROP INDEX table_name.index_name

DROP INDEX Syntax for DB2/Oracle:

DROP INDEX index_name

DROP INDEX Syntax for MySQL:

ALTER TABLE table_name DROP INDEX index_name

The DROP TABLE Statement

145
The DROP TABLE statement is used to delete a table.

DROP TABLE table_name

The DROP DATABASE Statement

The DROP DATABASE statement is used to delete a database.

DROP DATABASE database_name

The TRUNCATE TABLE Statement

What if we only want to delete the data inside the table, and not the table itself?

Then, use the TRUNCATE TABLE statement:

TRUNCATE TABLE table_name

The ALTER TABLE Statement

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

SQL ALTER TABLE Syntax

To add a column in a table, use the following syntax:

ALTER TABLE table_name

ADD column_name datatype

To delete a column in a table, use the following syntax (notice that some database systems don't
allow deleting a column):

146
ALTER TABLE table_name

DROP COLUMN column_name

To change the data type of a column in a table, use the following syntax:

ALTER TABLE table_name

ALTER COLUMN column_name datatype

SQL ALTER TABLE Example

Look at the "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to add a column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

ALTER TABLE Persons

ADD DateOfBirth date

Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data
type specifies what type of data the column can hold. For a complete reference of all the data
types available in MS Access, MySQL, and SQL Server, go to our complete

The "Persons" table will now like this:

P_Id LastName FirstName Address City DateOfBirth

147
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Change Data Type Example

Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:

ALTER TABLE Persons

ALTER COLUMN DateOfBirth year

Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two-
digit or four-digit format.

DROP COLUMN Example

Next, we want to delete the column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

ALTER TABLE Persons

DROP COLUMN DateOfBirth

The "Persons" table will now like this:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

148
AUTO INCREMENT a Field

Very often we would like the value of the primary key field to be created automatically every
time a new record is inserted.

We would like to create an auto-increment field in a table.

Syntax for SQL Server

The following SQL statement defines the "P_Id" column to be an auto-increment primary key
field in the "Persons" table:

CREATE TABLE Persons

P_Id int PRIMARY KEY IDENTITY,

LastName varchar(255) NOT NULL,

FirstName varchar(255),

Address varchar(255),

City varchar(255)

The MS SQL Server uses the IDENTITY keyword to perform an auto-increment feature.

By default, the starting value for IDENTITY is 1, and it will increment by 1 for each new record.

To specify that the "P_Id" column should start at value 10 and increment by 5, change the
identity to IDENTITY(10,5).

To insert a new record into the "Persons" table, we will not have to specify a value for the "P_Id"
column (a unique value will be added automatically):

149
INSERT INTO Persons (FirstName,LastName)

VALUES ('Lars','Monsen')

The SQL statement above would insert a new record into the "Persons" table. The "P_Id"
column would be assigned a unique value. The "FirstName" column would be set to "Lars" and
the "LastName" column would be set to "Monsen".

SQL CREATE VIEW Statement

In SQL, a view is a virtual table based on the result-set of an SQL statement.

A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database.

You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if
the data were coming from one single table.

SQL CREATE VIEW Syntax

CREATE VIEW view_name AS

SELECT column_name(s)

FROM table_name

WHERE condition

Note: A view always shows up-to-date data! The database engine recreates the data, using the
view's SQL statement, every time a user queries a view.

SQL CREATE VIEW Examples

If you have the Northwind database you can see that it has several views installed by default.

The view "Current Product List" lists all active products (products that are not discontinued)
from the "Products" table. The view is created with the following SQL:

150
CREATE VIEW [Current Product List] AS

SELECT ProductID,ProductName

FROM Products

WHERE Discontinued=No

We can query the view above as follows:

SELECT * FROM [Current Product List]

Another view in the Northwind sample database selects every product in the "Products" table
with a unit price higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS

SELECT ProductName,UnitPrice

FROM Products

WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)

We can query the view above as follows:

SELECT * FROM [Products Above Average Price]

Another view in the Northwind database calculates the total sale for each category in 1997. Note
that this view selects its data from another view called "Product Sales for 1997":

CREATE VIEW [Category Sales For 1997] AS

SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales

151
FROM [Product Sales for 1997]

GROUP BY CategoryName

We can query the view above as follows:

SELECT * FROM [Category Sales For 1997]

We can also add a condition to the query. Now we want to see the total sale only for the category
"Beverages":

SELECT * FROM [Category Sales For 1997]

WHERE CategoryName='Beverages'

SQL Updating a View

You can update a view by using the following syntax:

SQL CREATE OR REPLACE VIEW Syntax

CREATE OR REPLACE VIEW view_name AS

SELECT column_name(s)

FROM table_name

WHERE condition

Now we want to add the "Category" column to the "Current Product List" view. We will update
the view with the following SQL:

CREATE VIEW [Current Product List] AS

152
SELECT ProductID,ProductName,Category

FROM Products

WHERE Discontinued=No

SQL Dropping a View

You can delete a view with the DROP VIEW command.

SQL DROP VIEW Syntax

DROP VIEW view_name

SQL Dates

The most difficult part when working with dates is to be sure that the format of the date you are
trying to insert, matches the format of the date column in the database.

As long as your data contains only the date portion, your queries will work as expected.
However, if a time portion is involved, it gets complicated.

Before talking about the complications of querying for dates, we will look at the most important
built-in functions for working with dates.

SQL Server Date Functions

The following table lists the most important built-in date functions in SQL Server:

Function Description
GETDATE() Returns the current date and time
DATEPART() Returns a single part of a date/time
DATEADD() Adds or subtracts a specified time interval from a date
DATEDIFF() Returns the time between two dates

153
CONVERT() Displays date/time data in different formats

1.2.5 SQL Date Data Types

SQL Server comes with the following data types for storing a date or a date/time value in the
database:

 DATE - format YYYY-MM-DD


 DATETIME - format: YYYY-MM-DD HH:MM:SS
 SMALLDATETIME - format: YYYY-MM-DD HH:MM:SS
 TIMESTAMP - format: a unique number

Note: The date types are chosen for a column when you create a new table in your database!

SQL Working with Dates

You can compare two dates easily if there is no time component involved!

Assume we have the following "Orders" table:

OrderId ProductName OrderDate


1 Geitost 2008-11-11
2 Camembert Pierrot 2008-11-09
3 Mozzarella di Giovanni 2008-11-11
4 Mascarpone Fabioli 2008-10-29

Now we want to select the records with an OrderDate of "2008-11-11" from the table above.

We use the following SELECT statement:

SELECT * FROM Orders WHERE OrderDate='2008-11-11'

The result-set will look like this:

154
OrderId ProductName OrderDate
1 Geitost 2008-11-11
3 Mozzarella di Giovanni 2008-11-11

Now, assume that the "Orders" table looks like this (notice the time component in the
"OrderDate" column):

OrderId ProductName OrderDate


1 Geitost 2008-11-11 13:23:44
2 Camembert Pierrot 2008-11-09 15:45:21
3 Mozzarella di Giovanni 2008-11-11 11:12:01
4 Mascarpone Fabioli 2008-10-29 14:56:59

If we use the same SELECT statement as above:

SELECT * FROM Orders WHERE OrderDate='2008-11-11'

we will get no result! This is because the query is looking only for dates with no time portion.

Tip: If you want to keep your queries simple and easy to maintain, do not allow time
components in your dates!

SQL NULL Values

If a column in a table is optional, we can insert a new record or update an existing record without
adding a value to this column. This means that the field will be saved with a NULL value.

NULL values are treated differently from other values. NULL is used as a placeholder for
unknown or inapplicable values.

Note: It is not possible to compare NULL and 0; they are not equivalent.

SQL Working with NULL Values

Look at the following "Persons" table:

155
P_Id LastName FirstName Address City
1 Hansen Ola Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Stavanger

Suppose that the "Address" column in the "Persons" table is optional. This means that if we
insert a record with no value for the "Address" column, the "Address" column will be saved with
a NULL value.

How can we test for NULL values?

It is not possible to test for NULL values with comparison operators, such as =, <, or <>.

We will have to use the IS NULL and IS NOT NULL operators instead.

SQL IS NULL

How do we select only the records with NULL values in the "Address" column?

We will have to use the IS NULL operator:

SELECT LastName,FirstName,Address FROM Persons

WHERE Address IS NULL

The result-set will look like this:

LastName FirstName Address


Hansen Ola
Pettersen Kari

Tip: Always use IS NULL to look for NULL values.

SQL IS NOT NULL

How do we select only the records with no NULL values in the "Address" column?

156
We will have to use the IS NOT NULL operator:

SELECT LastName,FirstName,Address FROM Persons

WHERE Address IS NOT NULL

The result-set will look like this:

LastName FirstName Address


Svendson Tove Borgvn 23

In the next portion, we will look at the ISNULL(), NVL(), IFNULL() and COALESCE()
functions.

SQL ISNULL(), NVL(), IFNULL() and COALESCE() Functions

Look at the following "Products" table:

P_Id ProductName UnitPrice UnitsInStock UnitsOnOrder


1 Jarlsberg 10.45 16 15
2 Mascarpone 32.56 23
3 Gorgonzola 15.67 9 20

Suppose that the "UnitsOnOrder" column is optional, and may contain NULL values.

We have the following SELECT statement:

SELECT ProductName,UnitPrice*(UnitsInStock+UnitsOnOrder)

FROM Products

In the example above, if any of the "UnitsOnOrder" values are NULL, the result is NULL.

Microsoft's ISNULL() function is used to specify how we want to treat NULL values.

157
The NVL(), IFNULL(), and COALESCE() functions can also be used to achieve the same result.

In this case we want NULL values to be zero.

Below, if "UnitsOnOrder" is NULL it will not harm the calculation, because ISNULL() returns a
zero if the value is NULL:

E.g.

SELECT ProductName,UnitPrice*(UnitsInStock+ISNULL(UnitsOnOrder,0))

FROM Products

Data types and ranges for Microsoft Access, MySQL and SQL Server.

SQL Server Data Types

Character strings:

Data type Description Storage


char(n) Fixed-length character string. Maximum 8,000 characters n
varchar(n) Variable-length character string. Maximum 8,000 characters
varchar(max) Variable-length character string. Maximum 1,073,741,824
characters
text Variable-length character string. Maximum 2GB of text data

Unicode strings:

Data type Description Storage


nchar(n) Fixed-length Unicode data. Maximum 4,000 characters
nvarchar(n) Variable-length Unicode data. Maximum 4,000 characters
nvarchar(max) Variable-length Unicode data. Maximum 536,870,912 characters
ntext Variable-length Unicode data. Maximum 2GB of text data

Binary types:

158
Data type Description Storage
bit Allows 0, 1, or NULL
binary(n) Fixed-length binary data. Maximum 8,000 bytes
varbinary(n) Variable-length binary data. Maximum 8,000 bytes
varbinary(max) Variable-length binary data. Maximum 2GB
image Variable-length binary data. Maximum 2GB

Number types:

Data type Description Storage


tinyint Allows whole numbers from 0 to 255 1 byte
smallint Allows whole numbers between -32,768 and 32,767 2 bytes
int Allows whole numbers between -2,147,483,648 and 2,147,483,647 4 bytes
bigint Allows whole numbers between -9,223,372,036,854,775,808 and 8 bytes
9,223,372,036,854,775,807
decimal(p,s) Fixed precision and scale numbers. 5-17
bytes
Allows numbers from -10^38 +1 to 10^38 –1.

The p parameter indicates the maximum total number of digits that


can be stored (both to the left and to the right of the decimal point).
p must be a value from 1 to 38. Default is 18.

The s parameter indicates the maximum number of digits stored to


the right of the decimal point. s must be a value from 0 to p.
Default value is 0
numeric(p,s) Fixed precision and scale numbers. 5-17
bytes
Allows numbers from -10^38 +1 to 10^38 –1.

The p parameter indicates the maximum total number of digits that


can be stored (both to the left and to the right of the decimal point).
p must be a value from 1 to 38. Default is 18.

159
The s parameter indicates the maximum number of digits stored to
the right of the decimal point. s must be a value from 0 to p.
Default value is 0
smallmoney Monetary data from -214,748.3648 to 214,748.3647 4 bytes
money Monetary data from -922,337,203,685,477.5808 to 8 bytes
922,337,203,685,477.5807
float(n) Floating precision number data from -1.79E + 308 to 1.79E + 308. 4 or 8
bytes
The n parameter indicates whether the field should hold 4 or 8
bytes. float(24) holds a 4-byte field and float(53) holds an 8-byte
field. Default value of n is 53.
real Floating precision number data from -3.40E + 38 to 3.40E + 38 4 bytes

Date types:

Data type Description Storage


datetime From January 1, 1753 to December 31, 9999 with an accuracy of 8 bytes
3.33 milliseconds
datetime2 From January 1, 0001 and December 31, 9999 with an accuracy of 6-8 bytes
100 nanoseconds
smalldatetime From January 1, 1900 to June 6, 2079 with an accuracy of 1 minute 4 bytes
date Store a date only. From January 1, 0001 to December 31, 9999 3 bytes
time Store a time only to an accuracy of 100 nanoseconds 3-5 bytes
datetimeoffset The same as datetime2 with the addition of a time zone offset 8-10
bytes
timestamp Stores a unique number that gets updated every time a row gets
created or modified. The timestamp value is based upon an internal
clock and does not correspond to real time. Each table may have
only one timestamp variable

Other data types:

Data type Description

160
sql_variant Stores up to 8,000 bytes of data of various data types, except text, ntext, and
timestamp
uniqueidentifier Stores a globally unique identifier (GUID)
xml Stores XML formatted data. Maximum 2GB
cursor Stores a reference to a cursor used for database operations
table Stores a result-set for later processing

1.3. SQL Aggregate Functions

SQL aggregate functions return a single value, calculated from values in a column.

Useful aggregate functions:

 AVG() - Returns the average value


 COUNT() - Returns the number of rows
 FIRST() - Returns the first value
 LAST() - Returns the last value
 MAX() - Returns the largest value
 MIN() - Returns the smallest value
 SUM() - Returns the sum

1.4 SQL Scalar functions

SQL scalar functions return a single value, based on the input value.

Useful scalar functions:

 UCASE() - Converts a field to upper case


 LCASE() - Converts a field to lower case
 MID() - Extract characters from a text field
 LEN() - Returns the length of a text field
 ROUND() - Rounds a numeric field to the number of decimals specified
 NOW() - Returns the current system date and time
 FORMAT() - Formats how a field is to be displayed

161
Tip: The aggregate functions and the scalar functions will be explained in details in the next
section.

The AVG() Function

The AVG() function returns the average value of a numeric column.

SQL AVG() Syntax

SELECT AVG(column_name) FROM table_name

SQL AVG() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the average value of the "OrderPrice" fields.

We use the following SQL statement:

SELECT AVG(OrderPrice) AS OrderAverage FROM Orders

The result-set will look like this:

OrderAverage
950

162
Now we want to find the customers that have an OrderPrice value higher then the average
OrderPrice value.

We use the following SQL statement:

SELECT Customer FROM Orders

WHERE OrderPrice>(SELECT AVG(OrderPrice) FROM Orders)

The result-set will look like this:

Customer
Hansen
Nilsen
Jensen

SQL COUNT() Function

SQL COUNT(column_name) Syntax

The COUNT(column_name) function returns the number of values (NULL values will not be
counted) of the specified column:

SELECT COUNT(column_name) FROM table_name

SQL COUNT(*) Syntax

The COUNT(*) function returns the number of records in a table:

SELECT COUNT(*) FROM table_name

163
SQL COUNT(DISTINCT column_name) Syntax

The COUNT(DISTINCT column_name) function returns the number of distinct values of the
specified column:

SELECT COUNT(DISTINCT column_name) FROM table_name

Note: COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but not with
Microsoft Access.

SQL COUNT(column_name) Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to count the number of orders from "Customer Nilsen".

We use the following SQL statement:

SELECT COUNT(Customer) AS CustomerNilsen FROM Orders

WHERE Customer='Nilsen'

The result of the SQL statement above will be 2, because the customer Nilsen has made 2 orders
in total:

CustomerNilsen
2

164
SQL COUNT(*) Example

If we omit the WHERE clause, like this:

SELECT COUNT(*) AS NumberOfOrders FROM Orders

The result-set will look like this:

NumberOfOrders
6

which is the total number of rows in the table.

SQL COUNT(DISTINCT column_name) Example

Now we want to count the number of unique customers in the "Orders" table.

We use the following SQL statement:

SELECT COUNT(DISTINCT Customer) AS NumberOfCustomers FROM Orders

The result-set will look like this:

NumberOfCustomers
3

Which is the number of unique customers (Hansen, Nilsen, and Jensen) in the "Orders" table.

The FIRST() Function

The FIRST() function returns the first value of the selected column.

SQL FIRST() Syntax

SELECT FIRST(column_name) FROM table_name

165
SQL FIRST() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the first value of the "OrderPrice" column.

We use the following SQL statement:

SELECT FIRST(OrderPrice) AS FirstOrderPrice FROM Orders

The result-set will look like this:

FirstOrderPrice
1000

The LAST() Function

The LAST() function returns the last value of the selected column.

SQL LAST() Syntax

SELECT LAST(column_name) FROM table_name

SQL LAST() Example

We have the following "Orders" table:

166
O_Id OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the last value of the "OrderPrice" column.

We use the following SQL statement:

SELECT LAST(OrderPrice) AS LastOrderPrice FROM Orders

The result-set will look like this:

LastOrderPrice
100

The MAX() Function

The MAX() function returns the largest value of the selected column.

SQL MAX() Syntax

SELECT MAX(column_name) FROM table_name

SQL MAX() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen

167
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the largest value of the "OrderPrice" column.

We use the following SQL statement:

SELECT MAX(OrderPrice) AS LargestOrderPrice FROM Orders

The result-set will look like this:

LargestOrderPrice
2000

The MIN() Function

The MIN() function returns the smallest value of the selected column.

SQL MIN() Syntax

SELECT MIN(column_name) FROM table_name

SQL MIN() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen

168
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the smallest value of the "OrderPrice" column.

We use the following SQL statement:

SELECT MIN(OrderPrice) AS SmallestOrderPrice FROM Orders

The result-set will look like this:

SmallestOrderPrice
100

The SUM() Function

The SUM() function returns the total sum of a numeric column.

SQL SUM() Syntax

SELECT SUM(column_name) FROM table_name

SQL SUM() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

169
Now we want to find the sum of all "OrderPrice" fields".

We use the following SQL statement:

SELECT SUM(OrderPrice) AS OrderTotal FROM Orders

The result-set will look like this:

OrderTotal
5700

The GROUP BY Statement

The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns.

SQL GROUP BY Syntax

SELECT column_name, aggregate_function(column_name)

FROM table_name

WHERE column_name operator value

GROUP BY column_name

SQL GROUP BY Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen

170
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the total sum (total order) of each customer.We will have to use the
GROUP BY statement to group the customers. We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders

GROUP BY Customer

The result-set will look like this:

Customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000

Nice! Isn't it? :)

Let's see what happens if we omit the GROUP BY statement:

SELECT Customer,SUM(OrderPrice) FROM Orders

The result-set will look like this:

Customer SUM(OrderPrice)
Hansen 5700
Nilsen 5700
Hansen 5700
Hansen 5700
Jensen 5700
Nilsen 5700

171
The result-set above is not what we wanted.

Explanation of why the above SELECT statement cannot be used: The SELECT statement
above has two columns specified (Customer and SUM(OrderPrice). The "SUM(OrderPrice)"
returns a single value (that is the total sum of the "OrderPrice" column), while "Customer"
returns 6 values (one value for each row in the "Orders" table). This will therefore not give us the
correct result. However, you have seen that the GROUP BY statement solves this problem.

GROUP BY More Than One Column

We can also use the GROUP BY statement on more than one column, like this:

SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders

GROUP BY Customer,OrderDate

The HAVING Clause

The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.

SQL HAVING Syntax

SELECT column_name, aggregate_function(column_name)

FROM table_name

WHERE column_name operator value

GROUP BY column_name

HAVING aggregate_function(column_name) operator value

SQL HAVING Example

We have the following "Orders" table:

172
O_Id OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find if any of the customers have a total order of less than 2000.

We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders

GROUP BY Customer

HAVING SUM(OrderPrice)<2000

The result-set will look like this:

Customer SUM(OrderPrice)
Nilsen 1700

Now we want to find if the customers "Hansen" or "Jensen" have a total order of more than
1500.

We add an ordinary WHERE clause to the SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders

WHERE Customer='Hansen' OR Customer='Jensen'

GROUP BY Customer

HAVING SUM(OrderPrice)>1500

173
The result-set will look like this:

Customer SUM(OrderPrice)
Hansen 2000
Jensen 2000

The UCASE() Function

The UCASE() function converts the value of a field to uppercase.

SQL UCASE() Syntax

SELECT UCASE(column_name) FROM table_name

SQL UCASE() Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the content of the "LastName" and "FirstName" columns above, and
convert the "LastName" column to uppercase.

We use the following SELECT statement:

SELECT UCASE(LastName) as LastName,FirstName FROM Persons

The result-set will look like this:

LastName FirstName

174
HANSEN Ola
SVENDSON Tove
PETTERSEN Kari

The LCASE() Function

The LCASE() function converts the value of a field to lowercase.

SQL LCASE() Syntax

SELECT LCASE(column_name) FROM table_name

SQL LCASE() Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the content of the "LastName" and "FirstName" columns above, and
convert the "LastName" column to lowercase.

We use the following SELECT statement:

SELECT LCASE(LastName) as LastName,FirstName FROM Persons

The result-set will look like this:

LastName FirstName
hansen Ola
svendson Tove
pettersen Kari

175
The MID() Function

The MID() function is used to extract characters from a text field.

SQL MID() Syntax

SELECT MID(column_name,start[,length]) FROM table_name

Parameter Description
column_name Required. The field to extract characters from.
start Required. Specifies the starting position (starts at 1).
length Optional. The number of characters to return. If omitted, the MID() function
returns the rest of the text.

SQL MID() Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to extract the first four characters of the "City" column above.

We use the following SELECT statement:

SELECT MID(City,1,4) as SmallCity FROM Persons

The result-set will look like this:

176
SmallCity
Sand
Sand
Stav

SQL LEN() Example

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

Now we want to select the length of the values in the "Address" column above.

We use the following SELECT statement:

SELECT LEN(Address) as LengthOfAddress FROM Persons

The result-set will look like this:

LengthOfAddress
12
9
9

The ROUND() Function

The ROUND() function is used to round a numeric field to the number of decimals specified.

SQL ROUND() Syntax

SELECT ROUND(column_name,decimals) FROM table_name

177
Parameter Description
column_name Required. The field to round.
decimals Required. Specifies the number of decimals to be returned.

SQL ROUND() Example

We have the following "Products" table:

Prod_Id ProductName Unit UnitPrice


1 Jarlsberg 1000 g 10.45
2 Mascarpone 1000 g 32.56
3 Gorgonzola 1000 g 15.67

Now we want to display the product name and the price rounded to the nearest integer.

We use the following SELECT statement:

SELECT ProductName, ROUND(UnitPrice,0) as UnitPrice FROM Products

The result-set will look like this:

ProductName UnitPrice
Jarlsberg 10
Mascarpone 33
Gorgonzola 16

The NOW() Function

The NOW() function returns the current system date and time.

SQL NOW() Syntax

SELECT NOW() FROM table_name

178
SQL NOW() Example

We have the following "Products" table:

Prod_Id ProductName Unit UnitPrice


1 Jarlsberg 1000 g 10.45
2 Mascarpone 1000 g 32.56
3 Gorgonzola 1000 g 15.67

Now we want to display the products and prices per today's date.

We use the following SELECT statement:

SELECT ProductName, UnitPrice, Now() as PerDate FROM Products

The result-set will look like this:

ProductName UnitPrice PerDate


Jarlsberg 10.45 10/7/2008 11:25:02 AM
Mascarpone 32.56 10/7/2008 11:25:02 AM
Gorgonzola 15.67 10/7/2008 11:25:02 AM

The FORMAT() Function

The FORMAT() function is used to format how a field is to be displayed.

SQL FORMAT() Syntax

SELECT FORMAT(column_name,format) FROM table_name

Parameter Description
column_name Required. The field to be formatted.
format Required. Specifies the format.

179
SQL FORMAT() Example

We have the following "Products" table:

Prod_Id ProductName Unit UnitPrice


1 Jarlsberg 1000 g 10.45
2 Mascarpone 1000 g 32.56
3 Gorgonzola 1000 g 15.67

Now we want to display the products and prices per today's date (with today's date displayed in
the following format "YYYY-MM-DD").

We use the following SELECT statement:

SELECT ProductName, UnitPrice, FORMAT(Now(),'YYYY-MM-DD') as PerDate

FROM Products

The result-set will look like this:

ProductName UnitPrice PerDate


Jarlsberg 10.45 2008-10-07
Mascarpone 32.56 2008-10-07
Gorgonzola 15.67 2008-10-07
SQL Statement Syntax
AND / OR SELECT column_name(s)
FROM table_name
WHERE condition
AND|OR condition
ALTER TABLE ALTER TABLE table_name
ADD column_name datatype

or

180
ALTER TABLE table_name
DROP COLUMN column_name
AS (alias) SELECT column_name AS column_alias
FROM table_name

or

SELECT column_name
FROM table_name AS table_alias
BETWEEN SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
CREATE DATABASE CREATE DATABASE database_name
CREATE TABLE CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
column_name2 data_type,
...
)
CREATE INDEX CREATE INDEX index_name
ON table_name (column_name)

or

CREATE UNIQUE INDEX index_name


ON table_name (column_name)
CREATE VIEW CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

181
DELETE DELETE FROM table_name
WHERE some_column=some_value

or

DELETE FROM table_name


(Note: Deletes the entire table!!)

DELETE * FROM table_name


(Note: Deletes the entire table!!)
DROP DATABASE DROP DATABASE database_name
DROP INDEX DROP INDEX table_name.index_name (SQL Server)
DROP INDEX index_name ON table_name (MS Access)
DROP INDEX index_name (DB2/Oracle)
ALTER TABLE table_name
DROP INDEX index_name (MySQL)
DROP TABLE DROP TABLE table_name
GROUP BY SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value
IN SELECT column_name(s)
FROM table_name
WHERE column_name
IN (value1,value2,..)

182
INSERT INTO INSERT INTO table_name
VALUES (value1, value2, value3,....)

or

INSERT INTO table_name


(column1, column2, column3,...)
VALUES (value1, value2, value3,....)
INNER JOIN SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name
LEFT JOIN SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
RIGHT JOIN SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
FULL JOIN SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name
LIKE SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern
ORDER BY SELECT column_name(s)
FROM table_name
ORDER BY column_name [ASC|DESC]
SELECT SELECT column_name(s)
FROM table_name
SELECT * SELECT *

183
FROM table_name
SELECT DISTINCT SELECT DISTINCT column_name(s)
FROM table_name
SELECT INTO SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_table_name

or

SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_table_name
SELECT TOP SELECT TOP number percent column_name(s)
FROM table_name
TRUNCATE TABLE TRUNCATE TABLE table_name
UNION SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
UNION ALL SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2
UPDATE UPDATE table_name
SET column1=value, column2=value,...
WHERE some_column=some_value
WHERE SELECT column_name(s)
FROM table_name
WHERE column_name operator value

184
UNIT SIX
PROGRAMMING BLOCKS IN SQL
Structure

PROGRAMMING BLOCKS IN SQL

Procedural and Non procedural language in database

Simple program in PL/SQL

PL/SQL Trigger

Exception handling in PL/SQL

OBJECTIVES

After working through this unit, you should able to:

 Know features of PL/SQL approach


 Construct syntactically correct PL/SQL‟s statements
 Under stand advantages of non-procedural approach(PL/SQL) over procedural approach
(simple SQL)

Introduction

Programming activities demand high concentration and creativity. Cultivating such skill would
help you to design logic for the problem at hand easily.

185
In this unit, you will learn basic difference between PL/SQL and simple SQL statements, how
construct PL/SQL statements such as variables declaration in PL/SQL, conditional statements in
PL/SQL, iterative statements, in PL/SQL, how to create and use cursor and trigger in PL/SQL.
Besides you will learn how to exception handling techniques and functions in PL/SQL.

Short fragmented codes appropriate to the topics are given to help you understand the topics
easily.

Section 1: Procedural and Non procedural language in database

Section objectives:
Upon Completing this section, you would be able to:

 Understand basic difference between Procedural and Non procedural language


 Familiarize with PL/SQL (example of non-procedural approach) language.
 Identify advantage of PL/SQL
 Understand basic structure of PL/SQL

6.1.1. What is procedural language Non-procedural language?

SQL statements are defined in terms of constraints which we wish to fix on the result of the
query. As we have mentioned earlier SQL statements let us to store and retrieve data from the
relation database. SQL is the Non-procedural programming language.

What is Non-procedural language?

Non-procedural programming languages allow users and professional programmers to specify


the results they want without specifying how to solve the problem. To make it clear see the
following example.

Name Course_title Letter_grade


Abebe Operating system A
Hana Multimedia B
Solomon Operating system C
Kebebe Calculus D

186
Taye Civics B

Fig 1 Student_record

Assume if one wants to know name of student that scored “A”, the following SQL statement
retrieve name of student(s) from the database:

SELECT name FROM student_record WHERE letter_grade =”A”

In the above SQL statement the user didn‟t specify step by step algorithm to the database engine
how to retrieve name of student who scored “A” grade in any subject. What is expected from the
programmer to specify is, what to retrieve from the database. Such kind of languages is known
as non-procedural programming language.

Example of non-procedural language is SQL

What is procedural language?

Any programming language that executes instructions based on the given procedure is called
procedural language. In this case the programmer should specify every necessary step that the
machine follows to solve the given problem.

Let us see the following simple example.

Assume a programmer want to add two integer numbers and display the result on the screen. In
this case a programmer should specify all the necessary steps that machine follows to add two
integer numbers. The following is the rudimentary steps that may be written by programmer to
be followed by the machine to solve the problem:

Step 1 Accept the first number

Step 2 Check whether the number is integer, if not repeat step one unit the enter number
integer

Step 3 Accept the second number

187
Step 4 Check whether the number is integer, if not repeat step three unit the enter number
integer

Step 5 Add both numbers

Step 6 Display result

Examples of procedural programming languages are COBOL, C++, and ,PL/SQL,Transact-SQL


FORTRAN … etc.

See the above six steps and identify features that are not appeared in non procedural
programming example above?

See the above six set of instructions. Step 2 and step 4 are executed if and only if the give
numbers are integers (or required condition is satisfied) such kinds of statements are called
conditional statements.

This feature is not available in SQL statements.

On the other hand some block of instructions may be executed repeatedly until some condition is
met.

Look step 2 and step 4, these statements will be executed if the given number is different from
integer number (this is the condition). Such kind of statements is called loop statement. A Loop
statement too, is not available in SQL statement.

These features give power to the programmer to control the execution steps of the program. As
we have said earlier SQL language is power full in manipulating data in the data base, but they
missed some features like loop and conditional statements. So to make writing a program in SQL
language more flexible, SQL statements are embedded within the procedural language to gain
conditional and loop features from the procedural language. As a result of this the PL/SQL
language in oracle database and Transact-SQL language in SQL server database are came into
picture.

In other word the SQL language is extended to PL/SQL and Transact-SQL languages by
inheriting features of procedural language. In this module we will use the PL/SQL language.

188
6.1.2. What is PL/SQL?

PL/SQL stands for Procedural Language extension of SQL.

PL/SQL is a combination of SQL along with the procedural features of programming languages.
It was developed by Oracle Corporation in the early 90‟s to enhance the capabilities of SQL.

PL/SQL stands for "Procedural Language extensions to SQL." PL/SQL is available primarily as
an "enabling technology" within other software products; it does not exist as a standalone
language. You can use PL/SQL in the Oracle relational database, in the Oracle Server, and in
client-side application development tools, such as Oracle Forms. PL/SQL is closely integrated
into the SQL language, yet it adds programming constructs that are not native to this standard
relational database language. As you can see from the following code example, PL/SQL allows
you to combine SQL statements with "standard" procedural constructs. This single program can
either insert a company into or delete a company from the database. It relies on the IF statement
(not a SQL statement) to determine which action to take:

Example

1 PROCEDURE maintain_company

2 (action_in IN VARCHAR2,

id_in IN NUMBER,

name_in IN VARCHAR2 := NULL)

3 BEGIN

4 IF action_in = 'DELETE'

5 THEN

6 DELETE FROM company WHERE company_id = id_in;

189
7

8 ELSIF action_in = 'INSERT'

9 THEN

10 INSERT INTO company (company_id, name)

11 VALUES (id_in, name_in);

12 END IF;

13 END;

6.1.3. Advantages of PL/SQL

These are the advantages of PL/SQL.

 Block Structures: PL SQL consists of blocks of code, which can be nested within each
other. Each block forms a unit of a task or a logical module. PL/SQL Blocks can be
stored in the database and reused.
 Procedural Language Capability: PL SQL consists of procedural language constructs
such as conditional statements (if else statements) and loops like (FOR loops).
 Better Performance: PL SQL engine processes multiple SQL statements
simultaneously as a single block, thereby reducing network traffic.

Error Handling: PL/SQL handles errors or exceptions effectively during the


execution of a PL/SQL program. Once an exception is caught, specific actions
can be taken depending upon the type of the exception or it can be displayed to
the

6.1.4. Structure of PL/SQL

The following sections can be used to build program in PL/SQL language:

190
1- Header Section

2- Declaration Section (optional)

3- Execution Section (mandatory)

4- Exception Section(optional)

6.1.4.1. HEADER SECTION

This is part of the program where you can specify the type and name of the blocks

6.1.4.2 DECLARATION SECTION

The Declaration section of a PL/SQL Block starts with the reserved keyword DECLARE. This
section is optional and is used to declare any placeholders like variables, constants, records and
cursors, which are used to manipulate data in the execution section. Placeholders may be any of
Variables, Constants and Records, which stores data temporarily. Cursors are also declared in
this section.

6.1.4.3. EXCUTION Section

The Execution section of a PL/SQL Block starts with the reserved keyword BEGIN and ends
with END. This is a mandatory section and is the section where the program logic is written to
perform any task. The programmatic constructs like loops, conditional statement and SQL
statements form the part of execution section.

6.1.4.4 EXCEPTIONS (Error handling Section)

The Exception section of a PL/SQL Block starts with the reserved keyword EXCEPTION. This
section is optional. Any errors in the program can be handled in this section, so that the PL/SQL
Blocks terminates gracefully. If the PL/SQL Block contains exceptions that cannot be handled,
the Block terminates abruptly with errors.

Every statement in the above three sections must end with a semicolon (;). PL/SQL blocks can

191
be nested within other PL/SQL blocks. Comments can be used to document code. End section is
last part of the program that indicates the end of the program.

The following template shows how PL/SQL block looks

DECLARE
Variable declaration
BEGIN
Program Execution
EXCEPTION
Exception handling
END;

Section 2: Simple program in PL/SQL

Section objectives:
Upon Completing this section, you would be able to

 Know how to declare variables and constant in PL/SQL


 Declare and manipulate rows in PL/SQL.
 Construct conditional statement in PL/SQL
 Construct iterative statements in PL/SQL

6.2.1. Variable and constant declaration

How to declare PL/SQL variables and constant?

The following syntax should be used to declare PL/SQL variables:

variable_name datatype [NOT NULL := value ];

 variable_name is the name of the variable.


 datatype is a valid PL/SQL datatype.
 NOT NULL is an optional specification on the variable.

192
 value or DEFAULT value is also an optional specification, where you can initialize a
variable.
 Each variable declaration is a separate statement and must be terminated by a semicolon.

EXAMPLE 1

salary number (6);

“salary” is a variable of datatype number and of length 6.

When a variable is specified as NOT NULL, you must initialize the variable when it is declared.

EXAMPLE 2

The below example declares two variables, one of which is a not null.

DECLARE

salary number(4);

dept varchar2(10) NOT NULL := “computer science Dept”;

in the above example “dept” variable is not NULL, in such case we must initialized the variable
with value( in the above example the “dept” value is “computer science”),when the variable is
declared.

Notice: The value of a variable can change in the execution or exception section of the
PL/SQL Block. We can assign values to variables in the following two ways given
below.

1) We can directly assign values to variables.

The Syntax is:

variable_name:= value;

193
2) We can assign values to variables directly from the database columns by using a SELECT..
INTO
statement.

The Syntax is:

SELECT column_name

INTO variable_name

FROM table_name

[WHERE condition];

EXAMPLE 3

The following fragment code shows how to assign value to variable using SELECT.. INTO
statement

The program will get the salary of an employee with id '1116' and display it on the screen

DECLARE

var_salary number(6);

var_emp_id number(6) = 1116;

BEGIN

SELECT salary

INTO var_salary

FROM employee

WHERE emp_id = var_emp_id;

194
6.2.2 Scope of Variables

PL/SQL allows the nesting of Blocks within Blocks i.e, the Execution section of an outer
block can contain inner blocks. Therefore, a variable which is accessible to an outer
Block is also accessible to all nested inner Blocks. The variables declared in the inner
blocks are not accessible to outer blocks. Based on their declaration we can classify
variables into two types.

 Local variables - These are declared in a inner block and cannot be referenced by outside
Blocks.
 Global variables - These are declared in a outer block and can be referenced by its itself
and by its inner blocks.

For Example: In the below example we are creating two variables in the outer block and
assigning thier product to the third variable created in the inner block. The variable 'var_mult' is
declared in the inner block, so cannot be accessed in the outer block i.e. it cannot be accessed
after line 11. The variables 'var_num1' and 'var_num2' can be accessed anywhere in the block.

1> DECLARE

2> var_num1 number;

3> var_num2 number;

4> BEGIN

5> var_num1 := 100;

6> var_num2 := 200;

7> DECLARE

8> var_mult number;

195
9> BEGIN

10> var_mult := var_num1 * var_num2;

11> END;

12> END;

PL/SQL Constants

As the name implies a constant is a value used in a PL/SQL Block that remains
unchanged throughout the program. A constant is a user-defined literal value. You can
declare a constant and use it instead of actual value.

For example: If you want to write a program which will increase the salary of the
employees by 25%, you can declare a constant and use it throughout the program. Next
time when you want to increase the salary again you can change the value of the
constant which will be easier than changing the actual value throughout the program.

The General Syntax to declare a constant is:

constant_name CONSTANT datatype := VALUE;

constant_name is the name of the constant i.e. similar to a variable name

The word CONSTANT is a reserved word and ensures that the value does not change.

VALUE - It is a value which must be assigned to a constant when it is declared. You cannot
assign a value later.

For example: To declare salary_increase, you can write code as follows:

DECLARE

salary_increase CONSTANT number (3) := 10;

196
You must assign a value to a constant at the time you declare it. If you do not assign a
value to a constant while declaring it and try to assign a value in the execution section,
you will get a error.

For Example: If you execute the below Pl/SQL block you will get error.

DECLARE

salary_increase CONSTANT number(3);

BEGIN

salary_increase := 100;

END;

6.2.3 PL/SQL Records

What are records?

Records are another type of data types which oracle allows to be defined as a
placeholder. Records are composite data types, which means it is a combination of
different scalar data types like char, varchar, number etc. Each scalar data types in the
record holds a value. A record can be visualized as a row of data. It can contain all the
contents of a row.

6.2.3.1. Declaring a record:

To declare a record, you must first define a composite datatype; then declare a
record for that type.

The General Syntax to define a composite datatype is:

197
TYPE record_type_name IS RECORD

(first_col_name column_datatype,

second_col_name column_datatype ...);

record_type_name – it is the name of the composite type you want to define.

First_col_name, second_col_name, etc.,- it is the names the fields/columns within the record.

column_datatype defines the scalar datatype of the fields.

There are different ways to declare the data type of the fields

 You can declare the fields in the same way as you declare the fields while
creating the table.

 If a field is based on a column from database table, you can define the field_type
as follows:

col_name table_name.column_name%type;

By declaring the field data type in the above method, the data type of the column
is dynamically applied to the field. This method is useful when you are altering
the column specification of the table, because you do not need to change the code
again.

NOTE: You can use also %type to declare variables and constants

The General Syntax to declare a record of a uer-defined datatype is:

record_name record_type_name;

198
For example: The following code shows how to declare records employee_rec based on user
defined type

DECLARE

TYPE employee_type IS RECORD

(employee_id number(5),

employee_first_name varchar2(25),

employee_last_name employee.last_name%type,

employee_dept employee.dept%type);

employee_salary employee.salary%type;

employee_rec employee_type;

If all the fields of a record are based on the columns of a table, we can declare the record
as follows:

record_name table_name%ROWTYPE;

For example: The above declaration of employee_rec can declared as follows:

DECLARE

employee_rec employee%ROWTYPE

The advantages of declaring the record as a ROWTYPE are:

1) You do not need to explicitly declare variables for all the columns in a table.
2) If you alter the column specification in the database table, you do not need to update
the code.

199
The disadvantage of declaring the record as a ROWTYPE is:

1) When you create a record as a ROWTYPE, fields will be created for all the columns
in the table and memory will be used to create the datatype for all the fields. So use
ROWTYPE only when you are using all the columns of the table in the program.

NOTE: When you are creating a record, you are just creating a datatype, similar to
creating a variable. You need to assign values to the record to use them.

The following table consolidates the different ways in which you can define and declare
a PL/SQL record.

Syntax Usage

TYPE record_type_name IS RECORD Define a composite datatype, where each


(column_name1 datatype, column_name2 field is scalar.
datatype, ...);
col_name table_name.column_name%type; Dynamically define the datatype of a
column based on a database column.
record_name record_type_name; Declare a record based on a user-defined
type.
record_name table_name%ROWTYPE; Dynamically declare a record based on an
entire row of a table. Each column in the table
corresponds to a field in the record.

200
6.2.3.2. Passing Values To and From a Record

When you assign values to a record, you actually assign values to the fields within it.
The General Syntax to assign a value to a column within a record direclty is:

record_name.col_name := value;

If you used %ROWTYPE to declare a record, you can assign values as shown:

record_name.column_name := value;

We can assign values to records using SELECT Statements as shown:

SELECT col1, col2

INTO record_name.col_name1, record_name.col_name2

FROM table_name

[WHERE clause];

If %ROWTYPE is used to declare a record then you can directly assign values to the
whole record instead of each columns separately. In this case, you must SELECT all the
columns from the table into the record as shown:

SELECT * INTO record_name

FROM table_name

[WHERE clause];

Lets see how we can get values from a record.


The General Syntax to retrieve a value from a specific field into another variable is:

var_name := record_name.col_name;

201
The following table consolidates the different ways you can assign values to and from a
record:

Syntax Usage
record_name.col_name := value; To directly assign a value to a specific
column of a record.
record_name.column_name := value; To directly assign a value to a specific
column of a record, if the record is
declared using %ROWTYPE.
SELECT col1, col2 INTO To assign values to each field of a record
record_name.col_name1, from the database table.
record_name.col_name2 FROM table_name
[WHERE clause];

SELECT * INTO record_name FROM To assign a value to all fields in the record
table_name [WHERE clause]; from a database table.
variable_name := record_name.col_name; To get a value from a record column and
assigning it to a variable.

6.2.4. Conditional Statements in PL/SQL

As the name implies, PL/SQL supports programming language features like conditional
statements, iterative statements.

The programming constructs are similar to how you use in programming languages like Java and
C++. In this section I will provide you syntax of how to use conditional statements in PL/SQL
programming.

In almost every program you write, you need to make decisions. For example, if it is the end of
the fiscal year, bonuses must be distributed to the employees based on their salaries. To compute

202
employee bonuses, a program needs a conditional control. In other words, it needs to employ a
selection structure.

Conditional control allows you to control the program's flow of the execution based on a
condition. In programming terms, this means that the statements in the program are not executed
sequentially. Rather, one group of statements or another is executed, depending on how the
condition is evaluated.

PL/SQL has three types of conditional control: IF, ELSIF, and CASE statements. This chapter
explores the first two types and shows you how they can be nested inside one another. CASE
statements are discussed in the next chapter.

An IF statement has two forms: IF-THEN and IF-THEN-ELSE. An IF-THEN statement allows
you to specify only one group of actions to take. In other words, this group of actions is taken
only when a condition evaluates to TRUE. An IF-THEN-ELSE statement allows you to specify
two groups of actions. The second group of actions is taken when a condition evaluates to
FALSE or NULL.

IF-THEN STATEMENTS

An IF-THEN statement is the most basic kind of a conditional control; it has the following structure:

IF CONDITION THEN

STATEMENT 1;

...

STATEMENT N;

END IF;

The reserved word IF marks the beginning of the IF statement. Statements 1 through N are a
sequence of executable statements that consist of one or more standard programming structures.
The word CONDITION between the keywords IF and THEN determines whether these

203
statements are executed. END IF is a reserved phrase that indicates the end of the IF-THEN
construct.

The flowing flowchart illustrate the flowing logic of IF-THEN statement

IF-THEN statement

When an IF-THEN statement is executed, a condition is evaluated to either TRUE or FALSE. If


the condition evaluates to TRUE, control is passed to the first executable statement of the IF-
THEN construct. If the condition evaluates to FALSE, control is passed to the first executable
statement after the END IF statement.

Consider the following example. Two numeric values are stored in the variables v_num1 and
v_num2. You need to arrange their values so that the smaller value is always stored in v_num1
and the larger value is always stored in v_num2.

For Example

DECLARE

v_num1 NUMBER := 5;

v_num2 NUMBER := 3;

v_temp NUMBER;

BEGIN

204
-- if v_num1 is greater than v_num2 rearrange their values

IF v_num1 > v_num2 THEN

v_temp := v_num1;

v_num1 := v_num2;

v_num2 := v_temp;

END IF;

-- display the values of v_num1 and v_num2

DBMS_OUTPUT.PUT_LINE ('v_num1 = '||v_num1);

DBMS_OUTPUT.PUT_LINE ('v_num2 = '||v_num2);

END;

In this example, condition v_num1 > v_num2 evaluates to TRUE because 5 is greater than 3.
Next, the values are rearranged so that 3 is assigned to v_num1 and 5 is assigned to v_num2.
This is done with the help of the third variable, v_temp, which is used for temporary storage.

This example produces the following output:

v_num1 = 3

v_num2 = 5

PL/SQL procedure successfully completed.

IF-THEN-ELSE STATEMENT

An IF-THEN statement specifies the sequence of statements to execute only if the condition
evaluates to TRUE. When this condition evaluates to FALSE, there is no special action to take,
except to proceed with execution of the program.

205
An IF-THEN-ELSE statement enables you to specify two groups of statements. One group of
statements is executed when the condition evaluates to TRUE. Another group of statements is
executed when the condition evaluates to FALSE. This is indicated as follows:

IF CONDITION THEN

STATEMENT 1;

ELSE

STATEMENT 2;

END IF;

STATEMENT 3;

When CONDITION evaluates to TRUE, control is passed to STATEMENT 1; when CONDITION


evaluates to FALSE, control is passed to STATEMENT 2. After the IF-THEN-ELSE construct
has completed, STATEMENT 3 is executed. the following flowchart illustrates this flow of logic.

IF-THEN-ELSE statement

Did You Know?

You should use the IF-THEN-ELSE construct when trying to choose between two mutually
exclusive actions. Consider the following example:

DECLARE

v_num NUMBER := &sv_user_num;

206
BEGIN

-- test if the number provided by the user is even

IF MOD(v_num,2) = 0 THEN

DBMS_OUTPUT.PUT_LINE (v_num||

' is even number');

ELSE

DBMS_OUTPUT.PUT_LINE (v_num||' is odd number');

END IF;

DBMS_OUTPUT.PUT_LINE ('Done');

END;

For any given number, only one of the DBMS_ OUTPUT.PUT_LINE statements is executed.
Hence, the IF-THEN-ELSE construct enables you to specify two and only two mutually
exclusive actions.

When run, this example produces the following output:

Enter value for v_user_num: 24

old 2: v_num NUMBER := &v_user_num;

new 2: v_num NUMBER := 24;

24 is even number

Done

PL/SQL procedure successfully completed.

207
NULL CONDITION

In some cases, a condition used in an IF statement can be evaluated to NULL instead of TRUE
or FALSE. For the IF-THEN construct, the statements are not executed if an associated
condition evaluates to NULL. Next, control is passed to the first executable statement after END
IF. For the IF-THEN-ELSE construct, the statements specified after the keyword ELSE are
executed if an associated condition evaluates to NULL.

For Example

DECLARE

v_num1 NUMBER := 0;

v_num2 NUMBER;

BEGIN

IF v_num1 = v_num2 THEN

DBMS_OUTPUT.PUT_LINE ('v_num1 = v_num2');

ELSE

DBMS_OUTPUT.PUT_LINE ('v_num1 != v_num2');

END IF;

END;

This example produces the following output:

v_num1 != v_num2

PL/SQL procedure successfully completed.

208
The condition

v_num1 = v_num2

is evaluated to NULL because a value is not assigned to the variable v_num2. Therefore,
variable v_num2 is NULL. Notice that the IF-THEN-ELSE construct is behaving as if the
condition evaluated to FALSE, and the second DBMS_ OUTPUT.PUT_LINE statement is
executed.

Lab 4.1 Exercises

This section provides exercises and suggested answers, with discussion related to how those
answers resulted. The most important thing to realize is whether your answer works. You should
figure out the implications of the answers and what the effects are of any different answers you
may come up with.

4.1.1 Use the IF-THEN Statement

In this exercise, you use the IF-THEN statement to test whether the date provided by the user
falls on the weekend (in other words, if the day is Saturday or Sunday).

Create the following PL/SQL script:

-- ch04_1a.sql, version 1.0

SET SERVEROUTPUT ON

DECLARE

v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

v_day VARCHAR2(15);

BEGIN

v_day := RTRIM(TO_CHAR(v_date, 'DAY'));

209
IF v_day IN ('SATURDAY', 'SUNDAY') THEN

DBMS_OUTPUT.PUT_LINE (v_date||' falls on weekend');

END IF;

--- control resumes here

DBMS_OUTPUT.PUT_LINE ('Done...');

END;

To test this script fully, execute it twice. For the first run, enter 09-JAN-2008, and for the second
run, enter 13-JAN-2008. Execute the script, and then answer the following questions:

What output is printed on the screen (for both dates)?

ANSWER: The first output produced for the date is 09-JAN-2008. The second output produced for the
date is 13-JAN-2008.

Enter value for sv_user_date: 09-JAN-2008

old 2: v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

new 2: v_date DATE := TO_DATE('09-JAN-2008', 'DD-MON-YYYY');

Done...

PL/SQL procedure successfully completed.

When the value of 09-JAN-2008 is entered for v_date, the day of the week is determined for the
variable v_day with the help of the functions TO_CHAR and RTRIM. Next, the following
condition is evaluated:

v_day IN ('SATURDAY', 'SUNDAY')

210
Because the value of v_day is 'WEDNESDAY', the condition evaluates to FALSE.
Then, control is passed to the first executable statement after END IF. As a result,
Done... is displayed on the screen:

Enter value for sv_user_date: 13-JAN-2008

old 2: v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

new 2: v_date DATE := TO_DATE('13-JAN-2008', 'DD-MON-YYYY');

13-JAN-08 falls on weekend

Done...

PL/SQL procedure successfully completed.

As in the previous run, the value of v_day is derived from the value of v_date.
Next, the condition of the IF-THEN statement is evaluated. Because it evaluates to
TRUE, the statement after the keyword THEN is executed. Therefore, 13-JAN-
2008 falls on weekend is displayed on the screen. Next, control is passed to the last
DBMS_OUTPUT.PUT_LINE statement, and Done... is displayed on the screen.

 Explain why the output produced for the two dates is different.

ANSWER: The first date, 09-JAN-2008, is a Wednesday. As a result, the condition


v_day IN ('SATURDAY,' 'SUNDAY') does not evaluate to TRUE. Therefore,
control is transferred to the statement after END IF, and Done... is displayed on the
screen.

The second date, 13-JAN-2008, is a Sunday. Because Sunday falls on a weekend,


the condition evaluates to TRUE, and the message 13-JAN-2008 falls on weekend is
displayed on the screen. Next, the last DBMS_OUTPUT.PUT_LINE statement is
executed, and Done... is displayed on the screen.

Remove the RTRIM function from the assignment statement for v_day as follows:

211
v_day := TO_CHAR(v_date, 'DAY');

Run the script again, entering 13-JAN-2008 for v_date.

 What output is printed on the screen? Why?

ANSWER: The script should look similar to the following. Changes are shown in
bold.

-- ch04_1b.sql, version 2.0

SET SERVEROUTPUT ON

DECLARE

v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

v_day VARCHAR2(15);

BEGIN

v_day := TO_CHAR(v_date, 'DAY');

IF v_day IN ('SATURDAY', 'SUNDAY') THEN

DBMS_OUTPUT.PUT_LINE (v_date||' falls on weekend');

END IF;

-- control resumes here

DBMS_OUTPUT.PUT_LINE ('Done...');

END;

This script produces the following output:

212
Enter value for sv_user_date: 13-JAN-2008

old 2: v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

new 2: v_date DATE := TO_DATE('13-JAN-2008', 'DD-MON-YYYY');

Done...

PL/SQL procedure successfully completed.

In the original example, the variable v_day is calculated with the help of the
statement RTRIM(TO_CHAR(v_date, 'DAY')). First, the function TO_CHAR
returns the day of the week, padded with blanks. The size of the value retrieved by
the function TO_CHAR is always 9 bytes. Next, the RTRIM function removes
trailing spaces.

In the statement v_day := TO_CHAR(v_date, 'DAY') the TO_CHAR function is


used without the RTRIM function. Therefore, trailing blanks are not removed after
the day of the week has been derived. As a result, the condition of the IF-THEN
statement evaluates to FALSE even though the given date falls on the weekend, and
control is passed to the last DBMS_ OUTPUT.PUT_LINE statement.

 Rewrite this script using the LIKE operator instead of the IN operator so that it
produces the same results for the dates specified earlier.

ANSWER: The script should look similar to the following. Changes are shown in
bold.

-- ch04_1c.sql, version 3.0

SET SERVEROUTPUT ON

DECLARE

v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

213
v_day VARCHAR2(15);

BEGIN

v_day := RTRIM(TO_CHAR(v_date, 'DAY'));

IF v_day LIKE 'S%' THEN

DBMS_OUTPUT.PUT_LINE (v_date||' falls on weekend');

END IF;

-- control resumes here

DBMS_OUTPUT.PUT_LINE ('Done...');

END;

Saturday and Sunday are the only days of the week that start with S. As a result,
there is no need to spell out the names of the days or specify any additional letters
for the LIKE operator.

 Rewrite this script using the IF-THEN-ELSE construct. If the date specified does
not fall on the weekend, display a message to the user saying so.

ANSWER: The script should look similar to the following. Changes are shown in bold.

-- ch04_1d.sql, version 4.0

SET SERVEROUTPUT ON

DECLARE

v_date DATE := TO_DATE('&sv_user_date', 'DD-MON-YYYY');

v_day VARCHAR2(15);

214
BEGIN

v_day := RTRIM(TO_CHAR(v_date, 'DAY'));

IF v_day IN ('SATURDAY', 'SUNDAY') THEN

DBMS_OUTPUT.PUT_LINE (v_date||' falls on weekend');

ELSE

DBMS_OUTPUT.PUT_LINE (v_date||' does not fall on the weekend');

END IF;

-- control resumes

To modify the script, the ELSE part was added to the IF statement. The rest of the script
has not been changed.

Use the IF-THEN-ELSE Statement

In this exercise, you use the IF-THEN-ELSE statement to check how many students
are enrolled in course number 25, section 1. If 15 or more students are enrolled,
section 1 of course number 25 is full. Otherwise, section 1 of course number 25 is
not full, and more students can register for it. In both cases, a message should be
displayed to the user, indicating whether section 1 is full. Try to answer the
questions before you run the script. After you have answered the questions, run the
script and check your answers. Note that the SELECT INTO statement uses the
ANSI 1999 SQL standard.

Create the following PL/SQL script:

-- ch04_2a.sql, version 1.0

SET SERVEROUTPUT ON

215
DECLARE

v_total NUMBER;

BEGIN

SELECT COUNT(*)

INTO v_total

FROM enrollment e

JOIN section s USING (section_id)

WHERE s.course_no = 25

AND s.section_no = 1;

-- check if section 1 of course 25 is full

IF v_total >= 15 THEN

DBMS_OUTPUT.PUT_LINE

('Section 1 of course 25 is full');

ELSE

DBMS_OUTPUT.PUT_LINE

('Section 1 of course 25 is not full');

END IF;

-- control resumes here

END;

216
Notice that the SELECT INTO statement uses an equijoin. The join condition is
listed in the JOIN clause, indicating columns that are part of the primary key and
foreign key constraints. In this example, column SECTION_ID of the
ENROLLMENT table has a foreign key constraint defined on it. This constraint
references column SECTION_ID of the SECTION table, which, in turn, has a
primary key constraint defined on it.By the Way

You will find detailed explanations and examples of the statements using the new
ANSI 1999 SQL standard in Appendix C and in the Oracle help. Throughout this
book we try to provide examples illustrating both standards; however, our main
focus is PL/SQL features rather than SQL.

Try to answer the following questions, and then execute the script:

 What DBMS_OUTPUT.PUT_LINE statement is displayed if 15 students are enrolled in section 1 of


course number 25?

ANSWER: If 15 or more students are enrolled in section 1 of course number 25, the
first DBMS_OUTPUT.PUT_LINE statement is displayed on the screen.

The condition v_total >= 15 evaluates to TRUE, and as a result, the statement

DBMS_OUTPUT.PUT_LINE ('Section 1 of course 25 is full'); is executed.

 What DBMS_OUTPUT.PUT_LINE statement is displayed if three students are enrolled in section 1


of course number 25?

ANSWER: If three students are enrolled in section 1 of course number 25, the second
DBMS_OUTPUT.PUT_LINE statement is displayed on the screen.

The condition v_total >= 15 evaluates to FALSE, and the ELSE part of the IF-
THEN-ELSE statement is executed. As a result, the statement

DBMS_OUTPUT.PUT_LINE ('Section 1 of course 25 is not full'); is executed.

217
 What DBMS_OUTPUT.PUT_LINE statement is displayed if there is no section 1 for course number
25?

ANSWER: If there is no section 1 for course number 25, the ELSE part of the IF-
THEN-ELSE statement is executed. So the second DBMS_OUTPUT.PUT_LINE
statement is displayed on the screen.

The COUNT function used in the SELECT statement:

SELECT COUNT(*)

INTO v_total

FROM enrollment e

JOIN section s USING (section_id)

WHERE s.course_no = 25

AND s.section_no = 1;

returns 0.

The condition of the IF-THEN-ELSE statement evaluates to FALSE. Therefore, the


ELSE part of the IF-THEN-ELSE statement is executed, and the second
DBMS_OUTPUT.PUT_LINE statement is displayed on the screen.

 How would you change this script so that the user provides both course and section numbers?

ANSWER: Two additional variables must be declared and initialized with the help of
the substitution variables as follows. The script should look similar to the following.
Changes are shown in bold.

-- ch04_2b.sql, version 2.0

SET SERVEROUTPUT ON

DECLARE

218
v_total NUMBER;

v_course_no CHAR(6) := '&sv_course_no';

v_section_no NUMBER := &sv_section_no;

BEGIN

SELECT COUNT(*)

INTO v_total

FROM enrollment e

JOIN section s USING (section_id)

WHERE s.course_no = v_course_no

AND s.section_no = v_section_no;

-- check if a specific section of a course is full

IF v_total >= 15 THEN

DBMS_OUTPUT.PUT_LINE

('Section 1 of course 25 is full');

ELSE

DBMS_OUTPUT.PUT_LINE

('Section 1 of course 25 is not full');

END IF;

-- control resumes here

219
END;

 How would you change this script so that if fewer than 15 students are enrolled
in section 1 of course number 25, a message appears indicating how many
students can still enroll?

ANSWER: The script should look similar to the following. Changes are shown in bold.

-- ch04_2c.sql, version 3.0

SET SERVEROUTPUT ON

DECLARE

v_total NUMBER;

v_students NUMBER;

BEGIN

SELECT COUNT(*)

INTO v_total

FROM enrollment e

JOIN section s USING (section_id)

WHERE s.course_no = 25

AND s.section_no = 1;

-- check if section 1 of course 25 is full

IF v_total >= 15 THEN

DBMS_OUTPUT.PUT_LINE

220
('Section 1 of course 25 is full');

ELSE

v_students := 15 - v_total;

DBMS_OUTPUT.PUT_LINE (v_students||

' students can still enroll into section 1'||

' of course 25');

END IF;

-- control resumes here

END;

Notice that if the IF-THEN-ELSE statement evaluates to FALSE, the statements


associated with the ELSE part are executed. In this case, the value of the variable
v_total is subtracted from 15. The result of this operation indicates how many more
students can enroll in section 1 of course number 25.

6.2.5. Iterative Statements in PL/SQL

An iterative control Statements are used when we want to repeat the execution of one or more
statements for specified number of times. These are similar to those in

There are three types of loops in PL/SQL:

Simple Loop

While Loop

For Loop

221
Simple Loop

A Simple Loop is used when a set of statements is to be executed at least once before the
loop terminates. An EXIT condition must be specified in the loop, otherwise the loop
will get into an infinite number of iterations. When the EXIT condition is satisfied the
process exits from the loop.

The General Syntax to write a Simple Loop is:

LOOP

statements;

EXIT;

{or EXIT WHEN condition;}

END LOOP;

These are the important steps to be followed while using Simple Loop.

1) Initialise a variable before the loop body.


2) Increment the variable in the loop.
3) Use a EXIT WHEN statement to exit from the Loop. If you use a EXIT statement
without WHEN condition, the statements in the loop is executed only once.

While Loop

A WHILE LOOP is used when a set of statements has to be executed as long as a


condition is true. The condition is evaluated at the beginning of each iteration. The
iteration continues until the condition becomes false.

The General Syntax to write a WHILE LOOP is:

WHILE <condition>

222
LOOP statements;

END LOOP;

Important steps to follow when executing a while loop:

1) Initialise a variable before the loop body.


2) Increment the variable in the loop.
3) EXIT WHEN statement and EXIT statements can be used in while loops but it's not
done oftenly.

SQL>
SQL> DECLARE
2 v_Calc NUMBER := 0;
3 BEGIN
4 WHILE v_Calc <= 10 LOOP
5 v_Calc := v_Calc + 1;
6 DBMS_OUTPUT.PUT_LINE('The value of v_Calc is ' || v_Calc);
7 END LOOP;
8 END;
9 /
The value of v_Calc is 1
The value of v_Calc is 2
The value of v_Calc is 3
The value of v_Calc is 4
The value of v_Calc is 5
The value of v_Calc is 6
The value of v_Calc is 7
The value of v_Calc is 8
The value of v_Calc is 9
The value of v_Calc is 10
The value of v_Calc is 11

223
PL/SQL procedure successfully completed.

SQL>
SQL>
SQL>
SQL>
SQL> --

FOR Loop

A FOR LOOP is used to execute a set of statements for a predetermined number of


times. Iteration occurs between the start and end integer values given. The counter is
always incremented by 1. The loop exits when the counter reaches the value of the end
integer.

The General Syntax to write a FOR LOOP is:

FOR counter IN val1...val2

LOOP statements;

END LOOP;

 val1 - Start integer value.


 val2 - End integer value.

Important steps to follow when executing a while loop:

1) The counter variable is implicitly declared in the declaration section, so it's not
necessary to declare it
explicitly.
2) The counter variable is incremented by 1 and does not need to be incremented
explicitly.

224
3) EXIT WHEN statement and EXIT statements can be used in FOR loops but it's not
done oftenly.

SQL>
SQL> set serveroutput on
SQL> BEGIN
2 FOR just_a_num IN 1..10
3 LOOP
4 dbms_output.put_line(just_a_num);
5 END LOOP;
6 END;
7 /
1
2
3
4
5
6
7
8
9
10

PL/SQL procedure successfully completed.

SQL>

NOTE: The above Loops are explained with a example when dealing with Explicit
Cursors.

225
Section 3: PL/SQL Trigger

Section objectives:
Upon Completing this section, you would be able to

 Understand trigger
 Familiarized with types of PL/SQL trigger
 Understand when and how to use trigger

Trigger

A trigger is a pl/sql block structure which is fired when a DML statements like Insert,
Delete, Update is executed on a database table. A trigger is triggered automatically when
an associated DML statement is executed.

Syntax of Triggers

The Syntax for creating a trigger is:

CREATE [OR REPLACE ] TRIGGER trigger_name

{BEFORE | AFTER | INSTEAD OF }

{INSERT [OR] | UPDATE [OR] | DELETE}

[OF col_name]

ON table_name

[REFERENCING OLD AS o NEW AS n]

[FOR EACH ROW]

WHEN (condition)

226
BEGIN

--- sql statements

END;

 CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a trigger


with the given name or overwrites an existing trigger with the same name.

 {BEFORE | AFTER | INSTEAD OF } - This clause indicates at what time should the
trigger get fired. i.e for example: before or after updating a table. INSTEAD OF is
used to create a trigger on a view. before and after cannot be used to create a trigger
on a view.

 {INSERT [OR] | UPDATE [OR] | DELETE} - This clause determines the triggering
event. More than one triggering events can be used together separated by OR
keyword. The trigger gets fired at all the specified triggering event.

 [OF col_name] - This clause is used with update triggers. This clause is used when
you want to trigger an event only when a specific column is updated.

 CREATE [OR REPLACE ] TRIGGER trigger_name - This clause creates a trigger


with the given name or overwrites an existing trigger with the same name.

 [ON table_name] - This clause identifies the name of the table or view to which the
trigger is associated.
 [REFERENCING OLD AS o NEW AS n] - This clause is used to reference the old
and new values of the data being changed. By default, you reference the values as
:old.column_name or :new.column_name. The reference names can also be changed
from old (or new) to any other user-defined name. You cannot reference old values
when inserting a record, or new values when deleting a record, because they do not
exist.

227
 [FOR EACH ROW] - This clause is used to determine whether a trigger must fire
when each row gets affected ( i.e. a Row Level Trigger) or just once when the entire
sql statement is executed(i.e.statement level Trigger).

 WHEN (condition) - This clause is valid only for row level triggers. The trigger is
fired only for rows that satisfy the condition specified.

For Example: The price of a product changes constantly. It is important to maintain the
history of the prices of the products.

We can create a trigger to update the 'product_price_history' table when the price of the
product is updated in the 'product' table.

 Create the 'product' table and 'product_price_history' table

CREATE TABLE product_price_history

(product_id number(5),

product_name varchar2(32),

supplier_name varchar2(32),

unit_price number(7,2) );

CREATE TABLE product

(product_id number(5),

product_name varchar2(32),

supplier_name varchar2(32),

unit_price number(7,2) );

228
 Create the price_history_trigger and execute it.

CREATE or REPLACE TRIGGER price_history_trigger

BEFORE UPDATE OF unit_price

ON product

FOR EACH ROW

BEGIN

INSERT INTO product_price_history

VALUES

(:old.product_id,

:old.product_name,

:old.supplier_name,

:old.unit_price);

END;

Lets update the price of a product.

 UPDATE PRODUCT SET unit_price = 800 WHERE product_id = 100

Once the above update query is executed, the trigger fires and updates the
'product_price_history' table.

229
 If you ROLLBACK the transaction before committing to the database, the data
inserted to the table is also rolled back.

Types of PL/SQL Triggers

There are two types of triggers:

1) Row level trigger - An event is triggered for each row upated, inserted or
deleted.
2) Statement level trigger - An event is triggered for each sql statement executed.

PL/SQL Trigger Execution Hierarchy

The following hierarchy is followed when a trigger is fired.


1) BEFORE statement trigger fires first.
2) Next BEFORE row level trigger fires, once for each row affected.
3) Then AFTER row level trigger fires once for each affected row. This event will
alternates between BEFORE and AFTER row level triggers.

4) Finally the AFTER statement level trigger fires.

For Example: Let's create a table 'product_check' which we can use to store messages
when triggers are fired.

CREATE TABLE product

(Message varchar2(50),

Current_Date number(32)

);

Let's create a BEFORE and AFTER statement and row level triggers for the product
table.

230
1) BEFORE UPDATE, Statement Level: This trigger will insert a record into the table
'product_check' before a sql update statement is executed, at the statement level.

CREATE or REPLACE TRIGGER Before_Update_Stat_product

BEFORE

UPDATE ON product

Begin

INSERT INTO product_check

Values('Before update, statement level',sysdate);

END;

2) BEFORE UPDATE, Row Level: This trigger will insert a record into the table
'product_check' before each row is updated.

CREATE or REPLACE TRIGGER Before_Upddate_Row_product

BEFORE

UPDATE ON product

FOR EACH ROW

BEGIN

INSERT INTO product_check

Values('Before update row level',sysdate);

231
END;

3) AFTER UPDATE, Statement Level: This trigger will insert a record into the table
'product_check' after a sql update statement is executed, at the statement level.

CREATE or REPLACE TRIGGER After_Update_Stat_product

AFTER

UPDATE ON product

BEGIN

INSERT INTO product_check

Values('After update, statement level', sysdate);

End;

4) AFTER UPDATE, Row Level: This trigger will insert a record into the table
'product_check' after each row is updated.

CREATE or REPLACE TRIGGER After_Update_Row_product

AFTER

insert On product

FOR EACH ROW

BEGIN

232
INSERT INTO product_check

Values('After update, Row level',sysdate);

END;

Now lets execute a update statement on table product.

UPDATE PRODUCT SET unit_price = 800

WHERE product_id in (100,101);

Lets check the data in 'product_check' table to see the order in which the trigger is fired.

SELECT * FROM product_check;

Output:

Mesage Current_Date

------------------------------------------------------------

Before update, statement level 26-Nov-2008


Before update, row level 26-Nov-2008
After update, Row level 26-Nov-2008
Before update, row level 26-Nov-2008
After update, Row level 26-Nov-2008
After update, statement level 26-Nov-2008

The above result shows 'before update' and 'after update' row level events have occured
twice, since two records were updated. But 'before update' and 'after update' statement
level events are fired only once per sql statement.

233
The above rules apply similarly for INSERT and DELETE statements.

How to know Information about Triggers.

We can use the data dictionary view 'USER_TRIGGERS' to obtain information about
any trigger.

The below statement shows the structure of the view 'USER_TRIGGERS'

DESC USER_TRIGGERS;

NAME Type

--------------------------------------------------------

TERIGGR_NAME VARCHAR2(30)
TRIGGER_TYPE VARCHAR2(16)
TRIGGER_EVENT VARCHAR2(75)
TABLE_OWNER VARCHAR2(30)
BASE_OBJECT_TYPE VARCHAR2(16)
TABLE_NAME VARCHAR2(30)
COLUMN_NAME VARCHAR2(4000)
REFERENCING_NAMES VARCHAR2(128)
WHEN_CLAUSE VARCHAR2(4000)
STATUS VARCHAR2(8)
DESCRIPTION VARCHAR2(4000)
ACTION_TYPE VARCHAR2(11)
TRIGGER_BODY LONG

This view stores information about header and body of the trigger.

SELECT * FROM user_triggers WHERE trigger_name =


'Before_Update_Stat_product';

234
The above sql query provides the header and body of the trigger
'Before_Update_Stat_product'.

You can drop a trigger using the following command.

DROP TRIGGER trigger_name;

Section 4: Stored procedure and Function in PL/SQL

Section objectives:
Upon Completing this section, you would be able to

 Understand stored procedure and function in PL/SQL


 Execute stored procedure and function in PL/SQL.
 Use parameters in stored procedure and function in PL/SQL

Stored Procedure

A stored procedure or in simple a proc is a named PL/SQL block which performs one
or more specific task. This is similar to a procedure in other programming languages. A
procedure has a header and a body. The header consists of the name of the procedure and
the parameters or variables passed to the procedure. The body consists or declaration
section, execution section and exception section similar to a general PL/SQL Block. A
procedure is similar to an anonymous PL/SQL Block but it is named for repeated usage.

In stored procedure we can use parameters in three ways:.

1) IN-parameters
2) OUT-parameters
3) IN OUT-parameters

A procedure may or may not return any value.

General Syntax to create a procedure is:

235
CREATE [OR REPLACE] PROCEDURE proc_name [list of parameters]

IS

Declaration section

BEGIN

Execution section

EXCEPTION

Exception section

END;

IS - marks the beginning of the body of the procedure and is similar to DECLARE in
anonymous PL/SQL Blocks. The code between IS and BEGIN forms the Declaration
section.

The syntax within the brackets [ ] indicate they are optional. By using CREATE OR
REPLACE together the procedure is created if no other procedure with the same name
exists or the existing procedure is replaced with the current code.

The following example illustrate how to creates a procedure „employer_details‟ which


gives the details of the employee.

1> CREATE OR REPLACE PROCEDURE employer_details

2> IS

3> CURSOR emp_cur IS

4> SELECT first_name, last_name, salary FROM emp_tbl;

5> emp_rec emp_cur%rowtype;

236
6> BEGIN

7> FOR emp_rec in sales_cur

8> LOOP

9> dbms_output.put_line(emp_cur.first_name || ' ' ||emp_cur.last_name

10> || ' ' ||emp_cur.salary);

11> END LOOP;

12>END;

13> /

How to execute a Stored Procedure?

Execute a procedure is possible in two different ways:.

1) From the SQL prompt.

EXECUTE [or EXEC] procedure_name;

2) Within another procedure – simply use the procedure name.

procedure_name;

NOTE: In the examples given above, we are using backward slash „/‟ at the end of the
program. This indicates the oracle engine that the PL/SQL program has ended and it can
begin processing the statements.

237
Function in PL/SQL

A function is a named PL/SQL Block which is similar to a procedure. The major


difference between a procedure and a function is, a function must always return a value,
but a procedure may or may not return a value.

The General Syntax to create a function is:

CREATE [OR REPLACE] FUNCTION function_name [parameters]

RETURN return_datatype;

IS

Declaration_section

BEGIN

Execution_section

Return return_variable;

EXCEPTION

exception section

Return return_variable;

END;

Return Type: The header section defines the return type of the function. The return
datatype can be any of the oracle datatype like varchar, number etc.

238
The execution and exception section both should return a value which is of the
datatype defined in the header section.

The following example illustrate ho to create a function called ''employer_details_func'


similar to the one created in stored proc

1> CREATE OR REPLACE FUNCTION employer_details_func

2> RETURN VARCHAR(20);

3> IS

5> emp_name VARCHAR(20);

6> BEGIN

7> SELECT first_name INTO emp_name

8> FROM emp_tbl WHERE empID = '100';

9> RETURN emp_name;

10> END;

11> /

In the example we are retrieving the „first_name‟ of employee with empID 100 to
variable „emp_name‟.
The return type of the function is VARCHAR which is declared in line no 2.
The function returns the 'emp_name' which is of type VARCHAR as the return value in
line no 9.

239
How to execute a PL/SQL Function?

Execute a function is possible in two different ways:.

1) Since a function returns a value we can assign it to a variable.

employee_name := employer_details_func;

If „employee_name‟ is of datatype varchar we can store the name of the employee by


assigning the return type of the function to it.

2) As a part of a SELECT statement

SELECT employer_details_func FROM dual;

3) In a PL/SQL Statements like,

dbms_output.put_line(employer_details_func);

This line displays the value returned by the function.

Passing Parameters in Procedure and Functions

In PL/SQL, we can pass parameters to procedures and functions in three ways.

1) IN type parameter: These types of parameters are used to send values to stored
procedures.
2) OUT type parameter: These types of parameters are used to get values from stored
procedures. This is similar to a return type in functions.
3) IN OUT parameter: These types of parameters are used to send values and get
values from stored procedures.

NOTE: If a parameter is not explicitly defined a parameter type, then by default it is an


IN type parameter.

240
1) IN parameter:

This is similar to passing parameters in programming languages. We can pass values to


the stored procedure through these parameters or variables. This type of parameter is a
read only parameter. We can assign the value of IN type parameter to a variable or use it
in a query, but we cannot change its value inside the procedure.

The General syntax to pass a IN parameter is

CREATE [OR REPLACE] PROCEDURE procedure_name (

param_name1 IN datatype, param_name12 IN datatype ... )

 param_name1, param_name2... are unique parameter names.


 datatype - defines the datatype of the variable.
 IN - is optional, by default it is a IN type parameter.

2) OUT Parameter:

The OUT parameters are used to send the OUTPUT from a procedure or a function. This
is a write-only parameter i.e, we cannot pass values to OUT paramters while executing
the stored procedure, but we can assign values to OUT parameter inside the stored
procedure and the calling program can recieve this output value.

The General syntax to create an OUT parameter is

CREATE [OR REPLACE] PROCEDURE proc2 (param_name OUT datatype)

The parameter should be explicitly declared as OUT parameter.

241
3) IN OUT Parameter:

The IN OUT parameter allows us to pass values into a procedure and get output values
from the procedure. This parameter is used if the value of the IN parameter can be
changed in the calling program.

By using IN OUT parameter we can pass values into a parameter and return a value to
the calling program using the same parameter. But this is possible only if the value
passed to the procedure and output value have a same datatype. This parameter is used if
the value of the parameter will be changed in the procedure.

The General syntax to create an IN OUT parameter is

CREATE [OR REPLACE] PROCEDURE proc3 (param_name IN OUT datatype)

The below examples show how to create stored procedures using the above three types
of parameters.

Example1:

Using IN and OUT parameter:

Let‟s create a procedure which gets the name of the employee when the employee id is
passed.

1> CREATE OR REPLACE PROCEDURE emp_name (id IN NUMBER, emp_name


OUT NUMBER)

2> IS

3> BEGIN

4> SELECT first_name INTO emp_name

5> FROM emp_tbl WHERE empID = id;

242
6> END;

7> /

We can call the procedure „emp_name‟ in this way from a PL/SQL Block.

1> DECLARE

2> empName varchar(20);

3> CURSOR id_cur SELECT id FROM emp_ids;

4> BEGIN

5> FOR emp_rec in id_cur

6> LOOP

7> emp_name(emp_rec.id, empName);

8> dbms_output.putline('The employee ' || empName || ' has id ' || emp-rec.id);

9> END LOOP;

10> END;

11> /

In the above PL/SQL Block in line no 3; we are creating a cursor „id_cur‟ which
contains the employee id.
In line no 7; we are calling the procedure „emp_name‟, we are passing the „id‟ as IN
parameter and „empName‟ as OUT parameter. In line no 8; we are displaying the id and
the employee name which we got from the procedure „emp_name‟.

Example 2:

243
Using IN OUT parameter in procedures:

1> CREATE OR REPLACE PROCEDURE emp_salary_increase

2> (emp_id IN emptbl.empID%type, salary_inc IN OUT emptbl.salary%type)

3> IS

4> tmp_sal number;

5> BEGIN

6> SELECT salary

7> INTO tmp_sal

8> FROM emp_tbl

9> WHERE empID = emp_id;

10> IF tmp_sal between 10000 and 20000 THEN

11> salary_inout := tmp_sal * 1.2;

12> ELSIF tmp_sal between 20000 and 30000 THEN

13> salary_inout := tmp_sal * 1.3;

14> ELSIF tmp_sal > 30000 THEN

15> salary_inout := tmp_sal * 1.4;

16> END IF;

17> END;

18> /

244
The below PL/SQL block shows how to execute the above 'emp_salary_increase'
procedure.

1> DECLARE

2> CURSOR updated_sal is

3> SELECT empID,salary

4> FROM emp_tbl;

5> pre_sal number;

6> BEGIN

7> FOR emp_rec IN updated_sal LOOP

8> pre_sal := emp_rec.salary;

9> emp_salary_increase(emp_rec.empID, emp_rec.salary);

10> dbms_output.put_line('The salary of ' || emp_rec.empID ||

11> ' increased from '|| pre_sal || ' to '||emp_rec.salary);

12> END LOOP;

13> END;

14> /

245
Section 5: Exception handling in PL/SQL

Section objectives:
Upon Completing this section, you would be able to

 Understand what is exception handling


 Different types of exception handling.
 Identify structure of exception handling

Exception Handling

PL/SQL provides a feature to handle the Exceptions which occur in a PL/SQL Block
known as exception Handling. Using Exception Handling we can test the code and avoid
it from exiting abruptly. When an exception is occurs, messages which explain its cause
is received.
PL/SQL Exception message consists of three parts.

 Type of Exception
 An Error Code
 A message

By handling the exceptions we can ensure a PL/SQL block does not exit abruptly.

The General Syntax for coding the exception section

DECLARE

Declaration section

BEGIN

Exception section

EXCEPTION

246
WHEN ex_name1 THEN

-Error handling statements

WHEN ex_name2 THEN

-Error handling statements

WHEN Others THEN

-Error handling statements

END;

General PL/SQL statements can be used in the Exception Block.

When an exception is raised, Oracle searches for an appropriate exception handler in the
exception section. For example in the above example, if the error raised is 'ex_name1 ',
then the error is handled according to the statements under it. Since, it is not possible to
determine all the possible runtime errors during testing for the code, the 'WHEN Others'
exception is used to manage the exceptions that are not explicitly handled. Only one
exception can be raised in a Block and the control does not return to the Execution
Section after the error is handled.

Nested PL/SQL blocks.

DELCARE

Declaration section

BEGIN

DECLARE

Declaration section

247
BEGIN

Execution section

EXCEPTION

Exception section

END;

EXCEPTION

Exception section

END;

In the above case, if the exception is raised in the inner block it should be handled in the
exception block of the inner PL/SQL block else the control moves to the Exception
block of the next upper PL/SQL Block. If none of the blocks handle the exception the
program ends abruptly with an error.

Types of Exception handling.

There are 3 types of Exceptions.

 Named System Exceptions


 Unnamed System Exceptions
 User-defined Exceptions

Named System Exceptions

System exceptions are automatically raised by Oracle, when a program violates a


RDBMS rule. There are some system exceptions which are raised frequently, so they are
pre-defined and given a name in Oracle which is known as Named System Exceptions.

248
For example: NO_DATA_FOUND and ZERO_DIVIDE are called Named System
exceptions.

Named system exceptions are caused by:

 Not Declared explicitly,


 Raised implicitly when a predefined Oracle error occurs
 Caught by referencing the standard name within an exception-handling routine.

Exception Name Reason Error


Number

CURSOR_ALREADY_OPEN When you open a cursor that is already open. ORA-


06511

INVALID_CURSOR When you perform an invalid operation on a ORA-


cursor like closing a cursor, fetch data from a 01001
cursor that is not opened.

NO_DATA_FOUND When a SELECT...INTO clause does not ORA-


return any row from a table. 01403

TOO_MANY_ROWS When you SELECT or fetch more than one ORA-


row into a record or variable. 01422

ZERO_DIVIDE When you attempt to divide a number by ORA-


zero. 01476

For Example: Suppose a NO_DATA_FOUND exception is raised in a proc, we can


write a code to handle the exception as given below.

BEGIN

Execution section

EXCEPTION

249
WHEN NO_DATA_FOUND THEN

dbms_output.put_line ('A SELECT...INTO did not return any row.');

END;

Unnamed System Exceptions

Those system exception for which oracle does not provide a name is known as unnamed
system exception. These exceptions do not occur frequently. These Exceptions have a
code and an associated message.

There are two ways to handle unnamed system exceptions:


1. By using the WHEN OTHERS exception handler, or
2. By associating the exception code to a name and using it as a named exception.

We can assign a name to unnamed system exceptions using EXCEPTION_INIT.


EXCEPTION_INIT will associate a predefined Oracle error number to a programmer
defined exception name.

Steps to be followed to use unnamed system exceptions are:

 They are raised implicitly.


 If they are not handled in WHEN others they must be handled explicitly.
 To handle the exception explicitly, they must be declared using
EXCEPTION_INIT as given above and handled referencing the user-defined
exception name in the exception section.

The general syntax to declare unnamed system exception using EXCEPTION_INIT is:

DECLARE

exception_name EXCEPTION;

PRAGMA

250
EXCEPTION_INIT (exception_name, Err_code);

BEGIN

Execution section

EXCEPTION

WHEN exception_name THEN

handle the exception

END;

For Example: Lets consider the product table and order_items table from sql joins.

Here product_id is a primary key in product table and a foreign key in order_items table.
If we try to delete a product_id from the product table when it has child records in
order_id table an exception will be thrown with oracle code number -2292.
We can provide a name to this exception and handle it in the exception section as given
below.

DECLARE

Child_rec_exception EXCEPTION;

PRAGMA

EXCEPTION_INIT (Child_rec_exception, -2292);

BEGIN

Delete FROM product where product_id= 104;

EXCEPTION

251
WHEN Child_rec_exception

THEN Dbms_output.put_line('Child records are present for this product_id.');

END;

User-defined Exceptions

Apart from system exceptions we can explicitly define exceptions based on business
rules. These are known as user-defined exceptions.

Steps to be followed to use user-defined exceptions:

 They should be explicitly declared in the declaration section.


 They should be explicitly raised in the Execution Section.
 They should be handled by referencing the user-defined exception name in the
exception section.

For Example: Lets consider the product table and order_items table from sql joins to
explain user-defined exception.
Lets create a business rule that if the total no of units of any particular product sold is
more than 20, then it is a huge quantity and a special discount should be provided.

DECLARE

huge_quantity EXCEPTION;

CURSOR product_quantity is

SELECT p.product_name as name, sum(o.total_units) as units

FROM order_tems o, product p

252
WHERE o.product_id = p.product_id;

quantity order_tems.total_units%type;

up_limit CONSTANT order_tems.total_units%type := 20;

message VARCHAR2(50);

BEGIN

FOR product_rec in product_quantity LOOP

quantity := product_rec.units;

IF quantity > up_limit THEN

message := 'The number of units of product ' || product_rec.name ||

' is more than 20. Special discounts should be provided.

Rest of the records are skipped. '

RAISE huge_quantity;

ELSIF quantity < up_limit THEN

v_message:= 'The number of unit is below the discount limit.';

END IF;

dbms_output.put_line (message);

END LOOP;

EXCEPTION

WHEN huge_quantity THEN

253
dbms_output.put_line (message);

END;

RAISE_APPLICATION_ERROR ( )

RAISE_APPLICATION_ERROR is a built-in procedure in oracle which is used to


display the user-defined error messages along with the error number whose range is in
between -20000 and -20999.

Whenever a message is displayed using RAISE_APPLICATION_ERROR, all previous


transactions which are not committed within the PL/SQL Block are rolled back
automatically (i.e. change due to INSERT, UPDATE, or DELETE statements).

RAISE_APPLICATION_ERROR raises an exception but does not handle it.

RAISE_APPLICATION_ERROR is used for the following reasons,


a) to create a unique id for an user-defined exception.
b) to make the user-defined exception look like an Oracle error.

The General Syntax to use this procedure is:

RAISE_APPLICATION_ERROR (error_number, error_message);

 The Error number must be between -20000 and -20999


 The Error_message is the message you want to display when the error occurs.

Steps to be followed to use RAISE_APPLICATION_ERROR procedure:

 Declare a user-defined exception in the declaration section.


 Raise the user-defined exception based on a specific business rule in the
execution section.

254
 Finally, catch the exception and link the exception to a user-defined error number
in
 RAISE_APPLICATION_ERROR. Using the above example we can display an
error message using RAISE_APPLICATION_ERROR.

DECLARE

huge_quantity EXCEPTION;

CURSOR product_quantity is

SELECT p.product_name as name, sum(o.total_units) as units

FROM order_tems o, product p

WHERE o.product_id = p.product_id;

quantity order_tems.total_units%type;

up_limit CONSTANT order_tems.total_units%type := 20;

message VARCHAR2(50);

BEGIN

FOR product_rec in product_quantity LOOP

quantity := product_rec.units;

IF quantity > up_limit THEN

RAISE huge_quantity;

ELSIF quantity < up_limit THEN

v_message:= 'The number of unit is below the discount limit.';

255
END IF;

Dbms_output.put_line (message);

END LOOP;

EXCEPTION

WHEN huge_quantity THEN

raise_application_error(-2100, 'The number of unit is above the discount limit.');

END;

256
UNIT SEVEN
FUNCTIONAL DEPENDENCE AND NORMALIZATION
Objective:

After working through this unit, you should be able to:

 Understand functional dependence


 Identify different types of normalization and its application

Introduction

Database normalization can essentially be defined as the practice of optimizing table structures.
Optimization is accomplished as a result of a thorough investigation of the various pieces of data
that will be stored within the database, in particular concentrating upon how this data is
interrelated. An analysis of this data and its corresponding relationships is advantageous because
it can result both in a substantial improvement in the speed in which the tables are queried, and
in decreasing the chance that the database integrity could be compromised due to tedious
maintenance procedures.

So In this unit, you will learn how to improve the database through normalization

Section 1 Functional Dependency and normalization

Section objectives:

Upon Completing this section, you would be able to:

 Identify the existence of dependence between/among entities


 Familiarized with different types of normalization.
 Know when how to apply a given normalization

257
Functional Dependency and normailzation

Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminating redundant data (for example, storing the same data in
more than one table) and ensuring data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.

In otherworld normalization is a process that serves to reduce, if not eliminate, redundancy,


inaccuracy, consistency and concurrency of data.

Functional Dependency and Determinants

Before we develop the ideas of normalization further, it is important for you to have an
understanding of "functional dependency." The essence of this idea is that if the existence of
something, call it A, implies that B must exist and have a certain value, then we say that "B is
functionally dependent on A." We also often express this idea by saying that "A determines B,"
or that "B is a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the statement, "If A, then B." It
is important to note that the value B must be unique for a given value of A, i.e., any given value
of A must imply just one and only one value of B, in order for the relationship to qualify for the
name "function." (However, this does not necessarily prevent different values of A from
implying the same value of B.)

For the terminology of relational databases, the word "function" was borrowed from
mathematics, where it is common to say things like "y is a function of x" or "y = f(x)". (The
latter expression is read "y equals f of x".) The determining value, x, is called the argument; the
determined value, y or f(x), is called the result.

The expression "y = f(x)" is a very general, and abstract, way of talking about functionality.
Outside of mathematics--and, in particular, ordinarily in relational database management--we
talk not abstractly but in terms of particular examples. (Indeed, the general idea of a "function" is
best understood when one has seen enough examples of specific functions to be able to start
generalizing about the abstract, or general, properties that the specific functions share.)

258
Here are some examples of functions. An easy one is y = x2. This particular function says that if
we are given a particular value for x, say 3, then we must say that y has the value 9. (We could
also write y = f(x) = x2 or just f(x) = x2.) Another easy one is: y = x3. This particular function
says that if we are given a particular value for x, say -2, then we must say that y has the value -8.

A common way of indicating functions is to place the determining and determined values side by
side in a table. Thus we can place sample values of the function, y = x2, in a table like the one
shown here.

Value of x ("argument," or "A") Value of y = x2 ("the function," or "the


result", or "B")
3 9
4 16
-3 9

The above table shows just three of the infinity of possible pairs of values, x and y, for the
function y = x2. It also shows that for some functions, different values of x (here, 3 and -3) imply
the same value (here, 9) of the function.

The functions we have given as examples so far have been functions that are specified by an
algebraic function. But the idea of function is more general; i.e., functions need not be
algebraically defined. The essence of the idea of function is that to a specified determining value
corresponds a unique determined value. This essence can be defined, among other ways, by
placing the determining and determined values in a table that displays and/or defines the
relationship between the argument and the result.
Note that the table above displays, but does not fully define, the relationship, y = x2. This
function, since it has an infinite number of pairs of values, cannot be fully defined in a table. For
functions that involve only a finite number of pairs of values of argument and result, a table is
often a convenient way--and may in fact be the only way--of displaying and, at the same time,
defining the function.

In general, a functional dependency is a relationship among attributes. In relational databases, we


can have a determinant that governs one other attribute or several other attributes. To go back to
our mathematical examples for a moment, we could view the situation of functional dependency
of several attributes on one determinant as being like having several linked functions that share

259
an argument and can be displayed economically in just one table. For example, consider the

Value of x Value of x2 Value of x3 Value of x4


3 9 27 81
4 16 64 256
-3 9 -64 81
2 3
following table that displays sample values of the algebraic functions y = x , y = x , and y = x4.

Looking the below table from the relational-database point of view, we can say that the attributes
x2, x3, and x4 are all functionally dependent on the attribute x.

Now we ready to come to grips with the ideas of normalization. The following table, containing
information about some students at Jimma University, is a table that is in 1st Normal Form,
1NF.

Table 1

ID No First name Second name Major


Reg/234/2001 Abebe Solomon Medical science
Reg/111/2001 kebede Alemu Mechanical eng.
Reg/330/2001 Aster Tesfaye Law
Reg/567/2001 Dejegn Temame Law
Reg/291/2001 Aster Solomon Medical

You can easily verify for yourself that the above table satisfies the definition of 1NF: viz., it has
no duplicated rows; each cell is single-valued (i.e., there are no repeating groups or arrays); and
all the entries in a given column are of the same kind.

In Table 1 we can see that the key, ID No, functionally determines the other attributes; i.e., a
given Id number implies (determines) a particular value for each of the attributes FirstName,
LastName, and Major (assuming, at least for the moment, that a student is allowed to have only
one major). In the arrow notation: ID No ? FirstName, SSN ? LastName, and SSN ? Major.

A key attribute will, by the definition of key, uniquely determine the values of the other
attributes in a table; i.e., all non-key attributes in a table will be functionally dependent on the

260
key. But there may be non-key attributes in a table that determine other attributes in that table.
Consider the following table containing information about some students at Enormous State
University,:

Table 2

First name Second name Major Level


Jack Jones LIS Graduate
Lynn Lee LIS Graduate
Marry Ruize Pre-medicine Undergraduate
Lynn Smith Pre-Law Undergraduate
Jane Jones LIS Graduate

In Table 2 above the Level attribute can be said to be functionally dependent on the Major
attribute. Thus we have an example of an attribute that is functionally dependent on a non-key
attribute. This statement is true in the table above table, and that is all that the definition of
functional dependence requires; but the statement also reflects the real-world fact that LIS is a
major that is open only to graduate students and that Pre-Medicine and Pre-Law are majors that
are open only to undergraduate students.

The 2nd Normal Form (2NF)

Table 2 above has another interesting aspect. Its key is a composite key, consisting of the paired
attributes, FirstName and LastName. The Level attribute is functionally dependent on this
composite key, of course; but, in addition, Level can be seen to be dependent on only the
attribute LastName. (This is true because each value of Level is paired with a distinct value of
LastName. In contrast, there are two occurrences of the value Lynn for the attribute FirstName,
and the two Lynns are paired with different values of Level, so Level is not functionally
dependent on FirstName.) Thus this table fails to qualify as a 2nd Normal Form table, since the
definition of 2NF requires that all non-key attributes be dependent on all of the key. (Admittedly,
this example of a partial dependency is artificially contrived, but nevertheless it illustrates the
problem of partial dependency.)

We can turn Table 2 above into a table in 2NF in an easy way, by adding a column for the Social
Security Number (SSN) which will then be the natural thing to use as the key.

261
Table 3

SSN First name Second name Major Level


123-45-6789 Jack Jones LIS Graduate
222-33-4444 Lynn Lee LIS Graduate
987-65-4321 Marry Ruize Pre-medicine Undergraduate
123-54-3210 Lynn Smith Pre-Law Undergraduate
111-33-5555 Jane Jones LIS Graduate

With the SSN defined as the key, Table 3 is in 2NF, as you can easily verify. This illustrates the
fact that any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.

Table 3 above still exhibits some problems, however. For example, it contains some repeated
information about the LIS-Graduate pairing.

Anomalies and Normalization

At this point it is appropriate to note that the main thrust behind the idea of normalizing
databases is the avoidance of insertion and deletion anomalies in databases.

To illustrate the idea of anomalies, consider what would happen to our knowledge (at least, as
explicitly contained in a table) of the level of the major, Pre-Medicine, if Mary Ruiz left
Enormous State University. With the deletion of the row for Ms. Ruiz, we would lose the
information that Pre-Medicine is an Undergraduate major. This is an example of a deletion
anomaly. We may possess the real-world information that Pre-Medicine is an Undergraduate
major, but no such information is explicitly contained in a table in our database.

As an example of an insertion anomaly, we can suppose that a new student wants to enroll in
ESU: e.g., suppose Jane Doe wants to major in Public Affairs. From the information in Table 3
above we cannot tell whether Public Affairs is an Undergraduate or a Graduate major; in fact, we
do not even know whether Public Affairs is an established major at ESU. We do not know
whether it is permissible to insert the value, Public Affairs, as a value of the attribute, Major, or
what to insert for the attribute, Level, if we were to assume that Public Affairs is a valid value
for Major. The point is that while we may possess real-world information about whether Public

262
Affairs is a major at ESU and what its level is, this information is not explicitly contained in any
table that we have thus far mentioned as part of our database.

A database-management system, a DBMS, can work only with the information that we put
explicitly into its tables for a given database and into its rules for working with those tables,
where such rules are appropriate and possible.

How do anomalies relate to normalization? The simple answer is that by arranging that the tables
in a database are sufficiently normalized (in practice, this typically means to at least the 4th level
of normalization), we can ensure that anomalies will not arise in our database. Anomalies are
difficult to avoid directly, because with databases of typical complexity (i.e., several tables) the
database designer can easily overlook possible problems. Normalization offers a rigorous way of
avoiding unrecognized anomalies.

Normalization may look like a difficult process when one views it from the standpoint of the
formal definitions of the various normal forms. But in practice, you can easily attain sufficient
normalization in your database by simply ensuring that the tables in your database are what we
can call "single-theme" tables. This idea will be illustrated as we proceed through the rest of the
discussion in this handout.

Turning a Table with Anomalies (Table 3 above ) into Single-Theme Tables

Although Table 3 is in 2NF, it is still open to the problems of insertion and deletion anomalies,
as the discussion in the preceding section shows. The reason is that Table 3 deals with more than
a single theme. What can we do to turn it into a set of tables that are, or at least come closer to
being, single-theme tables?

A reasonable way to proceed is to note that Table 3 deals with both information about students
(their names and SSNs) and information about majors and levels. This should strike you as two
different themes. Presented below is one possible set of single-theme tables dealing with the
information in Table 3 (To save space, the following tables also contain some information that is
not in Table 3 and the discussion will deal with this added information.)

Table 4

263
SSN First name Second name
123-45-6789 Jack Jones
222-33-4444 Lynn Lee
987-65-4321 Marry Ruize
123-54-3210 Lynn Smith
111-33-5555 Jane Jones

Table 5

Major Level
LIS Graduate
LIS Graduate
Pre-medicine Undergraduate
Pre-Law Undergraduate
LIS Graduate

Table 6

SSN Major
123-45-6789 LIS
222-33-4444 LIS
987-65-4321 Pre-medicine
123-54-3210 Pre-Law
111-33-5555 LIS

The three preceding tables should strike you as providing a better arrangement of the information
in Table 3 above. For one thing, this arrangement puts the information about the students into a
smaller table, Table 4, which happily fails to contain redundant information about the LIS-
Graduate pairing. For another thing, this arrangement permits us to enter information about
students (e.g., Newton Gingpoor) who have not yet identified themselves as pursuing a particular
major. For still another thing, it puts the information about the Major-Level pairings into a
separate table, Table 5 above , which can easily be expanded to include information (e.g., that
the Public Affairs major is at the Graduate level) about majors for which, at the moment, there
may be no students registered. Finally, Table 6 above provides the needed link between

264
individual students and their majors (note that Newton Gingpoor's SSN is not in this Table 6,
which tells us that he has not yet selected a major).

Tables 4 - 6 are single-theme tables and are in 2NF, as you can easily verify. (In fact, they are in
DKNF, but we are not yet ready to discuss the latter level in detail.).

The 3rd Normal Form (3NF)

In order to discuss the 3rd Normal Form, we need to begin by discussing the idea of transitive
dependencies.

In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C." An example is: "If John Doe is a human,
and if every human is a primate, then John Doe must be a primate." Another way of putting it is
this: "If A functionally governs B, and if B functionally governs C, then A functionally governs
C." In the arrow notation, we have:

[(A ? B) and (B ? C)] ? (A ? C)]

The following table, Table 7, provides an example of how transitive dependencies can occur in a
table in a relational database.

Author Author First Book Title Subject Collection or Building


Last Name Name Library
Berdahl Robert The Politics of the History PCL General Perry-
Prussian Nobility Stacks CastaLibrary
Yudof Mark Child Abuse and Legal Law Library Townes Hall
Neglect Procedures
Harmon Glynn Human Memory and Cognitive PCL General Perry-
Knowledge Psychology Stacks CastaLibrary
Graves Robert The Golden Fleece Greek Classics Waggener
Literature Library Hall
Miksa Francis Charles Ammi Cutter Library Library and Perry-
Biography Information CastaLibrary
Science

265
Collection
Hunter David Music Publishing and Music Fine Arts Fine Arts
Collecting Literature Library Building
Graves Robert English and Scottish Folksong PCL General Perry-
Ballads Stacks CastaLibrary

By examining Table 7 above we can infer that books dealing with history, cognitive psychology,
and folksong are assigned to the PCL General Stacks collection; that books dealing with legal
procedures are assigned to the Law Library; that books dealing with Greek literature are assigned
to the Classics Library; that books dealing with library biography are assigned to the Library and
Information Science Collection (LISC);and that books dealing with music literature are assigned
to the Fine Arts Library.

Further, we can infer that the PCL General Stacks collection and the LISC are both housed in the
Perry-Casta Library (PCL) building; that the Classics Library is housed in Waggener Hall; and
that the Law Library and Fine Arts Library are housed, respectively, in Townes Hall and the Fine
Arts Building.

Thus we see that there is a transitive dependency in Table 7 above: any book that deals with
history, cognitive psychology, or library biography will be physically housed in the PCL
building (unless it is temporarily checked out to a borrower); any book dealing with legal
procedures will be housed in Townes Hall; and so on. In short, if we know what subject a book
deals with, we also know not only what library or collection it will be assigned to but also what
building it is physically housed in.

What is wrong with having a transitive dependency or dependencies in a table? For one thing,
there is duplicated information: from three different rows we can see that the PCL General
Stacks are in the PCL building. For another thing, we have possible deletion anomalies: if the
Yudof book were lost and its row removed from Table 7 above, we would lose the information
that books on legal procedures are assigned to the Law Library and also the information the Law
Library is in Townes Hall. As a third problem, we have possible insertion anomalies: if we
wanted to add a chemistry book to the table, we would find that Table 7 above nowhere contains
the fact that the Chemistry Library is in Robert A.Welch Hall. As a fourth problem, we have the
chance of making errors in updating: a careless data-entry clerk might add a book to the LISC
but mistakenly enter Townes Hall in the building column.

266
The solution to the problem is, once again, to place the information in Table 7 above into
appropriate single-theme tables. Here is one such possible arrangement:

Table 8

Author Last Author First Name Book Title


Name
Berdahl Robert The Politics of the Prussian
Nobility
Yudof Mark Child Abuse and Neglect
Harmon Glynn Human Memory and
Knowledge
Graves Robert The Golden Fleece
Miksa Francis Charles Ammi Cutter
Hunter David Music Publishing and
Collecting
Graves Robert English and Scottish Ballads

Table 9

Book Title Subject


The Politics of the Prussian History
Nobility
Child Abuse and Neglect Legal
Procedures
Human Memory and Cognitive
Knowledge Psychology
The Golden Fleece Greek
Literature
Charles Ammi Cutter Library
Biography
Music Publishing and Music
Collecting Literature
English and Scottish Ballads Folksong

267
Table 10

Subject Collection or Library

History PCL General Stacks

Legal Procedures Law Library


Cognitive Psychology PCL General Stacks
Greek Literature Classics Library
Library Biography Library and Information Science Collection
Music Literature Fine Arts Library
Folksong PCL General Stacks

Table 11

Collection or Library Building


PCL General Stacks Perry-CastaLibrary

Law Library Townes Hall


PCL General Stacks Perry-CastaLibrary
Classics Library Waggener Hall

Library and Information Science Perry-CastaLibrary


Collection
Fine Arts Library Fine Arts Building
PCL General Stacks Perry-CastaLibrary

Table 12

Collection or Library Building


PCL General Stacks Perry-CastaLibrary

Law Library Townes Hall


PCL General Stacks Perry-CastaLibrary

Classics Library Waggener Hall

Library and Information Perry-CastaLibrary


Science Collection

268
Fine Arts Library Fine Arts Building
PCL General Stacks Perry-CastaLibrary

You can verify for yourself that none of these tables contains a transitive dependency; hence, all
of them are in 3NF (and, in fact, in DKNF).

We can note in passing that the fact that Table 8 contains the first and last names of Robert
Graves in two different rows suggests that it might be worthwhile to replace it with two further
tables, along the lines of:

Table 13

Author Last Name Author First Name Author Identification Number

Berdahl Robert 001


Yudof Mark 002
Harmon Glynn 003
Graves Robert 004
Miksa Francis 005
Hunter David 006

Table 14

Author Identification Number Book Title


001 The Politics of the Prussian
Nobility
002 Child Abuse and Neglect
003 Human Memory and
Knowledge
004 The Golden Fleece
005 Charles Ammi Cutter
006 Music Publishing and
Collecting
004 English and Scottish Ballads

269
Though Tables 13 and 14 together take a little more space than Table 8, it is easy to see that
given a much larger collection, in which there would be many more authors with multiple works
to their credit, Tables 13 and 14 would be more economical of storage space than Table 8.
Furthermore, the structure of Tables 13 and 14 lessens the chance of making updating errors
(e.g., typing Grave instead of Graves, or Miska instead of Miksa).

270
Summary of Normalization

First Normal Form (1st NF)

• The table cells must be of single value.


• Eliminate repeating groups in individual tables.
• Create a separate table for each set of related data.
• Identify each set of related data with a primary key.

Do not use multiple fields in a single table to store similar data. For example, to track an
inventory item that may come from two possible sources, an inventory record may contain fields
for

Vendor Code 1 and Vendor Code 2. But what happens when you a dd a third vendor? Adding a
field is not the answer; it requires program and table modifications and does not smoothly
accommodate a dynamic number of vendors. Instead, place all vendor information in a separate
table called Vendors, then link inventory t o vendors with an item number key, or vendors to
inventory with a vendor code key.

Second Normal Form (2nd NF)

If it's in 1st NF and if the Primary Key is composite (multiple columns) then any fact in that
table must be a fact about the entire composite Primary Key not just part of the Primary Key.

For example, if an inventory table has a primary key made up of two attributes PartId and
WarehouseId. Suppose the inventory table has the warehouse address in it, since warehouse
address is a fact about WarehoseId and not about the PartId the w arehouse address is in the
wrong table. This is in violation of the 2nd Normal Form.

Third Normal Form (3rd NF)

• If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other
than the Primary Key. Suppose in the Student table you had student birth date as an attribute and
you also had student's age. Students age depends on the student's birth date (a fact about his/her
birth date) so 3rd Normal Form is violated.

271

You might also like