Database Module
Database Module
Structures
1
Introduction
All of us are familiar with the term data. In fact, unknowingly we come across data in our day to
day life every day. The age of a person, price of potato, number of students in a school, pin code
of a city, etc. are some examples of data. In our life we have to remember so much of data. But it
is easier for us to remember all information for a few individuals. For example, you may be in a
position to tell accurately the age, height, complexion, income, educational qualification,
residential address, etc. of your close friends. But it is too difficult for you to memorize all these
information for a large number of individuals. Let us consider the example of National Open
School (NOS). Every year about one lakh students take admission in NOS. If you are asked to
memorize records of date of birth, subjects offered and postal address of all these students, it will
not be possible for you.
To deal with such problems we construct a database. We arrange all information about students
in a tabular form. We keep all the records so that if I am asked, „How many students are there in
Economics?‟ I am in a position to answer.
Section objectives
At the end of this section, you will find yourself being able to:
2
a) File-based Approach:
File system was used before the advent of Database systems. One way to keep the information
on a computer is to store it in the permanent files. The system has a number of application
programs; each of them is defined to manipulate the data files. These application programs have
been written on request of the users in the organization. New application will be added to the
system as the need arises. The system just described is called the file-based system.
Consider a traditional banking system which uses the file-based system in managing the
organization‟s data in the picture below. As we can see, there are different departments in the
Bank, each of them has its own applications which manage and manipulate different data files.
For banking system, the programs can be the one to debit or credit an account, find the balance
of an account, add a new mortgage loan or generate monthly statements etc.
Data Redundancy: Since files and applications are created by different programmer of
various departments over long period of time, it might lead to several problems:
Inconsistency in data format: Updates to files may result in inconsistent data across the
organization. For example, if an accounting application updates a customer name
without notifying other application areas that also maintain customer name, customer
name will be stored differently for different applications. The same information may be
3
kept in several different places (files). Data inconsistency which means various copies of
the same data is conflicting; waste storage space and duplication of effort.
Data Isolation: A file-based data storage approach makes it difficult for other
applications to access data not owned by their application. Data owned by other
applications may be stored in a format not consistent with the retrieval capabilities of
another application. It is difficult for new applications to retrieve the appropriate data
which might be stored in various files.
Integrity problems: Data values must satisfy certain consistency constraints which are
specified in the application programs. File-based approaches to data storage are tied to
applications rather than the entities to which the files refer. File-based approaches do
not recognize relationships between entities until such information is needed by an
application. It is difficult to add or change the programs to enforce new constraints.
Security problems: There are constraints regarding access privileges and application is
added to the system in the ad-hoc manner so it is difficult to enforce those constraints.
Concurrent – access anomalies: Data may be accessed by many applications that have
not been coordinated previously so it is not easy to provide a strategy to support multiple
users to update data simultaneously.
These difficulties have prompted the development of a new approach in managing large amount
of organizational information – database approach. In the following section, we shall see the
concepts that have been introduced to get over the problems mentioned.
b) Database Approach
Database approaches to data storage support the sharing of data across multiple applications with
multiple users. Databases are structured in a way that is meaningful to an organization. For
example, if an organization maintains information on suppliers and the geographic areas they
service, there would be a link in the database between the suppliers and geographic areas.
Databases reduce data redundancy.
A Database Management System (DBMS) is the software that handles all database accesses.
A DBMS presents a logical view of the data to the users. How this data is stored and retrieved is
hidden from the users. A DBMS ensures that the data is consistent across the database and
controls who can access what data.
4
Database and database technology play an important role in most of social areas where
computers are used, including business, education, medicine etc. To understand the fundamental
of database system, we start from introducing the basic concepts in this area.
Database is a shared collection of related data which will be used to support the activities of
particular organization. Database can be viewed as a repository of data that is defined once and
then is accessed by various users. A database has the following properties:
It is a representation of some aspect of the real world; or perhaps, a collection of data elements
(facts) representing real-world information. Database is logical coherent and internally
consistent.
Database is designed, built, and populated with data for a specific purpose.
Database Management System (DBMS) is a collection of programs that enable users to create,
maintain database and control all the access to the database. The primary goal of the DBMS is to
provide an environment that is both convenient and efficient for users to retrieve and store
information.
Application programs accesses the data stored in the database by sending requests to the DBMS.
Some examples of DBMS software are MS-SQL, MS-Access, Oracle, Informix, etc.
5
Figure 1.2: The components of a database system
With the database approach, we can have the traditional banking system as shown in the
following picture.
There are a number of characteristics that distinguish the database approach from the file-based
approach. In this section, we describe in detail some of those important characteristics.
6
1. Self-Describing Nature of a Database System: Database System contains not only the
database itself but also the descriptions of data structure and constraints (meta-data).
This information is used by the DBMS software or database users if needed. This
separation makes database system totally different from traditional file-based system in
which data definition is a part of application programs.
2. Insulation between Program and Data: In the filed-based system, the structure of the
data files is defined in the application programs so if user wants to change the structure
of a file, all the programs access to that files might need to be changed. On the other
hand, in database approach, data structure is stored in the system catalog not in the
programs so such changes might not occur.
3. Support multiple views of data: A view is a subset of the database which is defined
and dedicated for particular users of the system. Multiple users in the system might have
different views of the system. Each view might contain only the interested data of a user
or a group of user.
4. Sharing of data and Multiuser system: A multiuser database system must allow
multiple users access the database at the same time. As the result, the multiuser DBMS
must have concurrency control strategies to ensure that several user try to access the
same data item at a time do so in the manner so that the data always be correct.
Users of traditional file processing systems, each had a “copy” of relevant data, causing
Duplication of effort
Wastage of storage space
Inconsistent data
7
Fig. a file consisting of students‟ list in the registrar office
87 Brown BA UQ Scholar
In the Database approach, ideally each data item is stored in only one place in the database.
However, in some cases, redundancy still exists to improving system performance, but such
redundancy is controlled and kept to minimum.
2. Data Sharing
The integration of the whole data in an organization leads to the ability to produce more
information from a given amount of data
DBMSs should provide capabilities to define and enforce certain constraints such as data type,
data uniqueness and other constraints.
The DBMS has the capability to define and enforce integrity constraints which are restrictions
placed on the data, based on the semantics or meaning of the data. Examples of constraints are
just like the following:
8
4. Restricting Unauthorized Access
Different user groups may have different access privileges (Create/Alter, Update, and Retrieve),
which are controlled through DBMS security sub-system, through the use of Accounts &
Passwords called user accounts. For example,
Casual users may not have access to confidential data, e.g. medical records, salary
packages, police reports
Parametric users may be given update access, but are generally not allowed to change
the structure of data
Database administrators (DBAs) generally have highest privileges, create user accounts
and enforce restrictions
5. Data Independence
The system data descriptions are separated from the application programs. Changes to the data
structure is handled by the DBMS and not embedded in the program.
6. Transaction Processing
The DBMS must include concurrency control subsystem to ensure that several users trying to
update the same data do so in a controlled manner so that the result of the updates is correct.
A view may be a subset of the database. Various users may have different views of the database
itself. Users may not need to be aware of how and where the data they refer to is stored.
DBMS provides facility to recover from hardware and software failures through its backup and
recovery sub-system. If the computer system fails in the middle of a complex update program,
the recovery subsystem is responsible for making sure that the database is restored to the stage it
was in before the program started executing.
9
Section 2: Data Models, Schemas and Instances
Section objectives
Data model is a collection of concepts that can be used to describe the structure of database.
Structure of database means data types, relationships and constraints. In addition, most data
model includes a set of basic operations for specifying retrievals and modifications on the
database. Data Model provides a means to achieve Data Abstraction. Data Abstraction refers to
the hiding of certain details of how the data are stored and maintained. With several levels of
abstraction, the user‟s view of the database is simplified and this leads to the improved
understanding of data.
1. View level: The highest level of abstraction describes only part of the entire database.
Many users will not be concerned with the large database. Instead, they need to access
only a part of it so that view level abstraction is defined. There are many views for the
same database. View – corresponds to the third view of data: What part of the data is
seen by a specific application
2. Logical/conceptual level: This level describes what data are stored in the whole
database. Conceptual – corresponds to the second view of data: What we want the data
to express and what relationships between data we must express, what “ story” data tells,
are all data necessary for the “story‟ are discussed.
3. Physical level: The lowest level of abstraction describes how the data are actually stored.
Physical – corresponds to the first view of data: How data is stored, how is it accessed,
10
how data is modified, is data ordered, how data is allocated to computer memory and/or
peripheral devices, how data items are actually represented (ASCI, EBCDIC, …)
1. High-level Conceptual Data models: Provide concepts that are close to the way people
perceive data to present the data. Typical example of this type is entity – relationship model
which uses main concepts like entities, attributes, and relationships. An entity represents
real-world objects such as an employee, a project. An entity has some attributes which
represents properties of entity such as employee‟s name, address, birth-date. A relationship
represents association among entities for example a works on relationship between employee
and project.
2. Record-based Logical Data models: Provide concepts that can be understood by the user
but not too far from the way data is stored in the computer. Three well-known data models
of this type are relational data model, network data model and hierarchical data model.
Advantages of RDBMS
11
Sound theoretical foundation and use over many years has resulted in stable,
standardized products available.
Standard data access language through SQL.
Costs and risks associated with large development efforts and with large databases are
well understood.
The fundamental structure, i.e., a table, is easily understood and the design and
normalization process is well defined.
Weaknesses of RDBMS
Performance problems associated with re-assembling simple data structures into their
more complicated real-world representations.
Lack of support for complex base types, e.g., drawings
SQL is limited when accessing complex data.
Knowledge of the database structure is required to create ad hoc queries.
Locking mechanisms defined by RDBMSs do not allow design transactions to be
supported, e.g., the "check in" and "check out" type of feature that would allow an
engineer to modify a drawing over the course of several working days.
In a network database structure, data is represented by records and links. There is no limit to the
number of immediate parent segments or the number of immediate child segments a given record
occurrence may have. The network database structure allows the modeling of many-to-many
relationships. Represents data as record types and also represents a limited type of one-to-many
relationships, called set type. The figure below shows a schema in network model notation.
12
Figure 1.4: Sample schema in network model
13
c) The Hierarchical model:
In the hierarchical structure, data is represented by a simple tree structure. The record type at the
top of the tree is usually known as the "root." The simple hierarchical structure consists of a root
and a single dependent record type. In general, the root may have any number of dependent
records, each of which may have any number of lower-level dependents, and so on, to any
number of levels. The hierarchy view contains records of different types connected by links.
Hierarchical relationships of records are explicitly defined in the data structure. A parent record
can have many child records but a child record can have only one parent. There are no many-to-
many relationships between records. No dependent record within a hierarchical data structure
can exist without its parent record. For this reason, records must be seen in context.
14
may be required),
Poor performance for non-hierarchical accesses,
Lack of maintainability (changing relationships may require physical reorganization of
data).
With the increased awareness of the object-oriented paradigm, many DBMSs claim to be object-
oriented databases. The object-oriented model provides the basic DBMS characteristics. In
addition Object-oriented Database Management Systems provide the following features:
Support for Complex Objects: An ODBMS must provide support for data types that allow
storage of complex objects. An RDBMS supports only basic types, (e.g., integer,
character), that are assembled into table rows to define objects.
Identity: An object must be uniquely identified.
Encapsulation: Hiding information and only allowing access through the public interface is
a basic concept in object-orientation. Strict implementations enforce access to data
through programs; however, some ODBMS products allow query access to the data.
Classes and Types: An ODBMS must allow definition of classes and data types by the
user. In addition these must not be restricted by the implementation. User defined types
must be treated the same as the product supplier's base types in the database functionality.
Inheritance: Inheritance is a basic concept in object-orientation that is a major factor in re-
use. Inheritance of data and methods is a primary feature of an object-oriented
programming language (OOPL).
Dynamic Binding and Polymorphism: This characteristic allows different implementations
of semantically similar methods with the same name in different classes.
Computational Completeness: The integrated language must be computationally complete.
For example, SQL is relationally complete since a SELECT statement can be constructed
to retrieve any subset of data within an RDBMS. SQL is not computationally complete (it
is sometimes called a data sublanguage) since there are computations that cannot be done
using SQL constructs. The ODBMS language cannot have this restriction.
Extensibility: The user must be able to add non-restricted types to the database
15
Strengths of ODBMS
Weaknesses of ODBMS
Lack of maturity in the products. This leads to more risk in employing the product. There
is also a lack of product standardization. Database standards are emerging.
There is no equivalent of SQL for object-oriented databases. Work is being done on
establishing standards.
There is much less experience with project development. Hence, there is a lack of
information on costs and schedules. In addition, there is added training required for the
team because of the lack of familiarity with the object paradigm.
There is no consistent, solid theoretical basis to support ODBMS products. This raises
questions about the completeness of the implementations. In addition, there is a plethora of
analysis and design approaches.
3. Physical Data models: Provide concepts that describe how data is actually stored in the
computer.
Database technology may not always be needed, depending on the sophistication of the system's
data structures, complexity of data, and the need for features provided by databases.
16
maintain existing applications using this data. Application engineers and developers do not need
to know about the storage structure or access strategy of data to develop or modify applications.
Support for a standard query language, such as SQL, is also an important criterion. By
developing applications using a standard query language, flexibility is achieved since the
underlying database product can be changed or modified without necessarily affecting the
applications that access the data. Note, however, that some DBMSs have added functionality not
supported by the standard query language. Use the added functionality with caution.
The description of a database is called the DB Schema. It is specified during the DB design. It is
not expected to be changed frequently. It is usually represented by diagrams (Schema diagram)
(see next figure). Each object of the schema (such as student and course) is called a schema
construct. The Schema diagram specifies some aspect (names of records and data items) and
some other aspects as data type and relationships are not specified. The schema is called the
“intension”. The stored description of the schema is called the “meta-data”. The description of
the database which is designed in the early stage and is not expected to change frequently is
called the database schema. Database system has several schemas. The following are examples
of schemas:
Student
Name StudentNo Class Major
Course
CourseName CourseNumber CreditHours Department
The data in the DB at a particular moment in time is called a Database State or Snapshot or the
Current Set of Occurrence or Instances. Its initial state is generated when the DB is first
populated with data and then it moves from valid state to another valid state during data
manipulation. It is called the “extension”. Since information can be inserted to or deleted from
database at anytime, database changes over time. At a particular moment, the collection of
information stored in the database is called an instance of the database.
17
Section 3: Database Architecture and database users
Section objectives
Three – level architecture for database system is proposed to archive the characteristics of the
database approach. The goal of this architecture is to separate the applications programs from the
physical database so the actual details of how data is organized are hidden from the users.
18
As we can see from the picture above, there are three levels of schemas in the database
architecture
a) External level: In this highest level, there exists a number of views which of is
defined a part of the actual database. Each view is provided for a user or a group of
users so that it helps in simplified the interaction between the user and system. Also
high-level conceptual data model or implementation data model can be used at this
level.
b) Conceptual level: Conceptual Schema in this level describes the logical structure of
the whole database. The entire database is described using simple logical concepts
such as objects, their properties or relationships. Thus the complexity of the
implementation detail of the data will be hidden from the users. Conceptual level has a
conceptual schema which describes the structure of the DB. It hides the details of the
physical storage structure. Also it describes the entities, data types, relationships,
constraints and user operations. High-level conceptual data model or implementation
data model can be used at this level.
c) Internal level: Internal Schema in this level describes how the data are actually
stored, how to access the data. Internal schema has a Physical Data Model
It is a convenient tool for the user to understand and visualize the schema levels of the DB
system. Most of the DBMSs do not separate the three levels completely. Some DBMSs may
include physical details in the conceptual schema. Most of the DBMSs include external views in
the conceptual data model. DBMSs perform mappings to transform requests and results between
the levels. Mappings between levels are a time-consuming process, so some DBMSs do not
support the external views.
Data independence is the ability to modify the schema in one level without affecting the schema
in the higher level. There are two levels of data independence:
Logical data independence is the ability to make change in the conceptual schema without
causing a change in the user views or application program. Conceptual schema may be changed
19
to expand the DB as adding new data items or reducing it by removing data items. The
applications involved in the modified data items can only be changed without any effect on the
remaining applications.
Physical data independence is the ability to make change in the internal schema without causing
a change in the conceptual schema or application program. Internal schema may be changed due
to some file reorganization to meet certain system performance.
Physical data independence seem to be easier to achieve since the way the data is organized in
the memory affect only the performance of the system. Meanwhile, the application program
depends much on the logical structure of the data that they access.
Data Definition Language (DDL): This is used to define the conceptual and internal
schemas for a database system. It is not procedural language, rather a language for describing the
types of entities and relationships among them in terms of a particular data model.
These are the commands for creating tables, defining relationships, and controlling other aspects
of the database that are more structural than data related.
Data Manipulation Language (DML): This is used to manipulate the database, which
typically includes retrieval, insertion, deletion, and modification of the data.
Data Control Language (DCL): This is a set of commands that generally control
permissions on the data, such as defining access rights. Many database users will never use these
commands, because they work in larger company environments where one or more database
administrators are employed specifically to manage the database, and usually one of their roles is
to control permissions.
There are different types of users of the Database. These are of the following:
I. End users
20
These are people whose jobs require access to database for querying, updating and generating a
report. End users might be one of the following:
Naïve users: Naïve users are people who use the existing application programs to perform their
daily tasks. They do not know about database too much, invoke application programs that are
prepared already.
Sophisticated users: They are people who use their own way to access to the database. This
means they do not use the application program provided in the system. Instead, they might
define their own application or describe their need directly in a query language. They know
advanced use of the system and can use the system and packages on the top of the system.
Specialized users/ DBMS system users: Maintain the personal database by using ready –made
program packages that provide easy-to-use menu. Write specialized database applications that do
not fit into the traditional data processing framework.
People implement specific application programs to access the stored data. This kind of user
needs to be familiar with the DBMSs to accomplish their task. Know how to interact with the
system but may not know how DBMS is designed.
A person or a group of people in the organization who is responsible for authorizing the access
to the database, monitoring its use and managing all the resource to support the use of the whole
database system. Coordinates all the activities of the database system; the database administrator
has a good understanding of the enterprise‟s information resources and needs.
Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
Specifying integrity constraints
21
Acting as liaison with users
Monitoring performance and responding to changes in requirements
Database designers are responsible for identifying the data to be stored in the database and for
choosing appropriate structures to represent and store this data. These tasks are mostly
undertaken before the database is actually implemented and populated with data. It is the
responsibility of database designers to communicate with all prospective database users, in order
to understand their requirements, and to come up with a design that meets these requirements.
In many cases, the designers are on the staff of the DBA and may be assigned other staff
responsibilities after the database design is completed. Database designers typically interact with
each potential group of users and develop a view of the database that meets the data and
processing requirements of this group. These views are then analyzed and integrated with the
views of other user groups. The final database design must be capable of supporting the
requirements of all user groups.
22
UNIT TWO
Objectives
Structures
Introduction
Conceptual modeling is an important phase in the database design process. In this UNIT, we
will discuss briefly about data modeling and then focus on modeling concepts of the Entity –
Relationship Model, which is popular high-level conceptual data model.
23
Section 1: Database Application Development Process
Section objectives
The figure below shows the main phases of the database application development process.
Figure 2.1: The main phases of the database application development process
As we can see, the first step is Requirement Collection and Analysis. During this step, the
database designers have to interview the customers (database users) to understand the be-built
system, obtain and document the data and functional requirements. The result of this step is a
document including the detail requirements of the users.
24
Data Modeling is the next step in the process. This step is sometimes considered as a high-level
and abstract design phase (conceptual design). The aims of this phase are:
Describes what data is contained in the database (E.g. entities: students, lecturers,
courses, subjects etc.)
Describes the relationships between data items (E.g. Students are supervised by
Lecturers; Lecturers teach Courses )
Describes the constraints on data (E.g. Student Number has exact 8 digits; A subject has
4 or 6 unit of credits only)
The data items, the relationships and constraints all are expressed using the concepts provided by
the high-level data model. Because these concepts do not include the implementation details so
the results of the data modeling process is a (semi) formal representation of the database
structure. This result is quite easy to understand so it is used as reference to make sure that all the
user‟s requirements are met.
The third step is Database Design. During this step, we might have two sub steps called Database
Logical Design which define a database in a data model of a specific DBMS, and Database
Physical Design which define the internal database storage structure, file organization or
indexing techniques. The last two steps shown are Database Implementation and
Operations/Interfaces Building focus on create instance of schema and implementing operations
and user interfaces.
In the database design phases, data is represented using a certain data model. Data Model is a
collection of conceptual concepts or notations for describing data, data relationships, and data
semantics and data constraints. Most data models also include a set of basic operations for
manipulating data in the database.
As mentioned under “the three levels of data models”, data models are either:
1. Conceptual models
o Database is considered as a collection of entities (objects) of various kinds.
o These models provide a flexible data structuring capabilities.
o The typical example of this model is entity-relationship model, object-oriented
model or semantic data model.
25
2. Record based logical models
o Database is considered as a collection of fixed – size record.
o These models are closer to the physical level or file structure so they are easier
to implement.
o The three most well-known models of this kind are relational data model,
network data model or hierarchical data model.
3. Physical models
o Provide concepts that describe the details of how data is stored in the computer‟s
memory
In the next section, we will discuss the Entity-Relationship Data Model in more detail.
Section objectives
Introduction
Entity Relationship Data Model has existed for 30 years (original description: Chen, ACM
Transactions on Database Systems, 1(1), 1976).
The ER model is well suited to data modeling for use with databases because it is fairly abstract
and is easy to discuss and explain. ER models are readily translated to relations.
ER modeling is based on two concepts: Entities, that is, things. E.g. Prof. Ba, Course Database
System and Relationships, that is, associations or interactions between entities. E.g.. Prof. Ba
teaches course Database Systems. ER models (or ER schemas) are represented by ER diagrams.
26
In this section, we will use a sample database called the COMPANY database to illustrate the
concepts of Entity Relationship Model. This database contains the information about employees,
departments and projects:
There are several departments in the company. Each department has a unique identification, a
name, location of the office and a particular employee who manages the department. A
department controls a number of projects, each of which has unique name, a unique number and
the budget. Each employee has name, an identification number, address, salary, birth date. An
employee is assigned to one department but can join in several projects. We need to record the
start date of the employee in each project. We also need to know the direct supervisor of each
employee. We want to keep track of the dependents of the employees. Each dependent has name,
birth-date and relationship with the employee.
Entity
Entity is an object in the real world with an independent existence and can be differentiated from
other objects.
An entity might be an object with physical existence like a lecturer, a student, a car or an object
with conceptual existence like a course, a job, a position, etc.
27
2.2. Attribute
Each attribute has a name, associated with an entity and is associated with a domain of legal
values. However the information about attribute domain is not presented on the ER diagram
Types of Attributes
I) SIMPLE attributes are attributes that are drawn from the atomic value domains
E.g. Address may consist of “Number”, “Street” and “Suburb” → Address = {59 + „Meek
Street‟ + „Kingsford‟}
III) SINGLE VALUED attributes: Attributes that have only one value for each entity
IV) MULTIVALUED attributes: Attributes that have a set of values for each entity
28
E.g. Degrees of a person: „BSc‟, „MIT‟, „PhD‟
V) DERIVED attributes: Attributes Contain values that are calculated from other attributes
E.g. Age can be derived from attribute Date-Of-Birth. In this situation, Date-Of-Birth might be
called Stored Attribute.
29
Figure 2.6: Notation of Derived attribute in ER Diagram
Keys
An important constraint on the entities is the key. Key is an attribute or a group of attributes
whose values can be used to uniquely identify individual entity in an entity set.
For example, for the entity EMPLOYEE = {EID, Name, Address, Age, Salary}
There is a key that is chosen by the database designer used as an identifying mechanism for the
whole entity set: primary key. This key is indicated by underlying attributes in the ER model.
The set of similar associations at a point of time is called the Relationship Set
30
Employee (Sara) joins in Project (mFORM)
Relationship type is a collection of relationships among entities from a certain set of entity
types.
Definition: Consider n ≥ 2 possibly non-distinct entity types E1, E2, En. Then a relationship is a
tuple (e1, e2… en) is a subset of E1 x E2 x … x En.
In the ER diagram, relationship type is represented by a diamond with a name inside and is
connected by straight line to the rectangles representing entity type.
Unary (Recursive) relationship type is the relationship that involves only one entity
type. However, the same entity type participates in a relationship type in different roles.
For example, the figure below shows the Supervise relationship type which relates an
Employee and a Supervisor who is also an employee. So in this relationship, one
employee has the role of supervisor, another has the role of supervisee.
31
Figure 2.9: Supervise Relationship
Binary relationship type: This relationship type has two entity types link together. This
is the most common relationship.
For example the “Joins in” relationship between EMPLOYEE and PROJECT
32
Figure 2.11: "Joins in" Relationship
Ternary relationship type: If there are three entity types link together, the
relationship is called ternary relationship.
For example: The Supply relationship associates a SUPPLIER, a PART and a PROJECT.
33
Figure 2.13: Ternary relationship example
The constraints of relationship types limit the possible combination of entities that participate in
the relationship set. Two main types of constraints is mapping cardinality and participation.
I. Mapping Cardinality
Mapping Cardinality describes the maximum number of entities that a given entity can be
associated with via a relationship. In this section, we consider only the cardinality constraint for
34
the binary relationship. The possible cardinality for binary relationship types are: One - to- One
(1-1), One – to – Many (1 – N) and Many – to – Many (M-N).
I. One to one relationship: Given two entity sets A and B, there is a one-to-one
relationship between A and B if each entity in the set A is associated with at most one
entity in the set B and vice versa each entity in the set B is associated with at most one
entity in the set A.
Example: An employee if is a manager then can manage one department and a department can
have only one manager.
35
Figure 2.17: One-to-many relationship example
Example: There are many employees work in a department however; an employee can work for
only one department.
Example: An employee can join in several projects and a project can have several employees
36
Figure 2.20: Many-to-many relationship notation
II. Participation
The participation constraints specify whether the existence of an entity depends on its being
related to another entity via the relationship type.
The relationship types can have attributes like entity types. For 1-1 and 1-N relationship types,
attributes of the relationship can be migrated to one of the participating entity types. However,
for M-N relationship types, attributes must be specified as the relationship attribute.
Example: The Start_Date attribute records the date on which an employee joins in a project
37
Figure 2.22: Attribute in relationship example
Weak entity types are those whose existence depends on the existence of another entity type.
Entities belonging to weak entity type are identified by being related to specific entities from
another entity type which is called strong entity type.
The weak entity types do not have key attributes of their own. The keys of this type are partially
or totally derived from strong entity types.
In ER diagram, weak entity and strong/weak relationship are denoted by double box/diamond
Example: Entity type DEPENDENT related to EMPLOYEE. This entity type is used to keep
track of the dependents of each employee via 1-N relationship. DEPENDENT entity has Name,
Birthdate and Relationship. Two dependents of two distinct employees might have the same
{Name, Birthdate, Relationship} values but they are totally different. They are identified as
distinct only after considering the association with particular employee entities to which the
dependents relate. So the key of the DEPENDENT might be {EID, Name} in which EID is the
attribute from the strong entity type EMPLOYEE.
38
Figure 2.23: Weak entity type notation
Modeling Entities:
39
Modeling Relationships
There are several problems that may arise when designing a conceptual data model. These are
known as connection traps. There are two main types of connection traps:
A fan trap occurs when a model represents a relationship between entity types, but the pathway
between certain entity occurrences is ambiguous. It occurs when 1:M relationships fan out from
a single entity.
A single site contains many departments and employs many staffs. However, which staff works
in a particular department? The fan trap is resolved by restructuring the original ER model to
represent the correct association.
40
3.2. Chasm traps
A chasm trap occurs when a model suggests the existence of a relationship between entity types,
but the pathway does not exist between certain entity occurrences. It occurs where there is a
relationship with partial participation, which forms part of the pathway between entities that are
related.
is_allocated oversees
Branch Staff Property
n 0 0
A single branch is allocated to many staffs who oversee the management of properties for rent.
Not all staffs oversee property and not all property is managed by a member of staff.
41
Section 3: Enhanced-ER (EER) Model Concepts
Section objectives
Introduction
EER (Enhanced Entity Relationship) data model includes all modeling concepts of basic ER
database model. It also includes additional concepts: subclasses/superclasses,
specialization/generalization, categories, and attribute inheritance. The resulting model is called
the enhanced-ER or Extended ER (E2R or EER) model. It is used to model applications more
completely and accurately if needed and it includes some object-oriented concepts, such as
inheritance
An entity type may have additional meaningful sub groupings of its entities
42
These are also called IS-A relationships (SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A
EMPLOYEE, …).
Note: An entity that is member of a subclass represents the same real-world entity as some
member of the superclass
An entity that is member of a subclass inherits all attributes of the entity as a member of the
superclass. It also inherits all relationships
a) Specialization
Is the process of defining a set of subclasses of a superclass. The set of subclasses is based upon
some distinguishing characteristics of the entities in the superclass.
43
– Superclass/subclass relationships and specialization can be diagrammatically
represented in EER diagrams
– Attributes of a subclass are called specific attributes. For example, TypingSpeed
of SECRETARY
– The subclass can participate in specific relationship types. For example,
BELONGS_TO of HOURLY_EMPLOYEE
Example of a Specialization
b) Generalization
Generalization is the reverse of the specialization process. Several classes with common features
are generalized into a superclass; original classes become its subclasses.
Example: CAR, TRUCK generalized into VEHICLE; both CAR, TRUCK become subclasses of
the superclass VEHICLE.
44
– We advocate not drawing any arrows in these situations
– A superclass or subclass represents a set of entities
– Shown in rectangles in EER diagrams (as are entity types)
– Sometimes, all entity sets are simply called classes, whether they are entity
types, superclasses, or subclasses
If we can determine exactly those entities that will become members of each subclass by a
condition, the subclasses are called predicate-defined (or condition-defined) subclasses
I. Disjointness Constraint:
Specifies that the subclasses of the specialization must be disjointed (an entity can be a member
of at most one of the subclasses of the specialization). It is specified by d in EER diagram. If not
45
disjointed, overlap; that is the same entity may be a member of more than one subclass of the
specialization. Overlap is specified by o in EER diagram
Disjoint, total
Disjoint, partial
Overlapping, total
Overlapping, partial
Note: Generalization usually is total because the superclass is derived from the subclasses.
A subclass may itself have further subclasses specified on it. This forms a hierarchy or a lattice.
Hierarchy has a constraint that every subclass has only one superclass (called single
inheritance)
46
In a lattice, a subclass can be subclass of more than one superclass (called multiple
inheritances)
In a lattice or hierarchy, a subclass inherits attributes not only of its direct superclass, but also of
all its predecessor superclasses. A subclass with more than one superclass is called a shared
subclass
In specialization, start with an entity type and then define subclasses of the entity type by
successive specialization (top down conceptual refinement process)
In generalization, start with many entity types and generalize those that have common properties
(bottom up conceptual synthesis process). In practice, the combination of two processes is
employed.
47
Categories (UNION TYPES)
All of the superclass/subclass relationships we have seen thus far have a single superclass. A
shared subclass is subclass in more than one distinct superclass/subclass relationships, where
each relationship has a single superclass (multiple inheritance). In some cases, we need to model
a single superclass/subclass relationship with more than one superclass.
Superclasses represent different entity types. Such a subclass is called a category or UNION
TYPE.
Example: Database for vehicle registration, vehicle owner can be a person, a bank (holding a lien
on a vehicle) or a company.
Category (subclass) OWNER is a subset of the union of the three superclasses COMPANY,
BANK, and PERSON.
Note: The difference from shared subclass, which is subset of the intersection of its superclasses
(shared subclass member must exist in all of its superclasses).
48
49
UNIT THREE
Unit objectives
Structures
Introduction
The relational data model is introduced by C.F.Codd in 1970. Currently, it is considered as the
most widely used data model.
50
The relational model has provided basis for:
The relational data model describes the world as “a collection of inter-related relations (or
tables)“
Section objectives
1.1. Domain: A domain D is the original sets of atomic values used to model data. By atomic,
we mean that each value in the domain is indivisible as far as the relational model is
concerned. For example:
The domain of day shift is the set of all possible days: {Mon, Tue, Wed…}. The domain of
salary is the set of all floating-point numbers greater than 0 and less than 200,000 (say). The
domain of name is the set of character strings that represents names of person.
1.2. Relation (Relation state): A relation is a subset of the Cartesian product of a list of
domains characterized by a name.
51
Relation can be viewed as a “table”. In that table, each row represents a tuple of data values and
each column represents an attribute.
4.4. Attribute: A column of a relation designated by name. The name associated should be
meaningful. Each attributes associates with a domain.
The degree of the relation is the number of attributes of its relation schema.
Eid, Name, Birthdate, Salary, Department are called Attributes (or Columns)
52
Ordering of Values within a tuple is not important: The order of attributes and their
values within a relation is not important as long as the correspondence between
attributes and values is maintained. Thus the following is a different representation of
the above EMPLOYEE relation
EMPLOYEE
Department Eid Name Birthdate Salary
D1 20001234 Nguyễn Thành 22/11/1970 30000
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000
D2 20000980 Vũ Minh 30/4/1967 50000
D3 19991323 Nguyễn Thành 10/10/1966 50000
Values and NULL values in the tuple: Each value in a tuple is atomic. That means
each value cannot be divided into smaller components. Hence, the composite and multi-
valued attributes are not allowed in a relation.
In a relation, there can be several values of attributes that may be unknown or currently not
applicable. The following table contains some NULL values in it:
EMPLOYEE
Department Eid Name Birthdate Salary Cell-Phone
D1 20001234 Nguyễn Thành 22/11/1970 30000 0912304101
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000 NULL
D2 20000980 Vũ Minh 30/4/1967 50000 0903132123
D3 19991323 Nguyễn Thành 10/10/1966 50000 NULL
The data shows that there are employees with no cell phone because the number might be
unknown at this point of time.
Constraint is a very important feature in relational model. In fact, relational model support a
well-defined theory of constraint on attributes or tables.
53
Constraint is useful because it allows designer to specify the semantics of data in database and it
is the rules to enforce DBMSs to check that new data satisfies the semantics.
Integrity constraint
Relation allows us to represent data and association. Domain restricts the values of attributes in
the relation and it is a constraint of relational model. However, there are real –world semantics
on data that cannot specify if use only domain. We need more specific way to state what data
values are/are not allowed, what format is suitable for an attributes. For example, employee
number must be unique; employees‟ age is in the range [23, 65]. Such information is provided in
logical statements called integrity constraints.
1. Key constraint
A relation is a set of tuples. By definition, all elements in a set are distinct hence all tuple in a
relation must be distinct.
In relational model, tuples have no identity like object –identification. Tuple identity is totally
value-based. Therefore, we need key constraint that is the way of uniquely identify a tuple.
Given a relation schema R with U is the list of attributes, there are a set K which is a subset of U.
If in a relation r of E with any two distinct tuples t1 and t2 we have the constraint that t1[K] ≠
t2[K] then K is called a superkey of the relation schema R.
A superkey can have redundant attributes. A superkey that has no such attributes is called a
candidate key. Formally, we can define a candidate key as following.
K‟ is a superkey of R
There is no such proper subset of K‟ that is also a superkey of R or K‟ is the minimum
superkey)
54
Since a relation schema may have more than one candidate key thus there is a chosen candidate
key whose values are used to identify tuples in the relation. Such key is primary key. Primary
key is usually the most simple candidate key (i.e. key with single attribute or small number of
attributes)
In relational database, key are often implemented by introducing an attribute specially designed
for being key.
Example: What could be the superkey, candidate key and primary key of the following relation?
EMPLOYEE
Department Eid Name Birthdate Salary Cell-Phone
D1 20001234 Nguyễn Thành 22/11/1970 30000 0912304101
D1 20012322 Đỗ Văn Khôi 1/2/1978 27000 NULL
D2 20000980 Vũ Minh 30/4/1967 50000 0903132123
D3 19991323 Nguyễn Thành 10/10/1966 50000 NULL
2. Entity constraint: No attribute in the primary key can be NULL. This is because, NULL
values for the primary key means we cannot identify some tuples. For example, in the
EMPLOYEE relation showed above, Cell-Phone cannot be a key since we cannot use this
attribute to identify employees 20012322 and employee 19991323.
3. Referential constraint: The constraint that is specified between two relations and
maintains the correspondence between tuples in these relations. It means the reference from a
tuple in one relation to other relation must be valid.
In the Bank Database (From Data Modeling Note) : The ACCOUNT relation need to take note
the BRANCH where each account is held so in implementation, in each tuple of ACCOUNT
55
relation, there is an attribute such as branch name to identify the associate BRANCH . The
referential integrity constraint must state that the branch-name attribute in the ACCOUNT
relation refer to a valid branch (i.e. existing branch)
In the Company Database: In the EMPLOYEE relation, we need to store the information about
the department where each employee works in. The attribute Department in EMPLOYEE
relation is used for that purpose so the values appears in the columns Department must be taken
from set of values identify the Department number in the relation DEPARTMENT if this relation
exists.
The attributes in FK correspond to the attributes in the primary key of another relation schema
R2
The value for FK in each tuple of R1 either occur as values of primary key of a tuple in R2 or is
entirely NULL
In a database of many relations, there are usually many foreign keys. They provide the “glue”
that links individual relations into a cohesive database structure.
The following figure describes an example of referential integrity constraint in a part of the
BANK database
56
Figure 2: Referential integrity constraints in part of BANK database
4, Semantic constraints: This is a special kind of constraints that may have to be enforced
in relational database. Such constraints describe the semantics of data in the database or
sometimes called the rules on data. For example, in the COMPANY database, we have the rule
“An employee cannot take a part in more than 5 projects” or “Salary of an employee cannot
exceed the salary of the employee‟s manager”
Relations, keys, foreign keys and integrity constraints provide a complete toolkit for building
relational databases. A relational database consists of many relations and tuples in relations are
related in various ways. Here, we will define relational database schema and relational database
instance.
A set of relations (relation states) {r1(R1) , r2(R2) , … , rn(Rn) } where all of the
integrity constraints are satisfied.
57
Constraint Checking
Relational database instance is changing over time. At a moment of time, we can have an
instance that satisfied all the constraints but when some update operations performed, we must
re-check the constraints. There are three basic update operations on relations: insert a new
record, delete an existing record and modify an existing record.
ACCOUNT BRANCH
branch-Name balance account-Number branch-Name address assets
Hashanah 20000 C-12894349 Hashanah Hai Ba Trung 900000000
Dingo 20000 C-12894350 DongDo Dong Da 400000000
Dingo 3500 S-141510751 ThangLong Hoan Kiem 500000000
Hashanah 50000 S-520522620
CUSTOMER
customerNumber Name address homeBranch
111111 Anh Hai Ba Trung HaThanh
121314 Van Anh Hai Ba Trung Dong Do
515016 Son Hoan Kiem HaThanh
ACCOUNT-HOLDER
customerNumber accountNumber
111111 C-12894349
121314 C-12894350
121314 S-141510751
515016 S-520522620
111111 C-12894350
58
Domain constraint checking:
For insert operation, we need to check attribute value for type and other domain
restrictions.
For delete operation, it is no need to check any domain constraints
For update operation, it is also needed to check attribute value for type and other domain
restrictions.
For insert operation, it is need to check the key value does not occur in any existing tuple
in the relation.
For delete operation, it is no need to check any domain key constraints
For update operation, if the key value is modified then need the same check as for
insertion.
59
Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number
12334)
Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S-
34252525) (key is not modified)
For insert operation, it is need to check the foreign keys occur as primary keys in the
referenced relation.
For delete operation, check all relations that have foreign keys referring to this relation
An update need to treat as delete - then – insert for referential constraints checking.
Deletion can violate referential constraint when the tuple being deleted is referenced by the
foreign keys from others tuples in a different relation. Several approaches are considered to
handle this kind of violation.
60
The first approach simply disallows the deletion.
The second approach: User must find the referring tuple then either delete them
manually or change their foreign key to an acceptable value or NULL value ( not
possible if the foreign key also forms part of the primary key such as in the Account-
Holder relation)
The third approach: attempt to remove all referring tuple automatically (cascade)
When the referential constraint is specified in the database during the creation phase, the DBMSs
will allow user to specify which of the above approach applies when a violation occur.
Section objectives
Introduction
One useful strategy of database design is: Firstly, performing data modeling using conceptual
model (ER model or Object-Oriented Model) then transform this conceptual to relational model.
This section will introduce some “rule” to map an ER diagram in to a relational database
schema.
Mapping Regular Entity Types: An entity type E with atomic attributes A1, A2, ... An
maps to a relation R with attributes (columns) A1, A2, ... An . In addition, one of the
keys of the entity type E will be chosen as the primary key for R. If the chosen keys is
61
Example:
Example:
3.6.1. Multi-valued attributes are mapped by using additional relation schema. For each
multi-valued attribute A, create a new relation R. This relation will include attribute
62
corresponding to A plus the primary key K of the entity type or relationship type that has
A as attribute. Combination of A and K becomes primary key of R.
Example:
Example: Consider the below entity relationship diagram, weak entity type DEPENDENT is
mapped into relation EMP-DEPENTDENT. Primary key of this relation is {EID, Name}
because Name can be considered as partial key of DEPENDENT.
63
Figure 6: Mapping of Week Entities
Mapping 1-1 Binary Relationship Type: For each 1-1 Binary Relationship Type R,
identify relation A and B that correspond to the entity types participating in R. The
relationship itself can be mapped using one of the following approaches:
o Using foreign key: This is the most popular approach. Choose one of the
participating relation, say A ( usually one with total participation) ; include in A
as the foreign key the primary key of B. If R has attributes then put them in the
relation A as well.
o Merged relation: If both participations of two related entity type are total then
we can merge two entity types and the relationship into a relation.
o Defining relationship relation: Define a relation R to represent the
relationship. All attributes of relationship is included in R. In addition, put the
primary keys of two relations A , B into R. The primary key of R is combination
of primary keys of A and B.
Example: Consider the relationship Manages between EMPLOYEE and DEPARTMENT, in this
relationship the participation of DEPARTMENT is total.
64
Figure 7: Mapping of 1-1 Binary Relationship Type
Mapping 1-N Binary Relationship Type: For each 1-N Binary Relationship Type R,
identify two relations A and B correspond to two entity types participating in R. A represents
the entity type at 1-side and B represents the entity type at N-side. Include the primary key
of A as the foreign key in B. This foreign key represents the relationship type R.
Example:
Mapping M-N Binary Relationship Type: For each M-N Binary Relationship Type
65
R, identify two relation A, B represent two entity type participating in R. Create a new
relation S to represent R. Include in S as foreign keys the primary keys of A and B and all
the simple attributes of R. The combination of primary keys of A and B will make the
primary key of S.
Example:
Mapping Unary Relationship Type: For each Unary Relationship Type R, we identify
the relation A represents the entity type involves in this relationship. Include in this
relation one more time as foreign key the primary of itself. The two keys (primary key
and foreign key) are the same but they represents two entities of different roles relate to
this relationship.
Example
66
Figure 10: Mapping of Unary Relationship Type
Mapping Ternary Relationship Type: For each n-ary ( > 2 ) Relationships create a
new relation to represent the relationship. The primary key of the new relation is the
combination of the primary keys of the participating entities that hold the N (many) side.
In most cases of an n-ary relationship all the participating entities hold a many side.
67
Section 4: Mapping EER Model Constructs to Relations
Section objectives
Create a relation L for C with attributes Attrs(L) = {k,a1,…an} and PK(L) = k. Create a relation
Li for each subclass Si, 1 < i < m, with the attributesAttrs(Li) = {k} U {attributes of Si} and
PK(Li)=k. This option works for any specialization (total or partial, disjoint of over-lapping).
Example
68
Option B: Multiple relations-Subclass relations only
Create a relation Li for each subclass Si, 1 < i < m, with the attributes Attr(Li) = {attributes of
Si} U {k,a1…,an} and PK(Li) = k. This option only works for a specialization whose subclasses
are total (every entity in the superclass must belong to (at least) one of the subclasses.
Example
69
Option C: Single relation with one type attribute
Create a single relation L with attributes Attrs(L) = {k,a1,…an} U {attributes of S1} U…U
{attributes of Sm} U {t} and PK(L) = k. The attribute t is called a type (or discriminating)
attribute that indicates the subclass to which each tuple belongs
Example
70
Option D: Single relation with multiple type attributes
Create a single relation schema L with attributes Attrs(L) = {k,a1,…an} U {attributes of S1}
U…U {attributes of Sm} U {t1, t2,…,tm} and PK(L) = k. Each ti, 1 < I < m, is a Boolean type
attribute indicating whether a tuple belongs to the subclass Si.
Example
For mapping a category whose defining superclass have different keys, it is customary to specify
a new key attribute, called a surrogate key, when creating a relation to correspond to the category
71
In the example below we can create a relation OWNER to correspond to the OWNER category
and include any attributes of the category in this relation. The primary key of the OWNER
relation is the surrogate key, which we called OwnerId.
72
UNIT FOUR
RELATIONAL ALGEBRA
Unit objectives
Use relational algebra unary operations to retrieve data from the database
Use relational algebra binary operations to retrieve data from the database
Use relational algebra set operations to retrieve data from the database
Understand the relation between relational algebra operations and SQL operations
Structures
Relational algebra
Introduction
Relational Algebra (RA) can be viewed as a data manipulation language for relational model. It
consists of several basic operations which enable users to specify retrieval requests. RA is called
a procedural language in which users need to specify how to retrieve the expected data.
73
Rules for evaluating those expressions
Union, Intersect, Set Difference, Cartesian Product are operations based on set theory
Select, Project, Join, and Division are operations developed especially for relational
databases.
Section objectives
Definition: Two relations r(A1, A2, …, An) and s(B1, B2, …, Bn) are union compatible if they
have the same degree n and dom(Ai) = dom(Bi) for 1 ≤ i ≤ n.
This means two union compatible relations have the same number of attributes and each
corresponding pair of attributes have the same domain
74
1.1 UNION Operation
The UNION operation combines two union compatible relations into a single relation via set
union of sets of tuples.
Notation: r1∪r2
The INTERSECTION operation combines two union compatible relations into a single relation
via set intersection of sets of tuples.
75
Notation: r1∩r2
∣r1∩r2∣≤min(∣r1∣,∣r2∣)
Result schema: R
Producing the result of INTERSECTION
o Initially, result set is empty
o For each tuple t in relation r1, if t is in the relation r2 then place t in the result
set.
Example
The DIFFERENCE operation finds the set of tuples that exist in one relation but do not occur in
the other union compatible relation
Notation: r1 \ r2
76
Figure 6: Difference Operation Notation
r1={t∣t∈r1∧t∉r2{
Example
The PRODUCT operation combines information from two relations pair wise on tuples.
Notation: r x s
r×s={(t1,t2)∣t1∈r∧t2∈s { where r(R) and s(S)
Each tuple in the result contains all attributes in r and s, possibly with some fields
renamed to avoid ambiguity. The result set contains all possible tuple that can be
constructed from one tuple in r and one tuple in s.
Result schema: If we have R(A1, A2, …, An) and S(B1, B2, …, Bm) then the list of
attributes in Result is (A1, A2, …, An, B1, B2, …, Bm)
77
Result size: ∣r×s∣=∣r∣∗∣s∣
Producing the result of PRODUCT operation:
o For each tuple in r, form new tuples by pair it with each tuple in s
o Place all of these new tuples in the result set
Example
Note: As we can notice, the CARTESIAN PRODUCT operation by itself is not a useful
querying mechanism since the result size is large. However, it is an extremely important
operation of relational algebra since it is the basic mechanism for combining information
across relations. We will discuss about this topic in more detail in Query Processing
lecture.
78
Section 2: Other Relational Algebra Operations
Section objectives
The SELECT operation is a unary operation. It means the input of this operation is only one
relation and its output is also a relation.
The SELECT operation returns a subset of the tuples from a relation that satisfies a selection
condition. The SELECT operation can be viewed as a horizontal filter of the relation. It
partitions the input relation into two sets of tuples: those tuples that satisfy the condition are
selected; those tuples that do not satisfy the condition are discarded.
Notation:
σselection−condition>(r)
79
<attribute name 1> <comparison op> <attribute name 2>
In the clause, the comparison operations could be one of the following: ≤, ≥, ≠, =, >, < . Clauses
are connected by Boolean operators : and, or , not
Example: Retrieve the Id, Name, Suburb of student whose name is Mary or students who
live in Bundoora
80
2.2 PROJECT Operation
The PROJECT operation is another unary operation. This operation returns a set of tuples
containing a subset of the attributes in the original relation. Thus, as we state that the SELECT
operation selects some rows and discards the others. The PROJECT operation, on the other hand,
selects some columns of the relation and discards the other column. The PROJECT operation
can be viewed as the vertical filter of the relation.
Notation: πattribute−list(r)
81
Retrieve the name of the subjects and department which is responsible for the subject
The JOIN operation is used to combine related tuples from two relations into a single tuple. The
Theta-JOIN is a specialized product containing only pairs that match on a supplied condition
called join-condition.
Notation: r⊲⊳join−conditions
where <condition> is a comparison between one attribute in R and one attribute in S, provided
that these two attributes have the same domain.
82
o If the new tuple satisfies the specified condition, then place it in the result set.
Example:
Variations of JOIN
EQUI-JOIN: A JOIN where the only comparison operator used in the join condition is “=” is
called EQUI-JOIN. The result of Equijoin always has one or more pairs of attributes that have
identical values in every tuple.
83
Example:
NATURAL JOIN: The Natural Join operation is a specialized product where the result tuple
contains only pairs of tuples that match on their common attributes with one of each pair of
common attributes is eliminated. The standard definition of Natural Join requires that the two
join attributes have the same name. Therefore, we can see that Natural Join is created to get rid
of the duplicate columns in an Equijoin.
Notation: r * s
Natural Join can be defined using other operation r∗s=πR∪S(σcondition(r×s))
where r(R) and s(S) and condition is Boolean expression (A1 = B1) AND (A2 = B2 ) AND …
AND (Ak = Bk) with Ai is the attribute in r , Bi is the attribute in s and (Ai, Bi) is a pair of
common attributes.
For each tuple in relation r, compare common attributes with those in each tuple of s
If two tuples match in their common attributes then combine tuples, remove duplicate
attributes and add to the result.
Example: From the example of Equijoin, assume that the attribute list in s now is ( E,
F,B) instead of (E, F, G) then we can have the expression r * s .
84
Figure 20
The Division Operation is defined on two relation r(U1) and s(U2) where U2 is the subset of U1
and s is not an empty relation: r÷s={t∣t∈r(U1−U2)∧satisfy { where satisfy=∀ts∈s(∃tr∈r(tr[U2]=ts
∧tr[U1−U2]=t ))
This means that for a tuple t to appear in the result of Division, the values in t must appear in r in
combination with every tuple in s.
The Division is very useful for a special kind of query such as “Retrieve the name of the student
who enrolls in all course teach by Professor Ba”
85
Figure 22: Division Operation Example
Sample Database
In this session, we use the COMPANY database in the examples for illustrating the use of
Relational Algebra for answering several queries.
The relational database schema for the COMPANY database is specified as below
Sample Queries
σsalary>30000(EMPLOYEE)
Query 2: Find the name, address of employees who works for department number 1
πName,Address(σDeptId=1(EMPLOYEE))
Query 3: Find the name of the department that employee John Smith works for.
86
πDname(σName='JohnSmith'(EMPLOYEE∗DEPARTMENT))
Query 4: Find the name, relationship of all the dependents of employees who works for
Department Human Resource
πDependent−Name,Relationship(σDName='HumanResource'
(EMP−DEPENDENT,EMPLOYEE, DEPARTMENT))
87
UNIT FIVE
INTRODUCTION TO SQL
Objectives
Define SQL
Know DML and DDL well
Use DDL statements to create, alter and drop objects in SQL
Use DML statements to manipulate data in the database
Use built-in functions in SQL
Use aggregate functions in SQL
Structures
What is SQL?
SQL statements
88
Section 1: SQL Definition and SQL statements
Section objectives
Define SQL
Know DML and DDL well
Use DDL statements to create, alter and drop objects in SQL
Use DML statements to manipulate data in the database
89
Although SQL is an ANSI (American National Standards Institute) standard, there are many
different versions of the SQL language.
However, to be compliant with the ANSI standard, they all support at least the major commands
(such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner.
Note: Most of the SQL database programs also have their own proprietary extensions in addition
to the SQL standard!
RDBMS
RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and
for all modern database systems like MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft
Access. The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.
Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data. Below is an example of a
table called "Persons":
The table above contains three records (one for each person) and five columns (P_Id, LastName,
FirstName, Address, and City).
Most of the actions you need to perform on a database are done with SQL statements.
The following SQL statement will select all the records in the "Persons" table:
90
SELECT * FROM Persons
Some database systems require a semicolon at the end of each SQL statement.
Semicolon is the standard way to separate each SQL statement in database systems that allow
more than one SQL statement to be executed in the same call to the server.
We are using MS Access and SQL Server 2000 and we do not have to put a semicolon after each
SQL statement, but some database programs force you to use it.
SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data
Definition Language (DDL).
The query and update commands form the DML part of SQL:
The DDL part of SQL permits database tables to be created or deleted. It also define indexes
(keys), specify links between tables, and impose constraints between tables. The most important
DDL statements in SQL are:
91
DROP INDEX - deletes an index
SELECT * FROM
SELECT column_name(s)
And
table_name
FROM table_name
Now we want to select the content of the columns named "LastName" and "FirstName" from the
table above.
LastName FirstName
Hansen Ola
92
Svendson Tove
Pettersen Kari
SELECT * Example
Now we want to select all the columns from the "Persons" table.
In a table, some of the columns may contain duplicate values. This is not a problem, however,
sometimes you will want to list only the different (distinct) values in a table.
The DISTINCT keyword can be used to return only distinct (different) values.
93
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to select only the distinct values from the column named "City" from the table
above.
We use the following SELECT statement: The result-set will look like this:
SELECT City
DISTINCT City Sandnes
FROM Persons Stavanger
The WHERE clause is used to extract only those records that fulfill a specified criterion.
SELECT column_name(s)
FROM table_name
Now we want to select only the persons living in the city "Sandnes" from the table above.
94
We use the following SELECT statement:
WHERE City='Sandnes'
SQL uses single quotes around text values (most database systems will also accept double
quotes).
95
With the WHERE clause, the following operators can be used:
Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEEN Between an inclusive range
LIKE Search for a pattern
IN If you know the exact value you want to return for at least one of the columns
The AND & OR operators are used to filter records based on more than one condition.
The AND operator displays a record if both the first condition and the second condition is true.
The OR operator displays a record if either the first condition or the second condition is true.
Now we want to select only the persons with the first name equal to "Tove" AND the last name
equal to "Svendso
96
We use the following SELECT statement: The result-set will look like this
OR Operator Example
Now we want to select only the persons with the first name equal to "Tove" OR the first name
equal to "Ola":
We use the following SELECT statement: The result-set will look like this:
You can also combine AND and OR (use parenthesis to form complex expressions).
Now we want to select only the persons with the last name equal to "Svendson" AND the first
name equal to "Tove" OR to "Ola
97
We use the following SELECT statement: The result-set will look like this:
LastName='Svendson'
AND (FirstName='Tove' OR
FirstName='Ola')
The ORDER BY keyword sorts the records in ascending order by default. If you want to sort the
records in a descending order, you can use the DESC keyword.
SELECT column_name(s)
FROM table_name
ORDER BY Example
98
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger
Now we want to select all the persons from the table above, however, we want to sort the
persons by their last name.
We use the following SELECT statement: The result-set will look like this:
Now we want to select all the persons from the table above, however, we want to sort the
persons descending by their last name.
99
The result-set will look like this:
The first form doesn't specify the column names where the data will be inserted, only their
values:
The second form specifies both the column names and the values to be inserted:
100
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
The following SQL statement will add a new row, but only add data in the "P_Id", "LastName"
and the "FirstName" column
101
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Johan Bakken 2 Stavanger
5 Tjessem Jakob
UPDATE table_name
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which
record or records that should be updated. If you omit the WHERE clause, all records will be
updated!
102
Now we want to update the person "Tjessem, Jakob" in the "Persons" table.
UPDATE Persons
Be careful when updating records. If we had omitted the WHERE clause in the example above,
like this
UPDATE Persons
103
4 Nilsen Johan Nissestien 67 Sandnes
5 Tjessem Jakob Nissestien 67 Sandnes
WHERE some_column=some_value
Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies which
record or records that should be deleted. If you omit the WHERE clause, all records will be
deleted!
Now we want to delete the person "Tjessem, Jakob" in the "Persons" table.
104
DELETE FROM Persons
It is possible to delete all rows in a table without deleting the table. This means that the table
structure, attributes, and indexes will be intact:
or
Note: Be very careful when deleting records. You cannot undo this statement!
The TOP clause can be very useful on large tables with thousands of records. Returning a large
number of records can impact on performance.
105
SQL Server Syntax
FROM table_name
Now we want to select only the two first records in the table above.
106
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
4 Nilsen Tom Vingvn 23 Stavanger
Now we want to select only 50% of the records in the table above.
SELECT column_name(s)
FROM table_name
WHERE column_name
LIKE pattern
107
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to select the persons living in a city that starts with "s" from the table above.
The "%" sign can be used to define wildcards (missing letters in the pattern) both before and
after the pattern.
Next, we want to select the persons living in a city that ends with an "s" from the "Persons"
table.
108
We use the following SELECT statement:
Next, we want to select the persons living in a city that contains the pattern "tav" from the
"Persons" table.
We use the following SELECT statement: The result-set will look like this:
It is also possible to select the persons living in a city that NOT contains the pattern "tav" from
the "Persons" table, by using the NOT keyword.
We use the following SELECT statement: The result-set will look like this:
SQL wildcards can substitute for one or more characters when searching for data in a database.
109
SQL wildcards must be used with the SQL LIKE operator.
Wildcard Description
% A substitute for zero or more characters
_ A substitute for exactly one character
[charlist] Any single character in charlist
[^charlist] or Any single character not in charlist
[!charlist]
Now we want to select the persons living in a city that starts with "sa" from the "Persons" table.
We use the following SELECT statement: The result-set will look like this:
Next, we want to select the persons living in a city that contains the pattern "nes" from the
"Persons" table.
110
We use the following SELECT statement:
The result-set will look like this:
SELECT * FROM Persons
P_Id LastName FirstName Address City
WHERE City LIKE '%nes%' 1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
Now we want to select the persons with a first name that starts with any character, followed by
"la" from the "Persons" table.
We use the following SELECT statement: The result-set will look like this:
Next, we want to select the persons with a last name that starts with "S", followed by any
character, followed by "end", followed by any character, followed by "on" from the "Persons"
table.
We use the following SELECT statement: The result-set will look like this:
111
Now we want to select the persons with a last name that starts with "b" or "s" or "p" from the
"Persons" table.
We use the following SELECT statement: P_Id LastName FirstName Address City
2 Svendson Tove Borgvn 23 Sandnes
SELECT * FROM Persons
3 Pettersen Kari Storgt 20 Stavanger
Next, we want to select the persons with a last name that do not start with "b" or "s" or "p" from
the "Persons" table.
We use the following SELECT statement: The result-set will look like this:
The IN Operator
SQL IN Syntax
SELECT column_name(s)
FROM table_name
IN Operator Example
112
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to select the persons with a last name equal to "Hansen" or "Pettersen" from the
table above.
We use the following SELECT statement: The result-set will look like this:
Now we want to select the persons with a last name alphabetically between "Hansen" and
"Pettersen" from the table above.
113
We use the following SELECT statement: The result-set will look like this:
In some databases a person with the LastName of "Hansen" or "Pettersen" will not be listed
(BETWEEN only selects fields that are between and excluding the test values).
In other databases a person with the last name of "Hansen" or "Pettersen" will be listed
(BETWEEN selects fields that are between and including the test values).
And in other databases a person with the last name of "Hansen" will be listed, but "Pettersen"
will not be listed (BETWEEN selects fields between the test values, including the first test value
and excluding the last test value).
Example 2
To display the persons outside the range in the previous example, use NOT BETWEEN:
WHERE LastName
114
P_Id LastName FirstName Address City
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
You can give a table or a column another name by using an alias. This can be a good thing to do
if you have very long or complex table names or column names.
SELECT column_name(s)
FROM table_name
AS alias_name
FROM table_name
Alias Example
Assume we have a table called "Persons" and another table called "Product_Orders". We will
give the table aliases of "p" an "po" respectively.
Now we want to list all the orders that "Ola Hansen" is responsible for.
115
FROM Persons AS p,
Product_Orders AS po
WHERE p.LastName='Hansen'
WHERE p.FirstName='Ola'
FROM Persons,
Product_Orders
WHERE Persons.LastName='Hansen'
WHERE Persons.FirstName='Ola'
As you'll see from the two SELECT statements above; aliases can make queries easier to both
write and to read.
The JOIN keyword is used in an SQL statement to query data from two or more tables, based on
a relationship between certain columns in these tables.
A primary key is a column (or a combination of columns) with a unique value for each row.
Each primary key value must be unique within the table. The purpose is to bind data together,
across tables, without repeating all of the data in every table.
116
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Note that the "P_Id" column is the primary key in the "Persons" table. This means that no two
rows can have the same P_Id. The P_Id distinguishes two persons even if they have the same
name.
Note that the "O_Id" column is the primary key in the "Orders" table and that the "P_Id" column
refers to the persons in the "Persons" table without using their names.
Notice that the relationship between the two tables above is the "P_Id" column.
Before we continue with examples, we will list the types of JOIN you can use, and the
differences between them.
JOIN: Return rows when there is at least one match in both tables
LEFT JOIN: Return all rows from the left table, even if there are no matches in the
right table
RIGHT JOIN: Return all rows from the right table, even if there are no matches in the
left table
FULL JOIN: Return rows when there is a match in one of the tables
117
SQL INNER JOIN Keyword
The INNER JOIN keyword return rows when there is at least one match in both tables.
SELECT column_name(s)
FROM table_name1
ON table_name1.column_name=table_name2.column_name
118
We use the following SELECT statement:
FROM Persons
ON Persons.P_Id=Orders.P_Id
ORDER BY Persons.LastName
The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no
matches in the right table (table_name2).
SELECT column_name(s)
FROM table_name1
ON table_name1.column_name=table_name2.column_name
119
PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.
Now we want to list all the persons and their orders - if any, from the tables above.
ORDER BY Persons.LastName
120
The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no
matches in the right table (Orders).
The RIGHT JOIN keyword Return all rows from the right table (table_name2), even if there are
no matches in the left table (table_name1).
SELECT column_name(s)
FROM table_name1
ON table_name1.column_name=table_name2.column_name
121
4 24562 1
5 34764 15
Now we want to list all the orders with containing persons - if any, from the tables above.
We use the following SELECT statement: The result-set will look like this:
ORDER BY Persons.LastName
The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no
matches in the left table (Persons).
The FULL JOIN keyword return rows when there is a match in one of the tables.
SELECT column_name(s)
FROM table_name1
ON table_name1.column_name=table_name2.column_name
122
SQL FULL JOIN Example
Now we want to list all the persons and their orders, and all the orders with their persons.
We use the following SELECT statement: The result-set will look like this:
ORDER BY Persons.LastName
123
The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from
the right table (Orders). If there are rows in "Persons" that do not have matches in "Orders” or if
there are rows in "Orders" that do not have matches in "Persons", those rows will be listed as
well.
The UNION operator is used to combine the result-set of two or more SELECT statements.
Notice that each SELECT statement within the UNION must have the same number of columns.
The columns must also have similar data types. Also, the columns in each SELECT statement
must be in the same order.
UNION
Note: The UNION operator selects only distinct values by default. To allow duplicate values,
use UNION ALL.
UNION ALL
PS: The column names in the result-set of a UNION are always equal to the column names in
the first SELECT statement in the UNION.
124
SQL UNION Example
"Employees_Norway":
E_ID E_Name
01 Hansen, Ola
02 Svendson, Tove
03 Svendson, Stephen
04 Pettersen, Kari
"Employees_USA":
E_ID E_Name
01 Turner, Sally
02 Kent, Clark
03 Svendson, Stephen
04 Scott, Stephen
Now we want to list all the different employees in Norway and USA.
125
Note: This command cannot be used to list all employees in Norway and USA. In the example
above we have two employees with equal names, and only one of them will be listed. The
UNION command selects only distinct values.
UNION ALL
Result
E_Name
Hansen, Ola
Svendson, Tove
Svendson, Stephen
Pettersen, Kari
Turner, Sally
Kent, Clark
Svendson, Stephen
Scott, Stephen
The SELECT INTO statement selects data from one table and inserts it into a different table.
The SELECT INTO statement is most often used to create backup copies of tables.
126
We can select all columns into the new Or we can select only the columns we want
table: into the new table:
Make a Backup Copy - Now we want to make an exact copy of the data in our "Persons" table.
We use the following SQL statement: We can also use the IN clause to copy the
table into another database:
SELECT *
SELECT *
INTO Persons_Backup
INTO Persons_Backup IN
FROM Persons 'Backup.mdb'
FROM Persons
127
We can also copy only a few fields into the new table:
SELECT LastName,FirstName
INTO Persons_Backup
FROM Persons
The following SQL statement creates a "Persons_Backup" table with only the persons who lives
in the city "Sandnes":
SELECT LastName,Firstname
INTO Persons_Backup
FROM Persons
WHERE City='Sandnes'
The following example creates a "Persons_Order_Backup" table contains data from the two
tables "Persons" and "Orders":
128
SELECT Persons.LastName,Orders.OrderNo
INTO Persons_Order_Backup
FROM Persons
ON Persons.P_Id=Orders.P_Id
129
SQL CREATE TABLE Syntax
column_name1 data_type,
column_name2 data_type,
column_name3 data_type,)
The data type specifies what type of data the column can hold. For a complete reference of all
the data types available in MS Access, MySQL, and SQL Server, go to our complete
Now we want to create a table called "Persons" that contains five columns: P_Id, LastName,
FirstName, Address, and City.
P_Id int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255) );
130
The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and
City columns are of type varchar with a maximum length of 255 characters.
The empty table can be filled with data with the INSERT INTO statement.
SQL Constraints
Constraints are used to limit the type of data that can go into a table.
Constraints can be specified when a table is created (with the CREATE TABLE statement) or
after the table is created (with the ALTER TABLE statement).
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK
DEFAULT
The NOT NULL constraint enforces a column to NOT accept NULL values.
The NOT NULL constraint enforces a field to always contain a value. This means that you
cannot insert a new record, or update a record without adding a value to this field.
The following SQL enforces the "P_Id" column and the "LastName" column to not accept
NULL values:
131
CREATE TABLE Persons (
FirstName varchar(255),
Address varchar(255),
City varchar(255) );
The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a
column or set of columns.
Note that you can have have many UNIQUE constraints per table, but only one PRIMARY KEY
constraint per table.
The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons"
table is created:
SQL Server:
132
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple
columns, use the following SQL syntax:
SQL Server:
FirstName varchar(255),
Address varchar(255),
City varchar(255),
133
SQL UNIQUE Constraint on ALTER TABLE
To create a UNIQUE constraint on the "P_Id" column when the table is already created, use the
following SQL:
SQL Server
To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple
columns, use the following SQL syntax:
SQL Server
SQL Server:
The PRIMARY KEY constraint uniquely identifies each record in a database table.
134
Each table should have a primary key, and each table can have only one primary key.
The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is
created:
SQL Server
FirstName varchar(255),
Address varchar(255),
City varchar(255)
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint
on multiple columns, use the following SQL syntax:
SQL Server
135
FirstName varchar(255),
Address varchar(255),
City varchar(255),
To create a PRIMARY KEY constraint on the "P_Id" column when the table is already created,
use the following SQL:
SQL Server
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint
on multiple columns, use the following SQL syntax:
SQL Server
Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s)
must already have been declared to not contain NULL values (when the table was first created).
SQL Server
136
ALTER TABLE Persons
Let's illustrate the foreign key with an example. Look at the following two tables:
Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons"
table.
The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.
The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.
The FOREIGN KEY constraint is used to prevent actions that would destroy link between tables.
The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign key
column, because it has to be one of the values contained in the table it points to.
137
SQL FOREIGN KEY Constraint on CREATE TABLE
The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is
created:
SQL Server
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint
on multiple columns, use the following SQL syntax:
SQL Server
P_Id int,
138
REFERENCES Persons(P_Id)
To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already
created, use the following SQL:
SQL Server
REFERENCES Persons(P_Id)
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint
on multiple columns, use the following SQL syntax:
SQL Server
REFERENCES Persons(P_Id)
139
SQL Server
The CHECK constraint is used to limit the value range that can be placed in a column.
If you define a CHECK constraint on a single column it allows only certain values for this
column.
If you define a CHECK constraint on a table it can limit the values in certain columns based on
values in other columns in the row.
The following SQL creates a CHECK constraint on the "P_Id" column when the "Persons" table
is created. The CHECK constraint specifies that the column "P_Id" must only include integers
greater than 0.
SQL Server:
FirstName varchar(255),
Address varchar(255),
City varchar(255)
140
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple
columns, use the following SQL syntax:
SQL Server:
FirstName varchar(255),
Address varchar(255),
City varchar(255),
To create a CHECK constraint on the "P_Id" column when the table is already created, use the
following SQL:
SQL Server
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple
columns, use the following SQL syntax:
SQL Server:
141
ALTER TABLE Persons
SQL Server:
The default value will be added to all new records, if no other value is specified.
The following SQL creates a DEFAULT constraint on the "City" column when the "Persons"
table is created:
SQL Server:
FirstName varchar(255),
Address varchar(255),
142
City varchar(255) DEFAULT 'Sandnes'
The DEFAULT constraint can also be used to insert system values, by using functions like
GETDATE():
P_Id int,
To create a DEFAULT constraint on the "City" column when the table is already created, use the
following SQL:
SQL Server:
SQL Server:
143
ALTER TABLE Persons
Indexes allow the database application to find data fast; without reading the whole table.
Indexes
An index can be created in a table to find data more quickly and efficiently.
The users cannot see the indexes, they are just used to speed up searches/queries.
Note: Updating a table with indexes takes more time than updating a table without (because the
indexes also need an update). So you should only create indexes on columns (and tables) that
will be frequently searched against.
ON table_name (column_name)
ON table_name (column_name)
Note: The syntax for creating indexes varies amongst different databases. Therefore: Check the
syntax for creating indexes in your database.
144
The SQL statement below creates an index named "PIndex" on the "LastName" column in the
"Persons" table:
ON Persons (LastName)
If you want to create an index on a combination of columns, you can list the column names
within the parentheses, separated by commas:
145
The DROP TABLE statement is used to delete a table.
What if we only want to delete the data inside the table, and not the table itself?
The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.
To delete a column in a table, use the following syntax (notice that some database systems don't
allow deleting a column):
146
ALTER TABLE table_name
To change the data type of a column in a table, use the following syntax:
Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data
type specifies what type of data the column can hold. For a complete reference of all the data
types available in MS Access, MySQL, and SQL Server, go to our complete
147
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:
Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two-
digit or four-digit format.
Next, we want to delete the column named "DateOfBirth" in the "Persons" table.
148
AUTO INCREMENT a Field
Very often we would like the value of the primary key field to be created automatically every
time a new record is inserted.
The following SQL statement defines the "P_Id" column to be an auto-increment primary key
field in the "Persons" table:
FirstName varchar(255),
Address varchar(255),
City varchar(255)
The MS SQL Server uses the IDENTITY keyword to perform an auto-increment feature.
By default, the starting value for IDENTITY is 1, and it will increment by 1 for each new record.
To specify that the "P_Id" column should start at value 10 and increment by 5, change the
identity to IDENTITY(10,5).
To insert a new record into the "Persons" table, we will not have to specify a value for the "P_Id"
column (a unique value will be added automatically):
149
INSERT INTO Persons (FirstName,LastName)
VALUES ('Lars','Monsen')
The SQL statement above would insert a new record into the "Persons" table. The "P_Id"
column would be assigned a unique value. The "FirstName" column would be set to "Lars" and
the "LastName" column would be set to "Monsen".
A view contains rows and columns, just like a real table. The fields in a view are fields from one
or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if
the data were coming from one single table.
SELECT column_name(s)
FROM table_name
WHERE condition
Note: A view always shows up-to-date data! The database engine recreates the data, using the
view's SQL statement, every time a user queries a view.
If you have the Northwind database you can see that it has several views installed by default.
The view "Current Product List" lists all active products (products that are not discontinued)
from the "Products" table. The view is created with the following SQL:
150
CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName
FROM Products
WHERE Discontinued=No
Another view in the Northwind sample database selects every product in the "Products" table
with a unit price higher than the average unit price:
SELECT ProductName,UnitPrice
FROM Products
Another view in the Northwind database calculates the total sale for each category in 1997. Note
that this view selects its data from another view called "Product Sales for 1997":
151
FROM [Product Sales for 1997]
GROUP BY CategoryName
We can also add a condition to the query. Now we want to see the total sale only for the category
"Beverages":
WHERE CategoryName='Beverages'
SELECT column_name(s)
FROM table_name
WHERE condition
Now we want to add the "Category" column to the "Current Product List" view. We will update
the view with the following SQL:
152
SELECT ProductID,ProductName,Category
FROM Products
WHERE Discontinued=No
SQL Dates
The most difficult part when working with dates is to be sure that the format of the date you are
trying to insert, matches the format of the date column in the database.
As long as your data contains only the date portion, your queries will work as expected.
However, if a time portion is involved, it gets complicated.
Before talking about the complications of querying for dates, we will look at the most important
built-in functions for working with dates.
The following table lists the most important built-in date functions in SQL Server:
Function Description
GETDATE() Returns the current date and time
DATEPART() Returns a single part of a date/time
DATEADD() Adds or subtracts a specified time interval from a date
DATEDIFF() Returns the time between two dates
153
CONVERT() Displays date/time data in different formats
SQL Server comes with the following data types for storing a date or a date/time value in the
database:
Note: The date types are chosen for a column when you create a new table in your database!
You can compare two dates easily if there is no time component involved!
Now we want to select the records with an OrderDate of "2008-11-11" from the table above.
154
OrderId ProductName OrderDate
1 Geitost 2008-11-11
3 Mozzarella di Giovanni 2008-11-11
Now, assume that the "Orders" table looks like this (notice the time component in the
"OrderDate" column):
we will get no result! This is because the query is looking only for dates with no time portion.
Tip: If you want to keep your queries simple and easy to maintain, do not allow time
components in your dates!
If a column in a table is optional, we can insert a new record or update an existing record without
adding a value to this column. This means that the field will be saved with a NULL value.
NULL values are treated differently from other values. NULL is used as a placeholder for
unknown or inapplicable values.
Note: It is not possible to compare NULL and 0; they are not equivalent.
155
P_Id LastName FirstName Address City
1 Hansen Ola Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Stavanger
Suppose that the "Address" column in the "Persons" table is optional. This means that if we
insert a record with no value for the "Address" column, the "Address" column will be saved with
a NULL value.
It is not possible to test for NULL values with comparison operators, such as =, <, or <>.
We will have to use the IS NULL and IS NOT NULL operators instead.
SQL IS NULL
How do we select only the records with NULL values in the "Address" column?
How do we select only the records with no NULL values in the "Address" column?
156
We will have to use the IS NOT NULL operator:
In the next portion, we will look at the ISNULL(), NVL(), IFNULL() and COALESCE()
functions.
Suppose that the "UnitsOnOrder" column is optional, and may contain NULL values.
SELECT ProductName,UnitPrice*(UnitsInStock+UnitsOnOrder)
FROM Products
In the example above, if any of the "UnitsOnOrder" values are NULL, the result is NULL.
Microsoft's ISNULL() function is used to specify how we want to treat NULL values.
157
The NVL(), IFNULL(), and COALESCE() functions can also be used to achieve the same result.
Below, if "UnitsOnOrder" is NULL it will not harm the calculation, because ISNULL() returns a
zero if the value is NULL:
E.g.
SELECT ProductName,UnitPrice*(UnitsInStock+ISNULL(UnitsOnOrder,0))
FROM Products
Data types and ranges for Microsoft Access, MySQL and SQL Server.
Character strings:
Unicode strings:
Binary types:
158
Data type Description Storage
bit Allows 0, 1, or NULL
binary(n) Fixed-length binary data. Maximum 8,000 bytes
varbinary(n) Variable-length binary data. Maximum 8,000 bytes
varbinary(max) Variable-length binary data. Maximum 2GB
image Variable-length binary data. Maximum 2GB
Number types:
159
The s parameter indicates the maximum number of digits stored to
the right of the decimal point. s must be a value from 0 to p.
Default value is 0
smallmoney Monetary data from -214,748.3648 to 214,748.3647 4 bytes
money Monetary data from -922,337,203,685,477.5808 to 8 bytes
922,337,203,685,477.5807
float(n) Floating precision number data from -1.79E + 308 to 1.79E + 308. 4 or 8
bytes
The n parameter indicates whether the field should hold 4 or 8
bytes. float(24) holds a 4-byte field and float(53) holds an 8-byte
field. Default value of n is 53.
real Floating precision number data from -3.40E + 38 to 3.40E + 38 4 bytes
Date types:
160
sql_variant Stores up to 8,000 bytes of data of various data types, except text, ntext, and
timestamp
uniqueidentifier Stores a globally unique identifier (GUID)
xml Stores XML formatted data. Maximum 2GB
cursor Stores a reference to a cursor used for database operations
table Stores a result-set for later processing
SQL aggregate functions return a single value, calculated from values in a column.
SQL scalar functions return a single value, based on the input value.
161
Tip: The aggregate functions and the scalar functions will be explained in details in the next
section.
OrderAverage
950
162
Now we want to find the customers that have an OrderPrice value higher then the average
OrderPrice value.
Customer
Hansen
Nilsen
Jensen
The COUNT(column_name) function returns the number of values (NULL values will not be
counted) of the specified column:
163
SQL COUNT(DISTINCT column_name) Syntax
The COUNT(DISTINCT column_name) function returns the number of distinct values of the
specified column:
Note: COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but not with
Microsoft Access.
WHERE Customer='Nilsen'
The result of the SQL statement above will be 2, because the customer Nilsen has made 2 orders
in total:
CustomerNilsen
2
164
SQL COUNT(*) Example
NumberOfOrders
6
Now we want to count the number of unique customers in the "Orders" table.
NumberOfCustomers
3
Which is the number of unique customers (Hansen, Nilsen, and Jensen) in the "Orders" table.
The FIRST() function returns the first value of the selected column.
165
SQL FIRST() Example
FirstOrderPrice
1000
The LAST() function returns the last value of the selected column.
166
O_Id OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
LastOrderPrice
100
The MAX() function returns the largest value of the selected column.
167
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
LargestOrderPrice
2000
The MIN() function returns the smallest value of the selected column.
168
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
SmallestOrderPrice
100
169
Now we want to find the sum of all "OrderPrice" fields".
OrderTotal
5700
The GROUP BY statement is used in conjunction with the aggregate functions to group the
result-set by one or more columns.
FROM table_name
GROUP BY column_name
170
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now we want to find the total sum (total order) of each customer.We will have to use the
GROUP BY statement to group the customers. We use the following SQL statement:
GROUP BY Customer
Customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000
Customer SUM(OrderPrice)
Hansen 5700
Nilsen 5700
Hansen 5700
Hansen 5700
Jensen 5700
Nilsen 5700
171
The result-set above is not what we wanted.
Explanation of why the above SELECT statement cannot be used: The SELECT statement
above has two columns specified (Customer and SUM(OrderPrice). The "SUM(OrderPrice)"
returns a single value (that is the total sum of the "OrderPrice" column), while "Customer"
returns 6 values (one value for each row in the "Orders" table). This will therefore not give us the
correct result. However, you have seen that the GROUP BY statement solves this problem.
We can also use the GROUP BY statement on more than one column, like this:
GROUP BY Customer,OrderDate
The HAVING clause was added to SQL because the WHERE keyword could not be used with
aggregate functions.
FROM table_name
GROUP BY column_name
172
O_Id OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now we want to find if any of the customers have a total order of less than 2000.
GROUP BY Customer
HAVING SUM(OrderPrice)<2000
Customer SUM(OrderPrice)
Nilsen 1700
Now we want to find if the customers "Hansen" or "Jensen" have a total order of more than
1500.
GROUP BY Customer
HAVING SUM(OrderPrice)>1500
173
The result-set will look like this:
Customer SUM(OrderPrice)
Hansen 2000
Jensen 2000
Now we want to select the content of the "LastName" and "FirstName" columns above, and
convert the "LastName" column to uppercase.
LastName FirstName
174
HANSEN Ola
SVENDSON Tove
PETTERSEN Kari
Now we want to select the content of the "LastName" and "FirstName" columns above, and
convert the "LastName" column to lowercase.
LastName FirstName
hansen Ola
svendson Tove
pettersen Kari
175
The MID() Function
Parameter Description
column_name Required. The field to extract characters from.
start Required. Specifies the starting position (starts at 1).
length Optional. The number of characters to return. If omitted, the MID() function
returns the rest of the text.
Now we want to extract the first four characters of the "City" column above.
176
SmallCity
Sand
Sand
Stav
Now we want to select the length of the values in the "Address" column above.
LengthOfAddress
12
9
9
The ROUND() function is used to round a numeric field to the number of decimals specified.
177
Parameter Description
column_name Required. The field to round.
decimals Required. Specifies the number of decimals to be returned.
Now we want to display the product name and the price rounded to the nearest integer.
ProductName UnitPrice
Jarlsberg 10
Mascarpone 33
Gorgonzola 16
The NOW() function returns the current system date and time.
178
SQL NOW() Example
Now we want to display the products and prices per today's date.
Parameter Description
column_name Required. The field to be formatted.
format Required. Specifies the format.
179
SQL FORMAT() Example
Now we want to display the products and prices per today's date (with today's date displayed in
the following format "YYYY-MM-DD").
FROM Products
or
180
ALTER TABLE table_name
DROP COLUMN column_name
AS (alias) SELECT column_name AS column_alias
FROM table_name
or
SELECT column_name
FROM table_name AS table_alias
BETWEEN SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2
CREATE DATABASE CREATE DATABASE database_name
CREATE TABLE CREATE TABLE table_name
(
column_name1 data_type,
column_name2 data_type,
column_name2 data_type,
...
)
CREATE INDEX CREATE INDEX index_name
ON table_name (column_name)
or
181
DELETE DELETE FROM table_name
WHERE some_column=some_value
or
182
INSERT INTO INSERT INTO table_name
VALUES (value1, value2, value3,....)
or
183
FROM table_name
SELECT DISTINCT SELECT DISTINCT column_name(s)
FROM table_name
SELECT INTO SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_table_name
or
SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_table_name
SELECT TOP SELECT TOP number percent column_name(s)
FROM table_name
TRUNCATE TABLE TRUNCATE TABLE table_name
UNION SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2
UNION ALL SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2
UPDATE UPDATE table_name
SET column1=value, column2=value,...
WHERE some_column=some_value
WHERE SELECT column_name(s)
FROM table_name
WHERE column_name operator value
184
UNIT SIX
PROGRAMMING BLOCKS IN SQL
Structure
PL/SQL Trigger
OBJECTIVES
Introduction
Programming activities demand high concentration and creativity. Cultivating such skill would
help you to design logic for the problem at hand easily.
185
In this unit, you will learn basic difference between PL/SQL and simple SQL statements, how
construct PL/SQL statements such as variables declaration in PL/SQL, conditional statements in
PL/SQL, iterative statements, in PL/SQL, how to create and use cursor and trigger in PL/SQL.
Besides you will learn how to exception handling techniques and functions in PL/SQL.
Short fragmented codes appropriate to the topics are given to help you understand the topics
easily.
Section objectives:
Upon Completing this section, you would be able to:
SQL statements are defined in terms of constraints which we wish to fix on the result of the
query. As we have mentioned earlier SQL statements let us to store and retrieve data from the
relation database. SQL is the Non-procedural programming language.
186
Taye Civics B
Fig 1 Student_record
Assume if one wants to know name of student that scored “A”, the following SQL statement
retrieve name of student(s) from the database:
In the above SQL statement the user didn‟t specify step by step algorithm to the database engine
how to retrieve name of student who scored “A” grade in any subject. What is expected from the
programmer to specify is, what to retrieve from the database. Such kind of languages is known
as non-procedural programming language.
Any programming language that executes instructions based on the given procedure is called
procedural language. In this case the programmer should specify every necessary step that the
machine follows to solve the given problem.
Assume a programmer want to add two integer numbers and display the result on the screen. In
this case a programmer should specify all the necessary steps that machine follows to add two
integer numbers. The following is the rudimentary steps that may be written by programmer to
be followed by the machine to solve the problem:
Step 2 Check whether the number is integer, if not repeat step one unit the enter number
integer
187
Step 4 Check whether the number is integer, if not repeat step three unit the enter number
integer
See the above six steps and identify features that are not appeared in non procedural
programming example above?
See the above six set of instructions. Step 2 and step 4 are executed if and only if the give
numbers are integers (or required condition is satisfied) such kinds of statements are called
conditional statements.
On the other hand some block of instructions may be executed repeatedly until some condition is
met.
Look step 2 and step 4, these statements will be executed if the given number is different from
integer number (this is the condition). Such kind of statements is called loop statement. A Loop
statement too, is not available in SQL statement.
These features give power to the programmer to control the execution steps of the program. As
we have said earlier SQL language is power full in manipulating data in the data base, but they
missed some features like loop and conditional statements. So to make writing a program in SQL
language more flexible, SQL statements are embedded within the procedural language to gain
conditional and loop features from the procedural language. As a result of this the PL/SQL
language in oracle database and Transact-SQL language in SQL server database are came into
picture.
In other word the SQL language is extended to PL/SQL and Transact-SQL languages by
inheriting features of procedural language. In this module we will use the PL/SQL language.
188
6.1.2. What is PL/SQL?
PL/SQL is a combination of SQL along with the procedural features of programming languages.
It was developed by Oracle Corporation in the early 90‟s to enhance the capabilities of SQL.
PL/SQL stands for "Procedural Language extensions to SQL." PL/SQL is available primarily as
an "enabling technology" within other software products; it does not exist as a standalone
language. You can use PL/SQL in the Oracle relational database, in the Oracle Server, and in
client-side application development tools, such as Oracle Forms. PL/SQL is closely integrated
into the SQL language, yet it adds programming constructs that are not native to this standard
relational database language. As you can see from the following code example, PL/SQL allows
you to combine SQL statements with "standard" procedural constructs. This single program can
either insert a company into or delete a company from the database. It relies on the IF statement
(not a SQL statement) to determine which action to take:
Example
1 PROCEDURE maintain_company
2 (action_in IN VARCHAR2,
id_in IN NUMBER,
3 BEGIN
4 IF action_in = 'DELETE'
5 THEN
189
7
9 THEN
12 END IF;
13 END;
Block Structures: PL SQL consists of blocks of code, which can be nested within each
other. Each block forms a unit of a task or a logical module. PL/SQL Blocks can be
stored in the database and reused.
Procedural Language Capability: PL SQL consists of procedural language constructs
such as conditional statements (if else statements) and loops like (FOR loops).
Better Performance: PL SQL engine processes multiple SQL statements
simultaneously as a single block, thereby reducing network traffic.
190
1- Header Section
4- Exception Section(optional)
This is part of the program where you can specify the type and name of the blocks
The Declaration section of a PL/SQL Block starts with the reserved keyword DECLARE. This
section is optional and is used to declare any placeholders like variables, constants, records and
cursors, which are used to manipulate data in the execution section. Placeholders may be any of
Variables, Constants and Records, which stores data temporarily. Cursors are also declared in
this section.
The Execution section of a PL/SQL Block starts with the reserved keyword BEGIN and ends
with END. This is a mandatory section and is the section where the program logic is written to
perform any task. The programmatic constructs like loops, conditional statement and SQL
statements form the part of execution section.
The Exception section of a PL/SQL Block starts with the reserved keyword EXCEPTION. This
section is optional. Any errors in the program can be handled in this section, so that the PL/SQL
Blocks terminates gracefully. If the PL/SQL Block contains exceptions that cannot be handled,
the Block terminates abruptly with errors.
Every statement in the above three sections must end with a semicolon (;). PL/SQL blocks can
191
be nested within other PL/SQL blocks. Comments can be used to document code. End section is
last part of the program that indicates the end of the program.
DECLARE
Variable declaration
BEGIN
Program Execution
EXCEPTION
Exception handling
END;
Section objectives:
Upon Completing this section, you would be able to
192
value or DEFAULT value is also an optional specification, where you can initialize a
variable.
Each variable declaration is a separate statement and must be terminated by a semicolon.
EXAMPLE 1
When a variable is specified as NOT NULL, you must initialize the variable when it is declared.
EXAMPLE 2
The below example declares two variables, one of which is a not null.
DECLARE
salary number(4);
in the above example “dept” variable is not NULL, in such case we must initialized the variable
with value( in the above example the “dept” value is “computer science”),when the variable is
declared.
Notice: The value of a variable can change in the execution or exception section of the
PL/SQL Block. We can assign values to variables in the following two ways given
below.
variable_name:= value;
193
2) We can assign values to variables directly from the database columns by using a SELECT..
INTO
statement.
SELECT column_name
INTO variable_name
FROM table_name
[WHERE condition];
EXAMPLE 3
The following fragment code shows how to assign value to variable using SELECT.. INTO
statement
The program will get the salary of an employee with id '1116' and display it on the screen
DECLARE
var_salary number(6);
BEGIN
SELECT salary
INTO var_salary
FROM employee
194
6.2.2 Scope of Variables
PL/SQL allows the nesting of Blocks within Blocks i.e, the Execution section of an outer
block can contain inner blocks. Therefore, a variable which is accessible to an outer
Block is also accessible to all nested inner Blocks. The variables declared in the inner
blocks are not accessible to outer blocks. Based on their declaration we can classify
variables into two types.
Local variables - These are declared in a inner block and cannot be referenced by outside
Blocks.
Global variables - These are declared in a outer block and can be referenced by its itself
and by its inner blocks.
For Example: In the below example we are creating two variables in the outer block and
assigning thier product to the third variable created in the inner block. The variable 'var_mult' is
declared in the inner block, so cannot be accessed in the outer block i.e. it cannot be accessed
after line 11. The variables 'var_num1' and 'var_num2' can be accessed anywhere in the block.
1> DECLARE
4> BEGIN
7> DECLARE
195
9> BEGIN
11> END;
12> END;
PL/SQL Constants
As the name implies a constant is a value used in a PL/SQL Block that remains
unchanged throughout the program. A constant is a user-defined literal value. You can
declare a constant and use it instead of actual value.
For example: If you want to write a program which will increase the salary of the
employees by 25%, you can declare a constant and use it throughout the program. Next
time when you want to increase the salary again you can change the value of the
constant which will be easier than changing the actual value throughout the program.
The word CONSTANT is a reserved word and ensures that the value does not change.
VALUE - It is a value which must be assigned to a constant when it is declared. You cannot
assign a value later.
DECLARE
196
You must assign a value to a constant at the time you declare it. If you do not assign a
value to a constant while declaring it and try to assign a value in the execution section,
you will get a error.
For Example: If you execute the below Pl/SQL block you will get error.
DECLARE
BEGIN
salary_increase := 100;
END;
Records are another type of data types which oracle allows to be defined as a
placeholder. Records are composite data types, which means it is a combination of
different scalar data types like char, varchar, number etc. Each scalar data types in the
record holds a value. A record can be visualized as a row of data. It can contain all the
contents of a row.
To declare a record, you must first define a composite datatype; then declare a
record for that type.
197
TYPE record_type_name IS RECORD
(first_col_name column_datatype,
First_col_name, second_col_name, etc.,- it is the names the fields/columns within the record.
There are different ways to declare the data type of the fields
You can declare the fields in the same way as you declare the fields while
creating the table.
If a field is based on a column from database table, you can define the field_type
as follows:
col_name table_name.column_name%type;
By declaring the field data type in the above method, the data type of the column
is dynamically applied to the field. This method is useful when you are altering
the column specification of the table, because you do not need to change the code
again.
NOTE: You can use also %type to declare variables and constants
record_name record_type_name;
198
For example: The following code shows how to declare records employee_rec based on user
defined type
DECLARE
(employee_id number(5),
employee_first_name varchar2(25),
employee_last_name employee.last_name%type,
employee_dept employee.dept%type);
employee_salary employee.salary%type;
employee_rec employee_type;
If all the fields of a record are based on the columns of a table, we can declare the record
as follows:
record_name table_name%ROWTYPE;
DECLARE
employee_rec employee%ROWTYPE
1) You do not need to explicitly declare variables for all the columns in a table.
2) If you alter the column specification in the database table, you do not need to update
the code.
199
The disadvantage of declaring the record as a ROWTYPE is:
1) When you create a record as a ROWTYPE, fields will be created for all the columns
in the table and memory will be used to create the datatype for all the fields. So use
ROWTYPE only when you are using all the columns of the table in the program.
NOTE: When you are creating a record, you are just creating a datatype, similar to
creating a variable. You need to assign values to the record to use them.
The following table consolidates the different ways in which you can define and declare
a PL/SQL record.
Syntax Usage
200
6.2.3.2. Passing Values To and From a Record
When you assign values to a record, you actually assign values to the fields within it.
The General Syntax to assign a value to a column within a record direclty is:
record_name.col_name := value;
If you used %ROWTYPE to declare a record, you can assign values as shown:
record_name.column_name := value;
FROM table_name
[WHERE clause];
If %ROWTYPE is used to declare a record then you can directly assign values to the
whole record instead of each columns separately. In this case, you must SELECT all the
columns from the table into the record as shown:
FROM table_name
[WHERE clause];
var_name := record_name.col_name;
201
The following table consolidates the different ways you can assign values to and from a
record:
Syntax Usage
record_name.col_name := value; To directly assign a value to a specific
column of a record.
record_name.column_name := value; To directly assign a value to a specific
column of a record, if the record is
declared using %ROWTYPE.
SELECT col1, col2 INTO To assign values to each field of a record
record_name.col_name1, from the database table.
record_name.col_name2 FROM table_name
[WHERE clause];
SELECT * INTO record_name FROM To assign a value to all fields in the record
table_name [WHERE clause]; from a database table.
variable_name := record_name.col_name; To get a value from a record column and
assigning it to a variable.
As the name implies, PL/SQL supports programming language features like conditional
statements, iterative statements.
The programming constructs are similar to how you use in programming languages like Java and
C++. In this section I will provide you syntax of how to use conditional statements in PL/SQL
programming.
In almost every program you write, you need to make decisions. For example, if it is the end of
the fiscal year, bonuses must be distributed to the employees based on their salaries. To compute
202
employee bonuses, a program needs a conditional control. In other words, it needs to employ a
selection structure.
Conditional control allows you to control the program's flow of the execution based on a
condition. In programming terms, this means that the statements in the program are not executed
sequentially. Rather, one group of statements or another is executed, depending on how the
condition is evaluated.
PL/SQL has three types of conditional control: IF, ELSIF, and CASE statements. This chapter
explores the first two types and shows you how they can be nested inside one another. CASE
statements are discussed in the next chapter.
An IF statement has two forms: IF-THEN and IF-THEN-ELSE. An IF-THEN statement allows
you to specify only one group of actions to take. In other words, this group of actions is taken
only when a condition evaluates to TRUE. An IF-THEN-ELSE statement allows you to specify
two groups of actions. The second group of actions is taken when a condition evaluates to
FALSE or NULL.
IF-THEN STATEMENTS
An IF-THEN statement is the most basic kind of a conditional control; it has the following structure:
IF CONDITION THEN
STATEMENT 1;
...
STATEMENT N;
END IF;
The reserved word IF marks the beginning of the IF statement. Statements 1 through N are a
sequence of executable statements that consist of one or more standard programming structures.
The word CONDITION between the keywords IF and THEN determines whether these
203
statements are executed. END IF is a reserved phrase that indicates the end of the IF-THEN
construct.
IF-THEN statement
Consider the following example. Two numeric values are stored in the variables v_num1 and
v_num2. You need to arrange their values so that the smaller value is always stored in v_num1
and the larger value is always stored in v_num2.
For Example
DECLARE
v_num1 NUMBER := 5;
v_num2 NUMBER := 3;
v_temp NUMBER;
BEGIN
204
-- if v_num1 is greater than v_num2 rearrange their values
v_temp := v_num1;
v_num1 := v_num2;
v_num2 := v_temp;
END IF;
END;
In this example, condition v_num1 > v_num2 evaluates to TRUE because 5 is greater than 3.
Next, the values are rearranged so that 3 is assigned to v_num1 and 5 is assigned to v_num2.
This is done with the help of the third variable, v_temp, which is used for temporary storage.
v_num1 = 3
v_num2 = 5
IF-THEN-ELSE STATEMENT
An IF-THEN statement specifies the sequence of statements to execute only if the condition
evaluates to TRUE. When this condition evaluates to FALSE, there is no special action to take,
except to proceed with execution of the program.
205
An IF-THEN-ELSE statement enables you to specify two groups of statements. One group of
statements is executed when the condition evaluates to TRUE. Another group of statements is
executed when the condition evaluates to FALSE. This is indicated as follows:
IF CONDITION THEN
STATEMENT 1;
ELSE
STATEMENT 2;
END IF;
STATEMENT 3;
IF-THEN-ELSE statement
You should use the IF-THEN-ELSE construct when trying to choose between two mutually
exclusive actions. Consider the following example:
DECLARE
206
BEGIN
IF MOD(v_num,2) = 0 THEN
DBMS_OUTPUT.PUT_LINE (v_num||
ELSE
END IF;
DBMS_OUTPUT.PUT_LINE ('Done');
END;
For any given number, only one of the DBMS_ OUTPUT.PUT_LINE statements is executed.
Hence, the IF-THEN-ELSE construct enables you to specify two and only two mutually
exclusive actions.
24 is even number
Done
207
NULL CONDITION
In some cases, a condition used in an IF statement can be evaluated to NULL instead of TRUE
or FALSE. For the IF-THEN construct, the statements are not executed if an associated
condition evaluates to NULL. Next, control is passed to the first executable statement after END
IF. For the IF-THEN-ELSE construct, the statements specified after the keyword ELSE are
executed if an associated condition evaluates to NULL.
For Example
DECLARE
v_num1 NUMBER := 0;
v_num2 NUMBER;
BEGIN
ELSE
END IF;
END;
v_num1 != v_num2
208
The condition
v_num1 = v_num2
is evaluated to NULL because a value is not assigned to the variable v_num2. Therefore,
variable v_num2 is NULL. Notice that the IF-THEN-ELSE construct is behaving as if the
condition evaluated to FALSE, and the second DBMS_ OUTPUT.PUT_LINE statement is
executed.
This section provides exercises and suggested answers, with discussion related to how those
answers resulted. The most important thing to realize is whether your answer works. You should
figure out the implications of the answers and what the effects are of any different answers you
may come up with.
In this exercise, you use the IF-THEN statement to test whether the date provided by the user
falls on the weekend (in other words, if the day is Saturday or Sunday).
SET SERVEROUTPUT ON
DECLARE
v_day VARCHAR2(15);
BEGIN
209
IF v_day IN ('SATURDAY', 'SUNDAY') THEN
END IF;
DBMS_OUTPUT.PUT_LINE ('Done...');
END;
To test this script fully, execute it twice. For the first run, enter 09-JAN-2008, and for the second
run, enter 13-JAN-2008. Execute the script, and then answer the following questions:
ANSWER: The first output produced for the date is 09-JAN-2008. The second output produced for the
date is 13-JAN-2008.
Done...
When the value of 09-JAN-2008 is entered for v_date, the day of the week is determined for the
variable v_day with the help of the functions TO_CHAR and RTRIM. Next, the following
condition is evaluated:
210
Because the value of v_day is 'WEDNESDAY', the condition evaluates to FALSE.
Then, control is passed to the first executable statement after END IF. As a result,
Done... is displayed on the screen:
Done...
As in the previous run, the value of v_day is derived from the value of v_date.
Next, the condition of the IF-THEN statement is evaluated. Because it evaluates to
TRUE, the statement after the keyword THEN is executed. Therefore, 13-JAN-
2008 falls on weekend is displayed on the screen. Next, control is passed to the last
DBMS_OUTPUT.PUT_LINE statement, and Done... is displayed on the screen.
Explain why the output produced for the two dates is different.
Remove the RTRIM function from the assignment statement for v_day as follows:
211
v_day := TO_CHAR(v_date, 'DAY');
ANSWER: The script should look similar to the following. Changes are shown in
bold.
SET SERVEROUTPUT ON
DECLARE
v_day VARCHAR2(15);
BEGIN
END IF;
DBMS_OUTPUT.PUT_LINE ('Done...');
END;
212
Enter value for sv_user_date: 13-JAN-2008
Done...
In the original example, the variable v_day is calculated with the help of the
statement RTRIM(TO_CHAR(v_date, 'DAY')). First, the function TO_CHAR
returns the day of the week, padded with blanks. The size of the value retrieved by
the function TO_CHAR is always 9 bytes. Next, the RTRIM function removes
trailing spaces.
Rewrite this script using the LIKE operator instead of the IN operator so that it
produces the same results for the dates specified earlier.
ANSWER: The script should look similar to the following. Changes are shown in
bold.
SET SERVEROUTPUT ON
DECLARE
213
v_day VARCHAR2(15);
BEGIN
END IF;
DBMS_OUTPUT.PUT_LINE ('Done...');
END;
Saturday and Sunday are the only days of the week that start with S. As a result,
there is no need to spell out the names of the days or specify any additional letters
for the LIKE operator.
Rewrite this script using the IF-THEN-ELSE construct. If the date specified does
not fall on the weekend, display a message to the user saying so.
ANSWER: The script should look similar to the following. Changes are shown in bold.
SET SERVEROUTPUT ON
DECLARE
v_day VARCHAR2(15);
214
BEGIN
ELSE
END IF;
-- control resumes
To modify the script, the ELSE part was added to the IF statement. The rest of the script
has not been changed.
In this exercise, you use the IF-THEN-ELSE statement to check how many students
are enrolled in course number 25, section 1. If 15 or more students are enrolled,
section 1 of course number 25 is full. Otherwise, section 1 of course number 25 is
not full, and more students can register for it. In both cases, a message should be
displayed to the user, indicating whether section 1 is full. Try to answer the
questions before you run the script. After you have answered the questions, run the
script and check your answers. Note that the SELECT INTO statement uses the
ANSI 1999 SQL standard.
SET SERVEROUTPUT ON
215
DECLARE
v_total NUMBER;
BEGIN
SELECT COUNT(*)
INTO v_total
FROM enrollment e
WHERE s.course_no = 25
AND s.section_no = 1;
DBMS_OUTPUT.PUT_LINE
ELSE
DBMS_OUTPUT.PUT_LINE
END IF;
END;
216
Notice that the SELECT INTO statement uses an equijoin. The join condition is
listed in the JOIN clause, indicating columns that are part of the primary key and
foreign key constraints. In this example, column SECTION_ID of the
ENROLLMENT table has a foreign key constraint defined on it. This constraint
references column SECTION_ID of the SECTION table, which, in turn, has a
primary key constraint defined on it.By the Way
You will find detailed explanations and examples of the statements using the new
ANSI 1999 SQL standard in Appendix C and in the Oracle help. Throughout this
book we try to provide examples illustrating both standards; however, our main
focus is PL/SQL features rather than SQL.
Try to answer the following questions, and then execute the script:
ANSWER: If 15 or more students are enrolled in section 1 of course number 25, the
first DBMS_OUTPUT.PUT_LINE statement is displayed on the screen.
The condition v_total >= 15 evaluates to TRUE, and as a result, the statement
ANSWER: If three students are enrolled in section 1 of course number 25, the second
DBMS_OUTPUT.PUT_LINE statement is displayed on the screen.
The condition v_total >= 15 evaluates to FALSE, and the ELSE part of the IF-
THEN-ELSE statement is executed. As a result, the statement
217
What DBMS_OUTPUT.PUT_LINE statement is displayed if there is no section 1 for course number
25?
ANSWER: If there is no section 1 for course number 25, the ELSE part of the IF-
THEN-ELSE statement is executed. So the second DBMS_OUTPUT.PUT_LINE
statement is displayed on the screen.
SELECT COUNT(*)
INTO v_total
FROM enrollment e
WHERE s.course_no = 25
AND s.section_no = 1;
returns 0.
How would you change this script so that the user provides both course and section numbers?
ANSWER: Two additional variables must be declared and initialized with the help of
the substitution variables as follows. The script should look similar to the following.
Changes are shown in bold.
SET SERVEROUTPUT ON
DECLARE
218
v_total NUMBER;
BEGIN
SELECT COUNT(*)
INTO v_total
FROM enrollment e
DBMS_OUTPUT.PUT_LINE
ELSE
DBMS_OUTPUT.PUT_LINE
END IF;
219
END;
How would you change this script so that if fewer than 15 students are enrolled
in section 1 of course number 25, a message appears indicating how many
students can still enroll?
ANSWER: The script should look similar to the following. Changes are shown in bold.
SET SERVEROUTPUT ON
DECLARE
v_total NUMBER;
v_students NUMBER;
BEGIN
SELECT COUNT(*)
INTO v_total
FROM enrollment e
WHERE s.course_no = 25
AND s.section_no = 1;
DBMS_OUTPUT.PUT_LINE
220
('Section 1 of course 25 is full');
ELSE
v_students := 15 - v_total;
DBMS_OUTPUT.PUT_LINE (v_students||
END IF;
END;
An iterative control Statements are used when we want to repeat the execution of one or more
statements for specified number of times. These are similar to those in
Simple Loop
While Loop
For Loop
221
Simple Loop
A Simple Loop is used when a set of statements is to be executed at least once before the
loop terminates. An EXIT condition must be specified in the loop, otherwise the loop
will get into an infinite number of iterations. When the EXIT condition is satisfied the
process exits from the loop.
LOOP
statements;
EXIT;
END LOOP;
These are the important steps to be followed while using Simple Loop.
While Loop
WHILE <condition>
222
LOOP statements;
END LOOP;
SQL>
SQL> DECLARE
2 v_Calc NUMBER := 0;
3 BEGIN
4 WHILE v_Calc <= 10 LOOP
5 v_Calc := v_Calc + 1;
6 DBMS_OUTPUT.PUT_LINE('The value of v_Calc is ' || v_Calc);
7 END LOOP;
8 END;
9 /
The value of v_Calc is 1
The value of v_Calc is 2
The value of v_Calc is 3
The value of v_Calc is 4
The value of v_Calc is 5
The value of v_Calc is 6
The value of v_Calc is 7
The value of v_Calc is 8
The value of v_Calc is 9
The value of v_Calc is 10
The value of v_Calc is 11
223
PL/SQL procedure successfully completed.
SQL>
SQL>
SQL>
SQL>
SQL> --
FOR Loop
LOOP statements;
END LOOP;
1) The counter variable is implicitly declared in the declaration section, so it's not
necessary to declare it
explicitly.
2) The counter variable is incremented by 1 and does not need to be incremented
explicitly.
224
3) EXIT WHEN statement and EXIT statements can be used in FOR loops but it's not
done oftenly.
SQL>
SQL> set serveroutput on
SQL> BEGIN
2 FOR just_a_num IN 1..10
3 LOOP
4 dbms_output.put_line(just_a_num);
5 END LOOP;
6 END;
7 /
1
2
3
4
5
6
7
8
9
10
SQL>
NOTE: The above Loops are explained with a example when dealing with Explicit
Cursors.
225
Section 3: PL/SQL Trigger
Section objectives:
Upon Completing this section, you would be able to
Understand trigger
Familiarized with types of PL/SQL trigger
Understand when and how to use trigger
Trigger
A trigger is a pl/sql block structure which is fired when a DML statements like Insert,
Delete, Update is executed on a database table. A trigger is triggered automatically when
an associated DML statement is executed.
Syntax of Triggers
[OF col_name]
ON table_name
WHEN (condition)
226
BEGIN
END;
{BEFORE | AFTER | INSTEAD OF } - This clause indicates at what time should the
trigger get fired. i.e for example: before or after updating a table. INSTEAD OF is
used to create a trigger on a view. before and after cannot be used to create a trigger
on a view.
{INSERT [OR] | UPDATE [OR] | DELETE} - This clause determines the triggering
event. More than one triggering events can be used together separated by OR
keyword. The trigger gets fired at all the specified triggering event.
[OF col_name] - This clause is used with update triggers. This clause is used when
you want to trigger an event only when a specific column is updated.
[ON table_name] - This clause identifies the name of the table or view to which the
trigger is associated.
[REFERENCING OLD AS o NEW AS n] - This clause is used to reference the old
and new values of the data being changed. By default, you reference the values as
:old.column_name or :new.column_name. The reference names can also be changed
from old (or new) to any other user-defined name. You cannot reference old values
when inserting a record, or new values when deleting a record, because they do not
exist.
227
[FOR EACH ROW] - This clause is used to determine whether a trigger must fire
when each row gets affected ( i.e. a Row Level Trigger) or just once when the entire
sql statement is executed(i.e.statement level Trigger).
WHEN (condition) - This clause is valid only for row level triggers. The trigger is
fired only for rows that satisfy the condition specified.
For Example: The price of a product changes constantly. It is important to maintain the
history of the prices of the products.
We can create a trigger to update the 'product_price_history' table when the price of the
product is updated in the 'product' table.
(product_id number(5),
product_name varchar2(32),
supplier_name varchar2(32),
unit_price number(7,2) );
(product_id number(5),
product_name varchar2(32),
supplier_name varchar2(32),
unit_price number(7,2) );
228
Create the price_history_trigger and execute it.
ON product
BEGIN
VALUES
(:old.product_id,
:old.product_name,
:old.supplier_name,
:old.unit_price);
END;
Once the above update query is executed, the trigger fires and updates the
'product_price_history' table.
229
If you ROLLBACK the transaction before committing to the database, the data
inserted to the table is also rolled back.
1) Row level trigger - An event is triggered for each row upated, inserted or
deleted.
2) Statement level trigger - An event is triggered for each sql statement executed.
For Example: Let's create a table 'product_check' which we can use to store messages
when triggers are fired.
(Message varchar2(50),
Current_Date number(32)
);
Let's create a BEFORE and AFTER statement and row level triggers for the product
table.
230
1) BEFORE UPDATE, Statement Level: This trigger will insert a record into the table
'product_check' before a sql update statement is executed, at the statement level.
BEFORE
UPDATE ON product
Begin
END;
2) BEFORE UPDATE, Row Level: This trigger will insert a record into the table
'product_check' before each row is updated.
BEFORE
UPDATE ON product
BEGIN
231
END;
3) AFTER UPDATE, Statement Level: This trigger will insert a record into the table
'product_check' after a sql update statement is executed, at the statement level.
AFTER
UPDATE ON product
BEGIN
End;
4) AFTER UPDATE, Row Level: This trigger will insert a record into the table
'product_check' after each row is updated.
AFTER
insert On product
BEGIN
232
INSERT INTO product_check
END;
Lets check the data in 'product_check' table to see the order in which the trigger is fired.
Output:
Mesage Current_Date
------------------------------------------------------------
The above result shows 'before update' and 'after update' row level events have occured
twice, since two records were updated. But 'before update' and 'after update' statement
level events are fired only once per sql statement.
233
The above rules apply similarly for INSERT and DELETE statements.
We can use the data dictionary view 'USER_TRIGGERS' to obtain information about
any trigger.
DESC USER_TRIGGERS;
NAME Type
--------------------------------------------------------
TERIGGR_NAME VARCHAR2(30)
TRIGGER_TYPE VARCHAR2(16)
TRIGGER_EVENT VARCHAR2(75)
TABLE_OWNER VARCHAR2(30)
BASE_OBJECT_TYPE VARCHAR2(16)
TABLE_NAME VARCHAR2(30)
COLUMN_NAME VARCHAR2(4000)
REFERENCING_NAMES VARCHAR2(128)
WHEN_CLAUSE VARCHAR2(4000)
STATUS VARCHAR2(8)
DESCRIPTION VARCHAR2(4000)
ACTION_TYPE VARCHAR2(11)
TRIGGER_BODY LONG
This view stores information about header and body of the trigger.
234
The above sql query provides the header and body of the trigger
'Before_Update_Stat_product'.
Section objectives:
Upon Completing this section, you would be able to
Stored Procedure
A stored procedure or in simple a proc is a named PL/SQL block which performs one
or more specific task. This is similar to a procedure in other programming languages. A
procedure has a header and a body. The header consists of the name of the procedure and
the parameters or variables passed to the procedure. The body consists or declaration
section, execution section and exception section similar to a general PL/SQL Block. A
procedure is similar to an anonymous PL/SQL Block but it is named for repeated usage.
1) IN-parameters
2) OUT-parameters
3) IN OUT-parameters
235
CREATE [OR REPLACE] PROCEDURE proc_name [list of parameters]
IS
Declaration section
BEGIN
Execution section
EXCEPTION
Exception section
END;
IS - marks the beginning of the body of the procedure and is similar to DECLARE in
anonymous PL/SQL Blocks. The code between IS and BEGIN forms the Declaration
section.
The syntax within the brackets [ ] indicate they are optional. By using CREATE OR
REPLACE together the procedure is created if no other procedure with the same name
exists or the existing procedure is replaced with the current code.
2> IS
236
6> BEGIN
8> LOOP
12>END;
13> /
procedure_name;
NOTE: In the examples given above, we are using backward slash „/‟ at the end of the
program. This indicates the oracle engine that the PL/SQL program has ended and it can
begin processing the statements.
237
Function in PL/SQL
RETURN return_datatype;
IS
Declaration_section
BEGIN
Execution_section
Return return_variable;
EXCEPTION
exception section
Return return_variable;
END;
Return Type: The header section defines the return type of the function. The return
datatype can be any of the oracle datatype like varchar, number etc.
238
The execution and exception section both should return a value which is of the
datatype defined in the header section.
3> IS
6> BEGIN
10> END;
11> /
In the example we are retrieving the „first_name‟ of employee with empID 100 to
variable „emp_name‟.
The return type of the function is VARCHAR which is declared in line no 2.
The function returns the 'emp_name' which is of type VARCHAR as the return value in
line no 9.
239
How to execute a PL/SQL Function?
employee_name := employer_details_func;
dbms_output.put_line(employer_details_func);
1) IN type parameter: These types of parameters are used to send values to stored
procedures.
2) OUT type parameter: These types of parameters are used to get values from stored
procedures. This is similar to a return type in functions.
3) IN OUT parameter: These types of parameters are used to send values and get
values from stored procedures.
240
1) IN parameter:
2) OUT Parameter:
The OUT parameters are used to send the OUTPUT from a procedure or a function. This
is a write-only parameter i.e, we cannot pass values to OUT paramters while executing
the stored procedure, but we can assign values to OUT parameter inside the stored
procedure and the calling program can recieve this output value.
241
3) IN OUT Parameter:
The IN OUT parameter allows us to pass values into a procedure and get output values
from the procedure. This parameter is used if the value of the IN parameter can be
changed in the calling program.
By using IN OUT parameter we can pass values into a parameter and return a value to
the calling program using the same parameter. But this is possible only if the value
passed to the procedure and output value have a same datatype. This parameter is used if
the value of the parameter will be changed in the procedure.
The below examples show how to create stored procedures using the above three types
of parameters.
Example1:
Let‟s create a procedure which gets the name of the employee when the employee id is
passed.
2> IS
3> BEGIN
242
6> END;
7> /
We can call the procedure „emp_name‟ in this way from a PL/SQL Block.
1> DECLARE
4> BEGIN
6> LOOP
10> END;
11> /
In the above PL/SQL Block in line no 3; we are creating a cursor „id_cur‟ which
contains the employee id.
In line no 7; we are calling the procedure „emp_name‟, we are passing the „id‟ as IN
parameter and „empName‟ as OUT parameter. In line no 8; we are displaying the id and
the employee name which we got from the procedure „emp_name‟.
Example 2:
243
Using IN OUT parameter in procedures:
3> IS
5> BEGIN
17> END;
18> /
244
The below PL/SQL block shows how to execute the above 'emp_salary_increase'
procedure.
1> DECLARE
6> BEGIN
13> END;
14> /
245
Section 5: Exception handling in PL/SQL
Section objectives:
Upon Completing this section, you would be able to
Exception Handling
PL/SQL provides a feature to handle the Exceptions which occur in a PL/SQL Block
known as exception Handling. Using Exception Handling we can test the code and avoid
it from exiting abruptly. When an exception is occurs, messages which explain its cause
is received.
PL/SQL Exception message consists of three parts.
Type of Exception
An Error Code
A message
By handling the exceptions we can ensure a PL/SQL block does not exit abruptly.
DECLARE
Declaration section
BEGIN
Exception section
EXCEPTION
246
WHEN ex_name1 THEN
END;
When an exception is raised, Oracle searches for an appropriate exception handler in the
exception section. For example in the above example, if the error raised is 'ex_name1 ',
then the error is handled according to the statements under it. Since, it is not possible to
determine all the possible runtime errors during testing for the code, the 'WHEN Others'
exception is used to manage the exceptions that are not explicitly handled. Only one
exception can be raised in a Block and the control does not return to the Execution
Section after the error is handled.
DELCARE
Declaration section
BEGIN
DECLARE
Declaration section
247
BEGIN
Execution section
EXCEPTION
Exception section
END;
EXCEPTION
Exception section
END;
In the above case, if the exception is raised in the inner block it should be handled in the
exception block of the inner PL/SQL block else the control moves to the Exception
block of the next upper PL/SQL Block. If none of the blocks handle the exception the
program ends abruptly with an error.
248
For example: NO_DATA_FOUND and ZERO_DIVIDE are called Named System
exceptions.
BEGIN
Execution section
EXCEPTION
249
WHEN NO_DATA_FOUND THEN
END;
Those system exception for which oracle does not provide a name is known as unnamed
system exception. These exceptions do not occur frequently. These Exceptions have a
code and an associated message.
The general syntax to declare unnamed system exception using EXCEPTION_INIT is:
DECLARE
exception_name EXCEPTION;
PRAGMA
250
EXCEPTION_INIT (exception_name, Err_code);
BEGIN
Execution section
EXCEPTION
END;
For Example: Lets consider the product table and order_items table from sql joins.
Here product_id is a primary key in product table and a foreign key in order_items table.
If we try to delete a product_id from the product table when it has child records in
order_id table an exception will be thrown with oracle code number -2292.
We can provide a name to this exception and handle it in the exception section as given
below.
DECLARE
Child_rec_exception EXCEPTION;
PRAGMA
BEGIN
EXCEPTION
251
WHEN Child_rec_exception
END;
User-defined Exceptions
Apart from system exceptions we can explicitly define exceptions based on business
rules. These are known as user-defined exceptions.
For Example: Lets consider the product table and order_items table from sql joins to
explain user-defined exception.
Lets create a business rule that if the total no of units of any particular product sold is
more than 20, then it is a huge quantity and a special discount should be provided.
DECLARE
huge_quantity EXCEPTION;
CURSOR product_quantity is
252
WHERE o.product_id = p.product_id;
quantity order_tems.total_units%type;
message VARCHAR2(50);
BEGIN
quantity := product_rec.units;
RAISE huge_quantity;
END IF;
dbms_output.put_line (message);
END LOOP;
EXCEPTION
253
dbms_output.put_line (message);
END;
RAISE_APPLICATION_ERROR ( )
254
Finally, catch the exception and link the exception to a user-defined error number
in
RAISE_APPLICATION_ERROR. Using the above example we can display an
error message using RAISE_APPLICATION_ERROR.
DECLARE
huge_quantity EXCEPTION;
CURSOR product_quantity is
quantity order_tems.total_units%type;
message VARCHAR2(50);
BEGIN
quantity := product_rec.units;
RAISE huge_quantity;
255
END IF;
Dbms_output.put_line (message);
END LOOP;
EXCEPTION
END;
256
UNIT SEVEN
FUNCTIONAL DEPENDENCE AND NORMALIZATION
Objective:
Introduction
Database normalization can essentially be defined as the practice of optimizing table structures.
Optimization is accomplished as a result of a thorough investigation of the various pieces of data
that will be stored within the database, in particular concentrating upon how this data is
interrelated. An analysis of this data and its corresponding relationships is advantageous because
it can result both in a substantial improvement in the speed in which the tables are queried, and
in decreasing the chance that the database integrity could be compromised due to tedious
maintenance procedures.
So In this unit, you will learn how to improve the database through normalization
Section objectives:
257
Functional Dependency and normailzation
Normalization is the process of efficiently organizing data in a database. There are two goals of
the normalization process: eliminating redundant data (for example, storing the same data in
more than one table) and ensuring data dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the amount of space a database consumes
and ensure that data is logically stored.
Before we develop the ideas of normalization further, it is important for you to have an
understanding of "functional dependency." The essence of this idea is that if the existence of
something, call it A, implies that B must exist and have a certain value, then we say that "B is
functionally dependent on A." We also often express this idea by saying that "A determines B,"
or that "B is a function of A," or that "A functionally governs B." Often, the notions of
functionality and functional dependency are expressed briefly by the statement, "If A, then B." It
is important to note that the value B must be unique for a given value of A, i.e., any given value
of A must imply just one and only one value of B, in order for the relationship to qualify for the
name "function." (However, this does not necessarily prevent different values of A from
implying the same value of B.)
For the terminology of relational databases, the word "function" was borrowed from
mathematics, where it is common to say things like "y is a function of x" or "y = f(x)". (The
latter expression is read "y equals f of x".) The determining value, x, is called the argument; the
determined value, y or f(x), is called the result.
The expression "y = f(x)" is a very general, and abstract, way of talking about functionality.
Outside of mathematics--and, in particular, ordinarily in relational database management--we
talk not abstractly but in terms of particular examples. (Indeed, the general idea of a "function" is
best understood when one has seen enough examples of specific functions to be able to start
generalizing about the abstract, or general, properties that the specific functions share.)
258
Here are some examples of functions. An easy one is y = x2. This particular function says that if
we are given a particular value for x, say 3, then we must say that y has the value 9. (We could
also write y = f(x) = x2 or just f(x) = x2.) Another easy one is: y = x3. This particular function
says that if we are given a particular value for x, say -2, then we must say that y has the value -8.
A common way of indicating functions is to place the determining and determined values side by
side in a table. Thus we can place sample values of the function, y = x2, in a table like the one
shown here.
The above table shows just three of the infinity of possible pairs of values, x and y, for the
function y = x2. It also shows that for some functions, different values of x (here, 3 and -3) imply
the same value (here, 9) of the function.
The functions we have given as examples so far have been functions that are specified by an
algebraic function. But the idea of function is more general; i.e., functions need not be
algebraically defined. The essence of the idea of function is that to a specified determining value
corresponds a unique determined value. This essence can be defined, among other ways, by
placing the determining and determined values in a table that displays and/or defines the
relationship between the argument and the result.
Note that the table above displays, but does not fully define, the relationship, y = x2. This
function, since it has an infinite number of pairs of values, cannot be fully defined in a table. For
functions that involve only a finite number of pairs of values of argument and result, a table is
often a convenient way--and may in fact be the only way--of displaying and, at the same time,
defining the function.
259
an argument and can be displayed economically in just one table. For example, consider the
Looking the below table from the relational-database point of view, we can say that the attributes
x2, x3, and x4 are all functionally dependent on the attribute x.
Now we ready to come to grips with the ideas of normalization. The following table, containing
information about some students at Jimma University, is a table that is in 1st Normal Form,
1NF.
Table 1
You can easily verify for yourself that the above table satisfies the definition of 1NF: viz., it has
no duplicated rows; each cell is single-valued (i.e., there are no repeating groups or arrays); and
all the entries in a given column are of the same kind.
In Table 1 we can see that the key, ID No, functionally determines the other attributes; i.e., a
given Id number implies (determines) a particular value for each of the attributes FirstName,
LastName, and Major (assuming, at least for the moment, that a student is allowed to have only
one major). In the arrow notation: ID No ? FirstName, SSN ? LastName, and SSN ? Major.
A key attribute will, by the definition of key, uniquely determine the values of the other
attributes in a table; i.e., all non-key attributes in a table will be functionally dependent on the
260
key. But there may be non-key attributes in a table that determine other attributes in that table.
Consider the following table containing information about some students at Enormous State
University,:
Table 2
In Table 2 above the Level attribute can be said to be functionally dependent on the Major
attribute. Thus we have an example of an attribute that is functionally dependent on a non-key
attribute. This statement is true in the table above table, and that is all that the definition of
functional dependence requires; but the statement also reflects the real-world fact that LIS is a
major that is open only to graduate students and that Pre-Medicine and Pre-Law are majors that
are open only to undergraduate students.
Table 2 above has another interesting aspect. Its key is a composite key, consisting of the paired
attributes, FirstName and LastName. The Level attribute is functionally dependent on this
composite key, of course; but, in addition, Level can be seen to be dependent on only the
attribute LastName. (This is true because each value of Level is paired with a distinct value of
LastName. In contrast, there are two occurrences of the value Lynn for the attribute FirstName,
and the two Lynns are paired with different values of Level, so Level is not functionally
dependent on FirstName.) Thus this table fails to qualify as a 2nd Normal Form table, since the
definition of 2NF requires that all non-key attributes be dependent on all of the key. (Admittedly,
this example of a partial dependency is artificially contrived, but nevertheless it illustrates the
problem of partial dependency.)
We can turn Table 2 above into a table in 2NF in an easy way, by adding a column for the Social
Security Number (SSN) which will then be the natural thing to use as the key.
261
Table 3
With the SSN defined as the key, Table 3 is in 2NF, as you can easily verify. This illustrates the
fact that any table that is in 1NF and has a single-attribute (i.e., a non-composite) key is
automatically also in 2NF.
Table 3 above still exhibits some problems, however. For example, it contains some repeated
information about the LIS-Graduate pairing.
At this point it is appropriate to note that the main thrust behind the idea of normalizing
databases is the avoidance of insertion and deletion anomalies in databases.
To illustrate the idea of anomalies, consider what would happen to our knowledge (at least, as
explicitly contained in a table) of the level of the major, Pre-Medicine, if Mary Ruiz left
Enormous State University. With the deletion of the row for Ms. Ruiz, we would lose the
information that Pre-Medicine is an Undergraduate major. This is an example of a deletion
anomaly. We may possess the real-world information that Pre-Medicine is an Undergraduate
major, but no such information is explicitly contained in a table in our database.
As an example of an insertion anomaly, we can suppose that a new student wants to enroll in
ESU: e.g., suppose Jane Doe wants to major in Public Affairs. From the information in Table 3
above we cannot tell whether Public Affairs is an Undergraduate or a Graduate major; in fact, we
do not even know whether Public Affairs is an established major at ESU. We do not know
whether it is permissible to insert the value, Public Affairs, as a value of the attribute, Major, or
what to insert for the attribute, Level, if we were to assume that Public Affairs is a valid value
for Major. The point is that while we may possess real-world information about whether Public
262
Affairs is a major at ESU and what its level is, this information is not explicitly contained in any
table that we have thus far mentioned as part of our database.
A database-management system, a DBMS, can work only with the information that we put
explicitly into its tables for a given database and into its rules for working with those tables,
where such rules are appropriate and possible.
How do anomalies relate to normalization? The simple answer is that by arranging that the tables
in a database are sufficiently normalized (in practice, this typically means to at least the 4th level
of normalization), we can ensure that anomalies will not arise in our database. Anomalies are
difficult to avoid directly, because with databases of typical complexity (i.e., several tables) the
database designer can easily overlook possible problems. Normalization offers a rigorous way of
avoiding unrecognized anomalies.
Normalization may look like a difficult process when one views it from the standpoint of the
formal definitions of the various normal forms. But in practice, you can easily attain sufficient
normalization in your database by simply ensuring that the tables in your database are what we
can call "single-theme" tables. This idea will be illustrated as we proceed through the rest of the
discussion in this handout.
Although Table 3 is in 2NF, it is still open to the problems of insertion and deletion anomalies,
as the discussion in the preceding section shows. The reason is that Table 3 deals with more than
a single theme. What can we do to turn it into a set of tables that are, or at least come closer to
being, single-theme tables?
A reasonable way to proceed is to note that Table 3 deals with both information about students
(their names and SSNs) and information about majors and levels. This should strike you as two
different themes. Presented below is one possible set of single-theme tables dealing with the
information in Table 3 (To save space, the following tables also contain some information that is
not in Table 3 and the discussion will deal with this added information.)
Table 4
263
SSN First name Second name
123-45-6789 Jack Jones
222-33-4444 Lynn Lee
987-65-4321 Marry Ruize
123-54-3210 Lynn Smith
111-33-5555 Jane Jones
Table 5
Major Level
LIS Graduate
LIS Graduate
Pre-medicine Undergraduate
Pre-Law Undergraduate
LIS Graduate
Table 6
SSN Major
123-45-6789 LIS
222-33-4444 LIS
987-65-4321 Pre-medicine
123-54-3210 Pre-Law
111-33-5555 LIS
The three preceding tables should strike you as providing a better arrangement of the information
in Table 3 above. For one thing, this arrangement puts the information about the students into a
smaller table, Table 4, which happily fails to contain redundant information about the LIS-
Graduate pairing. For another thing, this arrangement permits us to enter information about
students (e.g., Newton Gingpoor) who have not yet identified themselves as pursuing a particular
major. For still another thing, it puts the information about the Major-Level pairings into a
separate table, Table 5 above , which can easily be expanded to include information (e.g., that
the Public Affairs major is at the Graduate level) about majors for which, at the moment, there
may be no students registered. Finally, Table 6 above provides the needed link between
264
individual students and their majors (note that Newton Gingpoor's SSN is not in this Table 6,
which tells us that he has not yet selected a major).
Tables 4 - 6 are single-theme tables and are in 2NF, as you can easily verify. (In fact, they are in
DKNF, but we are not yet ready to discuss the latter level in detail.).
In order to discuss the 3rd Normal Form, we need to begin by discussing the idea of transitive
dependencies.
In mathematics and logic, a transitive relationship is a relationship of the following form: "If A
implies B, and if also B implies C, then A implies C." An example is: "If John Doe is a human,
and if every human is a primate, then John Doe must be a primate." Another way of putting it is
this: "If A functionally governs B, and if B functionally governs C, then A functionally governs
C." In the arrow notation, we have:
The following table, Table 7, provides an example of how transitive dependencies can occur in a
table in a relational database.
265
Collection
Hunter David Music Publishing and Music Fine Arts Fine Arts
Collecting Literature Library Building
Graves Robert English and Scottish Folksong PCL General Perry-
Ballads Stacks CastaLibrary
By examining Table 7 above we can infer that books dealing with history, cognitive psychology,
and folksong are assigned to the PCL General Stacks collection; that books dealing with legal
procedures are assigned to the Law Library; that books dealing with Greek literature are assigned
to the Classics Library; that books dealing with library biography are assigned to the Library and
Information Science Collection (LISC);and that books dealing with music literature are assigned
to the Fine Arts Library.
Further, we can infer that the PCL General Stacks collection and the LISC are both housed in the
Perry-Casta Library (PCL) building; that the Classics Library is housed in Waggener Hall; and
that the Law Library and Fine Arts Library are housed, respectively, in Townes Hall and the Fine
Arts Building.
Thus we see that there is a transitive dependency in Table 7 above: any book that deals with
history, cognitive psychology, or library biography will be physically housed in the PCL
building (unless it is temporarily checked out to a borrower); any book dealing with legal
procedures will be housed in Townes Hall; and so on. In short, if we know what subject a book
deals with, we also know not only what library or collection it will be assigned to but also what
building it is physically housed in.
What is wrong with having a transitive dependency or dependencies in a table? For one thing,
there is duplicated information: from three different rows we can see that the PCL General
Stacks are in the PCL building. For another thing, we have possible deletion anomalies: if the
Yudof book were lost and its row removed from Table 7 above, we would lose the information
that books on legal procedures are assigned to the Law Library and also the information the Law
Library is in Townes Hall. As a third problem, we have possible insertion anomalies: if we
wanted to add a chemistry book to the table, we would find that Table 7 above nowhere contains
the fact that the Chemistry Library is in Robert A.Welch Hall. As a fourth problem, we have the
chance of making errors in updating: a careless data-entry clerk might add a book to the LISC
but mistakenly enter Townes Hall in the building column.
266
The solution to the problem is, once again, to place the information in Table 7 above into
appropriate single-theme tables. Here is one such possible arrangement:
Table 8
Table 9
267
Table 10
Table 11
Table 12
268
Fine Arts Library Fine Arts Building
PCL General Stacks Perry-CastaLibrary
You can verify for yourself that none of these tables contains a transitive dependency; hence, all
of them are in 3NF (and, in fact, in DKNF).
We can note in passing that the fact that Table 8 contains the first and last names of Robert
Graves in two different rows suggests that it might be worthwhile to replace it with two further
tables, along the lines of:
Table 13
Table 14
269
Though Tables 13 and 14 together take a little more space than Table 8, it is easy to see that
given a much larger collection, in which there would be many more authors with multiple works
to their credit, Tables 13 and 14 would be more economical of storage space than Table 8.
Furthermore, the structure of Tables 13 and 14 lessens the chance of making updating errors
(e.g., typing Grave instead of Graves, or Miska instead of Miksa).
270
Summary of Normalization
Do not use multiple fields in a single table to store similar data. For example, to track an
inventory item that may come from two possible sources, an inventory record may contain fields
for
Vendor Code 1 and Vendor Code 2. But what happens when you a dd a third vendor? Adding a
field is not the answer; it requires program and table modifications and does not smoothly
accommodate a dynamic number of vendors. Instead, place all vendor information in a separate
table called Vendors, then link inventory t o vendors with an item number key, or vendors to
inventory with a vendor code key.
If it's in 1st NF and if the Primary Key is composite (multiple columns) then any fact in that
table must be a fact about the entire composite Primary Key not just part of the Primary Key.
For example, if an inventory table has a primary key made up of two attributes PartId and
WarehouseId. Suppose the inventory table has the warehouse address in it, since warehouse
address is a fact about WarehoseId and not about the PartId the w arehouse address is in the
wrong table. This is in violation of the 2nd Normal Form.
• If it's in the 2nd NF and there are no non-key fields that depend on attributes in the table other
than the Primary Key. Suppose in the Student table you had student birth date as an attribute and
you also had student's age. Students age depends on the student's birth date (a fact about his/her
birth date) so 3rd Normal Form is violated.
271