Intro to Database Systems
Intro to Database Systems
2
1 – Database System Architecture
amount to be transferred, say, Rs. 500, is debited from one account, but is not credited
to another account.
This leaves database in consistent state. But, it is difficult to ensure atomicity in a file
processing system.
Concurrent Access Anomalies
Multiple users are allowed to access data simultaneously (concurrently). This is for the
sake of better performance and faster response.
Consider an operation to debit (withdrawal) an account. The program reads the old
balance, calculates the new balance, and writes new balance back to database. Suppose
an account has a balance of Rs. 5000. Now, a concurrent withdrawal of Rs. 1000 and Rs.
2000 may leave the balance Rs. 4000 or Rs. 3000 depending upon their completion time
rather than the correct value of Rs. 2000.
Here, concurrent data access should be allowed under some supervision.
But, due to lack of co-ordination among different application programs, this is not
possible in file processing systems.
Security Problems
Database should be accessible to users in a limited way.
Each user should be allowed to access data concerning his application only.
For example, a customer can check balance only for his/her own account. He/She should
not have access for information about other accounts.
But, in file processing system, application programs are added in an ad hoc manner by
different programmers. So, it is difficult to enforce such kind of security constraints.
Explain advantages (benefits) of DBMS over file management
system. OR
Explain purpose of database system.
Minimal Data Redundancy (Duplication)
Due to centralized database, it is possible to avoid unnecessary duplication of
information.
This leads to reduce data redundancy.
It prevents memory wastage and reduces extra processing time to get required data.
Shared Data
All authorized user and application program can share database easily.
Data Consistency
Data inconsistency occurs due to data redundancy.
With reduced data redundancy such type of data inconsistency can be eliminated.
This results in improved data consistency.
Data Access
DBMS utilizes a variety of techniques to retrieve data.
Required data can be retrieved by providing appropriate query to the DBMS.
Thus, data can be accessed in convenient and efficient manner.
3
1 – Database System Architecture
Data Integrity
Data in database must be correct and consistent.
So, data stored in database must satisfy certain types of constraints (rules).
DBMS provides different ways to implement such type of constraints (rules).
This improves data integrity in a database.
Data Security
Database should be accessible to user in a limited way.
DBMS provides way to control the access to data for different user according to their
requirement.
It prevents unauthorized access to data.
Thus, security can be improved.
Concurrent Access
Multiple users are allowed to access data simultaneously.
Concurrent access to centralized data can be allowed under some supervision.
This results in better performance of system and faster response.
Guaranteed Atomicity
Any operation on database must be atomic. This means, operation must be executed
either 100% or 0%.
This type of atomicity is guaranteed in DBMS.
List and explain the applications of DBMS.
Airlines and railways
Airlines and railways use online databases for reservation, and for displaying the schedule
information.
Banking
Banks use databases for customer inquiry, accounts, loans, and other transactions.
Education
Schools and colleges use databases for course registration, result, and other information.
Telecommunications
Telecommunication departments use databases to store information about the
communication network, telephone numbers, record of calls, for generating monthly
bills, etc.
Credit card transactions
Databases are used for keeping track of purchases on credit cards in order to generate
monthly statements.
E-commerce
Integration of heterogeneous information sources (for example, catalogs) for business
activity such as online shopping, booking of holiday package, consulting a doctor, etc.
Health care information systems and electronic patient record
Databases are used for maintaining the patient health care details in hospitals.
4
1 – Database System Architecture
Digital libraries and digital publishing
Databases are used for management and delivery of large bodies of textual and
multimedia data.
Finance
Databases are used for storing information such as sales, purchases of stocks and bonds
or data useful for online trading.
Sales
Databases are used to store product, customer and transaction details.
Human resources
Organizations use databases for storing information about their employees, salaries,
benefits, taxes, and for generating salary checks.
Describe functions (responsibility, roles, and duties) of DBA to
handle DBMS.
DBA
The full name of DBA is Database Administrator.
Database Administrator is a person in the organization who controls the design and the
use of database.
Functions or Responsibilities of DBA are as under:
Schema Definition
DBA defines the logical schema of the database.
A schema refers to the overall logical structure of the database.
According to this schema, database will be designed to store required data for an
organization.
Storage Structure and Access Method Definition
DBA decides how the data is to be represented in the database.
Based on this, storage structure of the database and access methods of data is defined.
Defining Security and Integrity Constraints
DBA decides various security and integrity constraints.
DDL (Data Definition Language) provides facilities to specifying such constraints.
Granting of Authorization for Data Access
The DBA determines which user needs access to which part of the database.
According to this, various types of authorizations (permissions) are granted to different
users.
This is required to prevent unauthorized access of a database.
Liaison with Users
DBA is responsible to provide necessary data to user.
User should be able to write the external schema, using DDL (Data Definition Language).
Assisting Application Programmers
DBA provides assistance to application programmers to develop application programs.
5
1 – Database System Architecture
Monitoring Performance
The DBA monitors performance of the system.
The DBA ensures that better performance is maintained by making change in physical or
logical schema if required.
Backup and Recovery
Database should not be lost or damaged.
The task of DBA is to backing up the database on some storage devices such as DVD, CD
or Magnetic Tape or remote servers.
In case of failures, such as flood or virus attack, Database is recovered from this backup.
Explain three levels ANSI SPARC Database System. OR
Explain three level Data abstraction.
The ANSI SPARC architecture divided into three levels:
1) External level
2) Conceptual level
3) Internal level
External View A View B View C
Level
Conceptual
Level Conceptual View
Internal
Level Internal View
Internal Level
This is the lowest level of the data abstraction.
It describes how the data are actually stored on storage devices.
It is also known as a physical level.
The internal view is described by internal schema.
Internal schema consists of definition of stored record, method of representing the data
field and access method used.
Conceptual Level
This is the next higher level of the data abstraction.
6
1 – Database System Architecture
It describes what data are stored in the database and what relationships exist among
those data.
It is also known as a logical level.
Conceptual view is defined by conceptual schema. It describes all records and
relationship.
External Level
This is the highest level of data abstraction.
It is also known as view level.
It describes only part of the entire database that a particular end user requires.
External view is describes by external schema.
External schema consists of definition of logical records, relationship in the external view
and method of deriving the objects from the conceptual view.
This object includes entities, attributes and relationship.
Explain Mapping. OR
Explain external and internal mapping. OR
What is mapping? Describe type of mapping.
Mapping
The process of transforming requests and results between the three levels is called
mapping.
Types of Mapping
Conceptual/Internal Mapping
External/Conceptual Mapping
Conceptual/Internal Mapping
It relates conceptual schema with internal schema.
It defines correspondence between the conceptual schema and the database stored in
physical devices.
It specifies how conceptual records and fields are presented at the internal level.
If the structure of stored database is changed, then conceptual/internal mapping must be
changed accordingly and conceptual schema can remain invariant.
There could be one mapping between conceptual and internal levels.
External/Conceptual Mapping
It relates each external schema with conceptual schema.
It defines correspondence between a particular external view and conceptual schema.
If the structure of conceptual schema is changed, then external/conceptual mapping must
be changed accordingly and external schema can remain invariant.
There could be several mappings between external and conceptual levels.
7
1 – Database System Architecture
8
1 – Database System Architecture
These users interact with the system by using one of the application programs that have
been written previously.
Examples, e.g. Clerk in bank
Differentiate the DA and DBA.
DA (Data Administrator) DBA (Database Administrator)
The data administrator is a person in the The database administrator is a person in the
organization who controls the data of the organization who controls the design and the
database. use of the database.
DA determines what data to be stored in DBA provides necessary technical support for
database based on requirements of the implementing a database.
organization.
DA is involved more in the requirements DBA is involved more in the design,
gathering, analysis, and design phases. development, testing and operational phases.
DA is a manager or some senior level person DBA is a technical person having knowledge of
in an organization who understands database technology.
organizational requirements with respect to
data.
DA does not need to be a technical person, but
DBA does not need to be a business person,
any kind of knowledge about database but any kind of knowledge about a
technology can be more beneficiary. functionality of an organization can be more
beneficiary.
DA is a business focused person, but, he/she DBA is a technically focused person, but,
should understand more about the database he/she should understand more about the
technology. business to administer the databases
effectively.
Explain Database System Architecture.
Components of a DBMS
These functional units of a database system can be divided into two parts:
1. Query Processor Units (Components)
2. Storage Manager Units
Query Processor Units:
Query processor unit deal with execution of DDL (Data Definition Language) and DML (Data
Manipulation Language) statements.
DDL Interpreter — Interprets DDL statements into a set of tables containing metadata.
DML Compiler — Translates DML statements into low level instructions that the query
evaluation engine understands.
Embedded DML Pre-compiler — Converts DML statements embedded in an
application program into normal procedure caIls in the host language.
Query Evaluation Engine — Executes low level instructions generated by DML
compiler.
9
1 – Database System Architecture
Storage Manager Units:
Storage manager units provide interface between the low level data stored in database and
the application programs & queries submitted to the system.
Authorization Manager — Checks the authority of users to access data.
Integrity Manager — Checks for the satisfaction of the integrity constraints.
Transaction Manager — Preserves atomicity and controls concurrency.
File Manager — Manages allocation of space on disk storage.
Buffer Manager — Fetches data from disk storage to memory for being used.
In addition to these functional units, several data structures are required to implement physical
storage system. These are described below:
Data Files — To store user data.
Data Dictionary and System Catalog — To store metadata. It is used heavily, almost
for each and every data manipulation operation. So, it should be accessed efficiently.
Indices — To provide faster access to data items.
Statistical Data — To store statistical information about the data in the database. This
information is used by the query processor to select efficient ways to execute a query.
10
1 – Database System Architecture
application
DML compiler
program object
and organizer
code
query evaluation
engine
query processor
storage manager
disk storage
11
1 – Database System Architecture
application client
application server
server
Database Tier
database system
12
2 – Data Models
1
2 – Data Models
student-name
Issue date
Degree of relationship
The degree of a relationship is the number of entity types that participate in the
relationship.
The three most common relationships in ER models are Unary, Binary and Ternary.
A unary relationship is when both participant entities in the relationship are the same
entity.
Example: Subjects may be prerequisites for other subjects.
subject
is prerequisite for
2
2 – Data Models
course
A1 B1
A2 B2
A3 B3
A4 B4
s
An entity in A is associated with at most (only) one entity in B and an entity in B is
associated with at most (only) one entity in A.
customer-name customer-address
3
2 – Data Models
A customer is connected with only one loan using the relationship borrower and a loan
is connected with only one customer using borrower.
One-to-many relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
An entity in A is associated with any number (zero or more) of entities in B and an entity
in B is associated with at most (only) one entity in A.
customer-name customer-address
loan-no amount
customer-id customer-city
In the one-to-many relationship a loan is connected with only one customer using
borrower and a customer is connected with more than one loans using borrower.
4
2 – Data Models
Many-to-one relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
customer-name customer-address
In a many-to-one relationship a loan is connected with more than one customer using
borrower and a customer is connected with only one loan using borrower.
Many-to-many relationship
A B
A1 B1
A2 B2
A3 B3
A4 B4
5
2 – Data Models
An entity in A is associated with any number (zero or more) of entities in B and an entity
in B is associated with any number (zero or more) of entities in A.
customer-name customer-address
A customer is connected with more than one loan using borrower and a loan is
connected with more than one customer using borrower.
6
2 – Data Models
customer-name customer-address
payment-date
Strong Entity
Weak Entity Relationship Weak Entity
7
2 – Data Models
8
2 – Data Models
9
2 – Data Models
The database designer may have to first identify a customer entity set with the
attributes name, street, city, and customer-id, and an employee entity set with the
attributes name, street, city, employee-id, and salary.
But customer entity set and the employee entity set have some attributes common. This
commonality can be expressed by generalization, which is a containment relationship
that exists between a higher level entity set and one or more lower level entity sets.
In our example, person is the higher level entity set and customer and employee are
lower level entity sets.
Higher level entity set is called superclass and lower level entity set is called subclass. So
the person entity set is the superclass of two subclasses customer and employee.
Differences in the two approaches may be characterized by their starting point and
overall goal.
person
employee customer
ISA
officer-number hours-worked
station-number hours-worked
10
2 – Data Models
It may be disjoint or non-disjoint (overlapping).
1. Disjoint Constraint
It specifies that the subclasses of the specialization must be disjointed (an entity can be
a member of only one of the subclasses of the specialization).
Specified by ‘d’ in EER diagram or by writing disjoint.
2. Non-disjoint (Overlapping)
It specifies that is the same entity may be a member of more than one subclass of the
specialization.
Specified by ‘o’ in EER diagram or by writing overlapping.
Participation Constraint
Determines whether every member in super class must participate as a member of a
subclass or not.
It may be total (mandatory) or partial (optional).
1. Total (Mandatory)
Total specifies that every entity in the superclass must be a member of some subclass in
the specialization.
Specified by a double line in EER diagram.
2. Partial (Optional)
Partial specifies that every entity in the super class not belong to any of the subclass of
specialization.
Specified by a single line in EER diagram.
Based on these two different kinds of constraints, a specialization or generalization can
be one of four types
Total, Disjoint
Total, Overlapping
Partial, Disjoint
Partial, Overlapping.
11
2 – Data Models
work
hours hours
name id number name id number
machinery machinery
uses
id id
machinery
Fig. A Fig. B
Relationship sets work and uses could be combined into a single set. We can combine
them by using aggregation.
Aggregation is an abstraction through which relationships are treated as higher-level
entities.
For our example, we treat the relationship set work and the entity sets employee and
project as a higher-level entity set called work.
Transforming an E-R diagram with aggregation into tabular form is easy. We create a
table for each entity and relationship set as before.
The table for relationship set uses contains a column for each attribute in the primary
key of machinery and work.
12
2 – Data Models
It is highly recommended that every table should start with its primary key attribute
conventionally named as TablenameID.
Consider the following simple ER diagram:
PersonID Person
Phone
Name
Address Email
The initial relational schema is expressed in the following format writing the table
names with the attributes list inside a parentheses as shown below
Persons( personid, name, address, email )
Person
personid name address Email
Persons and Phones are Tables and personid, name, address and email are Columns
(Attributes).
personid is the primary key for the table : Person
PersonID Person
Phone
If you have a multi-valued attribute, take that multi-valued attribute and turn it into a
new entity or table of its own.
Then make a 1:N relationship between the new entity and the existing one.
In simple words.
1. Create a table for that multi-valued attribute.
2. Add the primary (id) column of the parent entity as a foreign key within
the new table as shown below:
First table is Persons ( personid, name, address, email )
Second table is Phones ( phoneid , personid, phone )
personid within the table Phones is a foreign key referring to the personid of Persons
Phone
phoneid personid phone
13
2 – Data Models
Step 3: 1:1 Relationship
Have
PersonID Person
Phone
Name
Address Email
Let us consider the case where the Person has one wife. You can place the primary key
of the wife table wifeid in the table Persons which we call in this case Foreign key as
shown below.
Persons( personid, name, address, email , wifeid )
Wife ( wifeid , name )
Or vice versa to put the personid as a foreign key within the wife table as shown below:
Persons( personid, name, address, email )
Wife ( wifeid , name , personid)
For cases when the Person is not married i.e. has no wifeID, the attribute can set to
NULL
Persons Wife
personid name address email wifeid wifeid name
OR
Persons Wife
personid name address email wifeid name personid
14
2 – Data Models
Step 4: 1:N Relationships
Address
Has
PersonID Person
Phone
Name
Address Email
For instance, the Person can have a House from zero to many, but a House can have
only one Person.
In such relationship place the primary key attribute of table having 1 mapping in to the
table having many cardinality as a foreign key.
To represent such relationship the personid as the Parent table must be placed within
the Child table as a foreign key.
It should convert to :
Persons( personid, name, address, email )
House ( houseid, name , address, personid)
Persons House
personid name address email houseid name address personid
Has
PersonID Person
Phone
Name
Address Email
15
2 – Data Models
For instance, The Person can live or work in many countries. Also, a country can have
many people. To express this relationship within a relational schema we use a separate
table as shown below:
It should convert into :
Persons( personid, name, address, email )
Countries ( countryid, name)
HasRelat ( hasrelatid, personid , countryid)
Persons Countries
personid name address email countryid name
HasRelat
hasrelatid personid countryid
16
2 – Data Models
1. Hierarchical Model
The hierarchical model organizes data into a tree-like structure, where each record has a
single parent or root.
department
student professor
17
2 – Data Models
The hierarchy starts from the Root data, and expands like a tree, adding child nodes to
the parent nodes.
In hierarchical model, data is organized into tree-like structure with one-to-many
relationship between two different types of data, for example, one department can
have many professors and many students.
2. Network Model
This is an extension of the hierarchical model, allowing many-to-many relationships in a
tree-like structure that allows multiple parents.
B C
D E F
3. Entity-relationship Model
In this database model, relationships are created by dividing object of interest into
entity and its characteristics into attributes.
customer-name customer-address
4. Relational Model
In this model, data is organized in two-dimensional tables and the relationship is
maintained by storing a common attribute.
18
2 – Data Models
Rno StudentName Age SubjectID SubjectName Teacher
1 Raj 21 1 DBMS Patel
2 Meet 22 2 DS Shah
19
2 – Data Models
3. Unique
This constraint ensures that a column or a group of columns in each row
have a distinct (unique) value.
A column(s) can have a null value but the values cannot be duplicated.
E.g enrollmentno column should have unique value.
4. Primary key
This constraint defines a column or combination of columns which uniquely
identifies each row in the table.
Primary key = Unique key + Not null
E.g enrollmentno column should have unique value as well as can’t be null.
5. Foreign key
A referential integrity constraint (foreign key) is specified between two
tables.
In the referential integrity constraints, if a foreign key column in table 1
refers to the primary key column of table 2, then every value of the foreign
key column in table 1 must be null or be available in primary key column of
table 2.
20
3 – Relational Query Language
Explain keys.
Super key
A super key is a set of one or more attributes (columns) that allow us to identify each
tuple (records) uniquely in a relation (table).
For example, the enrollment_no, roll_no, semester with department_name of a student
is sufficient to distinguish one student tuple from another. So {enrollment_no} and
{roll_no, semester, department_name} both are super key.
Candidate key
Candidate key is a super key for which no proper subset is a super key.
For example, combination of roll_no, semester and department_name is sufficient to
distinguish one student tuple from another. But either roll_no or semester or
department_name alone or combination of any two columns is not sufficient to
distinguish one student tuple from another. So {roll_no, semester, department_name} is
candidate key.
Every candidate key is super key but every super key may not candidate key.
Primary key
A Primary key is a candidate key that is chosen by database designer to identify tuples
uniquely in a relation.
Alternate key
An Alternate key is a candidate key that is not chosen by database designer to identify
tuples uniquely in a relation.
Foreign key
A foreign key is a set of one or more attributes whose values are derived from the
primary key attribute of another relation.
What is relational algebra? Explain relational algebraic
operation.
Relational algebra is a language for expressing relational database queries.
Relation algebra is a procedural query language.
Relational algebraic operations are as follows:
Selection:-
Operation: Selects tuples from a relation that satisfy a given condition.
It is used to select particular tuples from a relation.
It selects particular tuples but all attribute from a relation.
Symbol: σ (Sigma)
Notation: σ(condition) <Relation>
Operators: The following operators can be used in a condition.
=, !=, <, >, <=,>=, Λ(AND), ∨(OR)
1
3 – Relational Query Language
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Student
Rno Name Dept CPI
101 Ramesh CE 8
109 Amit CE 7
133 Anita CE 9
Projection:-
Operation: Selects specified attributes of a relation.
It selects particular attributes but all unique tuples from a relation.
Symbol: ∏ (Pi)
Notation: ∏ (attribute set) <Relation>
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
Example: List out all students with their roll no, name and department name.
∏Rno, Name, Dept (Student)
2
3 – Relational Query Language
Output: The above query returns all tuples with three attributes roll no, name and
department name.
Output of above query is as follows
Student
Rno Name Dept
101 Ramesh CE
109 Amit CE
125 Chetan CI
138 Mukesh ME
133 Anita CE
Example: List out students of CE department with their roll no, name and department.
∏Rno, Name, Dept (σDept=“CE” (Student))
Output: The above query returns tuples which contain CE as department with three
attributes roll no, name and department name.
Output of above query is as follows
Student
Rno Name Dept
101 Ramesh CE
109 Amit CE
133 Anita CE
Division:-
Operation: The division is a binary relation that is written as R1 ÷ R2.
Condition to perform operation: Attributes of R2 is proper subset of attributes of R1.
The output of the division operator will have attributes =
All attributes of R1 – All attributes of R2
The output of the division operator will have tuples =
Tuples in R1, which are associated with the all tuples of R2
Symbol: ÷
Notation: R1 ÷ R2
Consider following table
Project Work
Task Student Task
Database1 Shah Database1
Database2 Shah Database2
Shah Compiler1
Vyas Database1
Vyas Compiler1
Patel Database1
Patel Database2
3
3 – Relational Query Language
Example: Find out all students having both tasks Database1 as well as Database2.
∏(student, Task)(Work) ÷ ∏(Task)(Project)
Output: It gives name of all students whose task is both Database1 as well as Database2.
Output of above query is as follows
Student
Shah
Patel
Cartesian product:-
Operation: Combines information of two relations.
It will multiply each tuples of first relation to each tuples of second relation.
It is also known as Cross product operation and similar to mathematical
Cartesian product operation.
Symbol: X (Cross)
Notation: Relation1 X Relation2
Resultant Relation :
If relation1 and relation2 have n1 and n2 attributes respectively, then resultant
relation will have n1 + n2 attributes from both the input relations.
If both relations have some attribute having same name, it can be distinguished
by combing relation-name.attribute-name.
If relation1 and relation2 have n1 and n2 tuples respectively, then resultant
relation will have n1*n2 tuples, combining each possible pair of tuples from both
the input relations.
R R×S
A 1 A 1 A 1
B 2 A 1 D 2
D 3 A 1 E 3
B 2 A 1
S
B 2 D 2
A 1 B 2 E 3
D 2 D 3 A 1
E 3 D 3 D 2
D 3 E 3
Consider following table
Emp Dept
Empid Empname Deptname Deptname Manager
S01 Manisha Finance Finance Arun
S02 Anisha Sales Sales Rohit
S03 Nisha Finance Production Kishan
4
3 – Relational Query Language
Example:
Emp × Dept
Empid Empname Emp.Deptname Dept.Deptname Manager
S01 Manisha Finance Finance Arun
S01 Manisha Finance Sales Rohit
S01 Manisha Finance Production Kishan
S02 Anisha Sales Finance Arun
S02 Anisha Sales Sales Rohit
S02 Anisha Sales Production Kishan
S03 Nisha Finance Finance Arun
S03 Nisha Finance Sales Rohit
S03 Nisha Finance Production Kishan
Join:-
Natural Join Operation (⋈)
Operation: Natural join will retrieve information from multiple relations. It works in
three steps.
1. It performs Cartesian product
2. Then it finds consistent tuples and inconsistent tuples are deleted
3. Then it deletes duplicate attributes
Symbol: To perform a natural join there must be
one common attribute (column) between
Notation: Relation1 Relation2
two relations.
Consider following table
Emp Dept
Empid Empname Deptname Deptame Manager
S01 Manisha Finance Finance Arun
S02 Anisha Sales Sales Rohit
S03 Nisha Finance Production Kishan
Example:
Emp ⋈ Dept Empname, Manager (Emp ⋈ Dept)
Empid Empname Deptname Manager Empname Manager
S01 Manisha Finance Arun Manisha Arun
S02 Anisha Sales Rohit Anisha Rohit
S03 Nisha Finance Arun Nisha Arun
5
3 – Relational Query Language
College Hostel
Name Id Department Hostel_name Room_no
Anisha S02 Computer Kaveri hostel K01
Nisha S03 I.T. Godavari hostel G07
Isha Null Null Kaveri Hostel K02
6
3 – Relational Query Language
Full outer join ( )
The full outer join returns all the tuples of both of the relations. It also pads null values
whenever required.
Example : College Hostel
College Hostel
Name Id Department Hostel_name Room_no
Manisha S01 Computer Null Null
Anisha S02 Computer Kaveri hostel K01
Nisha S03 I.T. Godavari hostel G07
Isha Null Null Kaveri Hostel K02
Set Operators
Set operators combine the results of two or more queries into a single result.
Condition to perform set operation:
Both relations (queries) must be union compatible :
Relations R and S are union compatible, if
Both queries should have same (equal) number of columns, and
Corresponding attributes should have the same data type.
Types of set operators:
1. Union
2. Intersect (Intersection)
3. Minus (Set Difference)
Union
Operation: Selects tuples those are in either or both of the relations.
Symbol : U (Union)
Notation : Relation1 U Relation2
Example :
R S RUS
A 1 A 1 A 1
B 2 C 2 B 2
D 3 D 3 C 2
F 4 E 4 D 3
E 5 F 4
E 5
E 4
7
3 – Relational Query Language
Consider following tables
Emp Cst
Id Name Id Name
1 Manisha 1 Manisha
2 Anisha 2 Anisha
3 Nisha 4 Isha
Example:
∏Name (Emp) U ∏Name (Cst)
Name
Manisha
Anisha
Nisha
Isha
Intersection
Operation: Selects tuples those are common in both relations.
Symbol : ∩ (Intersection)
Notation : Relation1 ∩ Relation2
Example
R S R∩S
A 1 A 1 A 1
B 2 C 2 D 3
D 3 D 3
F 4 E 4
E 5
8
3 – Relational Query Language
Difference:-
Operation: Selects tuples those are in first (left) relation but not in second (right)
relation.
Symbol : — (Minus)
Notation : Relation1 — Relation2
Example :
R S R—S
A 1 A 1 B 2
B 2 C 2 F 4
D 3 D 3 E 5
F 4 E 4
E 5
Rename:-
Operation: It is used to rename a relation or attributes.
Symbol: ρ (Rho)
Notation: ρA(B) Rename relation B to A.
ρA(X1,X2….Xn)(B) Rename relation B to A and its attributes to X1, X2, …., Xn.
9
3 – Relational Query Language
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
CPI
9
Aggregate Function:-
Operation: It takes a more than one value as input and returns a single value as output
(result).
Symbol: G
Notation: G function (attribute) (relation)
Aggregate functions: Sum, Count, Max, Min, Avg.
Consider following table
Student
Rno Name Dept CPI
101 Ramesh CE 8
108 Mahesh EC 6
109 Amit CE 7
125 Chetan CI 8
138 Mukesh ME 7
128 Reeta EC 6
133 Anita CE 9
10
3 – Relational Query Language
sum
51
max min
9 6
(ii) Find out all customer who have an account in ‘Ahmedabad’ city and balance
is greater than 10,000.
∏customer_name (σ Branch.branch_city=“Ahmedabad” Λ σ Account.balance >10000 (Branch ⋈ Account ⋈ Depositor))
(iii) find out list of all branch name with their maximum balance.
∏branch_name , G max (balance) (Account)
11
4 –Relational Database Design
Example
Consider the relation Account(ano, balance, bname).
In this relation ano can determines balance and bname. So, there is a functional
dependency from ano to balance and bname.
This can be denoted by ano → {balance, bname}.
Account:
ano balance bname
1
4 –Relational Database Design
Eg. {Enrollment_No, Department_Name} → SPI
Transitive Dependency
In a relation, if attribute(s) A→B and B→C, then C is transitively depends on A via B
(provided that A is not functionally dependent on B or C).
Eg. Staff_No → Branch_No and Branch_No → Branch_Address
Trivial FD:
X→Y is trivial FDif Y is a subset of X
Eg.{Roll_No, Department_Name} → Roll_No
Nontrivial FD
X→Y is nontrivial FDif Y is not a subset of X
Eg.. {Roll_No, Department_Name} → Student_Name
Example-1
Suppose a relation R is given with attributes A, B, C, G, H and I.
Also, a set of functional dependencies F is given with following FDs.
F = {A → B, A → C, CG → H, CG → I, B → H}
Find Closure of F.
F+ = { A → H, CG → HI, AG → I, AG → H }
Example-2
Compute the closure of the following set F of functional dependencies for relational schema
R = (A, B, C, D, E, F ):
F = (A → B, A → C, CD → E, CD → F, B → E).
F+ = { A → BC, CD → EF, A → E, AD → E, AD → F }
Example-3
Compute the closure of the following set F of functional dependencies for relational schema
R = (A, B, C, D, E ):
F = (AB → C, D → AC, D → E).
F+ = { D → A, D → C, D → ACE }
3
4 –Relational Database Design
4
4 –Relational Database Design
Step-2: First loop
result = ABC # for A → BC, A result so result=result BC
result = ABC # for E→ CF, E result so result=result
result = ABCE # for B→ E, B result so result=result E
result = ABCE # for CD→ EF, CD result so result=result
result before step2 is AB and after step 2 is ABCE which is different so repeat same as
step 2.
Step-3: Second loop
result = ABCE # for A → BC, A result so result=result BC
result = ABCEF # for E→ CF, E result so result=result CF
result = ABCEF # for B→ E, B result so result=result E
result = ABCEF # for CD→ EF, CD result so result=result
result before step3 is ABCE and after step 3 is ABCEF which is different so repeat same
as step 3.
Step-4: Third loop
result = ABCEF # for A → BC, A result so result=result BC
result = ABCEF # for E→ CF, E result so result=result CF
result = ABCEF # for B→ E, B result so result=result E
result = ABCEF # for CD→EF, CD result so result=result
result before step4 is ABCEF and after step 3 is ABCEF which is same so stop.
So Closure of {A, B}+ is {A, B, C, E, F}.
5
4 –Relational Database Design
Algorithm to find Canonical cover:
To compute a canonical cover for F:
repeat
Step 1: Use the union rule to replace any dependencies in F
1 1 and 1 2 with 1 1 2
6
4 –Relational Database Design
Using functional dependencies, this universal relation schema is decomposed into a set
of relation schemas D = {R1, R2, R3,…,Rm}.
Now, D becomes the relational database schema and D is referred as decomposition of
R.
Generally, decomposition is used to eliminate the pitfalls of the poor database design
during normalization process.
For example, consider the relation Account_Branch given in figure:
Account_Branch
Ano Balance Bname Baddress
A01 5000 Vvn Mota bazaar, VVNagar
A02 6000 Ksad Chhota bazaar, Karamsad
A03 7000 Anand Nana bazaar, Anand
A04 8000 Ksad Chhota bazaar, Karamsad
A05 6000 Vvn Mota bazaar, VVNagar
This relation can be divided with two different relations
1. Account (Ano, Balance, Bname)
2. Branch (Bname, Baddress)
These two relations are shown in below figure
Account Branch
Ano Balance Bname Bname Baddress
A01 5000 Vvn Vvn Mota bazaar, VVNagar
A02 6000 Ksad Ksad Chhota bazaar, Karamsad
A03 7000 Anand Anand Nana Bazar, Anand
A04 8000 Ksad
A05 6000 Vvn
7
4 –Relational Database Design
Example
A figure shows a relation Account. This relation is decomposed into two relations
Acc_Bal and Bal_Branch.
Now, when these two relations are joined on the common attributeBalance, the
resultant relation will look like Acct_Joined. This Acct_Joined relation contains rows in
addition to those in original relation Account.
Here, it is not possible to specify that in which branch account A01 or A02 belongs.
So, information has been lost by this decomposition and then join operation.
Acct_Bal
Ano Balance
Acct_Joined
Account A01 5000
Ano Balance Bname
Ano Balance Bname A02 5000
A01 5000 Vvn
A01 5000 Vvn A01 5000 Ksad
A02 5000 Ksad Bal_Branch
A02 5000 Vvn
Balance Bname
A02 5000 Ksad
5000 Vvn
5000 Ksad
Not same
In other words, decomposition is lossy if decompose into R1 and R2 and again combine
(join) R1 and R2 we cannot get original table as R1, over X, where R is an original
relation, R1 and R2 are decomposed relations, and X is a common attribute between
these two relations.
8
4 –Relational Database Design
In other words, decomposition is lossy if R = join of R1 and R2, over X, where R is an
original relation, R1 an R2 are decomposed relations, and x is a common attribute
between these two relations.
Acct_Bal
Ano Balance
Account A01 5000 Acct_Joined
Ano Balance Bname A02 5000 Ano Balance Bname
A01 5000 Vvn A01 5000 Vvn
A02 5000 Ksad Acct_Branch A02 5000 Ksad
Ano Bname
A01 Vvn
A02 Ksad
Same
9
4 –Relational Database Design
Update / Modification anomaly:
Suppose the manager of a department has changed, this requires that the Dmgr# in all
the tuples corresponding to that department must be changed to reflect the new status.
If we fail to update all the tuples of given department, then two different records of
employee working in the same department might show different Dmgr# lead to
inconsistency in the database.
This kind of problem is known as update or modification anomaly.
How anomalies in database design can be solved:
Such type of anomalies in database design can be solved by using normalization.
10
4 –Relational Database Design
Above relation has four attributes Cid, Name, Address, Contact_no. Here address is
composite attribute which is further divided in to sub attributes as Society and City.
Another attribute TypeofAccountHold is multi valued attribute which can store more
than one values. So above relation is not in 1NF.
Problem
Suppose we want to find all customers for some particular city then it is difficult to
retrieve. Reason is city name is combined with society name and stored whole as
address.
Solution
Divide composite attributes into number of sub- attribute and insert value in proper sub
attribute. AND
Split the table into two tables in such a way that
o first table contains all attributes except multi-valued attribute and
o other table contains multi-valued attribute and
o insert primary key of first table in second table as a foreign key.
So above table can be created as follows.
2NF
A relation R is in second normal form (2NF) if and only if it is in 1NF and every non-key
attribute is fully dependent on the primary key. OR
A relation R is in second normal form (2NF) if and only if it is in 1NF and no any non-key
attribute is partially dependent on the primary key.
Example
cid ano acess_date balance bname
Above relation has five attributes cid, ano, acess_date, balance, bname and two FDS
FD1 {cid,ano}{acess_date,balance,bname} and
FD2 ano{balance,bname}
We have cid and ano as primary key. As per FD2 balace and bname are only depend on
ano not cid. In above table balance and bname are not fully dependent on primary key
but these attributes are partial dependent on primary key. So above relation is not in
2NF.
11
4 –Relational Database Design
Problem
For example in case of joint account multiple customers have common accounts. If
some account says ‘A02’ is jointly by two customers says ‘C02’ and ‘C04’ then data
values for attributes balance and bname will be duplicated in two different tuples of
customers ‘C02’ and ‘C04’.
Solution
Decompose relation in such a way that resultant relation does not have any partial FD.
For this purpose remove partial dependent attribute that violets 2NF from relation.
Place them in separate new relation along with the prime attribute on which they are
full dependent.
The primary key of new relation will be the attribute on which it if fully dependent.
Keep other attribute same as in that table with same primary key.
So above table can be decomposed as per following.
3NF
A relation R is in third normal form (3NF) if and only if it is in 2NF and every non-key
attribute is non-transitively dependent on the primary key.
An attribute C is transitively dependent on attribute A if there exist an attribute B such
that: A B and B C.
Example
ano balance bname baddress
Above relation has four attributes ano, balance, bname, baddress and two FDS
FD1 ano{balance, bname, baddress} and
FD2 bnamebaddress
So from FD1 and FD2 and using transitivity rule we get anobaddress.
So there is transitively dependency from ano to baddress using bname in which
baddress is non-prime attribute.
So there is a non-prime attribute baddress which is transitively dependent on primary
key ano.
So above relation is not in 3NF.
12
4 –Relational Database Design
Problem
Transitively dependency results in data redundancy.
In this relation branch address will be stored repeatedly from each account of same
branch which occupy more space.
Solution
Decompose relation in such a way that resultant relation does not have any non-prime
attribute that are transitively dependent on primary key.
For this purpose remove transitively dependent attribute that violets 3NF from relation.
Place them in separate new relation along with the non-prime attribute due to which
transitive dependency occurred. The primary key of new relation will be this non-prime
attribute.
Keep other attributes same as in that table with same primary key.
So above table can be decomposed as per following.
bname baddress
BCNF
A relation R is in BCNF if and only if it is in 3NF and no any prime attribute is transitively
dependent on the primary key. OR
A relation R is in BCNF if and only if it is in 3NF and for every functional dependency X →
Y, X should be the super key of the table OR
A relation R is in BCNF if and only if it is in 3NF and for every functional dependency X → Y, X
should be the primary key of the table.
An attribute C is transitively dependent on attribute A if there exist an attribute B such
that AB and BC.
Example
Student_Project
Student Language Guide
Mita JAVA Patel
Nita VB Shah
Sita JAVA Jadeja
Gita VB Dave
Rita VB Shah
Nita JAVA Patel
Mita VB Dave
Rita JAVA Jadeja
13
4 –Relational Database Design
student language guide
Above relation has five attributes cid, ano, acess_date, balance, bname and two FDS
FD1 {student,language}guide and
FD2 guidelanguage
So from FD1 and FD2 and using transitivity rule we get studentlanguage
So there is transitively dependency from student to language in which language is prime
attribute.
So there is on prime attribute language which is transitively dependent on primary key
student.
So above relation is not in BCNF.
Problem
Transitively dependency results in data redundancy.
In this relation one student have more than one project with different guide then
records will be stored repeatedly from each student and language and guides
combination which occupies more space.
Solution
Decompose relation in such a way that resultant relation does not have any prime
attribute transitively dependent on primary key.
For this purpose remove transitively dependent prime attribute that violets BCNF from
relation. Place them in separate new relation along with the non-prime attribute due to
which transitive dependency occurred. The primary key of new relation will be this non-
prime attribute.
So above table can be decomposed as per following.
4NF
A table is in the 4NF if it is in BCNF and has no non multivalued dependencies.
14
4 –Relational Database Design
Example
The multi-valued dependency X Y holds in a relation R if for a dependency X → Y, if for
a single value of X, multiple (more than one) values of Y exists.
Suppose a student can have more than one subject and more than one activity.
Student_Info
Student_Id Subject Activity
100 Music Swimming
100 Accounting Swimming
100 Music Tennis
100 Accounting Tennis
150 Math Jogging
Note that all three attributes make up the Primary Key.
Note that Student_Id can be associated with many subject as well as many activities
(multi-valued dependency).
Suppose student 100 signs up for skiing. Then we would insert (100, Music, Skiing). This
row implies that student 100 skies as Music subject but not as an accounting subject, so
in order to keep the data consistent we must add one more row (100, Accounting,
Skiing). This is an insertion anomaly.
Suppose we have a relation R(A) with a multivalued dependency X Y. The MVD can
be removed by decomposing R into R1(R - Y) and R2(X U Y).
Here are the tables Normalized
15
4 –Relational Database Design
RollNo StudentName
101 Raj
102 Meet
103 Suresh
SubjectID SubjectName
1 DBMS
2 DS
3 DE
We cannot decomposition any of above three tables into the sub tables so above three
tables are in 5NF.
16
4 –Relational Database Design
17
4 –Relational Database Design
3 Normal Form or BCNF (3NF or BCNF)
Table-1
PLANT_ID PLANT_NAME
P1 Plant-A
P2 Plant-B
Table-2
MANAGER_ID MANAGER_NAME
E1 Ravi
E2 Meena
Table-3
PLANT_ID MANAGER_ID
P1 E1
P2 E2
Table-4
MACHINE_ID MANAGER_NAME
M1 Lathe
M2 Boiler
M3 Cutter
M4 CNC
Table-5
PLANT_ID MACHINE_ID
P1 M1
P1 M2
P2 M3
P2 M2
P2 M4
Table-6
SUPPLIER_ID SUPPLIER_NAME SUPPLIER_CITY
S1 Jay industry Ahmedabad
S2 Abb aplliance Surat
S3 Raj machinery Vadodara
S4 Daksh industry Rajkot
Table-7
MANAGER_ID SUPPLIER_ID
E1 S1
E1 S2
E2 S3
E2 S4
E2 S1
18
4 –Relational Database Design
4 Normal Form (4NF)
Table-1
PLANT_ID PLANT_NAME
P1 Plant-A
P2 Plant-B
Table-2
MANAGER_ID MANAGER_NAME
E1 Ravi
E2 Meena
Table-3
MACHINE_ID MACHINE_NAME
M1 Lathe
M2 Boiler
M3 Cutter
M4 CNC
Table-4
PLANT_MACHINE_ID PLANT_ID MACHINE_ID
PM1 P1 M1
PM2 P1 M2
PM3 P2 M3
PM4 P2 M2
PM5 P2 M4
Table-5
PLANT_MACHINE_ID MANAGER_ID
PM1 E1
PM2 E1
PM3 E2
PM4 E2
PM5 E2
Table-6
SUPPLIER_ID SUPPLIER_NAME SUPPLIER_CITY
S1 Jay industry Ahmedabad
S2 Abb aplliance Surat
S3 Raj machinery Vadodara
S4 Daksh industry Rajkot
19
4 –Relational Database Design
Table-7
MANAGER_ID SUPPLIER_ID
E1 S1
E1 S2
E2 S3
E2 S4
E2 S1
20
4 –Relational Database Design
Above relation R is in 3NF because there is no non-prime attributes. That is, every
column (attribute) is a part of Super Key, so the right hand side of every FD must be a
part of super key. But it’s not in BCNF (Or 4NF or 5NF).
21
4 –Relational Database Design
22
4 –Relational Database Design
1NF
Employee Number, Employee Name, Date of Birth, Department Code, Department
Name
Employee Number, Project Code, Project Description, Project Supervisor
2NF
Employee Number, Employee Name, Date of Birth, Department Code, Department
Name
Employee Number, Project Code,
Project Code, Project Description, Project Supervisor
3NF
Employee Number, Employee Name, Date of Birth, Department Code
Department Code, Department Name
Employee Number, Project Code
Project Code, Project Description, Project Supervisor
23
5 – Query Processing & Optimization
Query optimizer
Execution plan
Query code generator
Result of query
A query expressed in a high level query language such as SQL must be
Scanned
Parsed
Validated
The scanner identifies the language tokens such as SQL keywords, attribute names and
relation names in the text of the query.
Parser checks the query syntax to determine whether it is formulated according to the
syntax rules of the query language.
The query must also be validated by checking that all attributes and relation names are
valid and semantically meaningful names in the schema of the particular database being
queried.
A query typically has many possible execution strategies and the process of choosing a
suitable one for processing a query is known as query optimization.
The query optimizer module has the task of producing an execution plan and code
generator generates the code to execute that plan.
The runtime database processor has the task of running the query code whether in
compiled or interpreted mode, to produce the query result.
1
5 – Query Processing & Optimization
If a runtime error results, an error message is generated by the runtime database
processor.
Query code generator will generate code for query.
Runtime database processor will select optimal plan and execute query and gives result.
Explain different search algorithm for selection operation. OR
Explain linear search and binary search algorithm for selection
operation.
There are two scan algorithms to implement the selection operation:
1. Linear search
2. Binary search
Linear search
In a linear search, the systems scans each file block and tests all records to see whether
they satisfy the selection condition.
For a selection on a key attribute, the system can terminate the scan if the requires
record is found, without looking at the other records of the relation.
The cost of linear search in terms of number of I/O operations is br where br is the
number of blocks in file.
Selection on key attribute has an average cost of br/2.
It may be a slower algorithm than any another algorithm.
This algorithm can be applied to any file regardless of the ordering of file or the
availability of indices or the nature of selection operation.
Binary search
If the file is ordered on attribute and the selection condition is an equality comparison
on the attribute, we can use a binary search to locate the records that satisfy the
condition.
The number of blocks that need to be examined to find a block containing the required
record is log(br).
If the selection is on non-key attribute more than one block may contain required
records and the cost of reading the extra blocks has to be added to the cost estimate.
Explain various steps involved in query evaluation. OR
Explain query evaluation process. OR
Explain evaluation expression process in query optimization.
There are two methods for the evaluation of expression
1. Materialization
2. Pipelining
Materialization
In this method we start from bottom of the tree and each expression is evaluated one
by one in bottom to top order. The result of each expression is materialized (stored) in
temporary relation (table) for later use.
2
5 – Query Processing & Optimization
∏customer-name
balalce<2500
customer
account
In our example, there is only one such operation, selection operation on account.
The inputs to the lowest level operation are relations in the database.
We execute these operations and we store the results in temporary relations.
We can use these temporary relations to execute the operation at the next level up in
the tree, where the inputs now are either temporary relations or relations stored in the
database.
In our example the inputs to join are the customer relation and the temporary relation
created by the selection on account.
The join can now be evaluated, creating another temporary relation.
By repeating the process, we will finally evaluate the operation at the root of the tree,
giving the final result of the expression.
In our example, we get the final result by executing the projection operation at the root
of the tree, using as input the temporary relation created by the join. Evaluation just
described is called materialized evaluation, since the results of each intermediate
operation are created and then are used for evaluation of the next level operations.
The cost of a materialized evaluation is not simply the sum of the costs of the operations
involved. To compute the cost of evaluating an expression is to add the cost of all the
operation as well as the cost of writing intermediate results to disk.
The disadvantage of this method is that it will create temporary relation (table) and that
relation is stored on disk which consumes space on disk.
It evaluates one operation at a time, starting at the lowest level.
Pipelining
We can reduce the number of temporary files that are produced by combining several
relations operations into pipeline operations, in which the results of one operation are
passed along to the next operation in the pipeline. Combining operations into a pipeline
eliminates the cost reading and writing temporary relations.
In this method several expression are evaluated simultaneously in pipeline by using the
result of one operation passed to next without storing it in a temporary relation.
3
5 – Query Processing & Optimization
σbranch-city=’pune’
account depositor
4
5 – Query Processing & Optimization
To choose from the different query evaluation plan, the optimizer has to estimate the
cost of each evaluation plan.
Optimizer use statically information about the relation such as relation size and index
depth to make a good estimate of the cost of a plan.
Explain transformation of relational expression to equivalent
relational expression.
Two relational algebra expressions are said to be equivalent (same) if on every legal
database operation, the two expressions gives the same set of tuples (records).
Sequence of records may be different but no of records must be same.
Equivalence rules
This rule says that expressions of two forms are same.
We can replace an expression of first form by an expression of the second form.
The optimizer uses equivalence rule to transform expression into other logically same
expression.
We use
θ1, θ2, θ3 and so on to denote condition
L1, L2, L3 and so on to denote list of attributes (columns)
E1, E2, E3 and so on to denote relational algebra expression.
Rules 1
Combined selection operation can be divided into sequence of individual selections. This
transformation is called cascade of σ.
σθ1Λθ2 (E)=σθ1(θ2 (E))
Rules 2
Selection operations are commutative.
σθ1(θ2 (E))=σθ2(θ1 (E))
Rules 3
If more than one projection operation is used in expression then only the outer
projection operation is required. So skip all the other inner projection operation.
∏L1 (∏L2 (… (∏Ln (E))…)) = ∏L1 (E)
Rules 4
Selection operation can be joined with Cartesian product and theta join.
σθ (E1 E2) = E1 θ E2
σθ1 (E1 θ2 E2) = E1 θ1Λθ2 E2
Rules 5
Theta operations are commutative.
E1 θ2 E2 = E2 θ2 E1
Rules 6
Natural join operations are associative.
(E1 E2) E3 = E1 (E2 E3)
Theta join operations are associative.
5
5 – Query Processing & Optimization
(E1 θ1E2) θ2Λθ3E3 = E1 θ1Λθ3 (E2 θ2E3)
Rules 7
The selection operation distribute over theta join operation under the following
condition
It distribute when all the attributes in the selection condition θ0 involves only
the attributes of the one of the expression (says E1) being joined.
σθ0 (E1 E2) = (σθ0 (E1)) θ E2
It distributes when the selection condition θ1involves only the attributes of E1
and θ2 involves only the attributes of E2.
σθ1Λθ2 (E1 θ E2) = (σθ1(E1) θ (σθ2 (E2)))
Explain the purpose of sorting with example with reference to
query optimization.
Purpose of Sorting
Several of the relational operations such as joins can be implemented efficiently if the
input relations are first sorted.
We can sort a relation by building an index on the sort key and then using that index to
read the relation in sorted order.
Such a process orders the relation only logically rather than physically.
Hence reading of tuples in the sorted order may lead to disk access for each record.
So it is desirable to order the records physically.
Sorting could be applied to both relations that fit entirely in main memory and for
relations bigger than main memory.
Sorting of relation that fit into main memory, standard sorting techniques such as quick-
sort can be used.
Sorting of relations that do not fit in main memory is called external sorting.
Most commonly used algorithm for this type of sorting is external sort merge algorithm.
It consists mainly two steps:
Step 1 (Create sorted runs)
Let M denotes memory size (in pages).
Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
Step 2 (Merge the runs)
6
5 – Query Processing & Optimization
Merge the runs (N-way merge). We assume (for now) that N < M.
1) Use N blocks of memory to buffer input runs, and 1 block to
buffer output. Read the first block of each run into its buffer page
2) repeat
I. Select the first record (in sort order) among all buffer
pages
II. Write the record to the output buffer. If the output buffer
is full write it to disk.
III. Delete the record from its input buffer page.
If the buffer page becomes empty then read the next block
(if any) of the run into the buffer.
3) until all input buffer pages are empty:
If N M, several merge passes are required.
In each pass, contiguous groups of M - 1 runs are merged.
A pass reduces the number of runs by a factor of M -1, and creates runs longer
by the same factor.
E.g. If M=11, and there are 90 runs, one pass reduces the number of runs
to 9, each 10 times the size of the initial runs
Repeated passes are performed till all runs have been merged into one.
The output of the merge stage is the sorted relation.
The output file is buffered to reduce the number of disk operations.
If the relation is much larger than memory there may be M or more runs generated in
the first stage and it is not possible to allocate a page frame for each run during the
merge stage.
In this case merge operation proceed in multiple passes.
Since there is enough memory for M-1 input buffer pages each merge can take M-1 runs
as input.
Explain the measures of query cost, selection operation and
join. OR
Explain the measures of finding out the cost of a query in query
processing.
Measures of query cost
The cost of query evaluation can be measured in terms of a number of different
resources including disk access, CPU time to execute a query and in a distributed or
parallel database system the cost of communication.
7
5 – Query Processing & Optimization
The response time for a query evaluation plan i.e the time required to execute the plan
(assuming no other activity is going on) on the computer would account for all these
activities.
In large database system, however disk accesses are usually the most important cost,
since disk access are slow compared to in memory operation.
Moreover, CPU speeds have been improving much faster than have a disk speed.
Therefore it is likely that the time spent in disk activity will continue to dominate the
total time to execute a query.
Estimating the CPU time is relatively hard, compared to estimating disk access cost.
Therefore disk access cost a reasonable measure of the cost of a query evaluation plan.
Disk access is the predominant cost (in terms of time) relatively easy to estimate;
therefore number of block transfers from/to disk is typically used as measures.
We use the number of block transfers from disk as a measure of actual cost.
To simplify our computation, we assume that all transfer of blocks have same cost.
To get more precise numbers we need to distinguish between sequential I/O where
blocks read are contiguous on disk and random I/O where blocks are non-contiguous
and an extra seek cost must be paid for each disk I/O operations.
We also need to distinguish between read and write of blocks since it takes more time
to write a block on disk than to read a block from disk.
Selection Operation
There are two scan algorithms to implement the selection operation:
1. Linear search
2. Binary search
Linear search
In a linear search, the system scans each file block and tests all records to see whether
they satisfy the selection condition.
For a selection on a key attribute, the system can terminate the scan if the requires
record is found, without looking at the other records of the relation.
The cost of linear search in terms of number of I/O operations is br where br is the
number of blocks in file.
Selection on key attribute has an average cost of br/2.
It may be a slower algorithm than any another algorithm.
This algorithm can be applied to any file regardless of the ordering of file or the
availability of indices or the nature of selection operation.
Binary search
If the file is ordered on attribute and the selection condition is an equality comparison
on the attribute, we can use a binary search to locate the records that satisfy the
condition.
8
5 – Query Processing & Optimization
The number of blocks that need to be examined to find a block containing the required
record is log(br).
If the selection is on non-key attribute more than one block may contain required
records and the cost of reading the extra blocks has to be added to the cost estimate.
Join
Like selection, the join operation can be implemented in a variety of ways.
In terms of disk access, the join operation can be very expensive, so implementing and
utilizing efficient join algorithms is critical in minimizing a query’s execution time.
For example consider
depositor customer
We assume following information about the two above relations
I. Number of records of customer ncustomer = 10,000
II. Number of blocks of customer bcustomer = 400
III. Number of records of depositor ndepositor = 5,000
IV. Number of blocks of depositor bdepositor = 100
There are four types of algorithms for join operations.
1. Nested loop join
2. Indexed nested loop join
3. Merge join
4. Hash join
9
6 – Storage Strategies
Search-key Pointer
The first column is the Search key that contains a copy of the primary key or candidate
key of the table. These values are stored in sorted order so that the corresponding data
can be accessed quickly.
The second column is the Data Reference or Pointer which contains a set of pointers
holding the address of the disk block where that particular key value can be found.
Explain different attributes of Indexing.
The indexing has various attributes:
Access Types: This refers to the type of access such as value based search, range access,
etc.
Access Time: It refers to the time needed to find particular data element or set of
elements.
Insertion Time: It refers to the time taken to find the appropriate space and insert a new
data.
1
6 – Storage Strategies
Deletion Time: Time taken to find an item and delete it as well as update the index
structure.
Space Overhead: It refers to the additional space required by the index.
Explain different Indexing Methods (Types).
Different indexing methods are:
Primary Indexing
Dense Indexing
Parse Indexing
Secondary Indexing
Clustering Indexing
Primary Indexing
If the index is created on the primary key of the table, then it is known as primary index.
These primary keys are unique to each record.
As primary keys are stored in sorted order, the performance of the searching operation is
quite efficient.
Student (RollNo, Name, Address, City, MobileNo) [RollNo is primary key]
CREATE INDEX idx_StudentRno
ON Student (RollNo);
The primary index can be classified into two types:
Dense index
Sparse index
Dense Index
2
6 – Storage Strategies
In dense index, there is an index record for every search key value in the database.
This makes searching faster but requires more space to store index records.
In this, the number of records in the index table is same as the number of records in the
main table.
Index records contain search key value and a pointer to the actual record on the disk.
Sparse Index
In sparse index, index records are not created for every search key.
The index record appears only for a few items in the data file.
It requires less space, less maintenance overhead for insertion, and deletions but is slower
compared to the dense index for locating records.
To search a record in sparse index we search for a value that is less than or equal to value
in index for which we are looking.
After getting the first record, linear search is performed to retrieve the desired record.
In the sparse indexing, as the size of the main table grows, the size of index table also
grows.
3
6 – Storage Strategies
Secondary Index
4
6 – Storage Strategies
Clustering Index
Sometimes the index is created on non-primary key columns which may not be unique
for each record.
In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index.
The records which have similar characteristics are grouped, and indexes are created for
these group.
Explain B-tree.
B-tree is a data structure that store data in its node in sorted order.
We can represent sample B-tree as follows.
Root Node
Intermediary Node 11
3, 6 16, 20
Leaf Node
B-tree stores data in such a way that each node contains keys in ascending order.
Each of these keys has two references to another two child nodes.
5
6 – Storage Strategies
The left side child node keys are less than the current keys and the right side child node
keys are greater than the current keys.
Searching a record in B-tree
Suppose we want to search 18 in the above B tree structure.
First, we will fetch for the intermediary node which will direct to the leaf node that can
contain a record for 18.
So, in the intermediary node, we will find a branch between 16 and 20 nodes.
Then at the end, we will be redirected to the fifth leaf node. Here DBMS will perform a
sequential search to find 18.
6
6 – Storage Strategies
Dynamic hashing
The drawback of static hashing is that that it does not expand or shrink dynamically as the
size of the database grows or shrinks.
In dynamic hashing, data buckets grows or shrinks (added or removed dynamically) as the
records increases or decreases.
Dynamic hashing is also known as extended hashing.
In dynamic hashing, the hash function is made to produce a large number of values.
For Example, there are three data records D1, D2 and D3 .
The hash function generates three addresses 1001, 0101 and 1010 respectively.
This method of storing considers only part of this address – especially only first one bit to
store the data.
So it tries to load three of them at address 0 and 1.
D1
0
D2
1
D3
But the problem is that no bucket address is remaining for D3.
The bucket has to grow dynamically to accommodate D3.
So it changes the address have 2 bits rather than 1 bit, and then it updates the existing
data to have 2 bit address.
Then it tries to accommodate D3.
7
6 – Storage Strategies
00
D1 01
D2 10
D3
11
8
7 – Transaction Processing
1
7 – Transaction Processing
Durability
After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
Once your transaction completed up to step 6 its result must be stored permanently. It
should not be removed if system fails.
Explain different states in transaction processing in database.
OR
Explain State Transition Diagram (Transaction State Diagram).
Because failure of transaction may occur, transaction is broken up into states to handle
various situations.
Following are the different states in transaction processing in database
Active
Partial committed
Failed
Aborted
Committed
Partial Committed
committed
Active
Failed Aborted
2
7 – Transaction Processing
Failed
After the discovery that normal execution can no longer proceed.
Once a transaction cannot be completed, any changes that it made must be undone
rolling it back.
Aborted
The state after the transaction has been rolled back and the database has been restored
to its state prior to the start of the transaction.
Committed
The transaction enters in this state after successful completion of the transaction.
We cannot abort or rollback a committed transaction.
Explain following terms.
Schedule
A schedule is the chronological (sequential) order in which instructions are executed in a
system.
A schedule for a set of transaction must consist of all the instruction of those
transactions and must preserve the order in which the instructions appear in each
individual transaction.
Example of schedule (Schedule 1)
T1 T2
read(A)
A:=A-50
write(A)
read(B)
B:= B+ 50
write(B)
read(A)
temp: A * 0.1
A: A-temp
write (A)
read(B)
B:=B +temp
write(B)
Serial schedule
Schedule that does not interleave the actions of different transactions.
In schedule 1 the all the instructions of T1 are grouped and run together. Then all the
instructions of T2 are grouped and run together.
Means schedule 2 will not start until all the instructions of schedule 1 are complete. This
type of schedules is called serial schedule.
3
7 – Transaction Processing
Interleaved schedule
Schedule that interleave the actions of different transactions.
Means schedule 2 will start before all instructions of schedule 1 are completed. This
type of schedules is called interleaved schedule.
T1 T2
read(A)
A:=A-50
write(A)
read(A)
temp: A * 0.1
A: A-temp
write (A)
read(B)
B:= B+ 50
write(B)
read(B)
B:=B +temp
write(B)
Equivalent schedules
Two schedules are equivalent schedule if the effect of executing the first schedule is
identical (same) to the effect of executing the second schedule.
We can also say that two schedule are equivalent schedule if the output of executing
the first schedule is identical (same) to the output of executing the second schedule.
Serializable schedule
A schedule that is equivalent (in its outcome) to a serial schedule has the serializability
property.
Example of serializable schedule
Schedule 1 Schedule 2
T1 T2 T3 T1 T2 T3
read(X) read(X)
write(X) read(Y)
read(Y) read(Z)
write(Y) write(X)
read(Z) write(Y)
write(Z) write(Z)
In above example there are two schedules as schedule 1 and schedule 2.
In schedule 1 and schedule 2 the order in which the instructions of transaction are
executed is not the same but whatever the result we get is same. So this is known as
serializability of transaction.
4
7 – Transaction Processing
5
7 – Transaction Processing
So in above example in schedule S two instructions read(A) and write(A) of transaction
T2 and two instructions read(B) and write(B) of transaction T1 are interchanged and we
get schedule S’.
Therefore Schedule S is conflict serializable.
Schedule S’’
T3 T4
read(Q)
write(Q)
write(Q)
We are unable to swap instructions in the above schedule S’’ to obtain either the serial
schedule < T3, T4 >, or the serial schedule < T4, T3 >.
So above schedule S’’ is not conflict serializable.
View serializability
Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met, for each data item Q,
1. If in schedule S, transaction Ti reads the initial value of Q, then in schedule S’ also
transaction Ti must read the initial value of Q.
2. If in schedule S transaction Ti executes read(Q), and that value was produced by
transaction Tj (if any), then in schedule S’ also transaction Ti must read the value
of Q that was produced by the same write(Q) operation of transaction T j .
3. The transaction Ti (if any) that performs the final write(Q) operation in schedule
S then in schedule S’ also the final write(Q) operation must be performed by T i.
A schedule S is view serializable if it is view equivalent to a serial schedule.
Every conflict serializable schedule is also view serializable but every view serializable is
not conflict serializable.
Below is a schedule which is view serializable but not conflict serializable.
Schedule S
T3 T4 T6
read(Q)
write(Q)
write(Q)
write(Q)
Above schedule is view serializable but not conflict serializable because all the
transactions can use same data item (Q) and all the operations are conflict with each
other due to one operation is write on data item (Q) and that’s why we cannot
interchange any non conflict operation of any transaction.
6
7 – Transaction Processing
7
7 – Transaction Processing
To recover a restored backup, data is updated using redo command after the backup
was taken.
Database server such as SQL server or ORACLE server performs cash recovery and
instance recovery automatically after an instance failure.
In case of media failure, a database administrator (DBA) must initiate a recovery
operation.
Recovering a backup involves two distinct operations: rolling the backup forward to a
more recent time by applying redo data and rolling back all changes made in
uncommitted transactions to their original state.
In general, recovery refers to the various operations involved in restoring, rolling
forward and rolling back a backup.
Backup and recovery refers to the various strategies and operations involved in
protecting the database against data loss and reconstructing the database.
Explain Log based recovery method.
Log based recovery
The most widely used structure for recording database modification is the log.
The log is a sequence of log records, recording all the update activities in the database.
In short Transaction log is a journal or simply a data file, which contains history of all
transaction performed and maintained on stable storage.
Since the log contains a complete record of all database activity, the volume of data
stored in the log may become unreasonable large.
For log records to be useful for recovery from system and disk failures, the log must
reside on stable storage.
Log contains
1. Start of transaction
2. Transaction-id
3. Record-id
4. Type of operation (insert, update, delete)
5. Old value, new value
6. End of transaction that is committed or aborted.
All such files are maintained by DBMS itself. Normally these are sequential files.
Recovery has two factors Rollback (Undo) and Roll forward (Redo).
When transaction Ti starts, it registers itself by writing a <Ti start>log record
Before Ti executes write(X), a log record <Ti, X, V1, V2> is written, where V1 is the value
of X before the write, and V2 is the value to be written to X.
Log record notes that Ti has performed a write on data item Xj
Xj had value V1 before the write, and will have value V2 after the write.
When Ti finishes it last statement, the log record <Ti commit> is written.
Two approaches are used in log based recovery
1. Deferred database modification
2. Immediate database modification
8
7 – Transaction Processing
Log based Recovery Techniques
Once a failure occurs, DBMS retrieves the database using the back-up of database and
transaction log. Various log based recovery techniques used by DBMS are as per below:
1. Deferred Database Modification
2. Immediate Database Modification
Both of the techniques use transaction logs. These techniques are explained in following
sub-sections.
Explain Deferred Database Modification log based recovery
method.
Concept
Updates (changes) to the database are deferred (or postponed) until the transaction
commits.
During the execution of transaction, updates are recorded only in the transaction log and
in buffers. After the transaction commits, these updates are recorded in the database.
When failure occurs
If transaction has not committed, then it has not affected the database. And so, no need
to do any undoing operations. Just restart the transaction.
If transaction has committed, then, still, it may not have modified the database. And so,
redo the updates of the transaction.
Transaction Log
In this technique, transaction log is used in following ways:
Transaction T starts by writing <T start> to the log.
Any update is recorded as <T, X, V>, where V indicates new value for data item X. Here,
no need to preserve old value of the changed data item. Also, V is not written to the X in
database, but it is deferred.
Transaction T commits by writing <T commit> to the log. Once this is entered in log,
actual updates are recorded to the database.
If a transaction T aborts, the transaction log record is ignored, and no any updates are
recorded to the database.
Example
Consider the following two transactions, T0 and T1 given in figure, where T0 executes
before T1. Also consider that initial values for A, B and C are 500, 600 and 700
respectively.
Transaction – T0 Transaction – T1
Read (A) Read (C)
A =A -100 C=C-200
Write (A) Write (C)
Read (B)
B =B+ 100
Write (B)
9
7 – Transaction Processing
The following figure shows the transaction log for above two transactions at three
different instances of time.
Time Instance (a) Time Instance (b) Time Instance (c)
<T0 start> < T0 start> < T0 start>
< T0, A, 400> < T0, A, 400> < T0, A, 400>
< T0, B, 700> < T0, B, 700> < T0, B, 700>
< T0 commit> < T0 commit>
<T1 start> < T1 start>
<T1, C, 500> < T1, C, 500>
< T1 commit>
If failure occurs in case of
1. No any REDO actions are required.
2. As Transaction T0 has already committed, it must be redone.
3. As Transactions T0 and T1 have already committed, they must be redone.
Explain Immediate Database Modification log based recovery
method.
Concept
Updates (changes) to the database are applied immediately as they occur without
waiting to reach to the commit point.
Also, these updates are recorded in the transaction log.
It is possible here that updates of the uncommitted transaction are also written to the
database. And, other transactions can access these updated values.
When failure occurs
If transaction has not committed, then it may have modified the database. And so, undo
the updates of the transaction.
If transaction has committed, then still it may not have modified the database. And so,
redo the updates of the transaction.
Transaction Log
In this technique, transaction log is used in following ways:
Transaction T starts by writing <T start> to the log.
Any update is recorded as <T, X, Vold, Vnew > where Vold indicates the original value of
data item X and Vnew indicates new value for X. Here, as undo operation is required, it
requires preserving old value of the changed data item.
Transaction T commits by writing <T commit> to the log.
If a transaction T aborts, the transaction log record is consulted, and required undo
operations are performed.
Example
Again, consider the two transactions, T0 and T1, given in figure, where T0 executes
before T1.
Also consider that initial values for A, B and C are 500, 600 and 700 respectively.
10
7 – Transaction Processing
The following figure shows the transaction log for above two transactions at three
different instances of time. Note that, here, transaction log contains original values also
along with new updated values for data items.
If failure occurs in case of -
Undo the transaction T0 as it has not committed, and restore A and B to 500 and 600
respectively.
Undo the transaction T1, restore C to 700; and, Redo the Transaction T0 set A and B to
400 and 700 respectively.
Redo the Transaction T0 and Transaction T0; and, set A and B to 400 and 700
respectively, while set C to 500.
Time Instance (a) Time Instance (b) Time Instance (c)
<T0 start> < T0 start> < T0 start>
< T0, A, 500, 400> < T0, A, 500, 400> < T0, A, 500, 400>
< T0, B, 600, 700> < T0, B, 600, 700> < T0, B, 600, 700>
< T0 commit> < T0 commit>
<T1 start> < T1 start>
<T1, C, 700, 500> < T1, C, 700, 500>
< T1 commit>
Explain system recovery procedure with Checkpoint record
concept.
Problems with Deferred & Immediate Updates
Searching the entire log is time-consuming.
It is possible to redo transactions that have already been stored their updates to the
database.
Checkpoint
A point of synchronization between database and transaction log file.
Specifies that any operations executed before this point are done correctly and stored
safely.
At this point, all the buffers are force-fully written to the secondary storage.
Checkpoints are scheduled at predetermined time intervals
Used to limit -
1. The size of transaction log file
2. Amount of searching, and
3. Subsequent processing that is required to carry out on the transaction log file.
When failure occurs
Find out the nearest checkpoint.
If transaction has already committed before this checkpoint, ignore it.
If transaction is active at this point or after this point and has committed before failure,
redo that transaction.
If transaction is active at this point or after this point and has not committed, undo that
11
7 – Transaction Processing
transaction.
Example
Consider the transactions given in following figure. Here, T c indicates checkpoint, while
Tf indicates failure time.
Here, at failure time -
1. Ignore the transaction T1 as it has already been committed before checkpoint.
2. Redo transaction T2 and T3 as they are active at/after checkpoint, but have
committed before failure.
3. Undo transaction T4 as it is active after checkpoint and has not committed.
Time Tc Tf
T1
T2
T3
T4
12
7 – Transaction Processing
At the start of the transaction, both tables are same and point· to same pages.
The shadow page table is never changed, and is used to restore the database in case of
any failure occurs. However, current page table entries may change during transaction
execution, as it is used to record all updates made to the database.
When the transaction completes, the current page table becomes shadow page table.
At this time, it is considered that the transaction has committed.
The following figure explains working of this technique.
As shown in this figure, two pages - page 2 & 5 - are affected by a transaction and
copied to new physical pages. The current page table points to these pages.
The shadow page table continues to point to old pages which are not changed by the
transaction. So, this table and pages are used for undoing the transaction.
Pages Page Table
Page 5 1
Page 1 2
Page 4 3
Page 2 4
Page 3 5
Page 6 6
Current Shadow
Page Tale Pages Page Table
Page 5(old) 1
Page 1 2
Page 4 3
Page 2(old) 4
Page 3 5
Page 6 6
Page 2(new)
Page 5(new)
Advantages
No overhead of maintaining transaction log.
Recovery is quite faster, as there is no any redo or undo operations required.
Disadvantages
Copying the entire page table is very expensive.
Data are scattered or fragmented.
After each transaction, free pages need to be collected by garbage collector. Difficult to
extend this technique to allow concurrent transactions.
13
7 – Transaction Processing
14
7 – Transaction Processing
T1 Time T2
--- T0 ---
Read X T1 ---
--- T2 Read X
Update X T3 ---
--- T4 Update X
--- T5 ---
How to avoid: In above example a transaction T2 must not update the data item (X)
until the transaction T1 can commit data item (X).
2. The dirty read problem: The dirty read arises when one transaction update some
item and then fails due to some reason. This updated item is retrieved by another
transaction before it is changed back to the original value.
T1 Time T2
--- T0 ---
--- T1 Update X
Read X T2 ---
--- T3 Rollback
--- T5 ---
How to avoid: In above example a transaction T1 must not read the data item (X)
until the transaction T2 can commit data item (X).
3. The incorrect retrieval problem: The inconsistent retrieval problem arises when one
transaction retrieves data to use in some operation but before it can use this data
another transaction updates that data and commits. Through this change will be
hidden from first transaction and it will continue to use previous retrieved data. This
problem is also known as inconsistent analysis problem.
Balance (A=200 B=250 C=150)
T1 Time T2
--- T0 ---
Read (A) T1 ---
Sum 200
Read (B) T2 ---
Sum Sum + 250 = 450
--- T3 Read (C)
--- T4 Update (C)
150 150 – 50 = 100
--- T5 Read (A)
--- T6 Update (A)
200 200 + 50 = 250
--- T7 COMMIT
Read (C) T8 ---
Sum Sum + 100 = 550
15
7 – Transaction Processing
How to avoid: In above example a transaction T2 must not read or update data item
(X) until the transaction T1 can commit data item (X).
What is concurrency control? Why Concurrency control is
needed?
Concurrency control
The technique is used to protect data when multiple users are accessing (using) same
data concurrently (at same time) is called concurrency control.
Concurrency control needed
If transactions are executed serially, i.e., sequentially with no overlap in time, no
transaction concurrency exists. However, if concurrent transactions with interleaving
operations are allowed in an uncontrolled manner, some unexpected, undesirable result
may occur. Here are some typical examples:
1. The lost update problem: This problem indicates that if two transactions T1 and T2
both read the same data and update it then effect of first update will be overwritten
by the second update.
2. The dirty read problem: The dirty read arises when one transaction updates some
item and then fails due to some reason. This updated item is retrieved by another
transaction before it is changed back to the original value.
3. The incorrect retrieval problem: The inconsistent retrieval problem arises when one
transaction retrieves data to use in some operation But before it can use this data
another transaction update that data and commits. Through this change will be
hidden from first transaction and it will continue to use previous retrieved data. This
problem is also known as inconsistent analysis problem.
Most high-performance transactional systems need to run transactions concurrently to
meet their performance requirements. Thus, without concurrency control such systems
can neither provide correct results nor maintain their databases consistent.
Define lock. Define locking. Explain lock based protocol.
Lock
A lock is a variable associated with data item to control concurrent access to that data
item.
Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.
Locking
One major problem in databases is concurrency.
Concurrency problems arise when multiple users try to update or insert data into a
database table at the same time. Such concurrent updates can cause data to become
corrupt or inconsistent.
Locking is a strategy that is used to prevent such concurrent updates to data.
Lock based protocol
A lock is a mechanism to control concurrent access to a data item
16
7 – Transaction Processing
Data items can be locked in two modes :
1. Exclusive (X) mode. Data item can be both read as well as written. X-lock is
requested using lock-X instruction.
2. Shared (S) mode. Data item can only be read. S-lock is requested using lock-S
instruction.
Lock requests are made to concurrency-control manager.
Transaction can proceed only after request is granted.
Lock-compatibility matrix
S X
S TRUE FALSE
X FALSE FALSE
A transaction may be granted a lock on an item if the requested lock is compatible with
locks already held on the item by other transactions
Any number of transactions can hold shared locks on an item, but if any transaction
holds an exclusive on the item no other transaction may hold any lock on the item.
If a lock cannot be granted, the requesting transaction is made to wait till all
incompatible locks held by other transactions have been released. The lock is then
granted.
T1 T2 Concurrency Control
Manager
Lock-X(B)
Grant-X(B,T1)
Read(B)
B=B-50
Write(B)
Unlock(B)
Lock-S(A)
Grant-S(A,T2)
Read (A)
Unlock(A)
Lock-S(B)
Grant-S(B,T2)
Read (B)
Unlock(B)
Display(A+B)
Lock-X(A)
Grant-X(A,T1)
Read(A)
A=A+50
Write(A)
Unlock(A)
17
7 – Transaction Processing
This locking protocol divides transaction execution phase into three parts.
1. In the first part, when transaction starts executing, transaction seeks grant for locks
it needs as it executes.
2. Second part is where the transaction acquires all locks and no other lock is required.
Transaction keeps executing its operation.
3. As soon as the transaction releases its first lock, the third phase starts. In this phase
a transaction cannot demand for any lock but only releases the acquired locks.
Lock acquisition Lock releasing
phase phase
18
7 – Transaction Processing
In this phase the transaction can only release locks, but cannot acquire any new
lock.
The transaction enters the shrinking phase as soon as it releases the first lock
after crossing the Lock Point.
From now on it has no option but to keep releasing all the acquired locks.
Initially the transaction is in growing phase, that is the transaction acquires locks as
needed.
Once the transaction releases lock, it enters the shrinking phase and no more lock
request may be issued.
Upgrading of lock is not possible in shrinking phase, but it is possible in growing phase.
The two phase locking protocol ensures serializability.
There are two different versions of the Two Phase Locking Protocol.
1) Strict Two Phase Locking Protocol
2) Rigorous Two Phase Locking Protocol.
Explain strict two phase locking. Explain its advantage and
disadvantage.
Strict Two Phase Locking Protocol
In this protocol, a transaction may release all the shared locks after the Lock Point has
been reached, but it cannot release any of the exclusive locks until the transaction
commits or aborts.
This ensures that any data is written by an uncommitted transaction are locked in
exclusive mode until the transaction commits and preventing other transaction from
reading that data.
This protocol solves dirty read problem.
Rigorous Two Phase Locking Protocol
In Rigorous Two Phase Locking Protocol, a transaction is not allowed to release any lock
(either shared or exclusive) until it commits. This means that until the transaction
commits, other transaction might acquire a shared lock on a data item on which the
uncommitted transaction has a shared lock; but cannot acquire any lock on a data item
on which the uncommitted transaction has an exclusive lock.
Advantages
Recovery is very easy.
Disadvantages
Concurrency is reduced
Explain time stamp based protocol.
Time stamp based protocol
This protocol uses either system time or logical counter to be used as a time-stamp.
Every transaction has a time-stamp associated with it and the ordering is determined by
the age of the transaction.
19
7 – Transaction Processing
A transaction created at 0002 clock time would be older than all other transaction,
which come after it.
For example, any transaction 'y' entering the system at 0004 is two seconds younger
and priority may be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the
system know, when last read was and write operation made on the data item.
Time stamp ordering protocol
The timestamp-ordering protocol ensures serializability among transaction in their
conflicting read and writes operations.
This is the responsibility of the protocol system that the conflicting pair of tasks should
be executed according to the timestamp values of the transactions.
Time-stamp of Transaction Ti is denoted as TS(Ti).
Read time-stamp of data-item X is denoted by R-timestamp(X).
Write time-stamp of data-item X is denoted by W-timestamp(X).
Timestamp ordering protocol works as follows:
If a transaction Ti issues read(X) operation:
If TS(Ti) < W-timestamp(X)
Operation rejected.
If TS(Ti) >= W-timestamp(X)
Operation executed.
All data-item Timestamps updated.
20
7 – Transaction Processing
Wait-for-graph
Table 1
Held by Wait for
Transaction 1 Transaction 2
In the above figure there are two transactions 1 and 2 and two table’s as table1 and
table 2.
Transaction 1 hold table 1 and wait for table 2. Transaction 2 hold table 2 and wait for
table 1.
Now the table 1 is wanted by transaction 2 and that is hold by transaction 1 and same
way table 2 is wanted by transaction 1 and that is hold by transaction 2. Until any one
can’t get this table they can’t precede further so this is called wait for graph. Because
both of these transaction have to wait for some resources.
When dead lock occurs
A deadlock occurs when two separate processes struggle for resources are held by one
another.
Deadlocks can occur in any concurrent system where processes wait for each other and
a cyclic chain can arise with each process waiting for the next one in the chain.
Deadlock can occur in any system that satisfies the four conditions:
1. Mutual Exclusion Condition: only one process at a time can use a resource or
each resource assigned to 1 process or is available.
2. Hold and Wait Condition: processes already holding resources may request new
resources.
3. No Preemption Condition: only a process holding a resource can release it
voluntarily after that process has completed its task or previously granted
resources cannot forcibly taken away from any process.
4. Circular Wait Condition: two or more processes forms circular chain where each
process requests a resource that the next process in the chain holds.
Explain deadlock detection and recovery.
Resource-Allocation Graph
A set of vertices V and a set of edges E.
V is partitioned into two types:
1. P = {P1, P2, …, Pn}, the set consisting of all the processes in the
system.
21
7 – Transaction Processing
2. R = {R1, R2, …, Rm}, the set consisting of all resource types in the
system.
R1 R3 R1 R3
P1 P2 P3 P1 P2 P3
R2 R4 R2 R4
22
7 – Transaction Processing
23
8 – Database Security
1
8 – Database Security
In above figure sender having data that he/she wants to send his/her data is known as
plaintext.
In first step sender will encrypt data (plain text) using encryption algorithm and some key.
After encryption the plaintext becomes ciphertext.
This ciphertext is not able to read by any unauthorized person.
This ciphertext is send to receiver.
The sender will send that key separately to receiver.
Once receiver receives this ciphertext, he/she will decrypt it by using that key send by
sender and decryption algorithm.
After applying decryption algorithm and key, receiver will get original data (plaintext) that
is sended by sender.
This technique is used to protect data when there is a chance of data theft.
In such situation if encrypted data is theft then it cannot be used (read) directly without
knowing the encryption technique and key.
2
8 – Database Security
There are two different method of data encryption
1. Symmetric key encryption / Private key encryption
In symmetric key schemes, the encryption and decryption keys are the same.
This same key is used by the sender to encrypt the data, and again by the
receiver to decrypt the data.
Symmetric encryption is fast in execution.
2. Asymmetric key encryption / Public key encryption
In asymmetric key schemes, the encryption and decryption keys are the
different (Public Key and Private Key).
Messages are encrypted by sender with one key (Public Key) and can be
decrypted by receiver only by the other key (Private Key).
Asymmetric Encryption is slow in execution due to the high computational
burden.
3
8 – Database Security
Object is table or view or any data item.
User is the name of user.
The grant option is used when user wants to pass the rights to other user. Means if a user
get a rights with the grant option then he/she can give this rights to another user.
When user executes a REVOKE command with the cascade option then it will take back
given rights from all the users who get those rights from this user.
Mandatory access control
In this method individual user cannot get rights. But all the users as well as all the objects
are classified into different categories. Each user is assigned a clearance level and each
object is assigned a security level.
A user can access object of particular security level only if he has proper clearance level.
The DBMS determines whether a given user can read or write a given object based on
some rules.
This rule contains security level of object and clearance level of the user.
This rule makes sure that sensitive data can never be passed to a user without necessary
clearance.
Components
The commonly used mandatory access control technique for multi-level security uses four
components as given below
1. Subjects:-Such as users, accounts, programs etc.
2. Objects:-Such as relation (table), tuples (records), attribute (column), view etc.
3. Clearance level:-Such as top secret (TS), secret (S), confidential (C), Unclassified
(U). Each subject is classified into one of these four classes.
4. Security level: -Such as top secret (TS), secret (S), confidential (C), Unclassified (U).
Each object is classified into one of these four classes.
In the above system TS>S>C>U, where TS>S means class TS data is more sensitive than
class S data.
Rules:-
A user can access data by following two rules
1. Security property:-
Security property states that a subject at a given security level may not read
an object at a higher security level.
2. Star (*) security property:-
Star (*) property states that a subject at a given security level may not write to
any object at a lower security level.
Role based access control (RBAC) rules
Role based access control (RBAC) rules
It restricts database access based on a person's role within an organization. The roles in
RBAC refer to the levels of access that employees have to the network.
Employees are only allowed to access the information necessary to effectively perform
their job duties.
4
8 – Database Security
Access can be based on several factors, such as authority, responsibility, and job
competency.
In addition, access to computer resources can be limited to specific tasks such as the
ability to view, create, or modify a file.
Lower-level employees usually do not have access to sensitive data if they do not need it
to fulfil their responsibilities.
Using RBAC will help in securing your company’s sensitive data and important
applications.
What is Intrusion Detection System?
An Intrusion Detection System (IDS) is a system or software application that monitors
network traffic or system for suspicious activity or policy violations and issues alerts when
such activity is discovered.
It is a software application that scans a network or a system for harmful activity or policy
breaching.
Any malicious venture or violation is normally reported either to an administrator or
collected centrally using a security information and event management (SIEM) system.
A SIEM system integrates outputs from multiple sources and uses alarm filtering
techniques to differentiate malicious activity from false alarms.
Types of Intrusion Detection Systems (IDS)
1. Network Intrusion Detection Systems (NIDS)
2. Host Intrusion Detection Systems (HIDS)
3. Knowledge-based (Signature-based) IDS
4. Behavior-based (Anomaly-based) IDS
Features of an Intrusion Detection System:
It monitors and analysis the user and system activities.
It performs auditing of the system files and other configurations and the operating
system.
It assesses the integrity of system and data files
It conducts analysis of patterns based on known attacks.
It detects errors in system configuration.
It detects and cautions if the system is in danger.
Advantages of Intrusion Detection Systems
The network or computer is constantly monitored for any invasion or attack.
The system can be modified and changed according to needs of specific client and can
help outside as well as inner threats to the system and network.
It effectively prevents any damage to the network.
It provides user friendly interface which allows easy security management systems.
Any alterations to files and directories on the system can be easily detected and reported.
5
8 – Database Security
6
9 – SQL Concepts
Check constraint
The check constraint is used to implement business rule. So, it is also called business
rule constraint.
Example of business rule: A balance in any account should not be negative.
Business rule define a domain for a particular column.
1
9 – SQL Concepts
The check constraint is bound to a particular column.
Once check constraint is implemented, any insert or update operation on that table
must follow this constraint.
If any operation violates condition, it will be rejected.
Syntax : ColumnName datatype (size) check(condition)
Example :
create table Account (ano int,
Balance decimal(8,2) CHECK (balance >= 0),
Branch varchar(10));
Any business rule validations can be applied using this constraint.
A condition must be some valid logical expression.
A check constraint takes longer time to execute.
On violation of this constraint, oracle display error message like – “check constraint
violated”.
Unique Constraint
Sometime there may be requirement that column cannot contain duplicate values.
A column, defined as a unique, cannot have duplicate values across all records.
Syntax : ColumnName datatype (size) UNIQUE
Example :
create table Account (ano int UNIQUE,
Balance decimal(8,2),
Branch varchar(10));
Though, a unique constraint does not allow duplicate values, NULL values can be
duplicated in a column defined as a UNIQUE column.
A table can have more than one column defined as a unique column.
If multiple columns need to be defined as composite unique column, then only table
level definition is applicable.
Maximum 16 columns can be combined as a composite unique key in a table.
2
9 – SQL Concepts
Example :
create table Account (ano int NOT NULL PRIMARY KEY,
Balance decimal(8,2),
Branch varchar(10));
A primary key constraint is combination of UNIQUE constraint and NOT NULL constraint.
A table cannot have more than one primary key.
If multiple columns need to be defined as primary key column, then only table level
definition is applicable.
Maximum 16 columns can be combined as a composite primary key in a table.
3
9 – SQL Concepts
It provides commands like:
CREATE: to create objects in a database.
ALTER: to alter the schema, or logical structure of the database.
DROP: to delete objects from the database.
TRUNCATE: to remove all records from the table.
CREATE: Create is used to create the database or its objects like table, view, index etc.
Create Table
The CREATE TABLE statement is used to create a new table in a database.
Syntax:
CREATE TABLE table_name
(
Column1 Datatype(Size) [ NULL | NOT NULL ],
Column2 Datatype(Size) [ NULL | NOT NULL ],
...
);
Example:
CREATE TABLE Students
(
Roll_No int(3) NOT NULL,
Name varchar(20),
Subject varchar(20)
);
Explanation:
The column should either be defined as NULL or NOT NULL. By default, a column can
hold NULL values.
The NOT NULL constraint enforces a column to NOT accept NULL values. This enforces a
field to always contain a value, which means that you cannot insert a new record, or
update a record without adding a value to this field.
ALTER: ALTER TABLE statement is used to add, modify, or drop columns in a table.
Add Column
The ALTER TABLE statement in SQL to add new columns in a table.
Syntax:
ALTER TABLE table_name
ADD Column1 Datatype(Size), Column2 Datatype(Size), … ;
Example:
ALTER TABLE Students
ADD Marks int;
Drop Column
The ALTER TABLE statement in SQL to drop a column in a table.
4
9 – SQL Concepts
Syntax:
ALTER TABLE table_name
DROP COLUMN column_name;
Example:
ALTER TABLE Students
DROP COLUMN Subject;
Modify Column
The ALTER TABLE statement in SQL to change the data type/size of a column in a table.
Syntax:
ALTER TABLE table_name
ALTER COLUMN column_name datatype(size);
Example:
ALTER TABLE Students
ALTER COLUMN Roll_No float;
DROP: Drop is used to drop the database or its objects like table, view, index etc.
Drop Table
The DROP TABLE statement is used to drop an existing table in a database.
Syntax:
DROP TABLE table_name;
Example:
DROP TABLE Students;
5
9 – SQL Concepts
Syntax:
INSERT INTO table_name (column1, column2, column3, ...)
VALUES (value1, value2, value3, ...);
OR
INSERT INTO table_name
VALUES (value1, value2, value3, ...);
Example:
INSERT INTO Students (Roll_No,Name,Subject)
VALUES (1,’anil’,’Maths’);
SELECT: The SELECT statement is used to select data from a database. The data returned is
stored in a result table, called the result-set.
Syntax:
SELECT column1, column2, ...
FROM table_name
WHERE condition;
OR
SELECT *
FROM table_name
WHERE condition;
6
9 – SQL Concepts
Example:
SELECT *
FROM Students
WHERE Roll_No>=1;
7
9 – SQL Concepts
BEGIN TRANSACTION t1
DELETE FROM STUDENT WHERE SPI <8;
COMMIT TRANSACTION t1;
Output:
Student
Rollno Name SPI
1 Raju 8
2 Hari 9
Note:
A transaction is a set of operations performed so that all operations are
guaranteed to succeed or fail as one unit.
If you place the BEGIN TRANSACTION before your SQL statement, the
transaction will automatically turn into the explicit transaction and it will
lock the table until the transaction is committed or rolled back.
2. ROLLBACK:
The ROLLBACK command is the transactional control command used to undo
transactions that have not already been saved to the database.
Rollbacks a transaction to the beginning of the transaction.
It is also used with SAVEPOINT command to jump to a savepoint in an ongoing
transaction.
You can use ROLLBACK TRANSACTION to erase all data modifications made from
the start of the transaction or to a savepoint.
Student
Rollno Name SPI
1 Raju 8
2 Hari 9
3 Mahesh 7
BEGIN TRANSACTION t1
DELETE FROM STUDENT WHERE SPI <8;
ROLLBACK TRANSACTION t1;
8
9 – SQL Concepts
Output:
Student
Rollno Name SPI
1 Raju 8
2 Hari 9
3 Mahesh 7
3. SAVEPOINT:
A SAVEPOINT is a point in a transaction when you can roll the transaction back to a
certain point without rolling back the entire transaction.
The ROLLBACK command is used to undo a group of transactions.
Student
Rollno Name SPI
1 Raju 8
2 Hari 9
3 Mahesh 7
BEGIN TRANSACTION t1
SAVE TRANSACTION s1
INSERT INTO Student Values (4,’Anil’,6);
SAVE TRANSACTION s2
INSERT INTO Student Values (5, Gita’,9);
9
9 – SQL Concepts
Explanation:
privilege_name is the access right or privilege want to take back from the
user. Some of the access rights are ALL, EXECUTE, and SELECT.
10
9 – SQL Concepts
object_name is the name of an database object like TABLE, VIEW,
STORED PROC and SEQUENCE.
user_name is the name of the user from whom an access right is being
taken back.
PUBLIC is used to take back access rights to all users.
CONSTRAINT fk_column
FOREIGN KEY (column1, column2, ... column_n)
REFERENCES parent_table (column1, column2, ... column_n)
ON DELETE CASCADE
);
11
9 – SQL Concepts
Ceiling(n) Returns the smallest integer value Select ceil(24.8);
that is smaller than or equal to a O/P : 25
number.
Floor(n) Returns the largest integer value Select Floor(24.8);
that is greater than or equal to a O/P : 24
number.
Power (m,n) Returns m raised to nth power. Select power(3,2);
O/P : 9
Round (n,m) Returns n rounded to m places the Select round(15.91,1);
right of decimal point. O/P : 15.9
Square(n) Returns square of n. Select Square(4);
O/P : 16
Sqrt(n) Returns square root of n. Select sqrt(25);
O/P : 5
Exp(n) Returns e raised to the nth power, Select exp(1);
O/P : 1
e=2.17828183.
X%Y Returns reminder of X modulus Y. Select 5%3;
O/P : 2
Pi() Returns the value of pi. Select Pi();
O/P : 3.14159265358979
Sin(n) Returns the sine value of n. Select Sin(0);
O/P : 0
Cos(n) Returns the cosine value of n. Select Cos(0);
O/P : 1
Tan(n) Returns the tangent value of n. Select Tan(0);
O/P : 0
Rand(n) Returns a random decimal number Select Rand();
between 0 and 1. O/P : 0.845610816728278
Log(n) Returns the log value of n having Select Log(1);
base e. O/P : 0
Log10(n) Returns the log value of n having Select Log10(1000);
base 10. O/P : 3
String function
ASCII(x) Returns ASCII value of character. Select ASCII(‘A’);
O/P : 65
Char(x) Returns a character of int ASCII Select CHAR(65);
code. O/P : A
Concat() Concatenates two strings. Select CONCAT(‘great’,’DIET’);
O/P : greatDIET
Len () Returns the number of character in Select LEN(‘DIET’);
x. O/P : 4
Lower() Converts the string to lower case. Select LOWER(‘DIET’);
O/P : diet
Upper() Converts the string to upper case. Select UPPER(‘diet’);
O/P : DIET
Ltrim() Trim blanks from the left of x. Select LTRIM(‘ diet ’);
O/P : diet
12
9 – SQL Concepts
Rtrim() Trim blanks from the right of x. Select RTRIM(‘ diet ’);
O/P : diet
Replace() Looks for the string and replace the Select Replace(‘this is college’,’is’,’may
string every time it occurs. be’);
O/P :thmay be may be college
Substring() Returns the part of string Select Substring(‘this is college’,6,7);
O/P : is my c
Left() Returns the specified number of Select Left(‘this is college’,7);
characters from the left. O/P : this is
Right() Returns the specified number of Select Right(‘this is college’,7);
characters from the right. O/P : college
Reverse() Returns the reversed string. Select Reverse(‘DIET’);
O/P : TEID
Space() Returns n number of spaces Select Space(10);
O/P :
Replicate() Returns repeated string for n Select Replicate(‘DIET’,2);
number of times. O/P :DIETDIET
Date function
Getdate() Returns current date and time. Select Getdate();
O/P : 2018-09-08 10:42:02.113
Day() Returns day of a given date. Select Day(‘23/JAN/2018’);
O/P : 23
Month() Returns month of a given date. Select Month(‘23/JAN/2018’);
O/P : 1
Year() Returns year of a given date. Select Year(‘23/JAN/2018’);
O/P : 2018
Isdate() Returns 1 if the expression is a valid Select Isdate(‘31/FEB/2018’);
date, otherwise 0. O/P : 0
Datename() Returns the specified part of a given Select Datename(month,‘1-23-2018’);
date as varchar value. O/P : January
Datepart() Returns the specified part of a given Select Datepart(month,‘1-23-2018’);
date as int value. O/P : 1
Dateadd() Returns datetime after adding n Select Dateadd(day,5,‘23/JAN/2018’);
numbers of datepart to a given O/P : 2018-01-28 00:00:00.000
date.
Datediff() Returns the difference between two Select Datediff(day,5,
date values, based on the interval ‘23/JAN/2018’,’23/FEB/2018’);
specified. O/P : 31
Eomonth() Returns the last day of month. Select Eomonth(’23/FEB/2018’);
O/P : 2018-01-31
13
9 – SQL Concepts
Example: Consider following table
Student
Rollno Name SPI
1 Raju 8
2 Hari 9
3 Mahesh 7
4 NULL 9
5 Anil 5
1. Avg() : It returns the average of the data values.
Select Avg(SPI) FROM Student;
Output: 7
2. Sum() : It returns the addition of the data values.
Select Sum(SPI) FROM Student;
Output: 38
3. Max() : It returns maximum value for a column.
Select Max(SPI) FROM Student;
Output: 9
4. Min() : It returns Minimum value for a column.
Select Min(SPI) FROM Student;
Output: 5
5. Count() : It returns total number of values in a given column.
Select Count(Name) FROM Student;
Output: 4
6. Count(*) : It returns the number of rows in a table.
Select Count(*) FROM Student;
Output: 5
14
9 – SQL Concepts
INNER JOIN
It returns records that have matching values in both tables.
Syntax:
SELECT columns
FROM table1 INNER JOIN table2
ON table1.column = table2.column;
Example:
Consider the following tables:
Student Result
RNO Name Branch RNO SPI
101 Raju CE 101 8.8
102 Amit CE 102 9.2
103 Sanjay ME 104 8.2
104 Neha EC 105 7
105 Meera EE 107 8.9
106 Mahesh ME
Output:
Inner Join
RNO Name Branch SPI
101 Raju CE 8.8
102 Amit CE 9.2
104 Neha EC 8.2
105 Meera EE 7
15
9 – SQL Concepts
SELECT Student.RNO, Student.Name, Student.Branch, Result.SPI
FROM Student LEFT OUTER JOIN Result
ON Student.RNO = Result.RNO;
Output:
Left Join
RNO Name Branch SPI
101 Raju CE 8.8
102 Amit CE 9.2
103 Sanjay ME NULL
104 Neha EC 8.2
105 Meera EE 7
106 Mahesh ME NULL
16
9 – SQL Concepts
Syntax:
SELECT columns
FROM table1 FULL OUTER JOIN table2
ON table1.column = table2.column;
Example:
Consider the Student and result tables:
SELECT Student.RNO, Student.Name, Student.Branch, Result.SPI
FROM Student FULL OUTER JOIN Result
ON Student.RNO = Result.RNO;
Output:
Full Join
RNO Name Branch SPI
101 Raju CE 8.8
102 Amit CE 9.2
103 Sanjay ME NULL
104 Neha EC 8.2
105 Meera EE 7
106 Mahesh ME NULL
NULL NULL NULL 8.9
CROSS JOIN
When each row of first table is combined with each row from the second table, known
as Cartesian join or cross join.
SQL CROSS JOIN returns the number of rows in first table multiplied by the number of
rows in second table.
Syntax:
SELECT columns
FROM table1 CROSS JOIN table2;
Example:
Consider the following tables:
Color Size
Code Name Amount
1 Red Small
2 Blue Large
17
9 – SQL Concepts
Output:
Cross Join
Code Name Amount
1 Red Small
2 Blue Small
1 Red Large
2 Blue Large
SELF JOIN
A self join is a regular join, but the table is joined with itself.
Here, we need to use aliases for the same table to set a self join between single table.
Syntax:
SELECT a.column, b.column
FROM tablename a CROSS JOIN tablename b
WHERE a.column=b.column;
Example:
Consider the following table:
Employee
EmpNo Name MngrNo
E01 Tarun E02
E02 Rohan E05
E03 Priya E04
E04 Milan NULL
E05 Jay NULL
E06 Anjana E03
Define view. What are the types of view? Write syntax to create
view of each type. Give an example of view.
Views are virtual tables that are compiled at runtime.
18
9 – SQL Concepts
The data associated with views are not physically stored in the view, but it is stored in
the base tables of the view.
A view can be made over one or more database tables.
Generally, we put those columns in view that we need to retrieve/query again and
again.
Once you have created the view, you can query view like as table.
TYPES OF VIEW
1. Simple View
2. Complex View
Syntax:
CREATE VIEW view_name
AS
SELECT column1, column2...
FROM table_name
[WHERE condition];
Simple View
When we create a view on a single table, it is called simple view.
In a simple view we can delete, update and Insert data and that changes are applied on
base table.
Insert operation are perform on simple view only if we have primary key and all not null
fields in the view.
Example:
Consider following table:
Employee
Eid Ename Salary Department
101 Raju 5000 Admin
102 Amit 8000 HR
103 Sanjay 3000 IT
104 Neha 7000 Sales
--Create View
CREATE VIEW EmpSelect
AS
SELECT Eid, Ename, Department
FROM Employee;
--Display View
Select * from EmpSelect;
19
9 – SQL Concepts
Output
Eid Ename Department
101 Raju Admin
102 Amit HR
103 Sanjay IT
104 Neha Sales
Complex View
When we create a view on more than one table, it is called complex view.
We can only update data in complex view.
You can't insert data in complex view.
In particular, complex views can contain: join conditions, a group by clause, an order by
clause etc.
Example:
Consider following table:
Employee ContactDetails
Eid Ename Salary Department Eid City Mobile
101 Raju 5000 Admin 101 Rajkot 1234567890
102 Amit 8000 HR 102 Ahmedabad 2345678901
103 Sanjay 3000 IT 103 Baroda 3456789120
104 Neha 7000 Sales 104 Rajkot 4567891230
--Create View
CREATE VIEW Empview
AS
SELECT Employee.Eid, Employee.Ename, ConcactDetails.City
FROM Employee Inner Join ConcactDetails
On Employee.Eid= ConcactDetails.Eid;
--Display View
Select * from Empview;
Output
Eid Ename City
101 Raju Rajkot
102 Amit Ahmedabad
103 Sanjay Baroda
104 Neha Rajkot
20
9 – SQL Concepts
21
10 – PL/SQL Concepts
Execution section
Explanation
Create:-It will create a procedure.
Alter:- It will re-create a procedure if it already exists.
We can pass parameters to the procedures in three ways.
1. IN-parameters: - These types of parameters are used to send values to stored procedures.
2. OUT-parameters: - These types of parameters are used to get values from stored
procedures. This is similar to a return type in functions but procedure can return values
for more than one parameters.
3. IN OUT-parameters: - This type of parameter allows us to pass values into a procedure
and get output values from the procedure.
AS indicates the beginning of the body of the procedure.
The syntax within the brackets [ ] indicates that they are optional.
1
10 – PL/SQL Concepts
By using CREATE OR ALTER together the procedure is created if it does not exist and if it exists
then it is replaced with the current code (The only disadvantage of CREATE OR ALTER is that it
does not work in SQL Server versions prior to SQL Server 2016).
2
10 – PL/SQL Concepts
Advantages of procedure
Security:- We can improve security by giving rights to selected persons only.
Faster Execution:- It is precompiled so compilation of procedure is not required every
time you call it.
Sharing of code:- Once procedure is created and stored, it can be used by more than one
user.
Productivity:- Code written in procedure is shared by all programmers. This eliminates
redundant coding by multiple programmers so overall improvement in productivity.
Syntax of Trigger
CREATE [OR ALTER] TRIGGER trigger_name
ON table_name
{ FOR | AFTER | INSTEAD OF }
{ [ INSERT ] [ , ] [ UPDATE ] [ , ] [ DELETE ] }
AS
BEGIN
Executable statements
END;
CREATE [OR ALTER] TRIGGER trigger_name:- This clause creates a trigger with the given name
or overwrites an existing trigger.
[ON table_name]:- This clause identifies the name of the table to which the trigger is related.
[FOR | AFTER | INSTEAD OF]:- This clause indicates at what time the trigger should be fired. FOR
and AFTER are similar.
[INSERT / UPDATE / DELETE]:- This clause determines on which kind of statement the trigger
should be fired. Either on insert or update or delete or combination of any or all. More than one
statement can be used together separated by Comma. The trigger gets fired at all the specified
triggering event.
Example 1
Trigger to display a message when we perform insert operation on student table.
4
10 – PL/SQL Concepts
AS
BEGIN
print ‘Record inserted successfully'
END
OUTPUT:- Trigger is created.
Now when you perform insert operation on student table.
Insert into student values (1, ’Raj’, ‘CE’);
It displays following message after executing insert statement.
Output:- (1 row(s) affected)
Record inserted successfully
Example 2
Trigger to insert history into Audit table when we perform insert operation on student table.
CREATE TRIGGER tgr_student_forinsert
ON Student
FOR INSERT
AS
BEGIN
DECLARE @id int
SELECT @rno= rno from INSERTED
INSERT INTO Audit VALUES
('New student with rno=‘ + cast(@rno as varchar(10)) +
'is added in student table ‘)
END
3) Fetching data:-
We cannot process selected row directly. We have to fetch column values of a row into
memory variables. This is done by FETCH statement.
Syntax:-
FETCH NEXT FROM cursorname INTO variable1, variable2………
4) Processing data:-
This step involves actual processing of current row.
6
10 – PL/SQL Concepts
5) Closing cursor:-
A cursor should be closed after the processing of data completes. Once you close the
cursor it will release memory allocated for that cursor.
Syntax:-
CLOSE cursorname;
6) Deallocating cursor:-
It is used to delete a cursor and releases all resources used by cursor.
Syntax:-
DEALLOCATE cursorname;
Example 1:- Cursor to insert record from student table to student1 table if branch is CE.
Example 2:- Cursor to update SPI (SPI=SPI-7) if SPI remains greater than or equal to ZERO after
update.
DECLARE
@rno int, @spi decimal(8,2);
DECLARE cursor_student CURSOR
FOR SELECT rno, spi
FROM student;
OPEN cursor_student;
FETCH NEXT FROM cursor_student INTO
@rno, @spi;
WHILE @@FETCH_STATUS = 0
BEGIN
set @spi=@spi-7
if (@spi<0)
7
10 – PL/SQL Concepts
print 'SPI must be greater than 0'
else
update student
set spi=@spi
where rno=@rno
FETCH NEXT FROM cursor_student INTO
@rno, @spi;
END;
CLOSE cursor_student;
DEALLOCATE cursor_student;