0% found this document useful (0 votes)
49 views31 pages

DBMS 1

The document provides an overview of Database Management Systems (DBMS), explaining key concepts such as data, databases, and the role of DBMS in managing data. It discusses the differences between file systems and databases, types of databases, DBMS architecture, and the responsibilities of a Database Administrator (DBA). Additionally, it covers data models, DBMS languages, entities, attributes, relationships, and relational algebra, outlining the evolution of DBMS from the 1950s to the present.

Uploaded by

jkiran24188
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views31 pages

DBMS 1

The document provides an overview of Database Management Systems (DBMS), explaining key concepts such as data, databases, and the role of DBMS in managing data. It discusses the differences between file systems and databases, types of databases, DBMS architecture, and the responsibilities of a Database Administrator (DBA). Additionally, it covers data models, DBMS languages, entities, attributes, relationships, and relational algebra, outlining the evolution of DBMS from the 1950s to the present.

Uploaded by

jkiran24188
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

DATABASE MANAGEMENT SYSTEMS

INTRODUCTION:
Data: Facts that can be recorded.
Database: Collection of related data.
Information: Data processed meaningfully.
Database Management System: Used to access database by the user.
User DBMS Database
DBMS handles storing and retrieving of large amount of data.
Applications:
 Airlines
 Business
 E-commerce
 Universities
 Banking
 Customer Relationship Management
 Human Resources Management
 Telecommunications
 Supply Chain Management
 Content Management Systems
DBMS is a collection of tables (collection of rows and columns)
Rows: Information of a person
Columns: Attributes

File Systems
A file system is a method used by operating systems and software to organize
and store data on storage devices, such as hard drives, solid-state drives, and
removable storage media. It provides a structured way to manage files,
directories (or folders), and their relationships .

File Systems vs Database:


 File systems are unstructured whereas database is structured.
 Dealing data is much easier in database.
 Location of data has to be specified in file systems but not in database.
 Storage management is efficient in database.
 Flexibility of accessing data in database. In file systems, every query is
accessed by a program. Database accesses it using a single query.
 Data redundancy or data duplicity is possible in file systems and also
databases but it can be minimised using normalisation techniques.
 File systems are more prone to data inconsistency.
 Concurrent access of files is not possible but databases is possible through
Concurrency protocols.
 Integrity constraints can be added to databases.
 Databases are more secure than file systems.
 Backing up of data is easy in databases.

Types of databases
1. Centralised database: One database used by multiple users
2. Distributed database: Multiple databases used by multiple users
3. Personal database: Single database used by single user
4. End user database: Used in applications
5. Commercial database: Accessed after paying
6. Relational database: Data is stored in the form of tables. Rows are also
called tables or records and columns are also called attributes.
7. Cloud database: Database is accessed from cloud.
8. Object oriented database: Object oriented programming concepts
applied on relational databases
9. Graph database: Databases represented as nodes (entities) connected by
edges (relationships)

DBMS Architecture
DBMS Architecture can be classified into 3 types:
1. 1 – Tier Architecture – Like personal database. Accessed and created by
the same user. A single file server accessed by multiple users.
2. 2 – Tier Architecture – It includes client – server method. It is similar to
Centralised Database. A single database server accessed by multiple users.
It is faster than 1 – Tier structure.
3. 3 – Tier Architecture – It is the 3 level schema structure. Internal,
Conceptual and external level. This 3 tier structure is a web based
applications. In this there are some application servers, called the business
logic tier, that interact between the DB server and the clients. DB tier is
independent of the business logic tier and the user(Presentation) tier.
User Interface (UI)/ Database users: The user interface is the component
through which users interact with the database system. It can include command-
line interfaces, graphical user interfaces (GUIs), or web-based interfaces. Users
interact with the DBMS to submit queries, update records, and perform other
database operations.
Query Processor: The query processor is responsible for interpreting and
optimizing SQL queries submitted by users. It includes a parser to analyze the
syntax of the query, an optimizer to generate an efficient query execution plan,
and an execution engine to execute the plan and retrieve the requested data.
Storage Manager: The storage manager is responsible for managing the physical
storage of data on disk or other storage media. It interacts with the file system
to allocate space for data, manages data storage structures (e.g., pages, blocks),
and handles tasks such as buffering and caching to optimize data retrieval.
Database Communication Interfaces: These interfaces allow applications and
users to communicate with the database system. Common communication
interfaces include ODBC (Open Database Connectivity), JDBC (Java Database
Connectivity), and various APIs (Application Programming Interfaces).

Database Administrator (DBA)


A Database Administrator (DBA) is a professional responsible for managing and
maintaining an organization's database systems. The role of a Database
Administrator is critical in ensuring the availability, security, and performance of
databases. Here are some key responsibilities and tasks associated with the role
of a Database Administrator:
Database Design and Planning: Collaborate with developers and system
architects to design and plan the structure of databases based on the
requirements of applications and business processes.
Installation and Configuration: Install and configure database management
systems (DBMS) software. This involves setting up the necessary parameters,
tuning configurations, and ensuring compatibility with the underlying
infrastructure.
Security Management: Implement security measures to safeguard the
confidentiality, integrity, and availability of data. This includes managing user
access, roles, permissions, and encryption.
Backup and Recovery: Develop and implement backup and recovery strategies
to protect against data loss and facilitate the restoration of databases in the
event of hardware failures, system crashes, or other disasters.
Performance Monitoring and Tuning: Monitor the performance of database
systems, identifying and resolving issues that may impact performance.
Implement tuning strategies to optimize queries, indexing, and overall system
efficiency.
Database Maintenance: Conduct routine maintenance tasks, such as updating
statistics, rebuilding indexes, and optimizing storage allocation to ensure the
ongoing health and efficiency of the database.
Database Upgrades and Patching: Plan and execute upgrades to the database
management system, applying patches and updates to enhance security,
stability, and functionality.
Data Modelling: Work with data modelers and application developers to create
and maintain database schemas and data models. Ensure that databases align
with organizational standards and best practices.
Capacity Planning: Evaluate and plan for the future growth of databases by
estimating storage requirements, monitoring usage trends, and making
recommendations for hardware and infrastructure upgrades.
Documentation: Maintain comprehensive documentation related to database
configurations, schema designs, security policies, and procedures. This
documentation serves as a reference for troubleshooting and future planning.

History of DBMS
1950s – Beginning of the structuring, storing and organising the data.
1960s – Development of IMS (Information Management Systems) by IBM and
CODASYL Model by CODASYL (Conference on Data Systems Languages)
1970s – Introduction to Relational models by Edgar F. Codd
Mid 1970s and Early 1980s – IBM developed R, the first RDBMS model and Oracle
Corporation was formed, the first commercial RDBMS formed oracle database.
Mid 1980s and Early 1990s – This era saw the need of standardisation and
commercialisation of DB. So, SQL was developed. DB2, Microsoft SQL server,
Sybase were developed.
1990s – Object oriented databases were formed.
2000s – Development of open source DBs
2010s – Big Data technologies and Cloud technologies were introduced.

Relational database management systems (RDBMS)


Gives relations between two entities in the form of tables.
Rows are called tuples. Note that no 2 tuples can be identical. Columns are called
attributes, where each attribute should be given a unique name. Data item or
cell data is the intersection of a tuple and an attribute. Data items must have
atomic (single) values.
Cardinality: Number of tuples in a database.
Degree: Number of attributes in a database.
Domain: Possible values of an attribute.
Null values: An empty data item

DBMS VS RDBMS:
 Data is stored in Hierarchical or navigational form in DBMS and tabular form
in RDBMS
 No relationship is established between data in DBMS.
 Normalisation is not supported by DBMS.
 DBMS doesn’t support distributed database.
 DBMS cannot handle relatively larger data.
 RDBMS supports multiple users.
 Examples of DBMS: File systems, XML, etc.
 Examples of RDBMS: Oracle, SQL, MySQL, SQLite, etc.
SQL: SQL stands for Structured Query Language. This helps in writing queries to
access and manage the data in the database.

Schema and Instances


Schema: Structure of any database is called a schema.
There are 3 levels of schema. They are:
1. Physical Level or Internal Level
2. Logical Level or Conceptual Level
3. View Level or External Level
Physical schema: Deals with the physical data on how it is stored.
Logical schema: Operating on physical data.
View schema: End user interaction with the database.
Instance: View of database at a particular instance of time.

Data Models
Data models are the underlying structures of DBMS that follow the 3 – tier
structure of schemas.
Types of data models:
1. Relational model
2. Entity Relationship model
3. Object based data model
4. Semi structured data model
5. Network Data Models
6. Hierarchical Data Models
Relational model
Collection of tables.
Tabular form of storing data.
Multiple columns with unique names.
Present DBs are relational models.
It is a record – based model.
Entity – Relationship model: E – R Model
Object based data model
Object – Relational Model: Relational model combined with Object oriented
features or programming.
Object – based Model: E-R Model combined with OOP concepts.
Semi Structured data model
Individual data items having different set of attributes.
Extensible Markup Language (XML) supports semi structured data model.

DBMS Languages
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language (DCL)
4. Transaction Control Language (TCL)

DDL
Structure of data is defined here.
1. CREATE – Creates a new table.
Syntax: CREATE TABLE TABLE_NAME (ATTRIBUTE1 DATATYPE1,….);
2. ALTER – Alters the structure of database (includes add column, delete
column, modify datatype of existing column, set or remove constraints)
a. Add column:
ALTER TABLE TABLE_NAME
ADD COL_NAME DATATYPE;
b. Drop column:
ALTER TABLE TABLE_NAME
DROP COLUMN COL_NAME;
c. Modify datatype of existing column:
ALTER TABLE TABLE_NAME
MODIFY COLUMN COL_NAME DATATYPE;
d. Set constraints
NOT NULL:
ALTER TABLE TABLE_NAME
MODIFY COL_NAME DATATYPE NOT NULL;
UNIQUE:
ALTER TABLE TABLE_NAME
ADD CONSTRAINT UNIQUE(COL_NAME);
PRIMARY KEY:
ALTER TABLE TABLE_NAME
ADD CONSTRAINT PRIMARY KEY(COL_NAME);
CHECK:
ALTER TABLE TABLE_NAME
ADD CONSTRAINT CHECK(CONDITION);
e. Remove constraints
UNIQUE:
ALTER TABLE TABLE_NAME
DROP INDEX COL_NAME;
REMAINING CONSTRAINTS:
ALTER TABLE TABLE_NAME
DROP CONSTRAINT_NAME;
3. DROP – Deletes table
Syntax: DROP TABLE TABLE_NAME;
4. TRUNCATE – Removes all the records (rows) of the table
Syntax: TRUNCATE TABLE TABLE_NAME;
DML
Helps in manipulating the data.
1. SELECT – Retrieves data from the database
Syntax:
SELECT (DISTINCT) COL1,COL2,… FROM TABLE_NAME
WHERE CONDITION / NOT CONDITION
ORDER BY COL_NAME ASC/DESC LIMIT NUMBER;
ORDER BY – orders in ascending or descending order
DISTINCT – selects uniquely
ASC/DESC – Ascending/descending order
LIMIT – Limits to the NUMBER of rows
CONDITIONS can include:
Relational operators (<,<=,>,>=, =, <> or !=)
LIKE – search for pattern – ‘%AA%’ (searches for AA anywhere in the
column)
IN – search in multiple values
BETWEEN VAL1 AND VAL2 – search within a range
2. INSERT – Inserts new data into the database
Syntax: INSERT INTO TABLE_NAME VALUES (VAL_A1,VAL_A2,…),
(VAL_B1,VAL_B2,..);
3. DELETE – Deletes a single/ multiple/ all the rows from a database
Syntax:
DELETE FROM TABLE_NAME
WHERE CONDITION;
WHERE - Optional
4. UPDATE – Updates data without disturbing the actual structure
Syntax:
UPDATE TABLE_NAME
SET COL1 = VAL1, COL2 = VAL2
WHERE CONDITION;

DCL
Giving access to the users for accessing the data.
1. GRANT – Permissions given to the user to access data.
2. REVOKE – Taking back the permissions given to the user.

TCL
1. COMMIT – Saves the changes to database
Syntax: COMMIT;
2. ROLLBACK – Undo the recent transaction in the database
Syntax: ROLLBACK;

Entities, Attributes and Relationships


Entity: Real world object having some specific characteristics.
Entities are of two types:
1. Tangible entity – one that exists physically (Ex: Persons)
2. Intangible entity – one that exists logically (Ex: Bank Accounts)
Entity set: A group of entities having same attributes.
Entity type: Representing entity set with a name.
Entity type has 2 categories:
1. Strong entity – An entity with key attribute, that helps to identify the
records uniquely. (Ex: Roll number of a student)
2. Weak entity – An entity without key attribute is called a weak entity. (Ex:
Student Class)
Attributes: Properties/ characteristics of an entity.
Types of attributes:
1. Composite Attributes – An attribute that can be divided into further
simpler attributes. (Ex: Name – First_Name, Middle_Name, Last_Name)
2. Simple / Atomic Attributes – An attribute that cannot be divided into
further simpler attributes. (Ex: Phone number)
3. Key Attribute – An attribute that allows to identify records uniquely.
4. Single Valued Attribute – An attribute that will always hold a single value.
(Ex: Age, Gender, Roll number, etc.)
5. Multi Valued Attribute – An attribute that can hold multiple values. (Ex:
Phone number, Mail IDs, Skill set, etc.)
6. Derived Attribute – An attribute that can be derived from another
attribute. (Ex: Age from DOB, Percentage from Marks, etc.)
Relationships: Relations between two or more tables or entities.
Types of relationships:
1. One – One relationships: One record in table 1 is related to exactly one
record in table 2 and vice versa.
2. One – Many relationships: One record in table 1 is related to multiple
records in table 2.
3. Many – Many relationships: Multiple records in table 1 is related to exactly
multiple records in table 2 and vice versa.

ER Models
ER model is the pictorial representation of a database showing entities and
relationships between them.
Components of an ER Model:
1. Entities
2. Attributes
3. Relationship
Entities ( ):
Strong entity –
Weak entity –
Attributes ( ):
Key Attribute -
Single Valued Attribute –
Multi Valued Attribute –
Derived Attribute –
Relationships ( )
Strong entities –

Weak entities –
Keys
Keys: Key is an attribute or set of attributes that identifies a record uniquely.
Types of keys:
1. Super key
2. Candidate key
3. Primary key
4. Foreign key
5. Alternate key
6. Unique key
7. Composite key
Super key: Identifies each row uniquely. Employee_ID, Employee_Aadhar, etc.
It is the superset of all the keys from which we can derive the records.
n
Maximum number of super keys having n attributes = 2 - 1
Combinations of above attributes also act as super keys.
Candidate key: Minimal set of a super key. Single attributes in the super key.
Super key without redundant data. Candidate key can never be NULL and must
be always unique.
Primary key: Any key from candidate key can be considered as primary key.
Alternate key: Any candidate key other than the primary key.
Unique key: Primary key cannot have null values but unique key can have. Ex:
Employee_Phone, Employee_Mail, etc.
Foreign key: A key that relates records of one table with the other.

Relational Algebra
Relational algebra is a formal system used to manipulate and query relations
(tables) in the context of a relational database management system (RDBMS). It
provides a set of operations that can be performed on relations to retrieve or
transform data. These operations form the basis for constructing queries in
relational databases. There are 4 types of operations in Relational algebra:
1. Unary operations
2. Set theory operations
3. Binary operations
4. Extended or derived operations
Unary operations:
1. Selection (σ): Retrieves data that satisfies the given condition.
Syntax:
SELECT column1, column2, ...
FROM table
WHERE condition;
2. Projection (π): Retrieves specific columns
Syntax:
SELECT column1, column2, ...
FROM table;
3. Rename(ρ): Renames attributes or relations
Syntax:
SELECT column1 AS new_name1, column2 AS new_name2, ...
FROM table;
Set theory operations:
1. Union:
Syntax:
SELECT column1, column2, ...
FROM table1
UNION
SELECT column1, column2, ...
FROM table2;
2. Intersection
Syntax:
SELECT column1, column2, ...
FROM table1
INTERSECT
SELECT column1, column2, ...
FROM table2;
3. Minus or Set Difference
Syntax:
SELECT column1, column2, ...
FROM table1
EXCEPT/MINUS
SELECT column1, column2, ...
FROM table2;
4. Cross Product - The Cartesian product operation combines every row from
one relation with every row from another relation, resulting in a new
relation. It is denoted by the symbol ×.
Syntax:
SELECT table1.column1, table1.column2,... , table2.column1,
table2.column2, ...
FROM table1
CROSS JOIN table2;
Binary Operations:
1. Join (⨝): The join operation combines rows from two or more relations
based on a related condition. There are 2 types of joins: Equi join and non
– equi join. Equi join is again divided into inner and outer join.
a. Self Join - A self join is a specific case of a JOIN operation where a table
is joined with itself. In other words, a self join involves combining rows
from a single table based on a related column within that same table.
Syntax:
SELECT t1.column1, t1.column2, ..., t2.column1, t2.column2, ...
FROM table t1
JOIN table t2 ON t1.related_column = t2.related_column;
b. Inner Join - An INNER JOIN is a type of JOIN operation in relational
databases that returns only the rows where there is a match in both
tables. The result set includes only the rows that satisfy the specified
join condition.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2 ON table1.column = table2.column;
c. Outer Join
i. Left outer join – Contains all the tuples on the left and matched
tuples from the right table. Empty data items are filled NULL
Syntax:
SELECT columns
FROM table1
LEFT JOIN table2 ON table1.column = table2.column;
ii. Right outer join – Contains all the tuples on the right and matched
tuples from the left table. Empty data items are filled NULL
Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2 ON table1.column = table2.column;
iii. Full outer join – Contains all the tuples of both tables and fills empty
data items are filled NULL.
Syntax:
SELECT columns
FROM table1
FULL JOIN table2 ON table1.column = table2.column;
2. Division(/): The division operation is used to find tuples in one relation for
which there are matching tuples in another relation for every tuple in a
third relation.
Extended or Derived operations:
These are the aggregate operations used in querying:
1. COUNT() - counts number of values in a column.
Syntax: SELECT COUNT(COL_NAME) FROM TABLE_NAME;
SELECT COUNT(DISTINCT COL_NAME) FROM TABLE_NAME;
2. SUM() - operates on numerical data. Gives the sum of values in a column.
Syntax: SELECT SUM(COL_NAME) FROM TABLE_NAME;
3. MIN() - Gives minimum of values in a column.
Syntax: SELECT MIN(COL_NAME) FROM TABLE_NAME;
4. MAX() - Gives maximum of values in a column
Syntax: SELECT MAX(COL_NAME) FROM TABLE_NAME;
5. AVG() - Gives average of values in a column
Syntax: SELECT AVG(COL_NAME) FROM TABLE_NAME;
6. GROUP BY() – Groups rows based on column
Syntax: SELECT * FROM TABLE_NAME GROUP BY COL1, COL2,…;
7. HAVING – conditions based on group by values
Syntax: SELECT * FROM TABLE_NAME GROUP BY COL1, COL2,… HAVING
CONDITION;

Datatypes in SQL
Specifies the type of data.
1. Numeric
a. integers/number
b. float
2. Strings
a. char
b. varchar
c. varchar2
3. Date and time
a. date
b. time
c. datetime
Integrity Constraints
Integrity constraints in a relational database are rules or conditions that are
applied to maintain the accuracy, consistency, and reliability of the data. These
constraints ensure that the data in the database remains valid and adheres to
specified rules, preventing the occurrence of errors or inconsistencies. There
are several types of integrity constraints:
1. Entity Integrity/ Primary Key Constraint
Syntax: CREATE TABLE TABLE_NAME(COL_NAME PRIMARY KEY,COL2,..);
2. Referential Integrity/ Foreign Key Constraint
Syntax: CREATE TABLE orders (FOREIGN KEY (COL_NAME) REFERENCES
TABLE_NAME(PRIMARY_ID));
3. Domain Integrity
Syntax: CREATE TABLE employees (
employee_id INT PRIMARY KEY,
salary DECIMAL(10, 2) CHECK (salary > 0));
4. Referential Actions: ON DELETE and ON UPDATE Actions: Define the
actions to be taken when a referenced row in the parent table is deleted
or updated.
Syntax: CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ON
DELETE CASCADE);
5. Unique Constraints
Syntax: CREATE TABLE students (
student_id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE);
6. Check Constraints
Syntax: CREATE TABLE employees (
employee_id INT PRIMARY KEY,
salary DECIMAL(10, 2) CHECK (salary > 0));
7. Default Constraints
Syntax: CREATE TABLE example (column1 datatype DEFAULT
default_value, column2 datatype DEFAULT default_value,...);
8. NOT NULL Constraint: Will not allow NULL values.
9. Triggers: Triggers are pieces of procedural code (often in the form of
stored procedures) that are automatically executed (triggered) in
response to specific events, such as INSERT, UPDATE, or DELETE
operations. Triggers can be used to enforce custom business rules and
additional integrity constraints.
Domain Integrity vs Check Constraint: Domain integrity is typically enforced by
defining the data type of a column. Check constraints are specified using the
CHECK clause during table creation or alteration.

Logical database design


Designing a logical database includes the following steps:
1. Defining Entities and Attributes
2. Defining Relationships
3. Normalising the data model
4. Define primary and foreign key
5. Apply Integrity Constraints
6. Refine data model

Views
Virtual tables created from a single or multiple existing tables. Changes made in
tables or views reflect in each other.
Syntax:
CREATE VIEW VIEW_NAME AS
SELECT COL1, COL2,… FROM TABLE_NAME1, TABLE_NAME2
WHERE CONDITION;

Anomalies
Anomalies are the faults that occur while making modifications in the data due
to poor storing or designing of database.
Types of Anomalies:
1. Insertion Anomaly – When data is insufficient to insert, insertion cannot
be done. This is called insertion anomaly.
2. Deletion Anomaly – When one data has to be deleted, other data is also
deleted. This leads to data inconsistency. This is called deletion anomaly.
3. Updation Anomaly – When there is same data, updating one data item
can affect the remaining due to different values in different records. This
is called updation anomaly.
To overcome anomalies, normalisation is introduced. This is also called lossless
decomposition. Normalisation can also reduce the prone of redundancy
(duplicates) in the database.

Nested Queries
Nested queries have a inner query and an outer query. The result of outer
query depends on the result of inner query. The order of checking queries:
FROM, WHERE, GROUP BY, HAVING, SELECT.

Functional Dependencies
FD: x -> y : y is dependent on x. y is called the dependent and x is called the
determinant. For the same values of x, y should be the same but vice versa need
not be true. Consider 2 tuples t1 and t2, x -> y is said to be a functional
dependency, when the following condition is true. if t1.x = t2.x then t1.y = t2.y.
Types of Functional dependencies:
1. Trivial
2. Non – trivial
3. Multi valued
4. Transitive
Trivial Functional Dependency: If Y is a subset of X, then X is a trivial functional
dependency. Trivial functional dependencies are always valid.
X defines X.
X,Y is a functional dependency if X is a subset of Y.
Non – Trivial Functional Dependency: If Y is dependent on X and X intersection
Y is a null set, this type of functional dependency is called non – trivial functional
dependency.
Multivalued Functional Dependency: If two or more attributes can be
determined by a single attribute, such functional dependencies are called multi
valued functional dependencies. X -> YZ
Transitive dependency: If A determines B and B determines C, then A can also
determine C, this is called transitive dependency.
Armstrong’s axioms or Inference Rules
Reflexivity: A -> A
Transitivity: A-> B and B-> C, then A -> C
Augmentation: Augmentation means addition if A->B, then AX -> BX
Union: If A -> B and A -> C, then A -> BC
Decomposition/splitting: A->BC, then A->B and A -> C
Pseudotransitivity: If X->Y and YZ -> A, then XZ -> A
Composition: If X-> Y and A -> B, then XA -> YB
Attribute closure or closure set: If A, B, C, D and E are attributes of the same
table and some functional dependencies are given for the attributes, then the
Attribute closure of each attribute, say A is given by the attributes that can be
derived using A.
Prime Attributes: Each element in the candidate keys are called Prime attributes
and the remaining are called non – prime attributes.
For example, If A,B,C are attributes in a relation, {A,B} is the candidate key set,
then A, B are prime attributes and C is a non – prime attribute.
The attribute closure of a super key contains all the attributes of a relation.

Decomposition
Decomposition in the context of databases refers to the process of breaking
down a relation (table) into smaller, more manageable relations. This is
typically done during the normalization process to achieve a higher level of
data integrity, reduce redundancy, and ensure that the database schema
adheres to certain normal forms. There are two types of decomposition. They
are:
1. Lossless decomposition
2. Dependency – preserving decomposition
Lossless decomposition
Tables decomposed into simpler tables, after reconstructing gives the same table
as the decomposed one. Such decomposition is called lossless decomposition.
No information is lost in lossless decomposition.
Example:
Consider a relation R(A, B, C, D) with a set of functional dependencies {A → B, C
→ D}. A lossless-join decomposition might result in two relations:
R1(A, B)
R2(C, D)
These two relations can be joined (natural join) on the common attribute(s) to
obtain the original relation R(A, B, C, D).
Properties:
1. Union compatibility - The union of the decomposed relations should
include all the attributes of the original relation.
2. Natural join results in the original relation.
3. No extra tuples are formed during union
4. No loss of information
5. Reconstruction yields original tuples
Dependency-Preserving Decomposition:
Dependency-preserving decomposition ensures that the functional
dependencies that held in the original relation also hold in the decomposed
relations. This is crucial for maintaining the semantics of the data and the
relationships between attributes.
Example:
Consider a relation R(A, B, C, D) with a set of functional dependencies {A → B, C
→ D}. A dependency-preserving decomposition might result in two relations:
R1(A, B)
R2(C, D)
In this case, both R1 and R2 must inherit the functional dependencies from the
original relation. Therefore, R1 must have the dependency A → B, and R2 must
have the dependency C → D.
Properties:
1. Functional dependencies are preserved
2. Reconstruction do not affect dependencies
3. Additional dependencies are avoided
4. Natural join results in the same functional dependencies.
Problems with decomposition:
1. Loss of information
2. Redundancy
3. Complex queries
4. Functional dependencies may violate
5. Difficulty in maintaining consistency
6. Join overhead – joining decomposed tables may impact the performance
of queries.
7. Complex Schema structure
8. Complexity in working with DBs
9. Aggregation complexity

Normalisation
Normalization is a database design technique used to organize data in a
relational database efficiently.
Need of Normalisation:
1. Elimination of data redundancy
2. Minimisation of Anomalies
3. Simplifying queries
4. Optimising storage
5. Maintenance and update flexibility
The above needs are all the problem faced due to redundancy.
There are 5 types of normal forms:
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce – Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
First Normal Form (1NF)
1. Convert all multivalued attributes to atomic attributes. If there is a tuple
in which there are 2 mobile numbers in the same cell, then it is written as
2 tuples.
2. A column should have values of same type.
3. Every attribute is named unique.
4. There is no ordering for rows and columns.
5. No duplicate rows or tuples.
6. Every relation must have a primary key.
If all the above rules are obeyed, it is said to be in the First Normal Form (1NF).
An ER model converted to Relational model, is by default in 1NF.

Second Normal Form (2NF):


1. To convert into 2NF, the relation must be in 1NF.
2. There is no partial dependency in the relation.
Partial dependency: Proper subsets of a candidate key is a non – prime attribute.
If XY->Z, and any one of X->Z or Y->Z is true, then the FD is said to be partial.
Third Normal Form (3NF):
1. It must be in 2NF
2. No transitive dependencies for non – prime attributes or if X->Y, then X
should be a primary key and Y should be a prime attribute.

Boyce Codd Normal Form (BCNF):


1. Relation should be in 3NF
2. For all FDs, X-> Y, X must be either a super key or a candidate key.
Fourth Normal Form (4NF):
If for every Multivalued dependencies,
1. X, Y, Y is subset of X or XY = R
2. X is a super key

PL/SQL Statements
PL/SQL is a procedural extension of SQL and oracle relational database. It is used
in oracle, times ten in database and IBM database. It offers:
1. Extensive error checking
2. Numerous datatypes
3. Variety of programming structures
It supports structural programming through functions, POP (procedural oriented
programming) and OOP and used in developing web pages and server pages.
Block structure:
BLOCK HEADER: Defines type of block (procedure/function)
DECLARE
DECLARATION SECTION: Declares variables
EXECUTION SECTION: Actual program
BEGIN
EXCEPTION
EXCEPTION SECTION: Handles runtime errors
END;
There are 2 types of blocks in PL/SQL: Anonymous block and named block
Both the types have declaration, execution and exception section.
Execution section must start with BEGIN and end with END; This is the only
mandatory block in PL/SQL. It supports all DML commands and supports DDL
commands using native dynamic SQL (NDS) or DBMS_SQL built-in package.
Features:
1. PL/SQL is case insensitive.
2. Uses datatypes same as in SQL.
3. Comments:
/*………*/ - multiline
-- - single line
4. Assignment - :=
5. End of line - ;
6. dbms_output.put_line() – prints the data inside the parentheses.
7. / after END; represents end of block.
Variables: Variables in PL/SQL must have a valid name, a datatype and a valid
size. It must be declared prior to its use.
SELECT INTO: SELECT salary INTO e_sal FROM Employee
WHERE id = 5098;
Loads value of salary where id is 5098 into e_sal from Employee table
Anchored datatype (%type): Unlike variables, they do not depend on object.
Syntax: variable_name typed_attribute %type
Typed_attribute can be an already used PL/SQL variable or a column of a table.
%type is the direct reference to the object. Anchored datatypes remove the
constraint that the column name and the variable name must be in the same
datatype.
Declaring constants: Constants shall be declared in the declaration section.
Syntax: variable CONSTANT datatype (datawidth/size) := value;
Ex: pi CONSTANT NUMBER(7,6) := 3.145634
7 is the size before decimal point and 6 after the decimal.
Instead of assignment operator, we can write default also.
Bind variable: Variable declared in SQL, used in PL/SQL.
Syntax: variable var_name datatype(datawidth);
Initialisation: exec :var_name := value; (or)
BEGIN
:var_name = value;
END;
CONTROL STATEMENTS
2 types:
1. IF statements
2. CASE statements
IF statements:
1. IF THEN
Syntax:
IF condition THEN
Statements;
END IF;
2. IF THEN ELSE
Syntax:
IF condition THEN
Statements;
ELSE
Statements;
END IF;
3. IF THEN ELSIF
Syntax:
IF condition1 THEN
Statements;
ELSIF condition2 THEN
Statements;
…………..
ELSIF condition_n THEN
Statements;
ELSE
Statements;
END IF;
CASE statements:
1. Simple case
Syntax:
CASE expression
WHEN condition1 AND condition2 THEN result1
ELSE result2
END
2. Searched case
Syntax:
CASE WHEN condition1 THEN result1
WHEN condition2 THEN result2
WHEN condition3 THEN result3
...
ELSE resultN
END
LOOPS
4 types:
1. Simple loop
Syntax:
LOOP
Statements;
IF condition THEN
EXIT;
END IF; or
EXIT WHEN condition;
END LOOP;
2. While loop
Syntax:
WHILE condition LOOP
Statements;
END LOOP;
3. Numeric for loop
Syntax:
FOR counter_var IN [REVERSE] lower_limit.. upper_limit LOOP
Statements;
END LOOP;
4. Cursor for loop

Triggers
Triggers are programs or procedures, which is automatically executed when
some event occurs. Triggers are written for:
1. Gain Strong Control over Security
2. Enforcing Referential Integrity
3. Event logging and storing information on table access
4. Auditing
5. Synchronous replication of tables
6. Preventing invalid transactions
Syntax:
CREATE [OR REPLACE] TRIGGER TRIGGER_NAME
BEFORE/AFTER
INSERT/UPDATE/DELETE
ON TABLE_NAME
FOR EACH ROW
DECLARE
Declaration statements
BEGIN
Executable statements
END;
BEFORE/AFTER – specifies when to implement a trigger.
INSERT/UPDATE/DELETE – specifies before or after which operation, trigger is
implemented.
FOR EACH ROW – implements trigger for every row, otherwise performs for just
a single row and leaves remaining unaltered.

TRANSACTIONS
Transaction: Transaction is a set of operations used to perform a logical unit of
work. It generally represents the changes in the database. Operations in
transactions:
1. Read
2. Write
Consider a transaction where A has 1000 in the account and B has 2000. Now, A
wants to transfer 500 from his account to B’s. After transfer, account balances
should be 500 and 2500 respectively. The set of operations would be as follows:
Reducing amount from A:
1. Read (A) : A = 1000
2. A = A – 500 : A = 500
3. Write (A) : A = 500
Transferring to B i.e., adding amount from A to B:
4. Read (B) : B = 2000
5. B = B + 500 : B = 2500
6. Write (B) : B = 2500
All the above operations happened in RAM. To save these changes permanently:
7. COMMIT;
Finally, transfer of amount of 500 from A to B is successful.
PROPERTIES OF A TRANSACTION
A transaction must follow 4 properties, simply called ACID properties.
1. Atomicity
2. Consistency
3. Isolation
4. Durability
Atomicity: Atomicity is either all operations will be executed or none of them
executes. It means the whole transaction is considered as one and executed.
Wherever the instruction fails to execute, the ROLLBACK command is
automatically executed, so as to unmake the changes. Transaction is successful
when changes are committed in the database.
Consistency: In case of amount transfers, the sum of amounts before and after
the transactions should be the same. Like, in the above case, before and after
transaction, sum of amounts of A and B is 3000.
Isolation: Multiple transactions occurring without interfering with each other.
Durability: A transaction made is permanent after committed.
STATES OF A TRANSACTION

CPU is much faster than the hard disk. Accessing data from hard disk maybe a
slower process and this can make CPU and hard disk incompatible with each
other. Hence, RAM operates between CPU and hard disk.
Active state: When a transaction begins, it is said to be in active state.
Partially committed state: When all the instructions of a transaction executed
successfully but are not committed yet, the state is said to be partially
committed. Changes are made in RAM only but not in the database yet.
Committed state: The changes are made in the database.
Failed state: If there is any error in the transaction, it is said to be in failed state.
Abort state: In this state, a transaction is killed or restarted.
Termination state: Resources are deallocated.
When a transaction is started, it will be in active state and resources are allocated
to it. Until the (n – 1)th instruction, if there is no error, the transaction is said to
be in partially committed state and if there is an error either at the beginning or
in the middle of the transaction, it goes into failed state. If the partially
committed state is successful, then it is committed or if there is an error in
committing, it goes to failed state. If the transaction is in failed state, it is aborted,
i.e., killed or restarted. In both the cases, the transaction is terminated, i.e., the
resources are deallocated.

SCHEDULE
Schedule: The chronological order of execution of multiple transactions is called
a schedule. There are two types of schedules. They are:
1. Serial schedule
2. Parallel Schedule or Non – serial schedule
a. Serializable
i. Conflict Serializable
ii. View Serializable
b. Non – Serializable
i. Recoverable
1. Cascading
2. Cascadless
3. Strict
ii. Non – Recoverable

Serial schedule: After one transaction is completed, the other is started. The
advantage of serial schedules is that there will be no data inconsistency and is
more secure, but the waiting period of other transactions will be higher.
Parallel schedule: Multiple transactions are scheduled simultaneously. The
advantage is that there is higher throughput but there is a risk of data
inconsistency. Throughput is the number of transactions executed per unit time.
Non – Recoverable Schedule: The schedules that cannot be recovered from
failed state are called non – recoverable schedules. For example, if multiple
transactions are executing and one of the transactions commits the changes but
the next steps of the other transaction fails and rollback, the transaction is called
non – recoverable schedule.
Recoverable Schedule: The schedules that can be recovered from failed state
are called recoverable schedules.
Cascading Schedule:
Also called Avoids cascading aborts/rollbacks (ACA). When there is a failure in
one transaction and this leads to the rolling back or aborting other dependent
transactions, then such scheduling is referred to as Cascading rollback or
cascading abort.
Cascadeless Schedule:
Schedules in which transactions read values only after all transactions whose
changes they are going to read commit are called cascadeless schedules. Avoids
that a single transaction abort leads to a series of transaction rollbacks.
Strict Schedule:
A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti
precedes a conflicting operation of Tj (either read or write), then the commit or
abort event of Ti also precedes that conflicting operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after Ti
commits/aborts.
Conflict Serializable:
Two or more instructions in transactions which are causing the transactions to
abort are said to be in conflict. Consider 2 transactions, there can be 4 possible
actions on the same account:
1. Read – Read
2. Read – Write
3. Write – Read
4. Write – Write
In the 1st case, there is no conflict, since both are reading both will get same data.
Read – Write Problem or Unrepeatable Read Problem:
When a transaction first reads and the other transaction writes, data is updated
when the 1st transaction reads again. It means same transaction reads two
different values and this goes to failed state and aborted/rollback.
Write – Read Problem or Dirty Read Problem:
If a transaction writes the data and the other transaction reads it at the same
time, the data is updated and if the next instructions of 1st transaction go in failed
state, the data read in the 2nd transaction initially is wrong and data becomes
inconsistent.
If the same happens on two different accounts, they are non – conflicting.
Definition: A schedule is called conflict serializable if it can be transformed into
a serial schedule by swapping non-conflicting operations. Two operations are
said to be conflicting if all conditions satisfy:
1. They belong to different transactions
2. They operate on the same data item
3. At Least one of them is a write operation
To make them serializable, non – conflicting transactions are swapped in their
positions but in the same transaction. For example, if R(A) is in T1 and the 7th
instruction and W(B) is in T2 8th instruction, then R(A) is swapped to 8th
instruction in T1 and W(B) to 7th instruction in T2.
View Serializable:
A Schedule is called view serializable if it is view equal to a serial schedule (no
overlapping transactions). A conflict schedule is a view serializable but if the
serializability contains blind writes, then the view serializable does not conflict
serializable.
Problems with concurrency:
1. Dirty read: A transaction is going on and the other is reading it
concurrently, but the first transaction is failed, the read data in the other
transaction is useless and becomes inconsistent.
2. Incorrect summary: Data can be inconsistent.
3. Lost update: If a value is updated in one transaction, it is not reflected in
the other transaction.
4. Unrepeatable read: In the same transaction, before and after updating,
read values will be different and this can be inconsistent.
5. Phantom read: A data deleted in one transaction, when read in the other
transaction, gives an error.

CONCURRENCY CONTROL PROTOCOLS


Concurrency control protocols, as the name suggests, control the flow of
transactions that occur concurrently. There are four types of concurrency control
protocols:
1. Lock based protocols
2. 2 – phase locking protocol
3. Timestamp – based protocols
4. Validation – based protocols
LOCK BASED PROTOCOLS
A lock is a data variable which is associated with a data item. This lock signifies
that operations that can be performed on the data item. Locks help synchronize
access to the database items by concurrent transactions.
Binary Locks: A Binary lock on a data item can either locked or unlocked states.
Shared lock (S): A transaction using a lock in shared mode is allowed only to read
the data.
Exclusive lock (X): A transaction using a lock in exclusive mode is allowed to read
or write the data.
Locks are requested by the transactions. If granted, locks are applied to the
transactions. According to the compatibility table locks can be granted.

2 transactions reading at the same time gives consistent data, hence, shared lock
can be granted to multiple transactions, whereas remaining can lead to
inconsistency, so 2 locks cannot be given.
Limitations of Lock – based protocol:
1. Cannot be sure if the transaction is serializable.
2. May not be free from irrecoverability, deadlocks and starvation.
2 – PHASE LOCKING PROTOCOL
As the name says, there are 2 phases in this.
1. Growing phase
2. Shrinking phase
Growing phase is where the locks are only attained but not released and
shrinking phase is where locks are only released but not attained.
A transaction first acquires all the locks it needs, after a certain point, locks are
just released and cannot be attained. Shared and exclusive locks are used here
but the only difference is that serializability is guaranteed in 2 – phase locking
protocol. The point where growing phase stops and shrinking phase starts is
called a lock point.
Limitations of 2 – phase locking protocol: May not be free from irrecoverability,
deadlock, starvation and cascading rollback.
Strict 2PL: A 2PL system, where exclusive locks are held until commit or abort.
Rigorous 2PL: A 2PL system, where all shared and exclusive locks are held until
commit or abort.
It prevents cascading roll back and irrecoverability.
Deadlock problem still exists.
TIMESTAMP – BASED PROTOCOL
Every transaction is assigned some unique value that tells the order of its entry.
Lower value indicates that the transaction entered before than the transaction
with higher value.
Read Timestamp (RTS): Latest transaction number which performed read
successfully.
Write Timestamp (WTS): Latest transaction number that performed write
successfully.
Rules of timestamp-based protocol:
1. When read operation is triggered:
If WTS(A) > TS(Ti), rollback Ti.
Else perform read (A) and set RTS(A) = max(RTS(A), TS(Ti)).
2. When write operation is triggered:
If RTS(A) > TS(Ti), rollback Ti.
If WTS(A) > TS(Ti), rollback Ti.
Else perform write(A) and set WTS(A) = TS(Ti).
VALIDATION – BASED PROTOCOL
Validation phase is also known as optimistic concurrency control technique. In
the validation-based protocol, the transaction is executed in the following three
phases:
1. Read phase
2. Validation phase
3. Write phase
Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
Here each phase has the following different timestamps:
Start (Ti): It contains the time when Ti started its execution.
Validation (Ti): It contains the time when Ti finishes its read phase and starts its
validation phase.
Finish (Ti): It contains the time when Ti finishes its write phase.
The serializability is determined during the validation process. It can't be decided
in advance. While executing the transaction, it ensures a greater degree of
concurrency and also less number of conflicts.

You might also like