UNIT-I
Introduction to DBMS (Database Management System)
A Database Management System (DBMS) is a software system designed to
efficiently manage, store, retrieve, and manipulate data in databases. It
provides an interface between the user and the database, ensuring data
integrity, security, and consistency. DBMS allows for the creation,
maintenance, and administration of databases, and it simplifies complex data
management tasks.
Key features of a DBMS include:
• Data Redundancy Control: Avoids duplication of data.
• Data Integrity: Ensures accuracy and consistency.
• Data Security: Protects sensitive data through access controls.
• Concurrent Access: Manages multiple users accessing data
simultaneously.
Data Modeling for a Database
Data modeling is the process of defining and organizing the structure of a
database, focusing on how data elements relate to each other. This conceptual
framework helps in visualizing the data flow and supports in designing a well-
structured database.
The primary goal of data modeling is to create a blueprint for the database that
clearly defines:
• Entities: Objects or concepts that represent real-world things (e.g.,
Employee, Customer).
• Attributes: Properties that describe entities (e.g., Employee Name,
Customer Address).
• Relationships: How entities are related to each other (e.g., An Employee
works for a Department).
Three-Level Architecture of DBMS
The three-level architecture of DBMS defines the way data is viewed and
manipulated in a database. It comprises three levels:
1. Internal Level (Physical Level):
o Describes how data is physically stored in the system (e.g., storage
devices, indexing).
o It focuses on the optimization of data storage and retrieval.
2. Conceptual Level (Logical Level):
o Describes what data is stored and the relationships between those
data.
o It hides the complexities of the internal level from users and
provides a higher level of abstraction.
3. External Level (View Level):
o Describes how users interact with the data (e.g., through user
interfaces or specific queries).
o Users only see what is relevant to them without needing to know
the details of the internal or conceptual levels.
This architecture ensures data independence, meaning that changes at one
level (e.g., physical storage) do not affect other levels (e.g., user access).
Components of a DBMS
A DBMS consists of several components that work together to provide efficient
data management:
1. Database Engine: The core service for accessing and processing the
database.
2. Database Schema: The logical structure that defines the organization of
data.
3. Query Processor: Translates user queries into executable commands.
4. Transaction Manager: Ensures database consistency by handling
transactions (group of operations).
5. Security Manager: Manages authentication, access control, and
encryption.
6. Data Dictionary: Stores metadata about the database, such as schema,
constraints, etc.
7. Backup and Recovery: Ensures the database can be restored to a
previous state in case of failure.
Introduction to Data Models
A data model is a conceptual framework that defines the structure,
organization, and relationships of data within a database. It provides a way to
describe the data and its relationships in a formalized manner. There are
several types of data models used in DBMS:
Hierarchical Model
The Hierarchical Model organizes data in a tree-like structure, where each
record has a single parent and potentially many children. It is simple and fast
for operations that follow the tree structure but is not flexible for representing
complex relationships.
Example:
• Parent → Child
• Department → Employee
Network Model
The Network Model extends the hierarchical model by allowing multiple
relationships between records, thus enabling more complex data relationships.
It uses a graph structure with records connected by pointers, which allows a
record to have multiple parents.
Example:
• Department ↔ Employee (with multiple connections)
Relational Model
The Relational Model organizes data into tables (relations), where each table
contains rows (records) and columns (attributes). This model uses keys
(primary and foreign) to establish relationships between tables. It is the most
popular model due to its simplicity and flexibility.
Example:
• Employee (ID, Name, DepartmentID)
• Department (DepartmentID, DepartmentName)
Comparison of Network, Hierarchical, and Relational Models
Hierarchical
Feature Network Model Relational Model
Model
Structure Tree-like Graph-like Table-based
Data Many-to-many,
One-to-many Many-to-many
Relationships flexible
More flexible than
Flexibility Limited Highly flexible
hierarchical
Complex to Easy to use,
Ease of Use Complex to manage
manage intuitive
Data
High Moderate Low
Redundancy
Entity Relationship Model (ERM)
The Entity Relationship Model (ERM) is a high-level conceptual data model
that describes the entities within a domain and the relationships between
them. It uses:
• Entities: Real-world objects or concepts (e.g., Employee).
• Attributes: Properties of entities (e.g., Name, Age).
• Relationships: Describes how entities are related (e.g., An Employee
works in a Department).
The ER model is often represented through an ER diagram that visually depicts
entities, their attributes, and relationships.
UNIT-II
Relational Database
A relational database is a type of database that stores data in tables (also
called relations). Each table consists of rows (records) and columns
(attributes). Tables in a relational database are linked using keys, and
relationships between different tables are established through foreign keys.
This model is based on the relational model of data, introduced by Edgar
Codd.
Key characteristics of relational databases:
• Data Integrity: Ensures data consistency and accuracy.
• Normalization: Reduces redundancy by organizing data into multiple
related tables.
• Flexibility: Supports complex queries and dynamic updates.
Common examples of relational database management systems (RDBMS)
include MySQL, Oracle, Microsoft SQL Server, and PostgreSQL.
Relational Algebra and Calculus
Both relational algebra and relational calculus are formal languages used to
query and manipulate relational databases. They are foundational to SQL
operations and represent mathematical approaches to database queries.
1. Relational Algebra:
o Relational algebra is a procedural query language that uses
operators to perform operations on tables (relations).
o Key operators in relational algebra:
▪ Select (σ): Filters rows based on a condition (similar to the
SQL WHERE clause).
▪ Project (π): Selects specific columns from a table (similar to
the SQL SELECT clause).
▪ Union (∪): Combines the results of two queries with the
same attributes.
▪ Difference (−): Retrieves rows from one table that are not in
another.
▪ Cartesian Product (×): Combines two tables by creating all
possible pairs of rows.
▪ Join (⨝): Combines related rows from two tables based on
a common attribute.
2. Relational Calculus:
o Relational calculus is a declarative query language, focusing on
what to retrieve rather than how to retrieve it.
o There are two types:
▪ Tuple Relational Calculus (TRC): Uses variables that
represent tuples (rows).
▪ Domain Relational Calculus (DRC): Uses variables that
represent individual fields (columns).
o Queries are written in the form of logical expressions rather than
operations.
SQL Fundamentals
SQL (Structured Query Language) is the standard language used for managing
and manipulating relational databases. SQL is used to create, retrieve, update,
and delete data in a database. It is divided into several sublanguages:
1. DDL (Data Definition Language): Defines and manages the structure of
database objects.
o CREATE: Used to create database objects (tables, indexes, etc.).
o ALTER: Modifies the structure of an existing database object.
o DROP: Deletes an existing database object.
o TRUNCATE: Removes all rows from a table but does not remove
the table itself.
2. DML (Data Manipulation Language): Deals with data manipulation and
retrieval.
o SELECT: Retrieves data from a table.
o INSERT: Adds new records to a table.
o UPDATE: Modifies existing records.
o DELETE: Removes records from a table.
3. DCL (Data Control Language): Manages permissions and access control.
o GRANT: Grants specific privileges to users.
o REVOKE: Removes specific privileges from users.
PL/SQL Concepts
PL/SQL (Procedural Language/SQL) is Oracle's extension to SQL that allows for
procedural programming. PL/SQL enables users to write complex logic such as
loops, conditionals, and exception handling, which is not possible in standard
SQL.
Key concepts of PL/SQL:
• Blocks: PL/SQL code is written in anonymous or named blocks. A block
consists of declarations, executable commands, and exception handlers.
• Variables: PL/SQL allows for the use of variables to store temporary
data.
• Control Structures: PL/SQL supports loops, conditional statements (IF-
THEN-ELSE), and other control mechanisms.
Cursors
A cursor is a pointer to a result set of a query. It allows you to process
individual rows of a query result one at a time. Cursors are essential when
dealing with queries that return multiple rows and need to be processed
sequentially.
There are two types of cursors:
1. Implicit Cursor: Automatically created by the DBMS when a query is
executed (e.g., a SELECT INTO statement).
2. Explicit Cursor: Defined by the programmer to handle more complex
operations, giving control over the result set.
Cursors support the following operations:
• OPEN: Executes the query and allocates memory for the cursor.
• FETCH: Retrieves the next row from the cursor's result set.
• CLOSE: Releases the resources allocated to the cursor.
Stored Procedures
A stored procedure is a precompiled collection of SQL statements and logic
that is stored in the database. Stored procedures can be invoked by users or
applications to perform specific tasks, such as inserting data, updating records,
or handling business logic.
Benefits of stored procedures:
• Performance: Since they are precompiled, stored procedures execute
faster than individual SQL queries.
• Security: Stored procedures can limit direct access to underlying tables
and data.
• Reusability: Stored procedures can be reused across applications and
users.
Stored Functions
A stored function is similar to a stored procedure, but it is specifically designed
to return a single value. It is often used for calculations or querying values.
Key differences between stored procedures and stored functions:
• A stored procedure does not return a value (though it can use output
parameters).
• A stored function always returns a value (typically used in expressions,
SELECT queries, etc.).
Stored functions are commonly used to perform operations such as
aggregations or complex calculations, returning the result directly to the calling
program.
Database Triggers
A trigger is a set of SQL statements that are automatically executed or "fired"
when certain events occur in the database. Triggers are associated with a
particular table and can be set to activate before or after INSERT, UPDATE, or
DELETE operations.
Types of triggers:
1. BEFORE Trigger: Fires before the operation (e.g., before inserting a new
record).
2. AFTER Trigger: Fires after the operation (e.g., after deleting a record).
3. INSTEAD OF Trigger: Replaces the operation with a new action
(commonly used for views).
Triggers are useful for enforcing business rules, auditing database changes, and
maintaining data consistency automatically.
UNIT-III
Introduction to Normalization
Normalization is the process of organizing data in a relational database to
reduce redundancy and improve data integrity. The goal of normalization is to
ensure that data is stored in such a way that it can be easily updated,
retrieved, and maintained without anomalies. This process involves dividing
large tables into smaller, more manageable ones and defining relationships
between them. By doing so, normalization eliminates unwanted dependencies
and ensures data consistency.
Normalization typically involves dividing a database into multiple tables based
on certain rules, which are defined as normal forms.
First Normal Form (1NF)
A table is in First Normal Form (1NF) if:
• All columns contain atomic (indivisible) values.
• Each column contains only one value per row (i.e., no repeating groups
or arrays).
• The order in which data is stored does not matter.
Example:
Student_ID Courses
1 Math, Science
Student_ID Courses
2 English, History
This table is not in 1NF because the "Courses" column contains multiple values.
To convert it to 1NF, each course should be stored in a separate row:
Student_ID Course
1 Math
1 Science
2 English
2 History
Second Normal Form (2NF)
A table is in Second Normal Form (2NF) if:
• It is in 1NF.
• There is no partial dependency, i.e., all non-key attributes are fully
functionally dependent on the entire primary key (for tables with
composite keys).
In simple terms, 2NF eliminates partial dependencies, where a non-key column
depends on only part of a composite primary key.
Example: Consider the table:
Student_ID Course Instructor Instructor_Phone
1 Math Mr. Smith 1234567890
1 Science Mrs. Lee 0987654321
2 English Mr. Brown 1122334455
Here, Instructor_Phone depends only on Instructor, not the entire primary key
(Student_ID and Course). To convert it to 2NF, you split the table:
Students Table:
Student_ID Course Instructor
1 Math Mr. Smith
1 Science Mrs. Lee
2 English Mr. Brown
Instructors Table:
Instructor Instructor_Phone
Mr. Smith 1234567890
Mrs. Lee 0987654321
Mr. Brown 1122334455
Third Normal Form (3NF)
A table is in Third Normal Form (3NF) if:
• It is in 2NF.
• There is no transitive dependency, i.e., non-key attributes do not
depend on other non-key attributes.
In other words, each non-key attribute should depend only on the primary key
and not on any other non-key attribute.
Example: Consider the table:
Employee_ID Department Department_Manager Manager_Phone
1 HR Mr. Green 2345678901
2 IT Mrs. Brown 3456789012
Here, Manager_Phone depends on Department_Manager, not directly on the
primary key (Employee_ID). To convert this to 3NF, we remove the transitive
dependency:
Employees Table:
Employee_ID Department Department_Manager
1 HR Mr. Green
2 IT Mrs. Brown
Managers Table:
Department_Manager Manager_Phone
Mr. Green 2345678901
Mrs. Brown 3456789012
Dependency Preservation
Dependency Preservation refers to the property of a database schema where
all functional dependencies can still be enforced in the decomposed tables
without requiring joins. It ensures that after normalization, you can still
enforce the original dependencies directly within individual tables.
Boyce-Codd Normal Form (BCNF)
A table is in Boyce-Codd Normal Form (BCNF) if:
• It is in 3NF.
• For every functional dependency A→BA \rightarrow BA→B, AAA must
be a superkey.
In simpler terms, BCNF eliminates any situation where a non-superkey
attribute determines another attribute.
Example: Consider the table:
Student_ID Course Instructor
1 Math Mr. Smith
2 History Mrs. Lee
If we assume that a student can only enroll in one course (i.e., Student_ID
determines Course), then the table is not in BCNF because Student_ID is not a
superkey for the dependency Student_ID → Course. To bring this table into
BCNF, you would need to decompose the table into two:
Students Table:
Student_ID Instructor
1 Mr. Smith
2 Mrs. Lee
Courses Table:
Course Instructor
Math Mr. Smith
History Mrs. Lee
Multi-valued Dependencies and Fourth Normal Form (4NF)
A table is in Fourth Normal Form (4NF) if:
• It is in BCNF.
• It has no multi-valued dependencies.
A multi-valued dependency occurs when one attribute determines a set of
values for another attribute, and these values do not depend on each other.
Example: Consider the table:
Student_ID Course Language
1 Math English
1 Science French
Here, the Student_ID determines both Course and Language, but these two are
independent of each other. To bring this into 4NF, we split the table into two:
Courses Table:
Student_ID Course
1 Math
1 Science
Languages Table:
Student_ID Language
1 English
1 French
Join Dependencies and Fifth Normal Form (5NF)
A table is in Fifth Normal Form (5NF) if:
• It is in 4NF.
• It does not contain any join dependencies.
A join dependency occurs when a table can be reconstructed by joining
multiple smaller tables, but doing so would cause redundancy in the data. In
5NF, all such join dependencies must be eliminated.
Example: Consider the table:
Supplier_ID Part_ID Project_ID
1 A X
1 B X
2 A Y
In this case, the table can be split into three smaller tables to eliminate
redundancy:
Suppliers Table:
Supplier_ID Part_ID
1 A
1 B
2 A
Parts Table:
Part_ID Project_ID
A X
B X
A Y
Projects Table:
Project_ID Supplier_ID
X 1
X 2
Domain-Key Normal Form (DKNF)
Domain-Key Normal Form (DKNF) is the highest normal form, where:
• The schema is free from all types of anomalies (functional, multivalued,
and join dependencies).
• All constraints are based on the domain of the data and the key
relationships.
In DKNF, the design ensures that there are no exceptions to the rules, and data
integrity is fully guaranteed based on domain and key constraints.
Summary of Normal Forms:
1. 1NF: Eliminates repeating groups; ensures atomicity.
2. 2NF: Removes partial dependencies (in tables with composite keys).
3. 3NF: Removes transitive dependencies.
4. BCNF: Ensures that every functional dependency has a superkey as its
determinant.
5. 4NF: Eliminates multi-valued dependencies.
6. 5NF: Eliminates join dependencies.
7. DKNF: Fully normalized without any exceptions or anomalies.
These normalization forms aim to organize data efficiently, reduce
redundancy, and maintain data integrity. Each successive form builds upon the
previous one to address more complex issues in database design.
UNIT-IV
Database Recovery
Database Recovery refers to the process of restoring a database to a correct
state after a failure, such as a system crash, power failure, or data corruption.
Recovery ensures that no data is lost, and the database remains consistent.
Key methods of database recovery:
• Backup and Restore: Regular backups of the database are made and
stored separately. In case of a failure, the backup is used to restore the
data.
• Transaction Logs: Databases maintain logs of all changes made to the
data. If a crash occurs, the system can use the transaction logs to roll
back or reapply changes to bring the database back to a consistent state.
• Checkpointing: A process that creates a consistent snapshot of the
database. This helps to quickly restore to a known good state.
Concurrency Management
Concurrency Management refers to handling multiple users or applications
accessing the database at the same time, without causing inconsistencies.
When several users modify the database simultaneously, concurrency control
ensures that data remains consistent and avoids conflicts.
Key concepts in concurrency management:
• Locking: The database uses locks to prevent two transactions from
modifying the same data simultaneously. There are different types of
locks (shared, exclusive) to control access to the data.
• Transaction Isolation: This ensures that transactions are executed in
isolation from each other. It avoids problems like dirty reads (reading
uncommitted data) and ensures the database remains consistent.
• Deadlock Prevention: This happens when two transactions are waiting
for each other to release resources. The system needs to detect and
resolve deadlocks by canceling one of the transactions.
Database Security
Database Security involves protecting data from unauthorized access and
ensuring its confidentiality, integrity, and availability. Security measures are
applied to prevent unauthorized users from accessing or tampering with the
data.
Key aspects of database security:
• Authentication: Ensures only authorized users can access the database,
often via usernames and passwords.
• Authorization: Grants users specific rights or privileges, such as read,
write, or modify access to data.
• Encryption: Data is encrypted to ensure it is protected from
unauthorized access when stored or transmitted.
• Auditing: Tracks database activities to detect any suspicious or
unauthorized actions.
Database Integrity
Database Integrity ensures the accuracy and consistency of data in the
database. It involves enforcing rules that maintain the correctness of data over
time.
Types of database integrity:
• Entity Integrity: Ensures that each row in a table is uniquely identifiable.
This is usually done by assigning a primary key to each record.
• Referential Integrity: Ensures that relationships between tables are
consistent. For example, if a record in one table refers to a record in
another table (using foreign keys), the referenced record must exist.
• Domain Integrity: Ensures that the values in a column conform to a
specific data type, range, or format.
Database Control
Database Control refers to the management of database operations to ensure
they run efficiently, securely, and reliably. This includes transaction
management, integrity checks, and user access control.
Key components:
• Transaction Management: Ensures that transactions (a set of database
operations) are executed in a way that maintains data integrity, and
either fully complete (commit) or do not alter the data (rollback) in case
of failure.
• Locking and Isolation: Ensures that multiple transactions can run
simultaneously without conflicting with each other.
Structure of a Distributed Database
A Distributed Database is a collection of data stored across multiple physical
locations but managed as a single database. The data is distributed over
several sites, each of which can access and modify the data independently.
Key features of a distributed database:
• Data Distribution: Data is divided and stored across multiple servers or
locations. It could be partitioned horizontally (rows) or vertically
(columns).
• Replication: Copies of data are maintained across different locations for
availability and fault tolerance.
• Autonomy: Each site in the distributed database system may operate
independently, with its own local database.
Design of Distributed Databases
Designing a Distributed Database involves deciding how to distribute and
manage data across multiple sites. The goal is to ensure that the system is
efficient, fault-tolerant, and able to provide high availability.
Key aspects of distributed database design:
• Fragmentation: Deciding how to divide data into smaller pieces
(fragments) and how to distribute them across different sites.
Fragmentation can be horizontal (data rows) or vertical (data columns).
• Replication: Determining how data should be copied across multiple
sites to improve availability and fault tolerance.
• Location Transparency: Users should not need to know the exact
location of data. The system should handle data retrieval automatically.
• Consistency and Synchronization: Ensuring that all copies of the data
are updated consistently and that users always see the most current
version of the data.
• Concurrency Control: Managing simultaneous access to data by multiple
users, ensuring there are no conflicts when data is modified across
different sites.
Summary
1. Database Recovery: Ensures the database can be restored after failures.
2. Concurrency Management: Manages simultaneous database access by
multiple users.
3. Database Security: Protects data from unauthorized access.
4. Database Integrity: Maintains the accuracy and consistency of data.
5. Database Control: Manages database operations to ensure reliability
and security.
6. Distributed Database: A database distributed across multiple locations.
7. Design of Distributed Databases: Planning how to distribute, replicate,
and manage data efficiently across multiple sites.
These concepts ensure that databases are reliable, secure, and efficient,
whether it's a centralized system or a distributed one across different
locations.
<<<<BAD RISHU>>>>>>