0% found this document useful (0 votes)
39 views119 pages

Unit 1 - Basic Concepts

The document outlines the course structure for a Relational Database Management System (RDBMS) course, detailing course outcomes, a comprehensive syllabus, assessment methods, and the history and applications of databases. It covers key concepts such as database architecture, data models, SQL, normalization, transactions, and recovery systems. Additionally, it discusses the advantages and disadvantages of DBMS, as well as various applications across different industries.

Uploaded by

dracula 0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views119 pages

Unit 1 - Basic Concepts

The document outlines the course structure for a Relational Database Management System (RDBMS) course, detailing course outcomes, a comprehensive syllabus, assessment methods, and the history and applications of databases. It covers key concepts such as database architecture, data models, SQL, normalization, transactions, and recovery systems. Additionally, it discusses the advantages and disadvantages of DBMS, as well as various applications across different industries.

Uploaded by

dracula 0247
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 119

Department of Computer Science and Engineering

Course Name- Relational Database Management System


Course Code – COM-402
Topic- Introduction
Faculty- Ms. Parul Sharma

Model Institute of Engineering & Technology (Autonomous)


Course Outcomes
2

Course Outcomes Description

Identify the basic concepts, architecture and various data models


CO1
used in Database Management Systems (DBMS).
Identify basic database storage structures and access techniques
CO2
such as file organizations, indexing methods.
Design queries using Structured Query Language (SQL) for
CO3
database definition and database manipulation.
Recognize the use of normalization and functional dependencies in
CO4 DBMS.

Implement the concept of transaction, concurrency control and


CO5 recovery in DBMS.
Detailed Syllabus
3

Section-A

Unit 1: Basic Concepts: Introduction, characteristics, history and applications, Components of DBMS,
Advantages and Disadvantages of DBMS. Language and Architecture: Data Modeling, Records and Files,
Abstraction and data Integration, Views, Data Independence, Data Associations, Data Models Classification.
(8hrs)
Unit 2: Entity Relationship Model: Basic concepts, constraints, design issues, Entity Relationship
diagram, Weak Entity sets, Extended ER features, Design of ER database schema, Reduction of ER
schema to tables. Relational Model: Attributes and domains, Tuples, Relations and Schemas, Relation
representation, keys, Integrity Rules, Relational algebra, Relational Calculus, Data Manipulation using
SQL. (12 hrs)
Unit 3: Relational Database Design: Normalization using Functional Dependency, Normalization using Join
dependencies, Domain key normal form. (8hrs)
Section-B
Unit 4: Transactions: Introduction to transaction and system concept, transaction state, desirable properties of
transactions (ACID properties), Concurrent executions, Serializability, Recoverability, implementation of
isolation, transaction definition in SQL. Concurrency Control: Lock based protocols, Timestamp-based
protocols, Validation based protocols, Multiple Granularity, Multiversion Schemes, Deadlock Handling, Insert
and Delete operations. (12hrs)
Unit 5: Recovery Systems: Failure classification, Storage Structure, Recovery and Atomicity, Log based
recovery, Shadow Paging, Recovery with Concurrent Transitions, Buffer Management. (8 hrs)
Assessment and Evaluation Plan
4

Assessment Tools Evaluation


 Assignment  20 marks

 Sessionals  20 marks

 Attendance  10 marks

 External Exam  100 marks


Database

What is a Database?
 A database is an organized collection of data that is electronically

stored and accessed. In a database, data is typically structured in a


way that makes it easy to manage, access, update, and analyze.
Databases can store data about people, products, processes, or any
other type of information.

Unit 1: Basic Concepts


DBMS

What is a Database Management System (DBMS)?


 A Database Management System (DBMS) is a software system that

enables users to create, manage, and interact with one or more


databases. It provides a systematic and efficient way of storing,
managing, and retrieving data. A DBMS serves as an interface
between users, applications, and databases, managing data structure,
storage, security, and retrieval operations.

Unit 1: Basic Concepts


Contd..

Database Management System (DBMS), acts as the intermediary between the end-
user and the application (App), both accessing the database through APIs (Application
Programming Interfaces).

•User and App: These are the two entities that need to interact with the database. The
user could be a person using a web or mobile interface, while the App represents any
application software that requires database access.
•API (Application Programming Interface): This component acts as a mediator that
allows the User and the App to communicate with the DBMS. APIs define the methods
and data formats that the User and the App can use to query and manipulate data.
•DBMS (Database Management System): This is the software system designed to
manage databases. It provides the interface through which databases can be accessed and
manipulated. The DBMS ensures data integrity, security, and consistency.
•Database: This is the storage component where data is actually kept. It could consist of
multiple databases managed by the DBMS, and it stores all the data that the User and the
App interact with.

Unit 1: Basic Concepts


History of DBMS

Unit 1: Basic Concepts


Contd..

 1960s: Hierarchical and Network Databases


• Hierarchical Databases: This type of database organizes data in a tree-like
structure, where each record has a single parent. IBM's Information Management
System (IMS) is one of the most notable examples. Hierarchical databases are
particularly useful for applications with complex one-to-many relationships, like
assembly lines or organizational charts.

• Network Databases: These extend the hierarchical model by allowing many-to-


many relationships between entities through the use of a graph structure, where
entities can have multiple parent and child records. This flexibility made network
databases very popular for complex applications, such as telecommunications and
banking systems.

Unit 1: Basic Concepts


Contd..

Hierarchical Model

Network Model
Contd..

 1970s: Relational Database Management Systems (RDBMS) and SQL


• Relational Databases: Introduced by Edgar F. Codd at IBM, relational databases
use a table-based structure where data relationships are defined by keys and indexes
rather than physical links. This model is simpler and more flexible than the
hierarchical and network models. It allows for the use of SQL (Structured Query
Language), a powerful tool for managing and manipulating data.

• SQL Development: SQL became the standard language used to interact with
relational databases. It allows users to perform various tasks such as querying,
updating, and managing databases, making it integral to database management.
Unit 1: Basic Concepts
Contd..

 1980s: Dominance of RDBMS


• During the 1980s, the use of relational databases became dominant in the industry.
Major systems like Oracle, IBM’s DB2, and Microsoft SQL Server were
developed and commercialized, providing robust, reliable, and efficient ways to
store, retrieve, and manage data across various applications and industries.
 1990s: Object-Oriented and Object-Relational Databases
• Object-Oriented Databases: These databases integrate object-oriented
programming language capabilities, allowing stored data to be handled as objects.
This approach is useful for applications requiring complex data representations,
such as CAD systems, and is designed to handle large-scale multimedia objects,
including audio, video, and text.
• Object-Relational Databases: These combine relational and object-oriented
database capabilities, offering a more sophisticated data modeling framework while
still supporting a high-level query language. They allow data to be manipulated
using methods defined by its class and can handle new data types, such as images
and custom data structures.
Unit 1: Basic Concepts
Contd..

 2000s: Rise of NoSQL


• NoSQL Databases: As web-scale applications emerged, NoSQL databases like MongoDB,
Cassandra, and CouchDB gained popularity for their ability to scale horizontally and handle
large volumes of unstructured and semi-structured data. NoSQL systems are characterized by
their flexibility, scalable performance, and ease of use in distributed environments, making
them suitable for big data and real-time web applications.

Unit 1: Basic Concepts


Contd..

 2010s: Low-Code Platforms


• Low-Code Development: allowed businesses to quickly build and deploy database-
driven applications without needing extensive coding knowledge. These platforms
provide tools and interfaces that simplify processes like application development,
allowing users to focus on design and functionality rather than backend coding.
 Key Features of Low-Code Platforms
1. Drag-and-Drop Interfaces: Simplify app development with user-friendly, visual component
assembly.
2. Pre-built Templates and Components: Offer libraries of customizable templates and
components to streamline app creation.
3. Automation Tools: Automate tasks like data validation and business logic without manual
coding.
4. Integration Capabilities: Enable seamless connections with existing databases and external
APIs.
5. Scalability: Support business growth with robust architectures that adapt to increasing
demands.
Unit 1: Basic Concepts
Advantages of DBMS:

1. Improved Data Sharing and Management: DBMS allows users to access and manage data
efficiently. Multiple users can access the data concurrently in a controlled manner, improving
data sharing across the organization.
2. Data Integrity: DBMS enforces rules to ensure the integrity of data. It ensures that only valid
and consistent data is stored through constraints, relationships, and rules.
3. Data Security: With a DBMS, it's possible to enforce user permissions and policies. This
ensures that only authorized users can access or manipulate the data, thus enhancing security.
4. Data Independence: DBMS provides an abstraction layer between the physical data storage
and the application layer, enabling data independence. Changes in the structure of a database
do not affect the applications that use this data.
5. Backup and Recovery: DBMS automatically takes care of backup and recovery. The database
administrator doesn’t need to worry about backing up data manually; the DBMS handles
issues of recovery after a crash.
6. Consistency: Multiple users can access the data concurrently without affecting its consistency.
DBMS ensures that any transaction brings the database from one valid state to another,
maintaining database consistency.

Unit 1: Basic Concepts


Advantages of DBMS:

1. Consistency: Multiple users can access the data concurrently without affecting its consistency.
DBMS ensures that any transaction brings the database from one valid state to another,
maintaining database consistency.
2. Reduced Data Redundancy: DBMS minimizes data redundancy by ensuring that a piece of
data is stored only once. This reduces the data anomalies and storage costs, while improving
data consistency.
3. Improved Decision Making: Enhanced data management translates into better reports, better
understanding of data, and improved analytics, which support better decision-making
throughout the organization.

Unit 1: Basic Concepts


Disadvantages of DBMS:

1. Complexity: DBMS systems can be complex to understand and operate. Effective


implementation requires in-depth knowledge of the underlying principles and technologies.
2. Costs: High initial investment is required for hardware, software, and trained personnel. The
cost of maintaining a DBMS can be substantial.
3. Performance: While DBMS can enhance performance in terms of data access and
management, the overhead of maintaining a DBMS (such as ensuring ACID properties,
security checks, and transaction management) can lead to performance degradation as
compared to simpler file-based systems.
4. Size: DBMS systems generally require substantial memory and processing power, which can
be a constraint for small businesses or low-resource environments.
5. Higher Impact of Failure: As the central system handling all data operations, a failure can be
catastrophic, potentially leading to loss of data or major downtimes.
6. Maintenance Requirements: DBMS require continuous maintenance to ensure they perform
efficiently. This includes database tuning, update, and upgrade, which necessitates skilled
personnel.
7. Vendor Lock-in: Once a specific DBMS is implemented, switching to another can be difficult
and costly due to the differences in technologies and data migration challenges.
Unit 1: Basic Concepts
Applications of DBMS

1. Banking and Finance


• Transaction Management: DBMS are used for managing customer information,

accounts, loans, and transactions.


• Fraud Detection and Risk Management: Analyzing trends and patterns to detect

fraud and assess risks.

 2. Telecommunications
• Service Management: Managing customer
data, billing information, and call records.
• Network Management: Handling
configurations, access rights, and
resource allocation.

Unit 1: Basic Concepts


Applications of DBMS
 3. Retail and E-Commerce
• Inventory Management: Tracking
products, orders, and supplies.
• Customer Relationship Management
(CRM): Managing customer data,
purchase histories, and preferences
to enhance customer service and
targeted marketing.

 4. Healthcare
• Patient Records Management: Storing patient data, treatment records, and
medical histories.
• Appointment Scheduling: Managing schedules for doctors, patient
appointments, and room availability.

Unit 1: Basic Concepts


Applications of DBMS
 5. Education
• Student Information Systems: Managing student registrations, grades, and
academic records.
• Library Management: Cataloging books, managing loans, and member records.

 6. Manufacturing
• Supply Chain Management: Monitoring production processes, inventory, supplier
relationships, and logistics.
• Product Lifecycle Management: Managing data through all stages of a product
from inception through manufacturing to service and disposal.

Unit 1: Basic Concepts


Applications of DBMS
 7. Airlines
• Reservation Systems: Managing flight schedules, bookings, and passenger details.
• Crew Management: Scheduling and managing crew assignments.

 8. Human Resources
• Employee Management: Handling employee records, payroll, benefits, and
compliance documentation.
• Recruitment Processes: Managing job postings, applicant tracking, and hiring
processes.

Unit 1: Basic Concepts


Applications of DBMS
 9. Government
• Public Records Management: Maintaining records of citizens, such as births,
marriages, deaths, and property ownership.
• Resource Management: Managing allocation and usage of governmental resources
and services.

 10. Insurance
• Policy Management: Managing customer policies, claims data, and payment
tracking.
• Risk Assessment: Analyzing data to assess insurance risks and calculate premiums.

Unit 1: Basic Concepts


Components of a Database
System
Database Management System (DBMS) is a sophisticated
software tool designed to store, manage, and manipulate
both data and the metadata that describes this data. It
serves as the backbone for handling vast amounts of
data efficiently and securely, ensuring data integrity,
consistency, and accessibility.

Components:
1.Classification of DBMS users
2.DBMS facilities
3.Structure of DBMS
4. Database Access
1. Classification of DBMS Users

DBMS users vary widely in their level of expertise and the way they
interact with the system:
• Naive Users: These users interact with the database through

highly simplified interfaces such as ATMs or menu-driven


applications, which guide them through each transaction step.

• Online users interact with databases either directly via terminals


or indirectly through user interfaces. They range from
sophisticated users who employ data manipulation languages to
naive users who require guided interfaces like menus. Their
interaction levels vary based on their familiarity and expertise with
the database system.
Contd..

• Casual User: Unlike the naive user, the casual user has some
knowledge about how to query the database but does not
regularly need to manipulate database structures or manage the
system. They access the DBMS through a telecom system that
interfaces with a query processor.
• Batch User: This user submits jobs that are processed in batches
by the system. They interact through compiled application
programs that communicate directly with the DBMS and its data
manager.
• Application Programmers: They develop the applications and
interfaces that naive and other online users utilize. These
programs are often written in languages like Assembler, C, COBOL,
and others, incorporating commands to manipulate the database.
• Database Administrator (DBA): The most knowledgeable about
the database, the DBA controls the database's structure and
operation, defines user access levels, and is responsible for
maintenance, recovery, and optimizing database performance.
2. DBMS Facilities

 The DBMS provides two primary facilities:


• Data Definition Facility: Utilizes Data Definition
Language (DDL) to define the database schema,
including entities, relationships, and constraints. This
schema is compiled and stored in the data dictionary,
which contains metadata about the data.
• Data Manipulation Facility: Uses Data Manipulation
Language (DML) for data retrieval, insertion, updating,
and deletion. DML can be embedded in application
programs or used interactively, and it may be
procedural or non-procedural.
3. DBMS Structure
Contd..

Data Definition Language Compiler (DDL Compiler)


The DDL Compiler is essential for setting up and modifying the database schema. It
processes DDL statements, such as CREATE, ALTER, and DROP, which define the
structure of the database tables, relationships, and other constraints. The output is a set
of tables that store metadata about these definitions. This metadata, stored in the data
dictionary, is crucial for the DBMS as it uses this information to maintain the structure
of the database and to ensure that the data adheres to specified constraints and
relationships.
Data Manager
The Data Manager is the central component of a DBMS, serving as the interface
between the user's high-level requests and the database's physical storage. It translates
the logical requests from users (through queries or application programs) into the
specific physical operations needed to retrieve or manipulate the data stored on disk.
This component is responsible for maintaining the security, integrity, and consistency
of the data, even when multiple users access the database concurrently. It also
manages the complex tasks of database backup and recovery, ensuring data durability
and robustness against system failures or data corruption.
Contd..

File Manager
The File Manager is responsible for the organization and management of the database
files on disk. It works closely with the Disk Manager to handle the physical location
of the data blocks. When the Data Manager needs to read or write data, the File
Manager determines which data blocks are needed and instructs the Disk Manager to
retrieve them. This component optimizes how data is stored, arranged, and accessed
on disks to improve performance and reduce access times.

Disk Manager
Operating within the host computer's operating system, the Disk Manager handles the
low-level operations required for data storage and retrieval. This includes reading
and writing data blocks or pages from the storage media. It performs these operations
based on requests from the File Manager, which abstracts away the details of the disk's
physical characteristics from the DBMS, allowing the File Manager to focus on
higher-level data management tasks.
Contd..

Query Processor
The Query Processor is critical for the efficient retrieval and manipulation of data. It
takes user queries, typically written in a Data Manipulation Language (DML), and
interprets them. Utilizing the metadata stored in the data dictionary, the Query
Processor optimizes these queries to minimize access times and computational
overhead.

Data files
contain the data portion of the database.
Contd..

The data dictionary


a specialized database that stores metadata—detailed information about the structure
and usage of data within the main database. This metadata includes descriptions of
data fields, relationships, data types, and access controls, as well as usage statistics
and audit trails that document when and by whom data was updated.
Contd..

Telecommunication System
In DBMS environments, especially those serving multiple or remote users, the
Telecommunication System manages the communication between the users and the
database. It routes requests from users to the database and sends responses back to the
users. This system can handle direct interactions with the database via command-line
interfaces or more complex interactions through graphical, form-based, or menu-
driven interfaces. Eg. like ATMs, internet, or networked computers.
4. Database Access

When a user requests data from a database, the request is handled


by the data manager, which first determines the specific physical
record needed, often consulting the database and data dictionary to
make this determination. The data manager then forwards this
request to the file manager, which identifies the exact physical block
on secondary storage where the record is located and passes this
request to the disk manager. The disk manager retrieves the
designated block and sends it back to the file manager, which then
extracts the required record and returns it to the data manager,
completing the process of fetching the requested data.
DBMS Architectures

DBMS architecture refers to the design and layout of the


database management system, detailing how data is
structured, accessed, and managed. It defines the
framework for the storage, retrieval, and updating of
data in a database.
 Types of DBMS Architecture

• 3-tier Architecture (ANSI/SPARC model)


• 2-tier Architecture
• 1-tier Architecture
3 Tier Architecture

The ANSI/SPARC model is a three-tiered architecture


used widely in commercial and research database
systems, defined by three distinct levels of abstraction:
external, conceptual, and internal.
Contd..

Schema:
The view at each of these levels is described by a scheme. the term
"scheme" refers to an outline or a plan that details the structure of
data within a database. It describes the records, their attributes,
and the relationships between them at a specific level of
abstraction within the database architecture. This includes how
these elements are organized and how data is accessed and
managed.

External Schema or User View:


This is the highest level of database abstraction, focusing only on
the portions of the database relevant to specific users or
applications. Each user view is defined by an external schema, which
includes logical records and relationships as they appear to the user.
It also specifies how these elements are derived from the conceptual
view.
Contd..

Conceptual schema or Global View:


The middle layer includes all database entities and their
relationships, representing the entire database through a single
conceptual schema. This schema describes the records and
relationships universally present in the database and details how
to derive these elements from the internal view. It is format-
independent and includes integrity and consistency checks.

Internal schema or View:


The lowest level of abstraction, the internal view details the
physical storage of data. It is expressed through an internal
schema that describes the storage records, data field
representations, and access methods used by the database
system.
 This model allows for a systematic and organized approach to

managing database structures, ensuring data consistency and


integrity while providing various user-specific views.
 Benefits:
Improved scalability, maintainability, security, data
integrity, and flexibility by separating the presentation,
application logic, and data layers, allowing each to be
developed and updated independently, making it easier
to manage complex systems and adapt to changing
requirements.
2-Tier Architecture

It involves two distinct layers: the client and the server.


The client application communicates directly with the
database server, where the database management tasks
are processed.
External Level: The client side of the architecture, where the user
interfaces are located.
Conceptual Level: Resides on the server where the business logic
and data management are handled. This includes the database
schema and the management rules that define how data is organized,
related, and manipulated.
Internal Level: Also located on the server, this level concerns the
physical storage of data. It includes how data is stored on server
hardware, the type of database files used, indexing methods, and
other storage optimization techniques.

Use Cases:
• Small to Medium Business Applications: Effective for

environments where multiple clients need to access a central


database.
• Departmental Applications: Used in department-specific

applications within larger organizations where data is managed


Advantages and disadvantages of 2 tier Architecture

Performance: Improved performance over 1-tier, as processing is distributed between


client and server.
Scalability: More scalable than 1-tier since multiple clients can connect to the server.
Maintenance: Easier to manage and update the application without affecting the
database server.

Disadvantages:
Complexity: Higher complexity in setting up and maintaining the network connections
between client and server.
Security: Increased security risks due to potential exposure of the server over a
network.
Dependency: Client machines depend on the availability of the server to access the
database, which can lead to bottlenecks.
Data Modelling

Data modeling is a fundamental process in designing a


database system, aimed at structurally representing data
elements and the relationships between them.
This representation serves as a blueprint for how data is
stored, organized, and accessed within a database,
ensuring alignment with business requirements and
objectives. The main goal of data modeling is to create a
detailed framework that supports data integrity, efficiency,
and usability.
Steps in data modelling
Step 1: Identifying business entity

Entities
• Entities: These are the principal data objects about which

information is stored. In a relational database, each entity


typically corresponds to a table.
Let's design a simplified data model for a clinic's database
system,
Entities are:

Employee
Department
Appointment
Medical Record
Entity- Laptop
Step 2: Identifying key properties for each
entity

Properties/Attributes: These are the properties, qualities,


or characteristics of an entity. Each attribute represents a
data field in a table in the database.
For each entity identified in Step 1, determine the necessary
details that need to be stored. For example, for the entity
"Student," attributes could include roll_no, name, age, and
DOB.
Step 3:Creating relationship amongst entities

Relationships
• Relationships define how entities interact with each other.

For example, a 'Customer' entity might relate to an 'Order'


entity, indicating that customers place orders.
• Relationship is also called as Cardinality.

• Depending on the nature of the interaction, the

relationships can be classified as


• One-to-one (1:1)

• One-to-many (1:M)

• Many-to-many (M:N)
One-to-one

A 1-to-1 relationship occurs when one record in a table


is associated with one and only one record in another
table, and vice versa.

Examples:
Person and his passport
User and UserProfile
Product and ProductDetail
CEO and Company
One-to-Many

A one-to-many relationship in a database context is a


type of association where a single record in one table
(the "one" side) can relate to multiple records in another
table (the "many" side), and each record on the "many"
side can relate to only one record on the "one" side.

Examples:
Customer and Order
ProjectTeacher and student
Publisher and Books
Many-to-Many

Many-to-many relationships occur when multiple


records in one table are associated with multiple records
in another table. This relationship type is common in
scenarios where complex interconnections need to be
represented in a database

Examples:
Student and Project
Student and Courses
Doctors and Patients
Step 4: Mapping data attributes to entities

With entities and relationships defined, it's time to map data


attributes to the corresponding entities. Data attributes are specific
pieces of information related to each entity's key properties.
'Customer Name' and 'Order Date' would be data attributes mapped
to the 'Customers' and 'Orders' entities, respectively. This mapping
ensures that each piece of data finds its place in the model.

This step involves organizing and structuring the previously


identified attributes within the actual database schema. It's
not just about knowing what attributes exist, but also how
they fit into the larger database structure in relation to their
respective entities.
Step 5: Assigning keys, deciding on the degree of normalization,
reducing redundancy

There are two main types of keys: primary keys and foreign
keys.
• A primary key is a column or set of columns that uniquely

identifies each row in a table. For example, the customer ID


column might be the primary key for the customer table.
A primary key in a database has two essential properties:
1. Uniqueness: This property ensures that each value in the primary
key field is unique across all records in the table. No two rows can
have the same primary key value, which helps to uniquely identify
each record.
2. Not Null: A primary key cannot have a NULL value. This property is
crucial because a NULL value represents an unknown or undefined
value, which would fail to serve the primary key's purpose of
uniquely identifying a record in the table.
 A foreign key is a column or set of columns in one table
that references the primary key of another table. For
example, the customer ID column in the order table might
be a foreign key that references the customer ID column in
the customer table.
Contd..

Foreign Key another example:


Step 5: Assigning keys, deciding on the degree of normalization,
reducing redundancy

Data Redundancy
Data redundancy in a database context refers to the unnecessary
duplication of data within a database. This situation occurs when the
same piece of data is stored in more than one place, or when
multiple fields store similar data in slightly different forms. Data
redundancy can happen due to various reasons, including poor
database design, lack of coordination between different sections of
an application, or during data integration from disparate sources.
 Implications of Data Redundancy

1. Increased Storage Costs

2. Data Integrity Issues

3. Complicated Queries

To minimize or avoid data redundancy, database


designers often employ normalization, a process
that organizes data in a database efficiently.
Step 5: Assigning keys, deciding on the degree of normalization,
reducing redundancy

Normalization is organizing data in a way that reduces redundancy


and improves data integrity. There are six levels of normalization,
but most databases are normalized to the third normal form (3NF).

Step 6: Finalizing the data model and validating its accuracy

Once keys are assigned, normalization is achieved, and redundancy


is minimized, it's time to finalize the data model. This step involves
reviewing the entire model, ensuring it accurately represents the
business entities, their relationships, and the associated data
attributes. Validation is key; using sample data to test the model
helps identify discrepancies or inconsistencies. Iterative refinement
might be necessary to validate the model's accuracy fully.
Records and Files

A record in a database is a collection of related data items stored in fields, each


with a specific purpose and data type. Records represent a single entity or object
within a dataset. Whereas, a column signifies an attribute of the data that is
being stored in the table. Each column in a table is designed to hold a specific
type of data about the items represented in the rows.
Purpose of Records
Records are used to organize and store data efficiently,
allowing for easy retrieval, modification, and management
within a database system.
Tuples vs Records
 Tuple: Originates from relational algebra and represents an

ordered list of elements. Each element (or attribute) in a


tuple corresponds to a column in a database table. Tuples
emphasize the theoretical and formal structure of data in
relational databases.
 Record: Comes from a more general computer science

background and refers to a basic unit of data storage,


consisting of fields that describe different attributes of a
single entity, like a person or an item. Records are
commonly used in practical, application-oriented settings
such as software development and database management.
Files

A file is named a collection of related information or records


that is stored on secondary storage such as magnetic disks,
optical disks, hard disks and Solid-state drives.
Files in a DBMS are containers that store records in an
organized manner on disk. These files are specifically
structured to facilitate efficient data retrieval, manipulation,
and management. A DBMS file typically consists of one or
more records.
File Organization

File Organization refers to the logical relationships among


various records that constitute the file, particularly with
respect to the means of identification and access to any
specific record.
Understanding Data Blocks

Definition: Data blocks are the basic units of storage in a


DBMS, used to efficiently manage data on secondary storage
devices like HDDs and SSDs.
Role of Data Blocks in DBMS:
• Minimizing Disk Accesses: Disk access is relatively slow

compared to other operations in a computer system, such


as accessing RAM. By storing data in blocks, the DBMS
minimizes the number of disk I/O (Input/Output) operations
required. When a query is executed, the DBMS loads the
relevant blocks into memory rather than accessing each
record individually on the disk. This bulk transfer
significantly reduces the number of disk accesses,
enhancing performance.
• Organized Retrieval and Storage: Blocks help organize

data physically on the storage medium in a way that aligns


with how data is logically accessed by queries. For example,
Types of file organizations

• Sequential file organization


• Heap file organization
• Hash file organization
• B+ file organization
• Indexed sequential access method (ISAM)
• Cluster file organization
Sequential file organization

Description: In sequential file organization, records are


stored one after the other as they are inserted into the
database. Records can be accessed sequentially from the
beginning to the end of the file.
• Two ways:

• Pile method: we store the record in a sequence, i.e.,

one after another. Here, the record will be inserted in


the order in which they are inserted into tables.
Sequential file organization

• Sorted method: In this method, the new record is


always inserted at the file's end, and then it will sort
the sequence in ascending or descending order. Sorting
of records is based on any primary key or any other
key.
Sequential file organization

 Pros
• Simple and easy to implement.
• Efficient for batch processing where most or all records
need to be processed
• This method is used when most of the records have to
be accessed like grade calculation of a student,
generating the salary slip, etc.
• This method is used for report generation or statistical
calculations.
• Cons
• It will waste time as we cannot jump on a particular
record that is required but we have to move
sequentially which takes our time.
• Sorted file method takes more time and space for
Heap file organization

• In Heap File Organization, records are stored in blocks


within the file, but not in any sorted order. As new records
are inserted into the database, the DBMS looks for the next
available space in the blocks that make up the file.
• If a block has enough space to accommodate a new record,
the record is placed there. If not, the DBMS may allocate a
new block (if one is available) and place the record in this
new block.
 Dynamic Allocation:
• First Fit: The DBMS might use a 'first fit' approach where it
inserts the new record into the first block it finds with
sufficient space.
• Next Available Block: Alternatively, it might append a
new block at the end of the file if no existing blocks have
enough space.
Random Placement: In heap file organization, records are
stored in no particular order. When a new record is inserted, it
is placed in the first available space found in the data blocks.
This might not necessarily be in the next sequential block or
in any sorted order, reflecting a flexible and unstructured
Suppose we have five records R1, R3, R6, R4 and R5 in a
heap and suppose we want to insert a new record R2 in a
heap. If the data block 3 is full then it will be inserted in any
of the database selected by the DBMS, let's say data block 1.
 Pros of Heap file organization
• It is a very good method of file organization for bulk
insertion. If there is a large number of data which needs to
load into the database at a time, then this method is best
suited.
• In case of a small database, fetching and retrieving of
records is faster than the sequential record.
 Cons of Heap file organization
• This method is inefficient for the large database because it
takes time to search or modify the record.
Hash file organization

 hashing is the best technique in this scenario, where we can


access the data bucket of any record in O(1) using its
primary key.
 In hash file organization, Hashing is used to generate the
addresses of the memory blocks where actual records are
mapped. Basically, a hashing function generates the
address of those data blocks using the primary key as input,
and those memory locations which
are generated by these hash functions are called data
buckets or data blocks.
 When a record has to be received using the hash key
columns, then the address is generated, and the whole
record is retrieved using that address. In the same way,
when a new record has to be inserted, then the
address is generated using the hash key and record is
directly inserted. The same process is
applied in the case of delete
and update.
 In this method, there is no
effort for searching and
sorting the entire file.
In this method, each record
will be stored randomly
in the memory.
Indexed sequential access method
(ISAM)
ISAM method is an advanced sequential file organization. In
this method, records are stored in the file using the primary
key. An index value is generated for each primary key and
mapped with the record. This index contains the address of
the record in the file.

If any record has to be retrieved based on its index value,


 Pros of ISAM:
• In this method, each record has the address of its data block,
searching a record in a huge database is quick and easy.
• This method supports range retrieval and partial retrieval of
records. Since the index is based on the primary key values, we
can retrieve the data for the given range of value. In the same
way, the partial value can also be easily searched, i.e., the student
name starting with 'JA' can be easily searched.
 Cons of ISAM
• This method requires extra space in the disk to store the index
value.
• When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
• When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow
down.
B+ file organization

• It is the advanced method of an indexed sequential access


method. It uses a tree-like structure to store records in File.
• It uses the same concept of key-index where the primary
key is used to sort the records. For each primary key, the
value of the index is generated and mapped with the
record.
• The B+ tree is similar to a binary search tree (BST), but it
can have more than two children. In this method, all the
records are stored only at the leaf node. Intermediate nodes
act as a pointer to the leaf nodes. They do not contain any
records.
 The above B+ tree shows that:
• There is one root node of the tree, i.e., 25.
• There is an intermediary layer with nodes. They do not
store the actual record. They have only pointers to the
leaf node.
• The nodes to the left of the root
node contain the prior value
of the root and nodes to the
right contain next value of
the root, i.e., 15 and
30 respectively.

• There is only one leaf node which has only values, i.e.,
10, 12, 17, 20, 24, 27 and 29.
• Searching for any record is easier as all the leaf nodes
are balanced.
• In this method, searching any record can be traversed
through the single path and accessed easily.
 Pros of B+ tree file organization
• In this method, searching becomes very easy as all the
records are stored only in the leaf nodes and sorted the
sequential linked list.
• Traversing through the tree structure is easier and
faster.
• The size of the B+ tree has no restrictions, so the
number of records can increase or decrease and the
B+ tree structure can also grow or shrink.
• It is a balanced tree structure, and any
insert/update/delete does not affect the performance of
tree.
 Cons of B+ tree file organization
• This method is inefficient for the static method.
Cluster file organization

 In clustered file organization, two or more


records/tables are combined into a single file based on
the clustered key or hash clusters, these files contain
two or more tables in the same memory block,
and all of them are combined using a single
clustered key/hash key to a single table.
• This method reduces the cost of searching for various
records in different files.
• The cluster file organization is used when there is a
frequent need for joining the tables with the same
condition. These joins will give only a few records from
both tables. In the given example, we are retrieving the
record for only particular departments. This method
can't be used to retrieve the record for the entire
department.
Data Abstraction

 The database system contains intricate data structures and


relations. The developers keep away the complex data from
the user and remove the complications so that the user can
comfortably access data in the database and can only
access the data they want, which is done with the help of
data abstraction.
 The main purpose of data abstraction is to hide irrelevant
data and provide an abstract view of the data. With the help
of data abstraction, developers hide irrelevant data from
the user and provide them the relevant data. By doing this,
users can access the data without any hassle, and the
system will also work efficiently.
Levels of Data Abstractions in DBMS
 1. Physical or Internal Level:
 The physical or internal layer is the lowest level of data
abstraction in the database management system. It is the
layer that defines how data is actually stored in the
database. It defines methods to access the data in the
database. It defines complex data structures in detail, so it
is very complex to understand, which is why it is kept
hidden from the end user.
 Data Administrators (DBA) decide how to arrange data and
where to store data. The Data Administrator (DBA) is the
person whose role is to manage the data in the database at
the physical or internal level. There is a data center that
securely stores the raw data in detail on hard drives at this
level.
2. Logical or Conceptual Level:
 The logical or conceptual level is the intermediate or next

level of data abstraction. It explains what data is going to


be stored in the database and what the relationship is
between them.
 It describes the structure of the entire data in the form of

tables. The logical level or conceptual level is less complex


than the physical level. With the help of the logical level,
Data Administrators (DBA) abstract data from raw data
present at the physical level.
 3. View or External Level:

 View or External Level is the highest level of data


abstraction. There are different views at this level that
define the parts of the overall data of the database. This
level is for the end-user interaction; at this level, end users
can access the data based on their queries.
Advantages of data abstraction in
DBMS
• Users can easily access the data based on their
queries.
• It provides security to the data stored in the database.
• Database systems work efficiently because of data
abstraction.
Data Independence

It refers to the system's ability to change the database


schema at one level of a database system without having
to change the schema at the next higher level. It is
designed to manage changes in database structure without
affecting the data access or associated applications. Data
independence is typically divided into two types:
1. Logical Data Independence
2. Physical data Independence
1. Logical Data Independence

Logical data independence is the capacity to alter the


logical schema without altering the external schema or
application programs.
 The logical schema involves the design and structure

of the database, including the tables, fields, and


constraints used to define how data is stored and
accessed.
 Changes could include adding new fields, tables, or

relationships without affecting how applications view


the data.
 This type of independence is crucial for evolving the

data model without disrupting the user interface or


application layer in bullets
2. Physical Data Independence

 Physical data independence refers to the ability to


modify the physical schema without changing the
logical schema or applications.
 The physical schema deals with the physical storage of
the data, including how it is organized on disk,
indexing mechanisms, and other storage details.
 For instance, changes might involve switching to new
storage devices, changing indexing strategies,
or modifying the file organization techniques.
 The main benefit is that these adjustments can be
made for performance optimization or cost efficiency
without affecting how data is queried or managed at
the application level.
Importance of Data
Independence
Data independence offers several significant
advantages:
• Flexibility: It allows database administrators to adjust

database storage and structure as technological and


organizational requirements change without
necessitating changes to the applications that use the
database.
• Reduced Maintenance Costs: Changes in the

database at one level do not require alterations to


other levels, thus minimizing the maintenance costs
associated with adapting applications to new database
structures.
• Enhanced Security and Abstraction: It provides an

additional layer of abstraction, which helps in


enhancing security and simplifying complex database
Data Integration in DBMS

Data Integration in DBMS is the process of


combining data from multiple sources into a single,
unified view to provide consistent and meaningful
information. It ensures that data from different
databases, applications, or systems can be accessed and
used efficiently without inconsistencies.
Importance of Data Integration
• Combines heterogeneous data sources into a

single database.
• Ensures data consistency and accuracy.

• Eliminates data redundancy and improves data

quality.
• Facilitates business intelligence and decision-

making.
Components of Data Integration
Views in DBMS

Definition:
 A view in DBMS is a virtual table that provides a

customized representation of data from one or


more tables. It does not store data physically but
dynamically fetches data when queried.
 Key Characteristics of Views:

1. Virtual Table → Does not store data but retrieves it

from base tables.


2. Simplifies Queries → Provides a predefined structure

for complex SQL queries.


3. Ensures Security → Restricts access to sensitive

columns or rows.
4. Logical Independence → Schema changes in the

base table do not affect the view.


Data Associations in DBMS

Data Associations in DBMS refer to the relationships


between different data entities in a database. These
associations help define how tables (or records) are
linked and how they interact with each other.
Importance of Data Associations
• Ensures data consistency and integrity.

• Avoids data redundancy by normalizing

relationships.
• Enables efficient querying by defining how data is

related.
• Supports complex operations like joins, constraints,

and relationships.
Types of Data Associations in
DBMS

1. One-to-One (1:1) Relationship


2. One-to-Many (1:M) Relationship
3. Many-to-Many (M:N) Relationship
Types of Data Associations in
DBMS
1. One-to-One (1:1) Relationship
• Each record in Table A is associated with one record in

Table B.
• Used for storing additional details of an entity.

 Example:

 A Passport is associated with only one Person.

2. One-to-Many (1:M) Relationship


• One record in Table A is associated with multiple

records in Table B.
• Most common relationship in databases.

 Example:

 A Professor can teach multiple Courses, but each


Types of Data Associations in
DBMS
3. Many-to-Many (M:N) Relationship
• A record in Table A can be related to multiple

records in Table B, and vice versa.


• Implemented using a junction (bridge) table.

Example:
 A Student can enroll in multiple Courses, and a

Course can have multiple Students.


Classification of Data Models

Data models are frameworks that define how data is


structured, stored, and retrieved in a database
system.

Data models are broadly classified into four categories:


1. Hierarchical Data Model

2. Network Data Model

3. Relational Data Model

4. Object-Oriented Data Model


Hierarchical Data Model

The Hierarchical Data Model is one of the oldest data


models that organizes data in a tree-like structure, which
closely resembles a family tree or an organization chart. This
model was popularized by IBM's Information Management
System (IMS) in the 1960s and is known for its
straightforward, simple structure, which is based on parent-
child relationships where each parent can have multiple
children, but each child has only one parent.
 Key Characteristics

1. Structure: Data is organized hierarchically, meaning there

is a single "root" or top level, and multiple levels of related


records beneath it. The relationships between records are
one-to-many, with each parent record having one or more
child records linked to it.
2. Navigation: Access to data starts at the root and moves

down through the structure. This allows efficient traversing


Hierarchical Data Model

 Advantages
• Performance: For certain types of queries, especially those
that can follow the predefined hierarchical paths, the
performance can be very good because the relationships
are fixed and there is no need for complex joins.
• Simplicity: It is relatively simple and straightforward to
implement and understand, particularly in scenarios where
relationships are naturally hierarchical (e.g., organizational
structures, product categories).
 Disadvantages
• Flexibility: The hierarchical model is less flexible compared
to other models like the relational model. The structure is
rigid, so making changes to the database schema (like
adding new fields or changing relationships) can be difficult
and may require significant restructuring.
• Limited Relationships: The model supports only one-to-
Hierarchical Data Model

• Data Redundancy: There is potential for data redundancy


if the hierarchical structure is not well designed. For
instance, data common to several child records needs to be
duplicated for each child.
 Use Cases

• Organizational charts: Representing company hierarchies

where each entity (employee or department) has a clear,


singular upward link to a supervisory or higher-level
department.
• File systems: Organizing files and folders where each

folder can contain multiple files or subfolders but each file


or subfolder is only in one folder.
Some popular software's that are still used today:
IBM Information Management System : IBM's IMS is still in
use today, mainly in industries that require high-throughput transaction
processing, such as banking and manufacturing.
Network Data Model

The Network Data Model is a database model


conceived as a flexible approach to representing objects
and their relationships. Its development in the 1970s was
primarily driven by the limitations of the hierarchical
data model, particularly its rigidity and the difficulty in
representing many-to-many relationships.
Key Characteristics of the Network Data Model:
1. Complex Relationships: Unlike the hierarchical model,

which strictly allows one-to-many relationships, the network


model can accommodate many-to-many relationships. This
capability makes it more versatile in representing real-world
relationships where a single entity can be connected to
many other entities, and vice versa.
2. Graph Structure: The model is conceptualized as a graph

where both entities (records) and relationships (links) are


directly represented. Each record can be connected to
multiple other records through links, which are not
restricted to a hierarchy. This structure is akin to a network
of nodes (entities) interconnected by edges (relationships).
3. Record Ownership: In the network model, a record can

have multiple owners, meaning it can be linked to multiple


parent records. This is a significant shift from the
Advantages:
• Flexibility: More flexible in modeling complex relationships

and organizational structures than the hierarchical model.


• Concurrency and Navigation: Efficient at managing large

databases and handling complex queries and relationships


due to its ability to quickly traverse networks.
• Data Integrity: Supports robust data integrity rules,

allowing for the representation of complex


interdependencies among data.
Disadvantages:
• Complexity: More complex to design and manage

compared to the hierarchical and even some aspects of the


relational models. Understanding and navigating the
network of relationships can be challenging.
• Database Management: Requires sophisticated database
Use cases:
1. Telecommunications: Useful for managing complex and

interlinked network systems, such as telephone networks or


internet infrastructure.
2. Transportation Networks: Ideal for managing data about

systems where many paths might exist between nodes,


such as road networks, airline flights, and logistics
networks.
3. Resource Management: Effective in environments where

resources are interconnected in complex ways, such as


power grid management or water distribution networks.
Software Examples:
 Integrated Data Store (IDS): was the first database

system based on the network data model.


 CA-IDMS (Integrated Database Management
Relational Data Model

 A Relational Database Management System (RDBMS) is a


type of database management system (DBMS) that stores
data in a structured format, using rows and columns.
RDBMSs use the relational model as their foundation, which
was introduced by Edgar F. Codd in 1970. This model
organizes data into one or more tables (or "relations") of
columns and rows, with a unique key identifying each row.
Relational Data Model

 Key Features of RDBMS:


1. Tables: The fundamental building blocks of RDBMS are

tables. Each table is organized into rows and columns, and


each row in a table represents a record. Each column in a
table represents a data field. All rows can have similar or
related data entries.
2. Primary Keys: Each table in an RDBMS is required to have

a unique primary key that identifies its rows. This is crucial


for relating tables to each other.
3. Foreign Keys: These are identifiers that create a link

between two tables. By using foreign keys, tables in a


relational database can be related to each other, enabling
cross-referencing of records across tables.
4. SQL: Structured Query Language (SQL) is the standard

language for managing and manipulating databases in


RDBMS. SQL can be used to perform all types of data
Relational Data Model

Normalization: This is a process used to minimize data


redundancy and dependency by organizing fields and table
relationships. Normalization typically involves dividing large
tables into smaller (and less redundant) tables and defining
relationships between them.
ACID Properties: Relational databases are designed to
guarantee ACID properties (Atomicity, Consistency, Isolation,
Durability), which ensure reliable processing of database
transactions.
 Advantages of RDBMS:

• Data Integrity: Various data integrity enforcing methods

like entity integrity, referential integrity, and user-defined


integrity ensure high data accuracy.
• Ease of Use: SQL makes it relatively easy to manage and

query data, even for complex and large datasets.


• Flexibility: Data can be accessed in many different ways
Relational Data Model

• Security: RDBMSs provide robust data security features,


including user authentication and access permissions to
secure data against unauthorized access.
 Disadvantages of RDBMS:
• Scalability: Traditional RDBMSs can struggle with very
large databases or hugely scalable environments.
Horizontal scaling is particularly challenging.
• Complexity: Managing the schema of relational databases
can become complex as the number of tables and
relationships grows.
• Cost: Advanced RDBMS software can be expensive due to
licensing, integration, and maintenance costs.
 Common Examples of RDBMS:
• Oracle Database, MySQL, Microsoft SQL Server,
PostgreSQL, IBM DB2
Object-Oriented Data Model

This model integrates object-oriented programming and


database technology to store objects directly in the
database rather than converting them into rows and
columns, as is done in relational databases.
 example
Examples where only OODBMS is suitable:

Complex Engineering and Design Applications (CAD/CAE):


• OODBMS effectively manages complex entities like parts and

assemblies, facilitating natural modeling of intricate relationships


and configurations.
Multimedia Databases:
• Supports encapsulation of multimedia objects and operations (like

resize or compress), enhancing performance and simplifying


application development.
Real-Time Systems and Simulations (Flight Simulators,
Gaming Engines):
• Offers direct storage and manipulation of simulation objects,

crucial for maintaining high performance and responsiveness in


real-time applications.
Scientific Research and Complex Data Analysis
(Bioinformatics, Ecological Research):
• Ideal for managing complex hierarchical data, supporting

advanced features like polymorphism and inheritance to model


How is data stored in OODBMS

Data in an Object-Oriented Database Management System


(OODBMS) is not stored in a tabular format like it is in a
Relational Database Management System (RDBMS). Instead,
data in OODBMS is stored in a format that mirrors the object
structures used in object-oriented programming. Here's how
this works:
Object Storage
• Object Representation: In OODBMS, data is stored as

objects, not as rows in a table or records. Each object


retains its identity and state which corresponds to the
attributes defined in its class.
• Object Identity: Each object has a unique object identifier

(OID) that is used to reference it, similar to how an object


reference or a pointer works in programming languages.
How is data stored in OODBMS

• Attributes and Methods: Objects store both fields


(attributes) and behaviors (methods) as defined in their
class in the programming environment. This allows objects
in the database to be used directly in applications without
needing to reconstruct them from relational data.
 Relationships
• References: Relationships between
objects are maintained as direct
references from one object to
another using their OIDs. This is
fundamentally different from
RDBMS, where relationships are
usually represented using foreign
keys and require joins to navigate.
Properties of Object oriented
programming
• Inheritance: OODBMS directly supports inheritance,
meaning objects can inherit attributes and methods from
superclass objects. This hierarchical relationship is
maintained directly within the database.
• Encapsulation: Objects encapsulate their data and
behavior, which is in contrast to the relational model where
data and its manipulations (queries and transactions) are
separate.
 Polymorphism: This concept is the ability to use a single
interface for different types of data.
 Object-Oriented Database Management Systems (OODBMS) offer
several advantages over traditional Relational Database
Management Systems (RDBMS), particularly when it comes to
handling complex data and relationships. Here are some key
advantages:
1. Complex Data Handling: OODBMS can store complex data
objects directly, which is particularly useful for applications that
deal with extensive data modeling, such as CAD/CAM, software
engineering, and multimedia databases. This direct handling can
simplify the design and development of applications by
maintaining consistency between the data model and the
programming objects.
2. Performance: OODBMS often provides better performance for
certain workloads because they can access data directly through
pointers, avoiding the overhead of joining tables as in RDBMS. This
can lead to significant performance gains, especially in
applications where the relationships between objects are fixed and
can be navigated quickly.
1. Inheritance: OODBMS supports the concepts of inheritance and
polymorphism, allowing classes to be defined with inherited
properties from other classes. This can reduce redundancy and
promote data integrity by ensuring that all inherited objects
maintain the characteristics of their parent objects, which is not
inherently supported in RDBMS.
2. Encapsulation: Objects in an OODBMS encapsulate both data and
behavior, which can help secure the data by restricting access to it
through the object’s methods. This feature supports a higher level
of data integrity and security compared to RDBMS, where data and
behavior are typically separate.
3. Version Control: OODBMS often includes built-in versioning
capabilities, allowing objects to be versioned and managed
effectively. This is especially useful in applications like software
engineering, where different versions of objects need to be
managed and maintained simultaneously.
1. Development Productivity: Since OODBMS concepts are aligned
with object-oriented programming principles, there is a natural
integration between the database and the application
development languages (like Java, C++, or Python). This can lead
to increased developer productivity as there is less impedance
mismatch between the database and the programming language.
2. Scalability: OODBMS can offer more flexibility in scaling
applications, especially when dealing with distributed systems.
They naturally support distributed architectures, making it easier
to distribute data across multiple servers while maintaining object
integrity.
3. Flexibility: Due to their schema-less nature, OODBMS can be
more flexible in adapting to changes. Modifying the database
schema does not require significant changes to the queries
accessing the data, unlike RDBMS, where even small changes in
the schema could require significant changes to all associated SQL
queries.
119

Thank You

You might also like