Unit 1 Dbms
Unit 1 Dbms
Database Management System (DBMS) is a software used to manage data from a database.
A relational database stores data in the form of tables and a NoSQL database in the
form of key-value pairs.
Examples of relational DBMS are MySQL, Oracle, Microsoft SQL Server, Postgre
SQL and Snowflake.
Data and Information are important concepts in the world of computing and decision-making.
Data is defined as unstructured information such as text, observations, images, symbols, and
descriptions on the other hand, Information refers to processed, organized, and structured
data. It gives context to the facts and facilitates decision-making.
Data Vs Information
What is Data?
Data is a raw and unorganized fact that is required to be processed to make it meaningful. It
can be considered as facts and statistics collected together for reference or analysis.
Data are individual units of information. In analytical processes, data are represented by
variables. Data is always interpreted, by a human or machine, to derive meaning. So, data is
meaningless. Data contains numbers, statements, and characters in a raw form.
Types of Data
There are two types of Data:
1. Quantitative: Quantitative data refers to numerical information like weight, height, etc.
2. Qualitative: Qualitative data refers to non-numeric information like opinions,
perceptions, etc.
What is Information?
Information is measured in
Bits and Bytes are the measuring
meaningful units like time,
unit of data.
Measurement quantity, etc.
Knowledge
It is low-level knowledge. It is the second level of knowledge.
Level
Data Example
Temperature Readings: Numbers representing temperature throughout the day,
such as "72°F", "68°F", "75°F".
Student Grades: A list of numerical scores obtained by students on a test, like "85",
"92", "78".
Stock Prices: Daily closing prices of a company's stock, such as "$50.25", "$48.90",
"$52.10".
Information Example
Weather Report: Based on the temperature reading a weather report can be generated.
Grade Average: Based on the student grades, the average grade of class can be derived.
Market Analysis: The stock market showed a slight increase today derived from stock
prices.
In each case, the raw data ( temperature readings, student grades, stock prices) becomes
meaningful information after being processed, analyzed, and presented in a relevant context.
Data
Data is the raw material that can be processed for any computing machine.
For example − Employee name, Product name, Name of the student, Marks of the student,
Mobile number, Image etc.
Information
Information is the data that has been converted into more useful or intelligent form.
For example: Report card sheet.
The information is needed for the following reasons −
To gain knowledge about the surroundings.
To keep the system up to date.
To know about the rules and regulations of the society.
Knowledge
The human mind purposefully organizes the information and evaluates it to produce
knowledge.
Example of data, information and knowledge
A student secures 450 marks. Here 450 is data, marks of the student is the information and
hard work required to get the marks is knowledge.
Differences
The major differences between Data and Information are as follows −
Data Information
Diagram
The diagram given below depicts the use of data and information in a database −
FILE SYSTEM:
Computer Systems or before their heavy usage, the data were used to be stored in files
manually. This means that the same data could have been present multiple times in a single
institution. Although it was easy to find those data as one piece of data was present in
multiple places (Data Redundancy) but accessing those data was not an easy task to do,
every time someone who wanted to access those files should have to write a new computer
program (in languages like C/C++) this technique was only suitable for small organizations
where the amount of data to be stored was relatively lesser. Each unit of information stored
in files was known as a "flat file".
What is the Traditionally used "File System"?
The traditionally used "File Systems" were nothing but a manual way of storing data as
"Files". Considering a scenario of a bank before the introduction of DBMS, for example,
say someone went to the bank to deposit a certain amount in their account. So as the DBMS
is not available so the bank employee has to manually register their account number, name,
and amount in either a written manner or type and store them locally in the computer as
a file. The problem which might arise that while writing if the employee mistakenly writes
any digit of their account number or amount wrong then there would be a major issue and
as there is no Database so it would be really hard to know what was the last state of that
person's account before this misshaped deposit.
Banking
system that uses File Based Approach
In the diagram above we can see how the details used to be stored by bank employees
before the introduction of DBMS. Each department would handle some specific tasks and
store the data locally in their computers or registers without knowing what is happening in
the other department.
Characteristics of Traditional File Systems
The files stored in different departments were independent of each other, which caused
severe data redundancy.
Those files were developed using programming languages like COBOL, C, and C++.
Each file includes information for a particular department or region, such as the library,
tuition, and students' exams.
The traditional file system is way less flexible than DBMS and has many disadvantages.
The maintenance of those files was also of high cost.
Each of the units of "Files" used to be known as "Flat Files".
Problem in File System
Disadvantages of the Traditional File Systems
The following issues are related to the file-based approach:
Distinguished and Isolated Data: Imagine a user needs information that is not possible
to be provided using a single file, multiple different files were required, which are
situated in different departments. So all the employees first need to manually and
carefully check each of the files in each department and find the relationship between
them to decipher the information that the user wants.
Data Duplication / Data Redundancy - Due to the manual storing of data, the same
data used to be present in multiple locations using the space in each of the Hard Disk,
below are some following reasons for which Data Redundancy can be a major issue
o Storing the same data multiple times not only wastes resources in every
machine but also is costly to maintain and wastes time.
o Loss of data integrity is another major issue of Data Redundancy, imagine
someone's address is present in multiple systems and he has applied to
update the address, in one system the address gets updated but in the rest of
them it remains the same, so the if someone from any different department
where the data is not updated tries to send them any letter or something then
it would go to a wrong address.
Dependence on Data - Files and information were stored in a certain specific format in
files which is hard coded by programmers in languages like C/C++, COBOL, etc. So if
any of the file's format changes then the programmers need to update the code every
time and the format of every piece of data stored in that file will be changed, which is a
rigorous task for programmers.
Data representation is challenging from the user's perspective - The data stored in
files must be represented in such a manner that every user can understand that data
clearly and easily, often it would become hectic as the programmers need to go through
different systems and find the proper connection between data and merge them together
and represent them in a more human-readable format
Different file types - The file structure would vary based on the programming language
that was used to store them, for example, if a system used COBOL to store certain data
then the structure of that data would differ from the one written in C/C++.
Data Protection - Data protection was very less due to different reasons like Data
Redundancy, manual storing of data, easy access of confidential data by unauthorized
parties, etc.
Issues with Transactions - It didn't follow the ACID (Atomicity, Consistency,
Isolation, and Durability) properties, for that if in the middle of any transaction the
system crashed then it would leave the system in an inconsistent state.
Concurrent issues - Two or more users can view the same file simultaneously, but the
problem arises when they try to update the same file simultaneously.
Database-Based Approach
Keeping in mind the disadvantages of the Traditionally used File Systems, the DBMS was
born. The Database approach nearly solved all the problems faced by traditional file
systems and it also has its own certain advantages which are beneficial for all sizes of
organizations in data storing and data handling.
Importance and Meaning of Data
Data are just structured facts and measurements which are True and construct meaning.
Data is used for many different purposes like calculations, different discussions, proofs, etc.
It doesn't only include some textual information, data can be numbers, events, certain
actions, etc. Data can be stored in many different places which include -
Spreadsheets
Folders
Lists
Digital drives like Hard Disk, SSD, Floppy Disk, CD etc
A database is like a bag that can hold different types of information, no matter in which
format they come.
Advantages of using Database System in place of Traditional File Systems
Redundancy and inconsistency in the data - The data redundancy problem of the
Traditional File System has been minimized by the DBMS, here one piece of data can
be present in a single place only without scattering throughout multiple places
Data Exchange - Exchange of Data between two or more devices have become so
hassle-free with the introduction of the Database system. As a single database is
accessible by every employee anyone can make any changes to the database or
exchange data with each other.
Concurrent Data - DBMS provides a locking system using which one or more users
can only read the same data but can't make any changes simultaneously.
Searching Data - In the case of the Traditional file systems, the programmer needs to
write lengthy programs every time to fetch certain information, with DBMS some 2-3
line query is enough to fetch as many data as we want. Also, one language is supported
by many databases (with a slight variety of syntax).
Data Reliability - Data reliability is also high in DBMS as it supports user-defined data
types also apart from the traditional in-built data types.
System Failure - As DBMS follows ACID properties, even if a system failure happens
in between a transaction, nothing will be lost, and we can restart that transaction from
its previous stable state.
Data Protection - DBMS comes up with lots of methods to protect the data stored
inside it rather than just Passwords
Backup of Data - Data Backup is possible in DBMS, which was not present in the
Traditional File systems.
Variation of Interfaces - DBMS comes with different kinds of interfaces like
Graphicals or Tabular.
Maintenance of the Database - As DBMS is a centralized structure it is easier to
maintain it rather than the Traditional File systems.
Database System
What is a DBMS?
A DBMS is a system that allows users to create, modify, and query databases while ensuring
data integrity, security, and efficient data access. Unlike traditional file systems, DBMS
minimizes data redundancy, prevents inconsistencies, and simplifies data management with
features like concurrent access and backup mechanisms. It organizes data into tables, views,
schemas, and reports, providing a structured approach to data management.
Example:
A university database can store and manage student information, faculty records, and
administrative data, allowing seamless retrieval, insertion, and deletion of information as
required.
1. Data Modeling: Tools to create and modify data models, defining the structure and
relationships within the database.
2. Data Storage and Retrieval: Efficient mechanisms for storing data and executing
queries to retrieve it quickly.
3. Concurrency Control: Ensures multiple users can access the database
simultaneously without conflicts.
4. Data Integrity and Security: Enforces rules to maintain accurate and secure data,
including access controls and encryption.
5. Backup and Recovery: Protects data with regular backups and enables recovery in
case of system failures.
Types of DBMS
There are several types of Database Management Systems (DBMS), each tailored to different
data structures, scalability requirements, and application needs. The most common types are
as follows:
RDBMS organizes data into tables (relations) composed of rows and columns. It uses
primary keys to uniquely identify rows and foreign keys to establish relationships between
tables. Queries are written in SQL (Structured Query Language), which allows for efficient
data manipulation and retrieval.
2. NoSQL DBMS
NoSQL systems are designed to handle large-scale data and provide high performance for
scenarios where relational models might be restrictive. They store data in various non-
relational formats, such as key-value pairs, documents, graphs, or columns. These flexible
data models enable rapid scaling and are well-suited for unstructured or semi-structured data.
Database Languages
Database languages are specialized sets of commands and instructions used to define,
manipulate, and control data within a database. Each language type plays a distinct role in
database management, ensuring efficient storage, retrieval, and security of data. The primary
database languages include:
DDL is the short name for Data Definition Language, which deals with database schemas and
descriptions, of how the data should reside in the database.
CREATE: to create a database and its objects like (table, index, views, store
procedure, function, and triggers)
TRUNCATE: remove all records from a table, including all spaces allocated for the
records are removed
DML focuses on manipulating the data stored in the database, enabling users to retrieve, add,
update, and delete data.
TCL commands oversee transactional data to maintain consistency, reliability, and atomicity.
SAVEPOINT: Sets a point within a transaction to which one can later roll back.
SELECT: The primary DQL command, used to query data from the database without
altering its structure or contents.
A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from the
design of the data to its proper implementation of data.
The conceptual data model describes the database at a very high level and is useful to
understand the needs or requirements of the database. It is this model, that is used in the
requirement-gathering process i.e. before the Database Designers start making a particular
database. One such popular model is the entity/relationship model (ER model). The E/R
model specializes in entities, relationships, and even attributes that are used by database
designers. In terms of this concept, a discussion can be made even with non-computer
science(non-technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to
define the data and the relationships between them. It is basically a conceptual design of any
database which is easy to design the view of data.
Components of ER Model:
Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope
This type of data model is used to represent only the logical part of the database and does not
represent the physical structure of the database. The representational data model allows us to
focus primarily, on the design part of the database. A popular representational model is
a Relational model. The relational Model consists of Relational Algebra and Relational
Calculus. In the Relational Model, we basically use tables to represent our data and the
relationships between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the
base for the Physical model.
The physical Data Model is used to practically implement Relational Data Model.
Ultimately, all data in a database is stored physically on a secondary storage device such as
discs and tapes. This is stored in the form of files, records, and certain other data structures. It
has all the information on the format in which the files are present and the structure of the
databases, the presence of external data structures, and their relation to each other. Here, we
basically save tables in memory so they can be accessed efficiently. In order to come up with
a good physical model, we have to work on the relational model in a better way. Structured
Query Language (SQL) is used to practically implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.
The physical data model describes data need for a single project or application though it
maybe integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and
nullability of the relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be
used in the project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are
defined
1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was developed by
IBM, in the 1950s. In a hierarchical model, data are viewed as a collection of tables, or we
can say segments that form a hierarchical relation. In this, the data is organized into a tree-
like structure where each record consists of one parent record and many children. Even if the
segments are connected as a chain-like structure by logical associations, then the instant
structure can be a fan structure with multiple branches. We call the illogical associations as
directional associations.
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model is
the generalization of the hierarchical model. This model can consist of multiple parent
segments and these segments are grouped as levels but there exists a logical association
between the segments belonging to any level. Mostly, there exists a many-to-many logical
association between any of the two segments.
In the Object-Oriented Data Model, data and their relationships are contained in a single
structure which is referred to as an object in this data model. In this, real-world problems are
represented as objects with different attributes. All objects have multiple relationships
between them. Basically, it is a combination of Object Oriented programming and a
Relational Database Model.
The float data model basically consists of a two-dimensional array of data models that do not
contain any duplicate elements in the array. This data model has one drawback it cannot store
a large amount of data that is the tables can not be of large size.
The Context data model is simply a data model which consists of more than one data model.
For example, the Context data model consists of ER Model, Object-Oriented Data Model,
etc. This model allows users to do more than one thing which each individual data model can
do.
Semi-Structured data models deal with the data in a flexible way. Some entities may have
extra attributes and some entities may have some missing attributes. Basically, you can
represent data here in a flexible way.
1. In the case of a vast database, sometimes it becomes difficult to understand the data
model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.
Database Management System (DBMS) is a software used to manage data from a database.
A database is a structured collection of data that is stored in an electronic device. The
data can be text, video, image or any other format.
A relational database stores data in the form of tables and a NoSQL database in the
form of key-value pairs.
SQL (Structured Query Language) is a standard language for accessing and
manipulating data in a relational database.
A DBMS is a software that allows to create, update and retrieval of data in an
organized way. It also provides security to the database.
Examples of relational DBMS are MySQL, Oracle, Microsoft SQL Server, Postgre
SQL and Snowflake.
Examples of NoSQL DBMS are MongoDB, Cassandra, DynamoDB and Redis.
Importance of database
What is a database management system (DBMS)?
A database management system (DBMS) is a software system for creating and managing
databases. A DBMS enables end users to create, protect, read, update and delete data in a
database. It also manages security, data integrity and concurrency for databases.
The most prevalent type of data management platform, the DBMS essentially serves as an
interface between databases and users or application programs, ensuring that data is
consistently organized and remains easily accessible.
Key characteristics of Database Systems:
Ease of Use
Less Duplication
Multiple Layouts and Presentations
Limited Redundancy
Data Recovery and Backup Plan
Data Security
Reduces Storage Space
Maintaining Integrity
Advantages of DBMS
Redundancy:
Database systems help to eliminate the need for redundant data, which can take up
valuable storage space and increase processing time.
Consistency:
Database systems help ensure data consistency by storing data in a central location
and providing controlled access.
Security:
It helps to protect data from unauthorized access and provides controlled access to
authorized users.
Flexibility:
It offers great flexibility regarding the data types that can be stored and accessed.
Data Sharing:
A Database Management System (DBMS) enables users to quickly and securely
access data, whether stored on an On-premise Database or with remote users. It
provides well-managed data to get faster query responses.
Scalability:
` Database systems are scalable, which means they can grow to accommodate larger
amounts of data and more users.
Backup and Recovery: It provides backup and recovery features to protect data from loss.
Better Decision Making: Database systems help businesses store and manage data
effectively, allowing them to make better decisions.
1. Data organization: A DBMS allows for the organization and storage of data in a
structured manner, making it easy to retrieve and query the data as needed.
4. Data security: A DBMS provides tools for managing the security of the data, such as
controlling access to the data and encrypting sensitive data.
5. Backup and recovery: A DBMS provides mechanisms for backing up and recovering
the data in the event of a system failure.
6. Data sharing: A DBMS allows multiple users to access and share the same data,
which can be useful in a collaborative work environment.
Disadvantages of DBMS
2. Performance overhead: The use of a DBMS can add overhead to the performance of
an application, especially in cases where high levels of concurrency are required.
3. Scalability: The use of a DBMS can limit the scalability of an application, since it
requires the use of locking and other synchronization mechanisms to ensure data
consistency.
4. Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high,
especially for large or complex systems.
5. Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions don't
need high reliability, consistency or security and may be better served by other types
of data storage.
Applications of DBMS
2. Banking and Finance Sector: Banks maintaining the customer details, accounts,
loans, banking transactions, credit card transactions. Finance: Storing the
information about sales and holdings, purchasing of financial stocks and bonds.
1. Entities
2. Attributes
3. Relationships
4. Constraints
Business Rules:
Business rules in DBMS (Database Management System) are the constraints and logic that
govern how data is stored, accessed, and managed within a database. They reflect the real-
world rules of an organization and ensure data accuracy, consistency, and compliance. These
rules are enforced at the database level, regardless of how users access or interact with the
data.
Definition:
Business rules are the underlying principles and constraints that define the behavior of
a database and its data.
Purpose:
They ensure data integrity, consistency, and compliance with organizational policies and
regulations.
Enforcement:
They are enforced at the database level, meaning they are independent of the user interface or
application accessing the database.
Examples:
A new employee's onboarding date must be after their contract start date.
A customer's credit limit cannot be exceeded.
A specific product category can only be sold to certain customer segments.
A student's GPA must be between 0.0 and 4.0.
Constraints:
Business rules are often implemented as constraints within the database schema (e.g.,
primary keys, foreign keys, check constraints).
Triggers:
They can also be implemented using triggers, which are stored procedures that automatically
execute when specific database events (like insert, update, or delete) occur.
Stored Procedures:
Business rules can be encapsulated within stored procedures that are invoked when specific
operations are performed on the database.
Data Integrity:
They ensure the accuracy, consistency, and reliability of data within the database.
Data Quality:
They help maintain high-quality data by preventing errors and inconsistencies.
Compliance:
They ensure that the database adheres to organizational policies, regulations, and legal
requirements.
Application Consistency:
They ensure that different applications accessing the database behave consistently.
Reduced Errors:
By automating data validation and enforcing constraints, they reduce the likelihood of errors
and data inconsistencies.
Improved Business Processes:
By reflecting real-world business rules, they streamline business operations and decision-
making.
The evolution of data models in Database Management Systems (DBMS) reflects a progression from
simple hierarchical structures to complex relational and object-oriented models, and more recently,
NoSQL and graph databases. This evolution was driven by the need for greater flexibility, efficiency,
and the ability to handle diverse data types and volumes
Hierarchical Model
In Hierarchical Model, a hierarchical relation is formed by collection of relations and forms a
tree-like structure.
One of the first and most popular Hierarchical Model is Information Management System
(IMS), developed by IBM.
Example
The hierarchy shows an Employee can be an Intern, on Contract or Full- Time. Sub-levels
show that Full-Time Employee can be hired as a Writer, Senior Writer or Editor:
Advantages
Disadvantages
Implementation is complex.
This model has to deal with anomalies like Insert, Update and Delete.
Maintenance is difficult since changes done in the database may want you to do
changes in the entire database structure.
Network Model
The Hierarchical Model creates hierarchical tree with parent/ child relationship, whereas the
Network Model has graph and links.
The relationship can be defined in the form of links and it handles many-to-many relations.
This itself states that a record can have more than one parent.
Example
Advantages
Disadvantages
Pointers bring complexity since the records are based on pointers and graphs.
Changes in the database isn’t easy that makes it hard to achieve structural
independence.
Relational Model
A relational model groups data into one or more tables. These tables are related to each other
using common records.
The data is represented in the form of rows and columns i.e. tables:
Example
Let us see an example of two relations <Employee> and <Department> linked to each
other, with DepartmentID, which is Foreign Key of <Employee> table and Primary key of
<Department> table.
Advantages
The Relational Model does not have any issues that we saw in the previous two
models i.e. update, insert and delete anomalies have nothing to do in this model.
Changes in the database do not require you to affect the complete database.
Implementation of a Relational Model is easy.
To maintain a Relational Model is not a tiresome task.
Disadvantages
Database inefficiencies hide and arise when the model has large volumes of data.
The overheads of using relational data model come with the cost of using powerful
hardware and devices.
Level of Abstraction in a DBMS
There are mainly 3 levels of data abstraction:
This is the lowest level of data abstraction. It tells us how the data is actually stored in
memory. Access methods like sequential or random access and file organization methods like
B+ trees and hashing are used for the same. Usability, size of memory, and the number of
times the records are factors that we need to know while designing the database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of
memory used for these purposes are kept hidden from the user.
This level comprises the information that is actually stored in the database in the form of
tables. It also stores the relationship among the data entities in relatively simple structures. At
this level, the information available to the user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager
can also be stored.
The logical level thus describes the entire database in terms of a small number of relatively
simple structures. Although implementation of the simple structures at the logical level may
involve complex physical-level structures, the user of the logical level does not need to be
aware of this complexity. This is referred to as physical data independence. Database
administrators, who must decide what information to keep in the database, use the logical
level of abstraction.
This is the highest level of abstraction. Only a part of the actual database is viewed by the
users. This level exists to ease the accessibility of the database by an individual user. Users
view data in the form of rows and columns. Tables and relations are used to store data.
Multiple views of the same database may exist. Users can just view the data and interact with
the database, storage and implementation details are hidden from them. Even though the
logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this
information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system
Data Abstraction
The main purpose of data abstraction is to achieve data independence in order to save the
time and cost required when the database is modified or altered.
Data Independence
Data Independence is mainly defined as a property of DBMS that helps you to change the
database schema at one level of a system without requiring to change the schema at the next
level. it helps to keep the data separated from all program that makes use of it.
We have namely two levels of data independence arising from these levels of abstraction:
It refers to the characteristic of being able to modify the physical schema without any
alterations to the conceptual or logical schema, done for optimization purposes, e.g., the
Conceptual structure of the database would not be affected by any change in storage size of
the database system server. Changing from sequential to random access files is one such
example. These alterations or modifications to the physical structure may include:
It refers characteristic of being able to modify the logical schema without affecting the
external schema or application program. The user view of the data would not be affected by
any changes to the conceptual view of the data. These changes may include insertion or
deletion of attributes, altering table structures entities or relationships to the logical schema,
etc.