0% found this document useful (0 votes)
48 views141 pages

Database Management Systems

DBMS BOOK FOR MBA

Uploaded by

sekeyol686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views141 pages

Database Management Systems

DBMS BOOK FOR MBA

Uploaded by

sekeyol686
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 141

UNIT-1

1.Introduction of DBMS (Database Management System)


A Database Management System (DBMS) is a software system that is designed to
manage and organize data in a structured manner. It allows users to create, modify, and
query a database, as well as manage the security and access controls for that database.

Key Features of DBMS

• Data modeling: A DBMS provides tools for creating and modifying data models,
which define the structure and relationships of the data in a database.
• Data storage and retrieval: A DBMS is responsible for storing and retrieving data
from the database, and can provide various methods for searching and querying the
data.
• Concurrency control: A DBMS provides mechanisms for controlling concurrent
access to the database, to ensure that multiple users can access the data without
conflicting with each other.
• Data integrity and security: A DBMS provides tools for enforcing data integrity and
security constraints, such as constraints on the values of data and access controls
that restrict who can access the data.
• Backup and recovery: A DBMS provides mechanisms for backing up and
recovering the data in the event of a system failure.
• DBMS can be classified into two types: Relational Database Management
System (RDBMS) and Non-Relational Database Management System (NoSQL or
Non-SQL)
• RDBMS: Data is organized in the form of tables and each table has a set of rows
and columns. The data are related to each other through primary and foreign keys.
• NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or
column-based. These are designed to handle large-scale, high-performance
scenarios.

A database is a collection of interrelated data which helps in the efficient retrieval,


insertion, and deletion of data from the database and organizes the data in the form of
tables, views, schemas, reports, etc. For Example, a university database organizes the
data about students, faculty, admin staff, etc. which helps in the efficient retrieval,
insertion, and deletion of data from it.

Database Languages
• Data Definition Language
• Data Manipulation Language
• Data Control Language
• Transactional Control Language
Data Definition Language
DDL is the short name for Data Definition Language, which deals with database schemas
and descriptions, of how the data should reside in the database.
• CREATE: to create a database and its objects like (table, index, views, store
procedure, function, and triggers)
• ALTER: alters the structure of the existing database
• DROP: delete objects from the database
• TRUNCATE: remove all records from a table, including all spaces allocated for the
records are removed
• COMMENT: add comments to the data dictionary
• RENAME: rename an object

Data Manipulation Language


DML is the short name for Data Manipulation Language which deals with data
manipulation and includes most common SQL statements such SELECT, INSERT,
UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update data
in a database.
• SELECT: retrieve data from a database
• INSERT: insert data into a table
• UPDATE: updates existing data within a table
• DELETE: Delete all records from a database table
• MERGE: UPSERT operation (insert or update)
• CALL: call a PL/SQL or Java subprogram
• EXPLAIN PLAN: interpretation of the data access path
• LOCK TABLE: concurrency Control

Data Control Language


DCL is short for Data Control Language which acts as an access specifier to the
database.(basically to grant and revoke permissions to users in the database
• GRANT: grant permissions to the user for running DML(SELECT, INSERT,
DELETE,…) commands on the table
• REVOKE: revoke permissions to the user for running DML(SELECT, INSERT,
DELETE,…) command on the specified table

Transactional Control Language


TCL is short for Transactional Control Language which acts as an manager for all types
of transactional data and all transactions.Some of the command of TCL are
• Roll Back: Used to cancel or Undo changes made in the database
• Commit: It is used to apply or save changes in the database
• Save Point: It is used to save the data on the temporary basis in the database

Database Management System


The software which is used to manage databases is called Database Management
System (DBMS). For Example, MySQL, Oracle, etc. are popular commercial DBMS used
in different applications. DBMS allows users the following tasks:
• Data Definition: It helps in the creation, modification, and removal of definitions that
define the organization of data in the database.
• Data Updation: It helps in the insertion, modification, and deletion of the actual data
in the database.
• Data Retrieval: It helps in the retrieval of data from the database which can be used
by applications for various purposes.
• User Administration: It helps in registering and monitoring users, enforcing data
security, monitoring performance, maintaining data integrity, dealing with
concurrency control, and recovering information corrupted by unexpected failure.

Paradigm Shift from File System to DBMS


File System manages data using files on a hard disk. Users are allowed to create, delete,
and update the files according to their requirements. Let us consider the example of file-
based University Management System. Data of students is available to their respective
Departments, Academics Section, Result Section, Accounts Section, Hostel Office, etc.
Some of the data is common for all sections like Roll No, Name, Father Name, Address,
and Phone number of students but some data is available to a particular section only like
Hostel allotment number which is a part of the hostel office. Let us discuss the issues with
this system:

• Redundancy of data: Data is said to be redundant if the same data is copied at


many places. If a student wants to change their Phone number, he or she has to get
it updated in various sections. Similarly, old records must be deleted from all
sections representing that student.
• Inconsistency of Data: Data is said to be inconsistent if multiple copies of the same
data do not match each other. If the Phone number is different in Accounts Section
and Academics Section, it will be inconsistent. Inconsistency may be because of
typing errors or not updating all copies of the same data.
• Difficult Data Access: A user should know the exact location of the file to access
data, so the process is very cumbersome and tedious. If the user wants to search
the student hostel allotment number of a student from 10000 unsorted students’
records, how difficult it can be.
• Unauthorized Access: File Systems may lead to unauthorized access to data. If a
student gets access to a file having his marks, he can change it in an unauthorized
way.
• No Concurrent Access: The access of the same data by multiple users at the
same time is known as concurrency. The file system does not allow concurrency as
data can be accessed by only one user at a time.
• No Backup and Recovery: The file system does not incorporate any backup and
recovery of data if a file is lost or corrupted.

Advantages of DBMS
• Data organization: A DBMS allows for the organization and storage of data in a
structured manner, making it easy to retrieve and query the data as needed.
• Data integrity: A DBMS provides mechanisms for enforcing data integrity
constraints, such as constraints on the values of data and access controls that
restrict who can access the data.
• Concurrent access: A DBMS provides mechanisms for controlling concurrent
access to the database, to ensure that multiple users can access the data without
conflicting with each other.
• Data security: A DBMS provides tools for managing the security of the data, such
as controlling access to the data and encrypting sensitive data.
• Backup and recovery: A DBMS provides mechanisms for backing up and
recovering the data in the event of a system failure.
• Data sharing: A DBMS allows multiple users to access and share the same data,
which can be useful in a collaborative work environment.

Disadvantages of DBMS
• Complexity: DBMS can be complex to set up and maintain, requiring specialized
knowledge and skills.
• Performance overhead: The use of a DBMS can add overhead to the performance
of an application, especially in cases where high levels of concurrency are required.
• Scalability: The use of a DBMS can limit the scalability of an application, since it
requires the use of locking and other synchronization mechanisms to ensure data
consistency.
• Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high,
especially for large or complex systems.
• Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions
don’t need high reliability, consistency or security and may be better served by other
types of data storage.

These are the main reasons which made a shift from file system to DBMS. Also, see
A Database Management System (DBMS) is a software system that allows users to
create, maintain, and manage databases. It is a collection of programs that enables users
to access and manipulate data in a database. A DBMS is used to store, retrieve, and
manipulate data in a way that provides security, privacy, and reliability.

Several Types of DBMS


• Relational DBMS (RDBMS): An RDBMS stores data in tables with rows and
columns, and uses SQL (Structured Query Language) to manipulate the data.
• Object-Oriented DBMS (OODBMS): An OODBMS stores data as objects, which
can be manipulated using object-oriented programming languages.
• NoSQL DBMS: A NoSQL DBMS stores data in non-relational data structures, such
as key-value pairs, document-based models, or graph models.
Overall, a DBMS is a powerful tool for managing and manipulating data, and is used in
many industries and applications, such as finance, healthcare, retail, and more.

Whether you're preparing for your first job interview or aiming to upskill in this ever-
evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide
top-quality content at affordable prices, all geared towards accelerating your growth in a
time-bound manner. Join the millions we've already empowered, and we're here to do
the same for you.

2. Database (DB)
A database is information that is set up for easy access, management and updating.
Computer databases typically store aggregations of data records or files that contain
information, such as sales transactions, customer data, financials and product
information.

Databases are used for storing, maintaining and accessing any sort of data. They
collect information on people, places or things. That information is gathered in one place
so that it can be observed and analyzed. Databases can be thought of as an organized
collection of information.

What are databases used for?

Businesses use data stored in databases to make informed business decisions. Some
of the ways organizations use databases include the following:

• Improve business processes. Companies collect data about business processes,


such sales, order processing and customer service. They analyze that data to
improve these processes, expand their business and grow revenue.

• Keep track of customers. Databases often store information about people, such as
customers or users. For example, social media platforms use databases to store
user information, such as names, email addresses and user behavior. The data is
used to recommend content to users and improve the user experience.

• Secure personal health information. Healthcare providers use databases to


securely store personal health data to inform and improve patient care.

• Store personal data. Databases can also be used to store personal information.
For example, personal cloud storage is available for individual users to store media,
such as photos, in a managed cloud.


Evolution of databases

Databases were first created in the 1960s. These early databases were network models
where each record is related to many primary and secondary records. Hierarchical
databases were also among the early models. They have tree schemas with a root
directory of records linked to several subdirectories.

Relational databases were developed in the 1970s. Object-oriented databases came


next in the 1980s. Today, we use Structured Query Language (SQL), NoSQL and cloud
databases.

E.F. Codd created the relational database while at IBM. It became the standard for
database systems because of its logical schema, or the way it is organized. The use of
a logical schema separates the relational database from physical storage.

The relational database, combined with the growth of the internet beginning in the mid-
1990s, led to a proliferation of databases. Many business and consumer applications
rely on databases.

Types of databases

There are many types of databases. They may be classified according to content type:
bibliographic, full text, numeric and images. In computing, databases are often classified
based on the organizational approach they use.

Some of the main organizational databases include the following:

Relational. This tabular approach defines data so it can be reorganized and accessed
in many ways. Relational databases are comprised of tables. Data is placed into
predefined categories in those tables. Each table has columns with at least one data
category, and rows that have a certain data instance for the categories which are
defined in the columns. Information in a relational database about a specific customer is
organized into rows, columns and tables. These are indexed to make it easier to search
using SQL or NoSQL queries.
Relational databases use SQL in their user and application program interfaces. A new
data category can easily be added to a relational database without having to change the
existing applications. A relational database management system (RDBMS) is used to
store, manage, query and retrieve data in a relational database.

Typically, the RDBMS gives users the ability to control read/write access, specify report
generation and analyze use. Some databases offer atomicity, consistency, isolation and
durability, or ACID, compliance to guarantee that data is consistent and that
transactions are complete.

Distributed. This database stores records or files in several physical locations. Data
processing is also spread out and replicated across different parts of the network.

Distributed databases can be homogeneous, where all physical locations have the
same underlying hardware and run the same operating systems and database
applications. They can also be heterogeneous. In those cases, the hardware, OS and
database applications can be different in the various locations.

Cloud. These databases are built in a public, private or hybrid cloud for a virtualized
environment. Users are charged based on how much storage and bandwidth they use.
They also get scalability on demand and high availability. These databases can work
with applications deployed as software as a service.

NoSQL. NoSQL databases are good when dealing with large collections of distributed
data. They can address big data performance issues better than relational databases.
They also do well analyzing large unstructured data sets and data on virtual servers in
the cloud. These databases can also be called non-relational databases.

Object-oriented. These databases hold data created using object-oriented


programming languages. They focus on organizing objects rather than actions and data
rather than logic. For instance, an image data record would be a data object, rather than
an alphanumeric value.
Graph. These databases are a type of NoSQL database. They store, map and query
relationships using concepts from graph theory. Graph databases are made up of nodes
and edges. Nodes are entities and connect the nodes.

These databases are often used to analyze interconnections. Graph databases are
often used to analyze data about customers as they interact with a business on
webpages and in social media.

Graph databases use SPARQL, a declarative programming language and protocol, for
analytics. SPARQL can perform all the analytics that SQL can perform, and can also be
used for semantic analysis, or the examination of relationships. This makes it useful for
performing analytics on data sets that have both structured and unstructured data.
SPARQL lets users perform analytics on information stored in a relational database, as
well as friend-of-a-friend relationships, PageRank and shortest path.

See how graph and relational databases compare.


Components of a database

While the different types of databases vary in schema, data structure and data types
most suited to them, they are all comprised of the same five basic components.

1. Hardware. This is the physical device that database software runs on. Database
hardware includes computers, servers and hard drives.

2. Software. Database software or application gives users control of the database.


Database management system (DBMS) software is used to manage and control
databases.

3. Data. This is the raw information that the database stores. Database
administrators organize the data to make it more meaningful.

4. Data access language. This is the programming language that controls the
database. The programming language and the DBMS must work together. One of
the most common database languages is SQL.

5. Procedures. These rules determine how the database works and how it handles the
data.

Database challenges

Setting up, operating and maintaining a database has some common challenges, such
as the following:

• Data security is required because data is a valuable business asset. Protecting


data stores requires skilled cybersecurity staff, which can be costly.

• Data integrity ensures data is trustworthy. It is not always easy to achieve data
integrity because it means restricting access to databases to only those qualified to
handle it.

• Database performance requires regular database updates and maintenance.


Without the proper support, database functionality can decline as the technology
supporting the database changes or as the data it contains changes.
• Database integration can also be difficult. It can involve integrating data
sources from varying types of databases and structures into a single database or
into data lakes and data warehouses.

APIs connect the user or application to the database management system, which lets
them interact with the database.
Database management system

A DBMS enables users to create and manage a database. It also helps users create,
read, update and delete data in a database, and it assists with logging and auditing
functions.

The DBMS provides physical and logical independence from data. Users and
applications do not need to know either the physical or logical locations of data. A
DBMS can also limit and control access to the database and provide different views of
the same database schema to multiple users.
3. Advantages of DBMS over file processing systems

Database Management Systems (DBMS) offer several advantages over traditional file
processing systems. Here are some key benefits:

1. Data Integrity:
- DBMS enforces data integrity constraints, ensuring that the data remains accurate
and consistent.
- Redundancy is minimized, and relationships between different pieces of data are
maintained.

2. Data Independence:
- Changes in the structure or organization of the database (schema) do not affect the
application programs. This is known as data independence.
- In file processing systems, changes in data structures often require modifying
application programs.

3. Efficient Data Retrieval:


- DBMS uses powerful query languages (e.g., SQL) for retrieving and manipulating
data, allowing for complex and efficient data retrieval operations.
- File processing systems may involve writing custom code for data retrieval, which
can be less efficient.

4. Concurrent Access and Transaction Management:


- DBMS provides mechanisms for managing concurrent access to the database by
multiple users.
- Transaction management ensures the consistency and integrity of the database
even in the presence of concurrent transactions.
- File processing systems may lack these features, leading to data inconsistency and
integrity issues.
5. Data Security:
- DBMS offers robust security features, including authentication and authorization
mechanisms.
- Access to sensitive data can be restricted based on user roles and privileges.
- File systems may rely on file-level security, which can be less granular and more
challenging to manage.

6. Centralized Data Management:


- DBMS centralizes data management, providing a single point of control and
administration.
- In file processing systems, data may be dispersed across different files and formats,
making it harder to manage and control.

7. Data Redundancy and Normalization:


- DBMS minimizes data redundancy through normalization techniques, reducing the
risk of inconsistencies.
- File processing systems may suffer from data redundancy, as the same data may be
stored in multiple files.

8. Scalability:
- DBMS is designed to handle large volumes of data and users, providing scalability.
- File processing systems may face challenges scaling up as data and user
requirements grow.

9. Backup and Recovery:


- DBMS typically offers automated backup and recovery mechanisms, helping to
prevent data loss.
- File processing systems may rely on manual backup procedures, increasing the risk
of data loss.

10. Data Relationships:


- DBMS allows for the establishment of relationships between different tables,
enabling efficient handling of complex data structures.
- File processing systems may lack support for relationships, making it more
challenging to represent and manage interconnected data.

Overall, DBMS provides a more organized, secure, and efficient way to manage and
retrieve data compared to traditional file processing systems.

4. Database Languages

In the context of databases, there are several types of languages used for different
purposes. Here are some of the key database languages:

1. Data Definition Language (DDL):


- Purpose: DDL is used to define and manage the structure of the database.
- Examples: SQL (CREATE, ALTER, DROP statements).

2. Data Manipulation Language (DML):


- Purpose: DML is used for managing data within the database.
- Examples: SQL (SELECT, INSERT, UPDATE, DELETE statements).

3. Query Language:
- Purpose: A type of DML specifically used for querying databases to retrieve
information.
- Examples: SQL is a widely used query language.

4. Data Control Language (DCL):


- Purpose: DCL is used to control access to data within the database.
- Examples: SQL (GRANT, REVOKE statements).
5. Transaction Control Language (TCL):
- Purpose: TCL is used to manage transactions within a database.
- Examples: SQL (COMMIT, ROLLBACK statements).

6. Procedural Language:
- Purpose: Allows the creation of stored procedures and functions within the database.
- Examples: PL/SQL (used in Oracle Database), T-SQL (used in Microsoft SQL
Server), and PL/pgSQL (used in PostgreSQL).

7. Object-Relational Database Language:


- Purpose: Extends a relational database system with object-oriented features.
- Examples: SQL/OLB, SQL3.

8. NoSQL Query Languages:


- Purpose: Used for querying NoSQL databases that may have different data models
than traditional relational databases.
- Examples: MongoDB Query Language (MQL), Couchbase N1QL (pronounced
"nickel").

9. Markup Languages for Database Interaction:


- Purpose: Used for representing and interacting with data in a structured format.
- Examples: XML (eXtensible Markup Language), JSON (JavaScript Object Notation).

10. Host Language Interface (HLI):


- Purpose: Allows a programming language to interact with a database.
- Examples: Embedded SQL, JDBC (Java Database Connectivity), ODBC (Open
Database Connectivity).

11. Graph Query Languages:


- Purpose: Designed for querying graph databases.
- Examples: Cypher (used in Neo4j), GraphQL (not specific to databases but can be
used to query graph databases).

These languages play a crucial role in interacting with databases, defining and
manipulating data, controlling access, and ensuring the integrity and security of the
information stored in the database. The choice of language often depends on the type
of database system and the specific requirements of the application.

6.Database Users and Administrators

Types of Database Users in DBMS


There are two types of database users, Users, and Administrators. In case you are
interested in SQL interview questions for your examinations or interviews then visit this
link. SQL Interview Questions

Database Users
Database users are the ones who really use and take the benefits of the database.
There will be different types of users depending on their needs and way of accessing
the database.

1. Application Programmers – They are the developers who interact with the
database by means of DML queries. These DML queries are written in the
application programs like C, C++, JAVA, Pascal, etc. These queries are
converted into object code to communicate with the database. For example,
writing a C program to generate the report of employees who are working in a
particular department will involve a query to fetch the data from the database. It
will include an embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to
request the database. They directly interact with the database by means of
a query language like SQL. These users will be scientists, engineers, analysts
who thoroughly study SQL and DBMS to apply the concepts in their
requirements. In short, we can say this category includes designers and
developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special
database application programs. They are the developers who develop the
complex programs to the requirement.
4. Stand-alone Users – These users will have a stand-alone database for their
personal use. These kinds of the database will have readymade database
packages which will have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact
with the database. For example, online library system, ticket booking systems,
ATMs etc which has existing application and users use them to interact with the
database to fulfill their requests.

Database Administrators
The life cycle of a database starts from designing, implementing to the administration of
it. A database for any kind of requirement needs to be designed perfectly so that it
should work without any issues. Once all the design is complete, it needs to be
installed. Once this step is complete, users start using the database. The database
grows as the data grows in the database. When the database becomes huge, its
performance comes down. Also accessing the data from the database becomes a
challenge. There will be unused memory in the database, making the memory inevitably
huge. This administration and maintenance of the database are taken care of by the
database Administrator – DBA.

A DBA has many responsibilities. A good-performing database is in the hands of DBA.


• Installing and upgrading the DBMS Servers: – DBA is responsible for
installing a new DBMS server for the new projects. He is also responsible for
upgrading these servers as there are new versions that come into the market or
requirement. If there is any failure in the up-gradation of the existing servers, he
should be able to revert the new changes back to the older version, thus
maintaining the DBMS working. He is also responsible for updating the service
packs/ hotfixes/ patches to the DBMS servers.
• Design and implementation: – Designing the database and implementing is
also DBA’s responsibility. He should be able to decide on proper memory
management, file organizations, error handling, log maintenance, etc for the
database.
• Performance tuning: – Since the database is huge and it will have lots of tables,
data, constraints, and indices, there will be variations in the performance from
time to time. Also, because of some designing issues or data growth, the
database will not work as expected. It is the responsibility of the DBA to tune the
database performance. He is responsible to make sure all the queries and
programs work in a fraction of seconds.
• Migrate database servers: – Sometimes, users using oracle would like to shift
to SQL server or Netezza. It is the responsibility of DBA to make sure that
migration happens without any failure, and there is no data loss.
• Backup and Recovery: – Proper backup and recovery programs needs to be
developed by DBA and has to be maintained him. This is one of the main
responsibilities of DBA. Data/objects should be backed up regularly so that if
there is any crash, it should be recovered without much effort and data loss.
• Security: – DBA is responsible for creating various database users and roles,
and giving them different levels of access rights.
• Documentation: – DBA should be properly documenting all his activities so that
if he quits or any new DBA comes in, he should be able to understand the
database without any effort. He should basically maintain all his installation,
backup, recovery, security methods. He should keep various reports about
database performance.
In order to perform his entire task, he should have very good command over DBMS.
Types of DBA
There are different kinds of DBA depending on the responsibility that he owns.

• Administrative DBA – This DBA is mainly concerned with installing, and


maintaining DBMS servers. His prime tasks are installing, backups, recovery,
security, replications, memory management, configurations, and tuning. He is
mainly responsible for all administrative tasks of a database.
• Development DBA – He is responsible for creating queries and procedures for
the requirement. Basically, his task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and
maintaining the users, roles, access rights, tables, views, constraints, and
indexes. He is mainly responsible for designing the structure of the database
depending on the requirement. These structures will be used by developers and
development DBA to code.
• Data Warehouse DBA –DBA should be able to maintain the data and
procedures from various sources in the data warehouse. These sources can be
files, COBOL, or any other programs. Here data and programs will be from
different sources. A good DBA should be able to keep the performance and
function levels from these sources at the same pace to make the data warehouse
work.
• Application DBA –He acts like a bridge between the application program and
the database. He makes sure all the application program is optimized to interact
with the database. He ensures all the activities from installing, upgrading, and
patching, maintaining, backup, recovery to executing the records work without
any issues.
• OLAP DBA – He is responsible for installing and maintaining the database in
OLAP systems. He maintains only OLAP databases.

6. Structure of DBMS

DBMS means Database Management System, which is a tool or software used to create
the database or delete or manipulate the database. A software programme created to
store, retrieve, query, and manage data is known as a Database Management System
(DBMS). Data can be generated, read, updated, and destroyed by authorized entities
thanks to user interfaces (UIs).

Because they give programmers, Database Managers, and end users a consolidated
view of the data, Database Management Systems are crucial because they relieve
applications and end users of the need to comprehend the physical location of the data.
Application Programme Interfaces (APIs) manage internet requests and responses for
particular sorts of data.
In marketing materials, the phrase "database as a service" (DBaaS) may be used to refer
to both relational and non-relational DBMS components that are given via the internet.

Users of DBMSs include application programmers, Database Administrators (DBAs), and


end users.

Database Administrators are typically the only people who work directly with a DBMS.
Today, end users read and write to databases using front-end interfaces made by
programmers, while programmers use cloud APIs to connect with DBMSs.

Three Parts that make up the Database System are:

o Query Processor
o Storage Manager
o Disk Storage

The explanations for these are provided below:

1. Query Processor

The query processing is handled by the query processor, as the name implies. It executes
the user's query, to put it simply. In this way, the query processor aids the database
system in making data access simple and easy. The query processor's primary duty is to
successfully execute the query. The Query Processor transforms (or interprets) the user's
application program-provided requests into instructions that a computer can understand.

Components of the Query Processor


o DDL Interpreter:

Data Definition Language is what DDL stands for. As implied by the name, the DDL
Interpreter interprets DDL statements like those used in schema definitions (such as
create, remove, etc.). This interpretation yields a set of tables that include the meta-data
(data of data) that is kept in the data dictionary. Metadata may be stored in a data
dictionary. In essence, it is a part of the disc storage that will be covered in a later section
of this article.

o DML Compiler:

Compiler for DML Data Manipulation Language is what DML stands for. In keeping with
its name, the DML Compiler converts DML statements like select, update, and delete into
low-level instructions or simply machine-readable object code, to enable execution. The
optimization of queries is another function of the DML compiler. Since a single question
can typically be translated into a number of evaluation plans. As a result, some
optimization is needed to select the evaluation plan with the lowest cost out of all the
options. This process, known as query optimization, is exclusively carried out by the DML
compiler. Simply put, query optimization determines the most effective technique to carry
out a query.

o Embedded DML Pre-compiler:

Before the query evaluation, the embedded DML commands in the application program
(such as SELECT, FROM, etc., in SQL) must be pre-compiled into standard procedural
calls (program instructions that the host language can understand). Therefore, the DML
statements which are embedded in an application program must be converted into routine
calls by the Embedded DML Pre-compiler.

o Query Optimizer:
It starts by taking the evaluation plan for the question, runs it, and then returns the result.
Simply said, the query evaluation engine evaluates the SQL commands used to access
the database's contents before returning the result of the query. In a nutshell, it is in
charge of analyzing the queries and running the object code that the DML Compiler
produces. Apache Drill, Presto, and other Query Evaluation Engines are a few examples.

2. Storage Manager:

An application called Storage Manager acts as a conduit between the queries made and
the data kept in the database. Another name for it is Database Control System. By
applying the restrictions and running the DCL instructions, it keeps the database's
consistency and integrity. It is in charge of retrieving, storing, updating, and removing data
from the database.

Components of Storage Manager

Following are the components of Storage Manager:

o Integrity Manager:

Whenever there is any change in the database, the Integrity manager will manage the
integrity constraints.

o Authorization Manager:

Authorization manager verifies the user that he is valid and authenticated for the specific
query or request.

o File Manager:

All the files and data structure of the database are managed by this component.

o Transaction Manager:

It is responsible for making the database consistent before and after the transactions.
Concurrent processes are generally controlled by this component.

o Buffer Manager:

The transfer of data between primary and main memory and managing the cache memory
is done by the buffer manager.

3. Disk Storage
A DBMS can use various kinds of Data Structures as a part of physical system
implementation in the form of disk storage.

Components of Disk Storage

Following are the components of Disk Manager:

o Data Dictionary:

It contains the metadata (data of data), which means each object of the database has
some information about its structure. So, it creates a repository which contains the details
about the structure of the database object.

o Data Files:

This component stores the data in the files.

o Indices:

These indices are used to access and retrieve the data in a very fast and efficient way.

6.Storage manager in database

A storage manager is a crucial component of a Database Management System (DBMS)


responsible for the efficient and effective storage of data on a physical storage medium,
such as a hard disk. Its primary functions include:

1. Data Storage Organization:


- The storage manager organizes and structures data on the physical storage
medium, translating the logical data organization (as defined by the database schema)
into physical storage structures.

2. File Organization:
- Manages the placement of data in files on the disk. It determines how data is stored,
whether in a single file or distributed across multiple files.
3. Data Buffer Management:
- Manages a memory buffer known as the data buffer or cache, which is used to store
a subset of the database. This buffer helps improve data retrieval performance by
caching frequently accessed data in memory.

4. Data Caching:
- The storage manager uses caching techniques to keep frequently accessed data in
the buffer, reducing the need to fetch data from the slower disk storage.

5. Data Retrieval and Storage:


- Retrieves data from the storage medium and stores new or updated data back to the
storage. It ensures that these operations are performed efficiently and with data
integrity.

6. Record Management:
- Handles the storage and retrieval of individual records within a file. It must support
operations like inserting, updating, and deleting records.

7. Indexing and Hashing:


- Implements indexing and hashing structures to improve the speed of data retrieval.
Index structures, such as B-trees, are used to locate data quickly based on specified
search criteria.

8. Space Management:
- Manages the allocation and deallocation of space on the storage medium. It ensures
efficient utilization of storage space and may involve tasks like compaction to reclaim
unused space.

9. Concurrency Control:
- Coordinates access to the database by multiple users or transactions to ensure data
consistency. This involves handling situations where multiple users might attempt to
access or modify the same data simultaneously.
10. Transaction Management:
- Ensures the atomicity, consistency, isolation, and durability properties of
transactions. It manages the commit and rollback operations to maintain the integrity of
the database.

11. Backup and Recovery:


- Implements mechanisms for data backup and recovery to protect against data loss
due to system failures, human errors, or other unforeseen events.

12. Security and Access Control:


- Enforces security measures to control access to the stored data. This involves
authentication and authorization mechanisms to ensure that only authorized users can
perform specific operations on the data.

In summary, the storage manager is a critical component of a DBMS that handles the
physical aspects of data storage, retrieval, and management. Its efficient functioning is
essential for the overall performance, reliability, and integrity of the database system.

7. Query Processor

The query processor is a fundamental component of a Database Management System


(DBMS) responsible for translating high-level queries written in a query language (such
as SQL) into a series of low-level instructions that can be executed by the database
engine. Its primary functions include:

1. Query Parsing:
- The query processor first parses the query to ensure that it adheres to the syntax
and grammar rules of the query language (e.g., SQL). It checks for errors and verifies
the query's structure.
2. Query Optimization:
- Once the query is parsed, the query processor performs optimization to enhance the
efficiency of query execution. This involves choosing the most efficient execution plan
from several possible plans. Optimization aims to minimize the response time and
resource usage.

3. Query Rewrite:
- In some cases, the query processor may rewrite the query to improve its execution
efficiency. This can involve transforming the query into an equivalent but more efficient
form.

4. Access Path Determination:


- The query processor determines how to access the data based on the query. It
decides which indexes to use, which tables to access, and in what order. This step is
crucial for optimizing data retrieval.

5. Data Retrieval:
- The query processor generates the instructions necessary for retrieving the data
from the storage manager. This includes specifying the conditions for data retrieval and
the columns to be returned.

6. Query Execution Plan:


- The output of the optimization process is an execution plan, which is a step-by-step
guide on how to execute the query efficiently. This plan is then passed to the database
engine for execution.

7. Interfacing with the Storage Manager:


- The query processor communicates with the storage manager to access and retrieve
the necessary data from the physical storage medium. It coordinates with the storage
manager to ensure that the required data is efficiently retrieved.

8. Handling Transactions:
- If the query involves transactional operations, the query processor coordinates with
the transaction manager to ensure that the transaction follows the principles of
Atomicity, Consistency, Isolation, and Durability (ACID).

9. Error Handling:
- The query processor is responsible for detecting and handling errors during query
processing. This includes syntax errors, semantic errors, and runtime errors.

10. Result Presentation:


- Once the data is retrieved, the query processor may format and present the results
in the desired format, which can then be returned to the user or application that issued
the query.

The query processor acts as an intermediary between the user/application and the
database engine, translating high-level queries into a series of instructions that the
engine can execute to retrieve, manipulate, or store data. The efficiency and
effectiveness of the query processor significantly impact the overall performance of a
database system.

7.Introduction to Client/Server architecture

More people than ever use computers today and rely on these devices and the
networks they're attached to for many tasks, from the critical to the frivolous.
Unsurprisingly, the sheer volume of users and the even more significant number of
requests strain servers and networks.

As if these challenges aren’t enough, IT professionals must deal with a constant stream
of new technology that must be implemented into the network. Evolving in this fast-
paced landscape is critical for keeping an organization up to date and
competitive.Simply put, there is no alternative.

IT professionals handle this strain by implementing client and server architecture — or


client-server architecture. But you may be wondering, "What is a client-server network?"
Well, you've come to the right place. This article will explain client-server architecture,
show you a client-server model, and illustrate the advantages of client-server
architecture.
Let’s look at some fundamentals first, then get to the heart of the matter.

Terminology Basics

A Client is either a person or an organization using as a service. In the IT context, the


client is a computer/device, also called a Host, that actually uses the service or accepts
the information. Client devices include laptops, workstations, IoT devices, and similar
network-friendly devices.

A Server in the IT world is a remote computer that provides access to data and
services. Servers are usually physical devices such as rack servers, though the rise
of cloud computing has brought virtual servers into the equation. The server handles
processes like e-mail, application hosting, Internet connections, printing, and more.

Client-Server Architecture Explained

The client-server architecture refers to a system that hosts, delivers, and manages most
of the resources and services that the client requests. In this model, all requests and
services are delivered over a network, and it is also referred to as the networking
computing model or client server network.

Client-server architecture, alternatively called a client-server model, is a network


application that breaks down tasks and workloads between clients and servers that
reside on the same system or are linked by a computer network.

Client-server architecture typically features multiple users’ workstations, PCs, or other


devices, connected to a central server via an Internet connection or other network. The
client sends a request for data, and the server accepts and accommodates the request,
sending the data packets back to the user who needs them.

This model is also called a client-server network or a network computing model.

To sum it up briefly:

• First, the client sends their request via a network-enabled device

• Then, the network server accepts and processes the user request

• Finally, the server delivers the reply to the client


What’s the Purpose of Client-Server Architecture?

We have already established that technology is constantly evolving and improving, often
at a very rapid pace. As a result, today's businesses increasingly rely on technology,
especially IT, to flourish and stay competitive in an "evolve or die" environment.

Therefore, today's organizations need a system that makes it easy to collect, process,
and act on corporate data, thus increasing business procedure efficiency and ensuring
survivability in the modern world's markets.

The client-server network model brings a higher level of processing that increases the
effectiveness of workstation power, workgroup empowerment, remote network
management, market-driven business, and the preservation of existing investments.

In summary, client-server architecture provides the exact framework that today’s


organizations need to meet the challenges of a rapidly evolving IT world.

The Characteristics of Client-Server Architecture

Client-server architecture typically features the following characteristics:

• Client and server machines typically require different hardware and software
resources and come from other vendors.

• The network has horizontal scalability, which increases the number of client
machines and vertical scalability, an then moves the entire process to more powerful
servers or a multi-server configuration.

• One computer server can provide multiple services simultaneously, although each
service requires a separate server program.

• Both client and server applications interact directly with a transport layer protocol.
This process establishes communication and enables the entities to send and
receive information.

• Both the client and server computers need a complete stack of protocols. The
transport protocol employs lower-layer protocols to send and receive individual
messages.

The following client-server diagram shows the basics of of the architecture:


What Are Some Examples of Client-Server Architecture?

You may wonder what constitutes a real-life application of client-server architecture.


Here are three examples of how you use client-server architecture, probably more often
than you think!

• Email servers: Thanks to ease and speed, e-mail has supplanted traditional postal
mail as the primary form of corporate communication. E-mail servers, aided by
various brands of dedicated software, send and receive e-mails between parties.

• File servers: If you store files on cloud-based services such as Google Docs or
Microsoft Office, you're using a file server. File servers are centralized locations for
file storage and are accessed by many clients.

• Web servers: These high-performance servers host many different websites, and
clients access them through the Internet. Here’s a step-by-step breakdown:

• The client/user uses their web browser to enter the URL they want

• The browser asks the Domain Name System (DNS) for an IP address

• The DNS server finds the desired server’s IP address and sends it to the web
browser

• The browser creates either an HTTPS or HTTP request

• The server/producer sends the user the correct files

• The client/user receives the files sent by the server, and the process is repeated
as needed

The Advantages and Disadvantages of Client-Server Architecture

Client-server architecture brings its share of positives and negatives to today’s digital
consumers. We start with a list of advantages:

• It’s a centralized system that keeps all the data and its controls in one place

• It brings a high level of scalability, organization, and efficiency

• It allows the IT staff to change the Client and Server capacities separately

• It’s cost-efficient, especially in terms of maintenance

• It allows data recovery


• It allows load-balancing, which optimizes performance

• It allows different platforms to share resources

• Users don’t need to log into a terminal or another processor to access corporate
information or desktop tools like PowerPoint presenters or spreadsheet utilities

• The setup reduces the incidence of data replication

Naturally, client-server architecture isn’t all sunshine and lollipops. Here are the
downsides:

• If the server has a worm, virus, or Trojan, the users will likely catch it since the
network consists of linked clients and servers

• The server is vulnerable to Denial of Service (DoS) attacks

• Data packets can be spoofed or modified during transmission

• It’s expensive to start up and initially implement

• If a critical server goes down, the clients are dead in the water

• The setup is prone to phishing and Man in the Middle (MITM) attacks

8. Various views of data


In the context of databases, various views of data refer to different perspectives or
representations of the data stored in a database. These views help users and
applications to interact with the database in a way that is meaningful and relevant to
their specific needs. Here are some common types of views in a database:

1. User Views:
- Personalized Views: Users may have personalized views tailored to their specific
preferences or roles. For example, an employee in the sales department might have a
view that emphasizes sales-related data.
- Customizable Views: Some systems allow users to customize their views by
selecting specific columns, applying filters, or arranging data in a preferred manner.

2. Virtual Views:
- Virtual Tables: Virtual views are created through the use of queries and do not
physically store data. They are essentially queries that present the results as if they
were tables. Views can be used to simplify complex queries or to present a simplified
version of the data to users.
- Materialized Views: Unlike virtual views, materialized views store the results of a
query physically. They are precomputed and updated periodically to reflect changes in
the underlying data.

3. Application Views:
- Application-specific Views: Views designed for specific applications, where the data
is presented in a way that aligns with the application's requirements. For example, a
customer relationship management (CRM) application might have views tailored for
managing customer interactions.

4. DBMS Views:
- System Catalog Views: These views provide information about the database itself,
including details about tables, columns, indexes, and other database objects. They are
often used by database administrators for maintenance tasks.
- Security Views: Views related to database security, such as information about users,
roles, and permissions.

5. End-User Views:
- Graphical Views: Views that are presented graphically, such as charts, graphs, or
diagrams, to help end-users understand complex relationships in the data.
- Tabular Views: Standard views that present data in a tabular format, similar to a
spreadsheet. These are common in data analysis and reporting.

6. Read-Only Views:
- Read-Only Access Views: Views that restrict users to read-only access, preventing
them from modifying the underlying data. This can be useful for providing information to
users without allowing them to make changes.

7. Joint Views:
- Join Views: Views that combine data from multiple tables using SQL JOIN
operations. These views are useful for simplifying complex queries involving data from
different tables.

8. Temporal Views:
- Temporal Views: Views that represent the data at a specific point in time. This is
particularly important in scenarios where historical data is crucial, such as financial
records or versioned data.

The ability to create and use different views of data enhances the flexibility and usability
of a database system, allowing various users and applications to interact with the data
in a way that suits their specific requirements. It also helps in maintaining data security
by controlling the level of access granted to different users.

9. Three Level Architecture of Database

The ANSI-SPARC database architecture is the basis of most of the modern databases.

The three levels present in this architecture are Physical level, Conceptual level and
External level.
The details of these levels are as follows −

Physical Level

This is the lowest level in the three level architecture. It is also known as the internal level.
The physical level describes how data is actually stored in the database. In the lowest
level, this data is stored in the external hard drives in the form of bits and at a little high
level, it can be said that the data is stored in files and folders. The physical level also
discusses compression and encryption techniques.

Conceptual Level

The conceptual level is at a higher level than the physical level. It is also known as the
logical level. It describes how the database appears to the users conceptually and the
relationships between various data tables. The conceptual level does not care for how
the data in the database is actually stored.

External Level

This is the highest level in the three level architecture and closest to the user. It is also
known as the view level. The external level only shows the relevant database content to
the users in the form of views and hides the rest of the data. So different users can see
the database as a different view as per their individual requirements.

10. Types of Database Model

Database Model: It determines the logical structure of a database and fundamentally


determines in which manner data can be stored, organized and manipulated.

There are four common types of database model that are useful for different types of
data or information. Depending upon your specific needs, one of these models can be
used.

1. Hierarchical databases.
2. Network databases.
3. Relational databases.
4. Object-oriented databases.
1. Hierarchical databases

It is one of the oldest database model developed by IBM for information Management
System. In a hierarchical database model, the data is organized into a tree-like structure.
In simple language we can say that it is a set of organized data in tree structure.

This type of Database model is rarely used nowadays. Its structure is like a tree with
nodes representing records and branches representing fields. The windows registry used
in Windows XP is an example of a hierarchical database. Configuration settings are
stored as tree structures with nodes.

The following figure shows the generalized the structure of Hierarchical database model
in which data is stored in the form of tree like structure (data represented or stored in root
node, parent node and child node).

The following figure shows the example of hierarchical database model for the university
management system. This type of database employs the “parent-child” relationship of
storing data.

MSBI Training Online


Advantages

• The model allows us easy addition and deletion of new information.


• Data at the top of the Hierarchy is very fast to access.
• It worked well with linear data storage mediums such as tapes.
• It relates well to anything that works through a one to many relationships. For
example; there is a president with many managers below them, and those managers
have many employees below them, but each employee has only one manager.

Disadvantages

• It requires data to be repetitively stored in many different entities.


• Now a day there is no longer use of linear data storage mediums such as tapes.
• Searching for data requires the DBMS to run through the entire model from top to
bottom until the required information is found, making queries very slow.
• This model support only one to many relationships, many to many relationships are
not supported.

2. Network databases

This is looks like a Hierarchical database model due to which many time it is called as
modified version of Hierarchical database. Network database model organised data
more like a graph and can have more than one parent node. The network model is a
database model conceived as a flexible way of representing objects and their
relationships.
Advantage

• The network model is conceptually simple and easy to design.


• The network model can represent redundancy in data more effectively than in the
hierarchical model.
• The network model can handle the one to many and many to many relationships which
is real help in modelling the real-life situations.
• The data access is easier and flexible than the hierarchical model.
• The network model is better than the hierarchical model in isolating the programs from
the complex physical storage details.

Disadvantage:

• All the records are maintained using pointers and hence the whole database structure
becomes very complex.
• The insertion, deletion and updating operations of any record require the large number
of pointers adjustments.
• The structural changes to the database is very difficult.

3. Relational Database

A relational database is developed by E. F. Codd in 1970. The various software systems


used to maintain relational databases are known as a relational database management
system (RDBMS). In this model, data is organised in rows and column structure i.e., two-
dimensional tables and the relationship is maintained by storing a common field. It
consists of three major components.

In relational model, three key terms are heavily used such as relations, attributes, and
domains. A relation nothing but is a table with rows and columns. The named columns of
the relation are called as attributes, and finally the domain is nothing but the set of values
the attributes can take. The following figure gives us the overview of rational database
model.

Terminology used in Relational Model

• Tuple: Each row in a table is known as tuple.


• Cardinality of a relation: The number of tuples in a relation determines its cardinality. In
this case, the relation has a cardinality of 4.
• Degree of a relation: Each column in the tuple is called an attribute. The number of
attributes in a relation determines its degree. The relation in figure has a degree of 3.

Keys of a relation-

• Primary key- it is the key that uniquely identifies a table. It doesn’t have null values.
• Foreign key- it refers to the primary key of some other table.it permits only those values
which appear in the primary key of the table to which it refers.

Some of the example of relational database are as follows.


Oracle: Oracle Database is commonly referred to as Oracle RDBMS or simply as Oracle.
It is a multi-model database management system produced and marketed by Oracle
Corporation.

MySQL: MySQL is an open-source relational database management system (RDBMS)


based on Structured Query Language (SQL). MySQL runs on virtually all platforms,
including Linux, UNIX, and Windows.

Microsoft SQL Server: Microsoft SQL Server is an RDBMS that supports a wide variety
of transaction processing, business intelligence, and analytics applications in corporate
IT environments.

PostgreSQL: PostgreSQL, often simply Postgres, is an object-relational database


management system (ORDBMS) with an emphasis on extensibility and standards
compliance.

DB2: DB2 is an RDBMS designed to store, analyze, and retrieve data efficiently.
The following tables shows the sample relational database model for the bank
environment where the data associated with the bank is stored in the form two
dimensional tables.

Advantage

• Relational model is one of the most popular used database model.


• In relational model, changes in the database structure do not affect the data access.
• The revision of any information as tables consisting of rows and columns is much
easier to understand.
• The relational database supports both data independence and structure independence
concept which makes the database design, maintenance, administration and usage
much easier than the other models.
• In this we can write complex query to accesses or modify the data from database.
• It is easier to maintain security as compare to other models.

Disadvantages

• Mapping of objects in relational database is very difficult.


• Object oriented paradigm is missing in relation model.
• Data Integrity is difficult to ensure with Relational database.
• Relational Model is not suitable for huge database but suitable for small database.
• Hardware overheads are incurred which make it costly.
• Ease of design can lead to bad design.
• Relational database system hides the implementation complexities and the physical
data storage details from the users.

4. Object-oriented databases

An object database is a system in which information is represented in the form of objects


as used in object-oriented programming. Object oriented databases are different from
relational databases which are table-oriented. The object-oriented data model is based
on the object-oriented- programming language concept, which is now in wide use.
Inheritance, polymorphism, overloading. object-identity, encapsulation and information
hiding with methods to provide an interface to objects, are among the key concepts of
object-oriented programming that have found applications in data modelling. The object-
oriented data model also supports a rich type system, including structured and collection
types.

The following figure shows the difference between relation and object-oriented database
model.
The following figure shows an example of object-oriented model.

Advantages

• Object database can handle different types of data while relational data base handles
a single data. Unlike traditional databases like hierarchical, network or relational, the
object-oriented databases can handle the different types of data, for example, pictures,
voice video, including text, numbers and so on.
• Object-oriented databases provide us code reusability, real world modelling, and
improved reliability and flexibility.
• The object-oriented database is having low maintenance costs as compared to other
model because most of the tasks within the system are encapsulated, they may be
reused and incorporated into new tasks.
Disadvantages

• There is no universally defined data model for an OODBMS, and most models lack a
theoretical foundation.
• In comparison to RDBMSs the use of OODBMS is still relatively limited.
• There is a Lack of support for security in OODBMSs that do not provide adequate
security mechanisms.
• The system more complex than that of traditional DBMSs.

11. Attributes and Entity sets

In the context of database design and the Entity-Relationship (ER) model, attributes and
entity sets are fundamental concepts used to model and represent information about the
real-world domain.

1. Entity Set:
- An entity set is a collection or set of entities that share common characteristics or
properties.
- An entity is an object or concept in the real world that is distinguishable from other
objects. Entities can have attributes that describe various aspects of the entity.
- For example, in a university database, the "Student" entity set may include individual
students, and each student has attributes like "StudentID," "Name," and "Date of Birth."

2. Attributes:
- An attribute is a property or characteristic of an entity set that describes some aspect
of the entity.
- Attributes provide more detailed information about the entity within an entity set.
- Each attribute has a domain, which is the set of permissible values for that attribute.
For example, the "Color" attribute of a "Car" entity might have a domain of {Red, Blue,
Green}.
- Attributes can be categorized into different types:
- Simple Attribute: An attribute that cannot be divided further. For example, "Age" or
"Name."
- Composite Attribute: An attribute that can be divided into sub-parts that represent
more basic attributes with independent meanings. For example, "Address" may be a
composite attribute composed of "Street," "City," and "Zip Code."
- Derived Attribute: An attribute whose values can be derived from other attributes in
the database. For example, "Age" might be a derived attribute if the birthdate is known.
- Multi-valued Attribute: An attribute that can hold multiple values. For example,
"Phone Numbers" for a person, which may have multiple phone numbers.

In ER modeling, entities are typically depicted using rectangles, and attributes are
represented within ellipses connected to the respective entity. Relationships between
entities are shown using diamond shapes.

For example:
```
Student (Entity)
- StudentID (Attribute)
- Name (Attribute)
- Date of Birth (Attribute)
```

In this example, "Student" is the entity set, and "StudentID," "Name," and "Date of Birth"
are attributes associated with the "Student" entity set.

Understanding the distinctions between entity sets and attributes is crucial when
designing a database schema, as it helps in accurately representing the structure and
characteristics of the data in the real-world domain being modeled.
12. Relationship and Relationship sets

Relationship Set

A set of relationships of similar type is called a relationship set. Like entities, a relationship
too can have attributes. These attributes are called descriptive attributes.

Mapping Cardinalities:

express the number of entities to which another entity can be associated via a
relationship. For binary relationship sets between entity sets A and B, the mapping
cardinality must be one of:
One-to-one: An entity in A is associated with at most one entity in B, and an entity in B
is associated with at most one entity in A.

One-to-many: An entity in A is associated with any number in B. An entity in B is


associated with at most one entity in A.

Many-to-one: An entity in A is associated with at most one entity in B. An entity in B is


associated with any number in A.
Many-to-many: Entities in A and B are associated with any number from each other.

The appropriate mapping cardinality for a particular relationship set depends on the real
world being modeled.

Degree of a Relationship Set

The number of entity sets that participate in a relationship set is termed as the degree of
that relationship set. Thus,
Degree

Degree of a relationship set = Number of entity sets participating in a relationship set

Types of Relationship Sets

On the basis of degree of a relationship set, a relationship set can be classified into the
following types-

• Unary relationship set


• Binary relationship set
• Ternary relationship set

Unary Relationship Set


Unary relationship set is a relationship set where only one entity set participates in a
relationship set.

Binary Relationship Set

Binary relationship set is a relationship set where two entity sets participate in a
relationship set.

Ternary Relationship Set

Ternary relationship set is a relationship set where three entity sets participate in a
relationship set.
Participation Constraints

Total Participation of an Entity set

Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set. It is also called mandatory participation. Total
participation is represented using a double line between the entity set and relationship
set.

Partial participation of an Entity Set

Partial participation of an entity set represents that each entity in the entity set may or
may not participate in the relationship instance in that relationship set. It is also called as
optional participation. Partial participation is represented using a single line between the
entity set and relationship set.

13. Mapping Constraints IN DBMS


In the context of databases, mapping refers to the process of translating data and
relationships from the conceptual or logical level to the physical level. This involves
defining how the data is stored, organized, and accessed in the underlying database
management system (DBMS). Mapping is a crucial step in the database design
process, where the logical data model is transformed into a physical data model.
Constraints in DBMS, on the other hand, are rules or conditions applied to the data to
ensure the accuracy, integrity, and consistency of the database. Constraints are defined
during the mapping process to enforce specific rules at the database level. Here are
some common types of constraints:

1. Entity Integrity Constraint:


- Ensures that each row (or entity) in a table has a unique identifier, typically
implemented using primary keys.
- Example: A "StudentID" column in a "Students" table is defined as the primary key to
ensure each student has a unique identifier.

2. Referential Integrity Constraint:


- Ensures the consistency of relationships between tables by enforcing referential
relationships.
- Example: A foreign key in a "Orders" table references the primary key in a
"Customers" table, ensuring that an order is associated with a valid customer.

3. Domain Integrity Constraint:


- Defines the permissible values for a column or attribute.
- Example: A "Gender" column may be constrained to only allow values 'Male' or
'Female.'

4. Check Constraint:
- Defines a condition that must be true for data to be inserted or updated in a table.
- Example: A "Salary" column may have a check constraint to ensure that the salary is
greater than or equal to zero.

5. Unique Constraint:
- Ensures that values in a particular column or a combination of columns are unique
across rows in a table.
- Example: A "Username" column may have a unique constraint to ensure that each
user has a distinct username.
6. Default Constraint:
- Provides a default value for a column when no value is specified during insertion.
- Example: A "DateCreated" column may have a default constraint to set the current
date and time if no date is explicitly provided.

7. Not Null Constraint:


- Ensures that a column cannot contain a null (undefined) value.
- Example: A "PhoneNumber" column may have a not null constraint to ensure that
each record has a valid phone number.

8. Key Constraint:
- Enforces uniqueness, either as a primary key or a unique constraint on one or more
columns.
- Example: A combination of "FirstName" and "LastName" columns may have a key
constraint to ensure unique combinations.

These constraints collectively contribute to the accuracy, consistency, and reliability of


the database by preventing the insertion of invalid or inconsistent data. During the
mapping process, the designer specifies these constraints to guide the DBMS in
enforcing the rules defined at the logical level.

14. Entity Relationship (ER) Diagram

It is a type of flowchart that illustrates how “entities” such as people, objects or concepts
relate to each other within a system. ER Diagrams are most often used to design or
debug relational databases in the fields of software engineering, business information
systems, education and research. Also known as ERDs or ER Models, they use a
defined set of symbols such as rectangles, diamonds, ovals and connecting lines to
depict the interconnectedness of entities, relationships and their attributes. They mirror
grammatical structure, with entities as nouns and relationships as verbs.

ER diagrams are related to data structure diagrams (DSDs), which focus on the
relationships of elements within entities instead of relationships between entities
themselves. ER diagrams also are often used in conjunction with data flow diagrams
(DFDs), which map out the flow of information for processes or systems.

➢ History of ER models

Peter Chen (a.k.a. Peter Pin-Shan Chen), currently a faculty member at Carnegie-
Mellon University in Pittsburgh, is credited with developing ER modeling for database
design in the 1970s. While serving as an assistant professor at MIT’s Sloan School of
Management, he published a seminal paper in 1976 titled “The Entity-Relationship
Model: Toward a Unified View of Data.”

In a broader sense, the depiction of the interconnectedness of things dates back to least
ancient Greece, with the works of Aristotle, Socrates and Plato. It’s seen more recently
in the 19th and 20th Century works of philosopher-logicians like Charles Sanders
Peirce and Gottlob Frege.

By the 1960s and 1970s, Charles Bachman (above) and A.P.G. Brown were working
with close predecessors of Chen’s approach. Bachman developed a type of Data
Structure Diagram, named after him as the Bachman Diagram. Brown published works
on real-world systems modeling. James Martin added ERD refinements. The work of
Chen, Bachman, Brown, Martin and others also contributed to the development of
Unified Modeling Language (UML), widely used in software design.

Uses of entity relationship diagrams

• Database design: ER diagrams are used to model and design relational databases, in
terms of logic and business rules (in a logical data model) and in terms of the specific
technology to be implemented (in a physical data model.) In software engineering, an
ER diagram is often an initial step in determining requirements for an information
systems project. It’s also later used to model a particular database or databases. A
relational database has an equivalent relational table and can potentially be
expressed that way as needed.

• Database troubleshooting: ER diagrams are used to analyze existing databases to


find and resolve problems in logic or deployment. Drawing the diagram should reveal
where it’s going wrong.

• Business information systems: The diagrams are used to design or analyze relational
databases used in business processes. Any business process that uses fielded data
involving entities, actions and interplay can potentially benefit from a relational
database. It can streamline processes, uncover information more easily and improve
results.
• Business process re-engineering (BPR): ER diagrams help in analyzing databases
used in business process re-engineering and in modeling a new database setup.

• Education: Databases are today’s method of storing relational information for


educational purposes and later retrieval, so ER Diagrams can be valuable in planning
those data structures.

• Research: Since so much research focuses on structured data, ER diagrams can play
a key role in setting up useful databases to analyze the data.

The components and features of an ER diagram

ER Diagrams are composed of entities, relationships and attributes. They also depict
cardinality, which defines relationships in terms of numbers. Here’s a glossary:

Entity

A definable thing—such as a person, object, concept or event—that can have data


stored about it. Think of entities as nouns. Examples: a customer, student, car or
product. Typically shown as a rectangle.

Entity type: A group of definable things, such as students or athletes, whereas the entity
would be the specific student or athlete. Other examples: customers, cars or products.

Entity set: Same as an entity type, but defined at a particular point in time, such as
students enrolled in a class on the first day. Other examples: Customers who purchased
last month, cars currently registered in Florida. A related term is instance, in which the
specific person or car would be an instance of the entity set.

Entity categories: Entities are categorized as strong, weak or associative. A strong


entity can be defined solely by its own attributes, while a weak entity cannot. An
associative entity associates entities (or elements) within an entity set.

Entity

keys: Refers to an attribute that uniquely defines an entity in an entity set. Entity keys
can be super, candidate or primary. Super key: A set of attributes (one or more) that
together define an entity in an entity set. Candidate key: A minimal super key, meaning
it has the least possible number of attributes to still be a super key. An entity set may
have more than one candidate key. Primary key: A candidate key chosen by the
database designer to uniquely identify the entity set. Foreign key: Identifies the
relationship between entities.

Relationship

How entities act upon each other or are associated with each other. Think of
relationships as verbs. For example, the named student might register for a course. The
two entities would be the student and the course, and the relationship depicted is the
act of enrolling, connecting the two entities in that way. Relationships are typically
shown as diamonds or labels directly on the connecting lines.

Recursive relationship: The same entity participates more than once in the relationship.

Attribute

A property or characteristic of an entity. Often shown as an oval or circle.

Descriptive attribute: A property or characteristic of a relationship (versus of an entity.)

Attribute categories: Attributes are categorized as simple, composite, derived, as well as


single-value or multi-value. Simple: Means the attribute value is atomic and can’t be
further divided, such as a phone number. Composite: Sub-attributes spring from an
attribute. Derived: Attributed is calculated or otherwise derived from another attribute,
such as age from a birthdate.
Multi-value: More than one attribute value is denoted, such as multiple phone numbers
for a person.

Single-value: Just one attribute value. The types can be combined, such as: simple
single-value attributes or composite multi-value attributes.

Cardinality

Defines the numerical attributes of the relationship between two entities or entity sets.
The three main cardinal relationships are one-to-one, one-to-many, and many-many.
A one-to-one example would be one student associated with one mailing address.
A one-to-many example (or many-to-one, depending on the relationship direction): One
student registers for multiple courses, but all those courses have a single line back to
that one student. Many-to-many example: Students as a group are associated with
multiple faculty members, and faculty members in turn are associated with multiple
students.

Cardinality views: Cardinality can be shown as look-across or same-side, depending on


where the symbols are shown.
Cardinality constraints: The minimum or maximum numbers that apply to a relationship.

Mapping natural language

ER components can be equated to parts of speech, as Peter Chen did. This shows how
an ER Diagram compares to a grammar diagram:

• Common noun: Entity type. Example: student.

• Proper noun: Entity. Example: Sally Smith.

• Verb: Relationship type. Example: Enrolls. (Such as in a course, which would be


another entity type.)

• Adjective: Attribute for entity. Example: sophomore.

• Adverb: Attribute for relationship. Example: digitally.

The database query language ERROL actually mimics natural language constructs.
ERROL is based on reshaped relational algebra (RRA) and works with ER models,
capturing their linguistic aspects.

UNIT-2

1. Relational Model in DBMS

Relational model can represent as a table with columns and rows. Each row is known as
a tuple. Each table of the column has a name or attribute.

Domain: It contains a set of atomic values that an attribute can take.

Attribute: It contains the name of a column in a particular table. Each attribute Ai must
have a domain, dom(Ai)

Relational instance: In the relational database system, the relational instance is


represented by a finite set of tuples. Relation instances do not have duplicate tuples.

Relational schema: A relational schema contains the name of the relation and name of
all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify

NAME ROLL_NO PHONE_NO ADDRESS AGE

Ram 14795 7305758992 Noida 24

Shyam 12839 9026288936 Delhi 35

Laxman 33289 8583287182 Gurugram 20

Mahesh 27857 7086819134 Ghaziabad 27

Ganesh 17282 9028 9i3988 Delhi 40

the row in the relation uniquely.

Example: STUDENT Relation

o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>

Properties of Relations

o Name of the relation is distinct from all other relations.


o Each relation cell contains exactly one atomic (single) value
o Each attribute contains a distinct name
o Attribute domain has no significance
o tuple has no duplicate value
o Order of tuple can have a different sequence
Introduction to the Relational Model:

The relational model is a conceptual framework for organizing and representing data in a
relational database. Proposed by Edgar Codd in 1970, it is based on the principles of
mathematical set theory and predicate logic. The relational model uses tables (relations)
to represent data, with each table consisting of rows (tuples) and columns (attributes).
The relationships between tables are established through keys.

Candidate Keys:

1. Key:

- A key is a set of one or more attributes that can uniquely identify a tuple (row) in a
relation (table).

- Keys play a vital role in establishing relationships between tables and enforcing data
integrity.

2. Super Key:

- A super key is a set of attributes that, taken collectively, can uniquely identify a tuple.

- It may contain more attributes than necessary for uniqueness.

3. Candidate Key:

- A candidate key is a minimal super key, meaning it is a super key without any
unnecessary attributes.
- Each candidate key uniquely identifies tuples, and any subset of a candidate key
should not have the uniqueness property.

4. Primary Key:

- A primary key is a selected candidate key designated to uniquely identify tuples in a


relation.

- It is used to establish relationships between tables and must be unique for each tuple,
with no null values.

5. Alternate Key:

- An alternate key is a candidate key that is not chosen as the primary key.

- A relation may have multiple alternate keys.

6. Composite Key:

- A composite key is a key that consists of two or more attributes.

- The combination of these attributes is required to uniquely identify a tuple.

7. Unique Key:

- A unique key is a key that ensures the uniqueness of values in a column or


combination of columns.

- Unlike a primary key, unique keys may allow null values.


Integrity Constraints:

Integrity constraints are rules defined on a relational database to maintain accuracy,


consistency, and reliability of data. They ensure that data entered into the database meets
certain criteria and adheres to the specified rules.

1. Entity Integrity Constraint:

- Ensures that the primary key attribute in a relation cannot have a null value.

2. Referential Integrity Constraint:

- Ensures the consistency of relationships between tables by enforcing referential


relationships.

- The foreign key in one table must match a primary key in another table.

3. Domain Integrity Constraint:

- Defines the permissible values for a column or attribute.

- Ensures that data entered into a column falls within a specified range or set of values.

4. Check Constraint:

- Defines a condition that must be true for data to be inserted or updated in a table.
- Provides a way to enforce business rules on the data.

5. Default Constraint:

- Provides a default value for a column when no value is specified during insertion.

- Ensures that columns have a value even if not explicitly provided.

6. Not Null Constraint:

- Ensures that a column cannot contain a null (undefined) value.

- Requires that every row must have a value in the specified column.

Enforcing Integrity Constraints Over Relations:

Enforcing integrity constraints involves implementing mechanisms within the database


management system (DBMS) to automatically validate and enforce these rules. The
DBMS checks constraints during data manipulation operations (insert, update, delete)
and prevents actions that violate the defined constraints.

For example, when inserting a new tuple into a table, the DBMS checks whether the
primary key constraint is satisfied, ensuring that the primary key value is unique and not
null. If a constraint is violated, the DBMS will reject the operation and maintain the
consistency and integrity of the data.
In summary, the relational model relies on keys and integrity constraints to ensure data
accuracy, consistency, and reliability. Candidate keys play a crucial role in uniquely
identifying tuples, and integrity constraints provide a set of rules that guide the insertion,
updating, and deletion of data in a relational database. The DBMS enforces these
constraints to maintain the integrity of the data throughout its lifecycle.

Relational algebra is a theoretical query language that forms the basis of the SQL
language used in relational databases. It consists of a set of operations that can be
performed on relations (tables) to retrieve or manipulate data. Here are some
fundamental operations in relational algebra:

1. Selection (σ):

- Purpose: Selects rows from a relation that satisfy a specified condition.

- Notation: σ<sub>condition</sub>(Relation)

- Example: σ<sub>Age > 25</sub>(Employees)

2. Projection (π):

- Purpose: Selects specific columns (attributes) from a relation.

- Notation: π<sub>attributes</sub>(Relation)

- Example: π<sub>Name, Salary</sub>(Employees)

3. Set Operations:
- Union (∪): Combines two relations and removes duplicate tuples.

- Intersection (∩): Retrieves common tuples from two relations.

- Difference (−): Retrieves tuples from the first relation that are not present in the
second.

4. Renaming (ρ):

- Purpose: Renames attributes or relations.

- Notation: ρ<sub>newName(attribute)</sub>(Relation)

- Example: ρ<sub>NewSalary(Salary)</sub>(Employees)

5. Joins (⨝):

- Purpose: Combines tuples from two relations based on a common attribute.

- Notation: Relation1 ⨝<sub>Condition</sub> Relation2

- Example: Employees ⨝<sub>DepartmentID</sub> Departments

6. Division (÷):

- Purpose: Retrieves tuples from one relation that, when combined with another relation,
cover the entire set of attributes.

- Notation: Relation1 ÷ Relation2

- Example: Instructors ÷ Courses


7. Aggregate Operations:

- Count (COUNT): Counts the number of tuples in a relation or a group.

- Sum (SUM): Adds up the values in a specified column.

- Average (AVG): Calculates the average of the values in a specified column.

- Maximum (MAX): Retrieves the maximum value in a specified column.

- Minimum (MIN): Retrieves the minimum value in a specified column.

- Notation:

- COUNT<sub>attribute</sub>(Relation)

- SUM<sub>attribute</sub>(Relation)

- AVG<sub>attribute</sub>(Relation)

- MAX<sub>attribute</sub>(Relation)

- MIN<sub>attribute</sub>(Relation)

Example Query:

- Retrieve names of employees earning more than $50,000:

- σ<sub>Salary > 50000</sub>(Employees)

- Retrieve the total number of employees in each department:

- π<sub>DepartmentID, COUNT(EmployeeID)</sub>(Employees)
⨝<sub>DepartmentID</sub> Departments
- Calculate the average salary for each department:

- π<sub>DepartmentID, AVG(Salary)</sub>(Employees)
⨝<sub>DepartmentID</sub> Departments

Relational algebra provides a theoretical foundation for query languages used in relational
databases, and understanding these operations is crucial for designing and querying
databases effectively.
UNIT-3

1. Normalisation and Functional Dependency

Normalization and functional dependency are key concepts in database design,


specifically in the context of relational databases. They are aimed at organizing data to
minimize redundancy and improve data integrity.

Functional Dependency:

Functional dependency is a relationship between attributes in a relation (table) that


describes how the values of one attribute uniquely determine the values of another
attribute. It is denoted by an arrow (→).

For example, if we have attributes A and B in a relation, and every value of A uniquely
determines the value of B, we write it as A → B.

Key Points:

1. Full Functional Dependency: A functional dependency A → B is called full functional


dependency if it is not possible to remove any attribute from A and still have the
dependency hold true.

2. Transitive Dependency: If A → B and B → C, then A → C is a transitive dependency.


Normalization:

Normalization is the process of organizing data in a relational database to reduce


redundancy and dependency. The goal is to minimize data anomalies (insertion, update,
and deletion anomalies) and ensure data integrity.

Normal Forms:

1. First Normal Form (1NF):

- Eliminate duplicate columns from the same table.

- Create a separate table for each set of related data.

2. Second Normal Form (2NF):

- Meet the requirements of 1NF.

- Remove partial dependencies by putting the fields that do not depend on the primary
key into a separate table.

3. Third Normal Form (3NF):

- Meet the requirements of 2NF.

- Remove transitive dependencies by putting the fields that do not depend on the
primary key into a separate table.
4. Boyce-Codd Normal Form (BCNF):

- Meet the requirements of 3NF.

- A stronger form of 3NF where there are no non-prime attribute dependencies.

5. Fourth Normal Form (4NF):

- Addresses multi-valued dependencies.

6. Fifth Normal Form (5NF):

- Addresses cases where a relation has a join dependency.

Example:

Consider a relation "Employee" with attributes {EmployeeID, EmployeeName,


Department, DepartmentLocation}.

- The functional dependencies might include:

- EmployeeID → EmployeeName

- EmployeeID → Department

- Department → DepartmentLocation
In this case, the relation is not in 3NF because of the transitive dependency. To normalize
it, you might create separate tables for Employee and Department, eliminating the
transitive dependency.

Normalization helps ensure that the data is organized efficiently and that updates, inserts,
and deletes can be performed without introducing inconsistencies.

In summary, functional dependency and normalization are crucial concepts in database


design. Functional dependency helps identify relationships between attributes, and
normalization provides a systematic approach to organizing data to achieve data integrity
and minimize redundancy.

2. Need of normalisation, decompositions

The need for normalization and decompositions in the context of database design arises
from the desire to organize data efficiently, avoid data redundancy, and ensure data
integrity. Here are the primary reasons why normalization and decompositions are
essential:

1. Minimizing Data Redundancy:


- Issue: Redundant data can lead to inconsistencies and wastes storage space.
- Solution: Normalization breaks down tables into smaller, well-organized structures to
eliminate redundant data. This reduces the chances of inconsistencies and ensures that
updates, inserts, and deletes do not create anomalies.
2. Avoiding Insertion, Update, and Deletion Anomalies:
- Issue: Anomalies can occur when data is not organized properly, leading to
difficulties in maintaining data consistency.
- Solution: Normalization helps in organizing data to eliminate or minimize anomalies.
Each relation is structured to meet specific normal form criteria, reducing the risk of
anomalies during database operations.

3. Improving Data Integrity:


- Issue: Inconsistent data can lead to unreliable results and decision-making.
- Solution: By organizing data through normalization, integrity constraints can be
applied more effectively. Data is stored in a way that reflects its inherent dependencies,
ensuring that the information remains accurate and reliable.

4. Facilitating Querying and Reporting:


- Issue: Unnormalized or partially normalized data can lead to complex queries and
reporting structures.
- Solution: Normalized databases simplify query formulation. Tables are designed to
store data in a way that reflects relationships and dependencies, making it easier to
retrieve relevant information.

5. Scalability:
- Issue: Poorly organized data structures can hinder the scalability of a database.
- Solution: Normalization enables the creation of modular and scalable database
structures. Well-organized tables make it easier to extend the database schema without
introducing complexities.

6. Supporting Maintainability:
- Issue: Poorly designed databases can be difficult to maintain and modify.
- Solution: Decomposing tables through normalization simplifies the structure, making
it easier to understand and maintain. Changes to the database schema become more
manageable.
7. Enabling Flexibility and Adaptability:
- Issue: Rigidity in database structures can impede the adaptability of a system to
changing requirements.
- Solution: Normalization allows for a more flexible database design. Changes and
adaptations can be made to individual components without affecting the entire system.

8. Meeting Specific Normal Form Criteria:


- Issue: Non-adherence to specific normal form criteria can lead to data
inconsistencies.
- Solution: Each normal form represents a specific level of data organization, and
adherence to these normal forms ensures a systematic and standardized approach to
database design.

Decompositions:

Decompositions involve breaking down larger tables into smaller, well-organized tables.
This process is crucial for achieving normalization. Each decomposition brings a table to
a higher normal form, addressing specific issues related to data redundancy and
anomalies.

In summary, normalization and decompositions are critical in database design to ensure


efficiency, maintainability, and data integrity. They provide a systematic approach to
organizing data, making databases more robust and adaptable to changing
requirements.

Decomposition in the context of Database Management Systems (DBMS) refers to the


process of breaking down a relational schema into smaller, well-structured relations or
tables. This process is driven by the need to achieve specific objectives related to data
organization, normalization, and performance optimization. Here are several reasons
why decomposition is necessary in DBMS:
1. Normalization:
- One of the primary reasons for decomposition is normalization. Normalization is a
process that helps reduce data redundancy and dependency by organizing tables in a
way that conforms to specific normal forms (e.g., First Normal Form, Second Normal
Form, etc.).
- Decomposing a relation helps in eliminating various types of anomalies (insertion,
update, and deletion anomalies) and ensures that the database is in a normalized state.

2. Minimizing Redundancy:
- Decomposition aims to minimize redundancy in the database by breaking down
larger relations into smaller ones. Redundancy can lead to data inconsistencies and
increased storage requirements.

3. Facilitating Maintenance:
- Smaller, well-structured relations are often easier to maintain. Changes to data or
schema modifications can be more straightforward when relations are appropriately
decomposed.

4. Improving Read and Write Performance:


- Decomposing relations can lead to improvements in query performance. Smaller
relations may result in more efficient retrieval of data, especially when dealing with
specific subsets of information.

5. Avoiding Data Anomalies:


- Decomposition helps in avoiding various types of anomalies that may arise due to
the presence of certain dependencies between attributes. Normalizing the schema
through decomposition reduces the likelihood of these anomalies.

6. Supporting Complex Queries:


- Well-decomposed relations can make it easier to write complex queries.
Decomposition allows for more flexible querying and retrieval of information from the
database.
7. Enabling Parallelism:
- In a distributed or parallel computing environment, decomposed relations can be
processed independently, allowing for parallel execution of queries. This can improve
overall system performance.

8. Enhancing Data Integrity:


- Proper decomposition contributes to maintaining data integrity by eliminating
inconsistencies and dependencies that may exist in larger, less organized relations.

9. Adapting to Changes in Requirements:


- As the requirements of an application evolve over time, the need for changes in the
database schema arises. Decomposed relations provide a more adaptable structure
that can be modified to accommodate changing business needs.

10. Simplifying Database Design:


- Decomposition simplifies the overall database design by breaking down complex
relations into smaller, more manageable components. This makes it easier to
understand and maintain the database structure.

In summary, decomposition is a crucial aspect of database design, driven by the need


to achieve normalization, minimize redundancy, enhance performance, and maintain
data integrity. It ensures that the database schema is well-organized and adaptable to
changing requirements.

3. Decomposition

Decomposition in the context of databases refers to the process of breaking down a


relational schema or table into smaller, more manageable tables. This process is often
driven by the goal of achieving normalization and minimizing redundancy in the
database. Here are the key steps and considerations involved in decomposition:
1. Identify Functional Dependencies:
- Before decomposition, it's essential to identify the functional dependencies within the
original relation. Understand how attributes depend on each other.

2. Normalize the Relation:


- Normalize the relation to a desired normal form (e.g., First Normal Form (1NF),
Second Normal Form (2NF), etc.). Each normal form addresses specific types of
dependencies and anomalies.

3. Determine Candidate Keys:


- Identify candidate keys for the relation. Candidate keys are minimal sets of attributes
that uniquely identify each tuple in the relation.

4. Choose a Decomposition Strategy:


- Decomposition can be achieved using different strategies:
- Lossless-Join Decomposition: Ensures that the original relation can be
reconstructed from its decomposed parts without loss of information.
- Dependency-Preserving Decomposition: Preserves all the functional dependencies
of the original relation in the decomposed relations.

5. Perform Decomposition:
- Based on the chosen strategy, decompose the relation into smaller tables. The
resulting tables should be in a normalized form, and each table should have a clear
purpose.

6. Create New Tables:


- For each decomposed part, create a new table. Define the primary key for each
table based on the candidate keys identified earlier.

7. Establish Relationships:
- Establish relationships between the decomposed tables. Foreign keys in some
tables should reference primary keys in other tables to maintain referential integrity.
8. Verify and Optimize:
- Verify that the decomposition meets the desired normal form and that it doesn't
introduce anomalies. Optimize the structure if necessary.

Example:

Consider an original relation "Employee" with attributes {EmployeeID, EmployeeName,


Department, DepartmentLocation}.

1. Identify Functional Dependencies:


- EmployeeID → EmployeeName
- EmployeeID → Department
- Department → DepartmentLocation

2. Normalize the Relation (Assume 3NF):


- Address partial and transitive dependencies.

3. Determine Candidate Keys:


- Candidate key: {EmployeeID}

4. Choose a Decomposition Strategy:


- Choose a strategy that ensures lossless-join decomposition and preserves
dependencies.

5. Perform Decomposition:
- Decompose the "Employee" relation into smaller tables, such as "EmployeeInfo"
(EmployeeID, EmployeeName) and "DepartmentInfo" (Department,
DepartmentLocation).
6. Create New Tables:
- Create the tables "EmployeeInfo" and "DepartmentInfo" with appropriate primary
keys.

7. Establish Relationships:
- Establish a relationship between "EmployeeInfo" and "DepartmentInfo" using the
common attribute "Department."

8. Verify and Optimize:


- Ensure that the decomposition meets the requirements of 3NF. Optimize the
structure if needed.

Decomposition helps in achieving a well-organized and normalized database structure,


reducing redundancy and improving data integrity. It's a fundamental step in the
database design process.

4.Minimal cover IN DBMS

In the context of database design and normalization, a minimal cover, often referred to
as a minimal cover of functional dependencies, is a set of functional dependencies that
is both irreducible and still implies all the dependencies of the original set. The goal is to
represent the dependencies in a minimal and concise way without losing any
information.

Here are the steps to find the minimal cover of a set of functional dependencies:

Given a set of functional dependencies:

\[ F = \{A \rightarrow B, AC \rightarrow D, E \rightarrow AB, D \rightarrow E\} \]

Steps to Find Minimal Cover:


1. Remove Redundant Dependencies:
- Remove any redundant dependencies. Redundancy occurs when a functional
dependency can be inferred from other dependencies in the set.

2. Remove Extraneous Attributes:


- Remove any extraneous attributes from the right-hand side of the dependencies. An
attribute is extraneous if it can be removed without changing the closure of the set.

3. Split Right-Hand Sides:


- Split dependencies with multiple attributes on the right-hand side into separate
dependencies. This helps in achieving a minimal representation.

4. Remove Duplicate Dependencies:


- Remove any duplicate dependencies that might have been introduced during the
process.

Example:

Given \( F = \{A \rightarrow B, AC \rightarrow D, E \rightarrow AB, D \rightarrow E\} \)

1. Remove Redundant Dependencies:


- No redundancy found in this example.

2. Remove Extraneous Attributes:


- AC \(\rightarrow\) D (Remove A \(\rightarrow\) D)
- E \(\rightarrow\) AB (Remove E \(\rightarrow\) A, E \(\rightarrow\) B)

3. Split Right-Hand Sides:


- No splitting needed in this example.
4. Remove Duplicate Dependencies:
- No duplicates found in this example.

Resulting Minimal Cover:

\[ F_{\text{min}} = \{AC \rightarrow D, E \rightarrow A, E \rightarrow B, D \rightarrow E\} \]

The resulting minimal cover is a reduced set of dependencies that still implies the same
closure as the original set. It is a more concise and minimal representation of the
functional dependencies, making it easier to work with during database design and
normalization processes.

5. Attribute closure
The attribute closure is a concept in database theory, specifically in the context of
functional dependencies. The closure of a set of attributes with respect to a set of
functional dependencies represents the complete set of attributes that can be
functionally determined by that original set.

Given a set of attributes \(X\) and a set of functional dependencies \(F\), the closure of
\(X\), denoted as \(X^+\), is the set of all attributes that are functionally determined by
\(X\) according to the dependencies in \(F\).

The process of finding the attribute closure involves repeatedly applying the given
functional dependencies to the set of attributes until no new attributes can be added.
The closure is determined by the rules of functional dependency closure.

Rules for Attribute Closure:

1. Reflexivity:
- \(Y \subseteq X^+\) for any \(Y \subseteq X\).
2. Augmentation:
- If \(X^+ \rightarrow Y\) and \(Z \subseteq Y\), then \(X^+ \rightarrow Z\).

3. Transitivity:
- If \(X^+ \rightarrow Y\) and \(Y^+ \rightarrow Z\), then \(X^+ \rightarrow Z\).

Example:

Consider the set of functional dependencies \(F = \{AB \rightarrow C, BC \rightarrow D,


D \rightarrow A\}\).

Given \(X = \{A, B\)\), let's find \(X^+\):

1. Reflexivity:
- \(X^+ = \{A, B\}\).

2. Augmentation:
- \(AB \rightarrow C\) implies \(X^+ = \{A, B, C\}\).

3. Transitivity:
- \(BC \rightarrow D\) implies \(X^+ = \{A, B, C, D\}\).

No further application of the dependencies can add more attributes to \(X^+\).

Result:

For \(X = \{A, B\}\) and \(F = \{AB \rightarrow C, BC \rightarrow D, D \rightarrow A\}\), the
attribute closure \(X^+\) is \(\{A, B, C, D\}\).
Attribute closure is a fundamental concept used in database normalization, particularly
in determining the candidate keys, super keys, and the normalization process. It helps
ensure that a relational database is organized to minimize redundancy and maintain
data integrity.

6. Attribute closure is a concept used in database normalization, particularly in the


context of functional dependencies. It is closely related to the process of determining
the closure of a set of attributes under a set of functional dependencies. The closure of
a set of attributes is the complete set of attributes that can be derived from the original
set using the given functional dependencies.

Let's go through the attribute closure process and the normalization levels (First Normal
Form, Second Normal Form, and Third Normal Form) with examples:

Attribute Closure:

Given a set of attributes \(X\) and a set of functional dependencies \(F\), the closure of
\(X\), denoted as \(X^+\), is the set of all attributes that can be functionally determined
by \(X\) based on the dependencies in \(F\).

Example:

Consider the set of functional dependencies \(F = \{A \rightarrow B, BC \rightarrow D, D


\rightarrow A\}\).

Given \(X = \{A\}\), let's find \(X^+\):

1. Reflexivity:
- \(X^+ = \{A\}\).

2. Augmentation:
- \(A \rightarrow B\) implies \(X^+ = \{A, B\}\).

3. Transitivity:
- \(D \rightarrow A\) implies \(X^+ = \{A, B, D\}\).

No further application of the dependencies can add more attributes to \(X^+\).

First Normal Form (1NF):

A relation is in 1NF if it has a primary key and all attributes are atomic.

Second Normal Form (2NF):

A relation is in 2NF if it is in 1NF and all non-prime attributes are fully functionally
dependent on the primary key.

Third Normal Form (3NF):

A relation is in 3NF if it is in 2NF and there are no transitive dependencies.

Example:

Consider a relation "Employee" with attributes {EmployeeID, EmployeeName,


Department, DepartmentLocation}.

1. First Normal Form (1NF):


- Ensure that each attribute contains atomic values.

2. Second Normal Form (2NF):


- No partial dependencies; each non-prime attribute is fully functionally dependent on
the primary key.

3. Third Normal Form (3NF):


- No transitive dependencies.

Conclusion:

Attribute closure is a crucial concept in determining the functional dependencies and


normalization levels of a relational database. It helps ensure data integrity, minimize
redundancy, and organize the data efficiently. The normalization process aims to bring
a database into a state where it satisfies certain conditions (1NF, 2NF, 3NF) that
prevent data anomalies.

6. Multi valued Dependencies

Multi-valued dependencies (MVDs) are a concept in database theory that extends the
idea of functional dependencies. While functional dependencies describe relationships
between attributes in a table, multi-valued dependencies involve relationships between
sets of attributes. Multi-valued dependencies are used to ensure that certain types of data
anomalies are avoided in a relational database.

Multi-Valued Dependency Definition:

Let's consider a relation \(R\) with attributes \(A\), \(B\), and \(C\). A multi-valued
dependency from \(A\) to \(B\) in \(R\), denoted as \(A \xrightarrow{} \xleftarrow{} B\),
specifies that for every value of \(A\) there is a set of values for \(B\) that is independent
of the other attributes in \(R\).
In simpler terms, if \(t_1\) and \(t_2\) are tuples in \(R\) with the same values for \(A\), then
for any tuple \(t_3\) in \(R\) with the same \(A\) value, there is a tuple \(t_4\) such that
\(t_1[B] = t_3[B]\) and \(t_2[B] = t_4[B]\).

Example:

Consider a relation \(R(A, B, C)\) with the following multi-valued dependency: \(A
\xrightarrow{} \xleftarrow{} B\).

This dependency implies that for every value of \(A\), there is a set of values for \(B\) that
is independent of \(C\).

```

|A|B|C|

|---|---|---|

|a|1|x|

|b|2|y|

|a|3|z|

|b|4|w|

```
In this example, for \(A = a\), the set of values for \(B\) is {1, 3}, and for \(A = b\), the set
of values for \(B\) is {2, 4}.

Usage in Database Design:

Multi-valued dependencies are particularly relevant in situations where a relation can be


decomposed to eliminate redundancy and data anomalies related to multi-valued facts.
These dependencies are considered during the normalization process to ensure that a
database schema is well-designed and free from certain types of anomalies.

Fourth Normal Form (4NF):

A relation is in Fourth Normal Form (4NF) if it is in Third Normal Form (3NF) and has no
non-trivial multi-valued dependencies.

In summary, multi-valued dependencies are a crucial concept in relational database


theory, especially in the normalization process, as they help in designing databases that
are free from certain types of anomalies related to multi-valued facts.
7. Fourth and Fifth Normal Form
The fourth normal form is the next level of normalization after BCNF which deals with a

more universal type of dependency known as multi-valued dependency. The fourth

normal form just like the other normal forms brings in features that organize the data

and eliminates anomalies and redundancy. Each normalization has a set of rules that

should be followed in creating the database. And in order to Based on the type

of normalization that is being wished to the larger tables are divided into smaller ones.

That is referred to as the decomposition of tables. A relation should satisfy the below

two conditions to be in 4NF. The conditions are:

1. It should be in BCNF.

2. There should be no Multi-valued Dependency.

A table which is in 4NF will by default satisfy all the conditions of the previous normal

forms. The different forms of normalization before 4NF are:

• 1NF

• 2NF

• 3NF

• BCNF

How does Fourth Normal Form Work?


To understand how 4NF works it is necessary to understand multi-valued dependency.

Multivalued dependency requires a minimum of three columns in which there should be

at least two attributes that depend on the third one. And those two attributes are

dependent on each other. Conditions for Multi-valued dependency.


1. There should be at least 3 columns in a table.

2. For every dependency A-> B, for every value of A multiple values of B exists then

the dependency is referred to as a multi-valued dependency.

3. In the relation of 3 columns R(XYZ), if there exists a multi-valued dependency

between X and Y then Y and Z should be independent of each other.

All the above conditions should be satisfied to establish the fact that multi-valued

dependency exists in relation.

Example 1
Here we have a Students table which has information about the subjects and the activity

the students have enrolled in. Subjects_Enrolled and Activty_Enrolled are the two

attributes that are independent of each other. The given table is in BCNF which satisfies

the first condition of 4NF.

Let’s check further the Multi-valued Dependency.

1. The dependencies in this relation are:

Student_Roll_No —> Subject_Enrolled

Student_Roll_No —> Activity Enrolled

2. Based on the conditions for Multi-Valued dependency, checking the existing

relation.

3. There should be at least 3 columns in a table. – – Satisfied


4. For every dependency A-> B, for every value of A multiple values of B exists then

the dependency is referred to as multi-valued dependency. – – Roll no 45 has

enrolled in Economics and History in terms of academics and Painting and

Hockey as activities. Thus for a value of Student_Roll_No different values of

Activity_Enrolled exist.

5. In the relation of 3 columns R(XYZ), if there exists a multi-valued dependency

between X and Y then Y and Z should be independent of each other. – –

Subject_Enrolled and Activity Enrolled are independent of each other.

As we checked the above conditions it is clear that the relation consists of multi-valued

dependency. In order to normalize the table into 4NF, we need to decompose it into

smaller tables.

Student relation is now been decomposed into smaller tables S1 and S2 which are in

4NF.

S1:
\

S2:

Example 2
Here Course is a relation with the below dependencies.

Course -> Instructor

Course -> TextBook_Author


For a certain value, of course, we have a set of values for instructor and a set of values

for TextBook_Author. But instructors and textbooks are not related. Thus the table has a

multi-valued dependency and does not satisfy 4NF. To solve this, the decomposition of

the table is required. The table will be split into two tables where one table will preserve

the Course -> Instructor and the second table will preserve Course -> TextBook_Author.

This decomposition will result in a table that is in 4NF.

• Helps in removing redundancy and anomalies in the database.

• Data integrity and consistency can be maintained through normalization and

restricted constraints.

Conclusion
By default, relation in 4NF will satisfy all the properties of 1NF, 2NF, 3NF, and BCNF. A

table needs to be in BCNF and have no multi-valued dependency. It will ensure data

integrity and consistency. These particular set of rules which constitutes the basis of

normalization is an update, delete and insert anomalies does not occur in databases.

8. Transaction Management and Concurrency Control


Transaction management and concurrency control are critical aspects of database
systems to ensure the consistency, isolation, and durability of transactions executed by
multiple users concurrently. Let's explore these concepts in detail:

Transaction Management:

A transaction is a sequence of one or more operations (such as read or write) on a


database that forms a single logical unit of work. Transactions are designed to maintain
the integrity of the database by ensuring that either all the operations within the
transaction are executed successfully (commit), or none of them are (rollback).

Key Properties of Transactions:

1. Atomicity:
- Ensures that a transaction is treated as a single, indivisible unit of work. Either all
operations within the transaction are committed, or none of them are.

2. Consistency:
- Ensures that a transaction brings the database from one consistent state to another.
The integrity constraints of the database should be preserved.

3. Isolation:
- Ensures that the execution of one transaction is isolated from the execution of other
transactions. Transactions appear to be executed in isolation, even though they may be
executed concurrently.

4. Durability:
- Ensures that once a transaction is committed, its effects on the database are
permanent and survive subsequent failures.

Concurrency Control:

Concurrency control is the process of managing the simultaneous execution of


transactions to ensure data consistency while allowing for concurrent access to the
database. The goal is to balance the benefits of parallel execution with the need to
maintain the integrity of the data.

Concurrency Control Mechanisms:


1. Locking:
- Uses locks to control access to data. Transactions must acquire a lock on a resource
before accessing it. Different types of locks (read, write, exclusive, shared) are used to
control the level of access.

2. Timestamp Ordering:
- Assigns a unique timestamp to each transaction and uses these timestamps to order
the execution of transactions. Transactions with higher timestamps are given priority.

3. Two-Phase Locking (2PL):


- Enforces that a transaction must acquire all the locks it needs before it starts
executing (growing phase) and release all its locks when it finishes (shrinking phase).

4. Optimistic Concurrency Control:


- Allows transactions to proceed without acquiring locks initially. Checks for conflicts at
the end of the transaction and rolls back if conflicts are detected.

5. Multi-Version Concurrency Control (MVCC):


- Maintains multiple versions of data to allow transactions to read a snapshot of the
database at a specific point in time. Each transaction sees a consistent snapshot of the
database.

Challenges in Concurrency Control:

1. Deadlocks:
- Situations where two or more transactions are unable to proceed because each is
waiting for the other to release a lock.

2. Starvation:
- A transaction is unable to proceed because other transactions continually acquire
the required locks.
3. Concurrency Anomalies:
- Undesirable phenomena such as lost updates, uncommitted data, and inconsistent
retrievals that may occur due to concurrent execution.

9. Transaction States

In a database system, a transaction goes through various states during its lifecycle. The
management of these states is essential to ensure the reliability, consistency, and
integrity of the database. Here are the common states of a transaction:

1. Active State:
- The transaction is in the process of being executed. It starts in the active state when
the first operation of the transaction is executed.

2. Partially Committed State:


- All the operations of the transaction have been executed, and the system is about to
commit the changes to the database. However, the changes are not yet permanent.

3. Committed State:
- The transaction has been successfully completed, and all its changes have been
permanently saved to the database. It is now in a committed state.

4. Failed State:
- The transaction has encountered an error or exception during its execution, making
it unable to proceed further. It may have partially executed, but it cannot be completed.

5. Aborted State:
- The transaction has encountered an error or has been intentionally rolled back to its
previous state. Any changes made by the transaction are undone, and the database is
restored to its state before the transaction began.
6. Terminated State:
- The transaction has completed its execution, either by committing or aborting. It is
no longer active, and its outcome (commit or abort) is final.

Transaction State Transitions:

The typical sequence of state transitions for a transaction is as follows:

1. Start:
- The transaction moves from a non-executing state (not started) to the active state
when its first operation is executed.

2. Execution:
- The transaction performs a sequence of read and write operations on the database
while in the active state.

3. Commit or Abort Decision:


- The transaction reaches a point where it decides whether to commit its changes or
abort due to an error or other reasons.

4. Partially Committed:
- If the decision is to commit, the transaction enters the partially committed state. It is
here that the changes are about to be made permanent.

5. Committed:
- If the commit operation is successful, the transaction enters the committed state, and
its changes become permanent in the database.
6. Aborted:
- If the decision is to abort or if an error occurs, the transaction enters the aborted
state. Any changes made by the transaction are rolled back, and the database is
restored.

7. Terminated:
- After committing or aborting, the transaction enters the terminated state. It is no
longer active and has completed its lifecycle.

Transaction Management:

Transaction management systems (TMS) within a database system are responsible for
coordinating these state transitions, ensuring the integrity of the database, and handling
exceptional situations. The mechanisms involved in transaction management include
logging, recovery, and concurrency control. These mechanisms collectively contribute to
the ACID properties (Atomicity, Consistency, Isolation, Durability) of transactions in a
database system.

10. Transactions and Schedules

In the context of database systems, transactions and schedules play crucial roles in
ensuring the consistency and integrity of data. Let's explore these concepts:

Transactions:

A transaction is a sequence of one or more operations that are executed as a single unit
of work. Transactions ensure the integrity of the database by following the ACID
properties:

1. Atomicity:
- A transaction is atomic, meaning that either all its operations are executed
successfully, or none of them are. There is no partial completion.

2. Consistency:
- A transaction brings the database from one consistent state to another. The integrity
constraints of the database are preserved.

3. Isolation:
- The execution of one transaction is isolated from the execution of other transactions.
Each transaction appears to be executed in isolation, even though they may be
executed concurrently.

4. Durability:
- Once a transaction is committed, its effects on the database are permanent and
survive subsequent failures.

Schedules:

A schedule is a chronological order of operations (reads and writes) from a set of


transactions in a database system. Schedules can be categorized into two types:

1. Serial Schedule:
- Transactions are executed one after the other, with no overlapping in time. While
simple, this approach may not fully utilize the capabilities of a multi-user system.

2. Concurrent Schedule:
- Multiple transactions are executed concurrently, with their operations interleaved in
time. Concurrency allows for better utilization of system resources but introduces
challenges related to data consistency and isolation.

Serializable Schedules:
Serializable schedules are schedules that produce the same results as if the
transactions were executed in some serial order. Ensuring serializability is crucial for
maintaining consistency and avoiding data anomalies.

Two common methods for achieving serializability in concurrent schedules are:

1. Locking:
- Transactions acquire locks on data items before accessing them. Locks prevent
other transactions from accessing the same data simultaneously, ensuring serialized
access.

2. Timestamp Ordering:
- Transactions are assigned unique timestamps, and the system ensures that
operations are executed in timestamp order. Older transactions are given priority.

Example:

Consider two transactions:

Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```

Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```

A schedule that interleaves the operations of T1 and T2 could look like this:
```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```

Ensuring that this concurrent schedule is equivalent to some serial order is important for
maintaining consistency.

Serializable Schedules in Locking:

```
(T1) (T2)
Read(A)
Read(B)
Write(B)
Write(A)
Commit
Commit
```
Here, T1 completes its operations before T2 begins, ensuring serializability.

In summary, transactions and schedules are fundamental concepts in database


systems. Transactions ensure the reliability and integrity of data, while schedules define
the chronological order of operations performed by transactions. Achieving serializability
in concurrent schedules is essential for maintaining consistency and avoiding data
anomalies. Locking and timestamp ordering are common methods to ensure
serializability.

11. Concurrent Execution of transaction

Concurrent execution of transactions is a fundamental aspect of database systems that


allows multiple transactions to run simultaneously. This concurrency improves the
overall throughput and responsiveness of the system. However, it introduces challenges
related to data consistency, isolation, and potential conflicts. To manage concurrent
execution, databases use various techniques, including locking mechanisms,
timestamp-based protocols, and isolation levels. Let's explore these concepts:

Techniques for Concurrent Execution:

1. Lock-Based Concurrency Control:


- Transactions acquire locks on data items to control access by other transactions.
Locks can be at various levels (e.g., row-level, page-level, table-level) and can be
shared or exclusive.
- Two-Phase Locking (2PL): Transactions follow a protocol where they acquire locks
in a growing phase and release locks in a shrinking phase. This ensures that conflicts
are avoided.

2. Timestamp-Based Concurrency Control:


- Transactions are assigned unique timestamps based on their start times. The
system uses timestamps to order transactions and resolve conflicts.
- Timestamp Ordering Protocol: Ensures that transactions are executed in timestamp
order. Older transactions are given priority, and conflicts are resolved based on
timestamps.
3. Optimistic Concurrency Control:
- Transactions are allowed to proceed without acquiring locks initially. Conflicts are
checked at the end of the transaction, and if conflicts are detected, the transaction is
rolled back.
- Validation Protocol: Validates the changes made by a transaction at the end against
other concurrently executing transactions. If no conflicts are found, the transaction is
committed.

Isolation Levels:

Isolation levels define the degree to which the operations of one transaction are isolated
from the operations of other concurrently executing transactions. The common isolation
levels include:

1. Read Uncommitted:
- Allows transactions to read uncommitted changes made by other transactions.

2. Read Committed:
- Ensures that transactions can only read committed changes made by other
transactions.

3. Repeatable Read:
- Ensures that once a transaction reads a value, it will always see that same value,
even if it is updated by other transactions.

4. Serializable:
- Ensures the highest level of isolation by preventing all types of anomalies, providing
a serializable execution.
Challenges and Concerns:

1. Deadlocks:
- Situations where two or more transactions are blocked, each waiting for the other to
release a lock.

2. Starvation:
- Occurs when a transaction is unable to proceed because other transactions
continually acquire the required locks.

3. Concurrency Anomalies:
- Undesirable phenomena such as lost updates, uncommitted data, and inconsistent
retrievals that may occur due to concurrent execution.

Example:

Consider two transactions:

Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```

Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```

In a concurrent execution scenario, the interleaved schedule could be:

```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```

Managing conflicts and ensuring that the final result is equivalent to some serial order is
crucial.

In summary, concurrent execution of transactions is a key feature in database systems


to enhance performance. Techniques such as lock-based control, timestamp-based
control, and different isolation levels help manage concurrency while ensuring data
consistency and integrity. However, careful consideration and proper protocols are
required to address challenges and avoid anomalies that may arise due to concurrent
execution.

12. Serializability and recoverability

Serializability and recoverability are two important concepts in database management


systems that are crucial for ensuring data consistency and reliability. Let's explore these
concepts in detail:
Serializability:

Serializability is a property of a schedule of transactions in a database system. A


schedule is considered serializable if it produces the same result as some serial
execution of the transactions. In other words, the interleaved execution of transactions
in the schedule must be equivalent to some sequential order of those transactions.

Conflict Serializability:

To determine whether a schedule is conflict serializable, we can use the concept of


conflicts. There are two types of conflicts:

1. Read-Write Conflict (RW):


- Occurs when one transaction reads a data item that another transaction writes.

2. Write-Write Conflict (WW):


- Occurs when two transactions write to the same data item.

A schedule is conflict serializable if there is no cycle in the precedence graph, where the
nodes represent transactions, and directed edges represent conflicts. If a schedule is
conflict serializable, it is guaranteed to be serializable.

Recoverability:

Recoverability is a property that ensures the ability to recover a database to a


consistent state after a failure, such as a system crash or an error during transaction
execution. The recoverability of a schedule is assessed based on the ability to commit
or abort transactions and maintain consistency.

Recoverability Criteria:
1. Cascadeless (Strict) Recoverability:
- A transaction cannot be committed if it has uncommitted dependencies (reads or
writes) on another transaction that was subsequently aborted.
- This prevents the propagation of uncommitted changes.

2. Recoverable Schedule:
- A schedule is recoverable if, for every pair of transactions T1 and T2, if T2 reads a
data item written by T1, then T1 commits before T2 starts.

Example:

Consider two transactions:

Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```

Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```

The schedule:
```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```

This schedule is not conflict serializable because it contains a cycle in the precedence
graph (T1 -> T2 -> T1).

For recoverability, we need to ensure that if T2 reads a data item written by T1, then T1
commits before T2 starts. In this case, the schedule is recoverable because T1 commits
before T2 starts.

Summary:

- Serializability:
- Ensures that the execution of transactions in a schedule is equivalent to some serial
order.
- Conflict serializability ensures that there are no conflicts between transactions that
form cycles in the precedence graph.

- Recoverability:
- Ensures the ability to recover a database to a consistent state after a failure.
- Cascadeless recoverability prevents the propagation of uncommitted changes.
- Recoverable schedules ensure that transactions reading data written by other
transactions see committed changes.
Ensuring both serializability and recoverability is crucial for maintaining the integrity and
reliability of a database system, especially in the presence of concurrent transactions
and potential failures.

13. Introduction to Lock Management


Lock management is a critical aspect of concurrency control in database systems.
Concurrency control ensures that multiple transactions can execute simultaneously
without compromising the consistency and integrity of the database. Locks are
mechanisms used to control access to shared resources, preventing conflicts and
enforcing a specific order of execution. Let's delve into the introduction of lock
management:

Key Concepts:

1. Lock:
- A lock is a mechanism that restricts access to a resource, such as a data item, to a
single transaction at a time.
- Locks are used to prevent conflicts and maintain the consistency of the database.

2. Lock Types:
- Shared Lock (Read Lock):
- Allows multiple transactions to read a resource concurrently but prevents any
transaction from writing to it.
- Exclusive Lock (Write Lock):
- Grants exclusive access to a single transaction, preventing any other transactions
from reading or writing to the resource.

3. Granularity of Locks:
- Row-Level Locks:
- Lock specific rows in a table.
- Page-Level Locks:
- Lock entire pages of data.
- Table-Level Locks:
- Lock entire tables.

4. Two-Phase Locking (2PL):


- A protocol for ensuring serializability by requiring transactions to acquire all the locks
they need before releasing any locks.
- Consists of a growing phase (acquiring locks) and a shrinking phase (releasing
locks).

5. Deadlocks:
- A deadlock occurs when two or more transactions are blocked, each waiting for a
lock held by the other.
- Detection and resolution mechanisms are needed to address deadlocks.

Lock Management in Action:

1. Lock Request:
- A transaction requests a lock before accessing a resource. The request can be for a
shared or exclusive lock.

2. Lock Grant:
- If the requested lock is available, it is granted to the transaction. The transaction can
then access the resource.

3. Lock Release:
- After completing its operations on the resource, the transaction releases the lock,
making the resource available for other transactions.

4. Lock Escalation:
- Some systems support lock escalation, where a low-level lock (e.g., row-level) is
escalated to a higher level (e.g., table-level) to reduce the overhead of managing
numerous locks.

Challenges and Considerations:

1. Concurrency vs. Overhead:


- Balancing the benefits of concurrency with the overhead of managing locks is a key
consideration.

2. Deadlock Prevention and Detection:


- Implementing mechanisms to prevent and detect deadlocks is crucial for system
stability.

3. Isolation Levels:
- Different isolation levels (e.g., Read Committed, Serializable) impact the types of
locks acquired and the duration of their retention.

4. Lock Timeout:
- Systems may implement lock timeouts to prevent transactions from being blocked
indefinitely.

Summary:

Lock management is a fundamental component of concurrency control in database


systems. It ensures that transactions can execute concurrently while maintaining the
consistency and integrity of the database. Understanding the types of locks, their
granularities, and the protocols for acquiring and releasing them is essential for efficient
and reliable database operations in a multi-user environment.
13. Introduction to Lock Management
Lock management is a critical aspect of concurrency control in database systems.
Concurrency control ensures that multiple transactions can execute simultaneously
without compromising the consistency and integrity of the database. Locks are
mechanisms used to control access to shared resources, preventing conflicts and
enforcing a specific order of execution. Let's delve into the introduction of lock
management:

Key Concepts:

1. Lock:
- A lock is a mechanism that restricts access to a resource, such as a data item, to a
single transaction at a time.
- Locks are used to prevent conflicts and maintain the consistency of the database.

2. Lock Types:
- Shared Lock (Read Lock):
- Allows multiple transactions to read a resource concurrently but prevents any
transaction from writing to it.
- Exclusive Lock (Write Lock):
- Grants exclusive access to a single transaction, preventing any other transactions
from reading or writing to the resource.

3. Granularity of Locks:
- Row-Level Locks:
- Lock specific rows in a table.
- Page-Level Locks:
- Lock entire pages of data.
- Table-Level Locks:
- Lock entire tables.
4. Two-Phase Locking (2PL):
- A protocol for ensuring serializability by requiring transactions to acquire all the locks
they need before releasing any locks.
- Consists of a growing phase (acquiring locks) and a shrinking phase (releasing
locks).

5. Deadlocks:
- A deadlock occurs when two or more transactions are blocked, each waiting for a
lock held by the other.
- Detection and resolution mechanisms are needed to address deadlocks.

Lock Management in Action:

1. Lock Request:
- A transaction requests a lock before accessing a resource. The request can be for a
shared or exclusive lock.

2. Lock Grant:
- If the requested lock is available, it is granted to the transaction. The transaction can
then access the resource.

3. Lock Release:
- After completing its operations on the resource, the transaction releases the lock,
making the resource available for other transactions.

4. Lock Escalation:
- Some systems support lock escalation, where a low-level lock (e.g., row-level) is
escalated to a higher level (e.g., table-level) to reduce the overhead of managing
numerous locks.
Challenges and Considerations:

1. Concurrency vs. Overhead:


- Balancing the benefits of concurrency with the overhead of managing locks is a key
consideration.

2. Deadlock Prevention and Detection:


- Implementing mechanisms to prevent and detect deadlocks is crucial for system
stability.

3. Isolation Levels:
- Different isolation levels (e.g., Read Committed, Serializable) impact the types of
locks acquired and the duration of their retention.

4. Lock Timeout:
- Systems may implement lock timeouts to prevent transactions from being blocked
indefinitely.

14. Time stamp-based concurrency control


Timestamp-based concurrency control is a technique used in database management
systems to ensure the consistency and isolation of transactions in a concurrent
environment. It relies on assigning unique timestamps to transactions and using these
timestamps to determine the order of their execution. This approach helps prevent
conflicts and maintain the integrity of the database. Let's explore how timestamp-based
concurrency control works:

Key Concepts:

1. Timestamp Assignment:
- Each transaction is assigned a unique timestamp based on the order of its arrival or
initiation.
2. Timestamp Ordering:
- Transactions are ordered based on their timestamps. Older transactions are given
priority over newer ones.

3. Concurrency Control:
- The system uses timestamps to control access to data items and determine whether
a transaction's operation is allowed.

4. Read and Write Operations:


- Read and write operations are associated with timestamps. A transaction with a
higher timestamp can override the changes made by a transaction with a lower
timestamp.

Two Common Timestamp-Based Protocols:

1. Timestamp Ordering Protocol:

- Read Operation:
- A transaction \(T_i\) with timestamp \(TS_i\) can read a data item \(X\) only if the
timestamp of the last transaction that wrote \(X\) is less than \(TS_i\).

- Write Operation:
- A transaction \(T_i\) with timestamp \(TS_i\) can write to a data item \(X\) only if no
transaction with a higher timestamp has read or written to \(X\).

- Example:
- If \(T_1\) with \(TS_1\) writes to \(X\), and later \(T_2\) with \(TS_2 > TS_1\) tries to
read \(X\), it can do so. However, if \(T_2\) tries to write to \(X\), it will be blocked since
\(T_1\) wrote to \(X\) first.

2. Thomas Write Rule (Timestamp-Ordered Strict Two-Phase Locking):


- In addition to timestamp ordering, this protocol incorporates a set of locks.

- Read Locks:
- A transaction \(T_i\) with timestamp \(TS_i\) can acquire a read lock on a data item
\(X\) if no transaction with a higher timestamp has a write lock on \(X\).

- Write Locks:
- A transaction \(T_i\) with timestamp \(TS_i\) can acquire a write lock on a data item
\(X\) only if no transaction has a read or write lock on \(X\).

- Example:
- If \(T_1\) with \(TS_1\) acquires a write lock on \(X\), \(T_2\) with \(TS_2 > TS_1\)
cannot acquire a read or write lock on \(X\) until \(T_1\) releases its lock.

Advantages of Timestamp-Based Concurrency Control:

1. High Concurrency:
- Timestamp-based protocols allow for high concurrency as transactions with non-
conflicting timestamps can proceed simultaneously.

2. Automated Conflict Resolution:


- The protocol automatically resolves conflicts based on timestamps without the need
for explicit locks.

3. Support for Multiversion Concurrency Control (MVCC):


- Timestamp-based approaches naturally support MVCC, allowing transactions to
read a consistent snapshot of the database.

Challenges and Considerations:


1. Clock Synchronization:
- Timestamp-based protocols assume synchronized clocks across the system to
ensure accurate ordering.

2. Overhead:
- The overhead associated with maintaining timestamps and enforcing the protocol
should be considered.

3. Resolution of Write Conflicts:


- Careful handling is needed to resolve conflicts when two transactions attempt to
write to the same data item.

Summary:

Timestamp-based concurrency control is a widely used technique in database systems


to manage the concurrency of transactions. By assigning unique timestamps and
enforcing specific rules based on these timestamps, the system can ensure a consistent
and isolated execution of transactions in a concurrent environment. The two common
protocols, Timestamp Ordering and Thomas Write Rule, provide different approaches to
managing concurrency while maintaining data integrity.
UNIT-4
1. SQL
Structured query language (SQL) is a programming language for storing and processing
information in a relational database. A relational database stores information in tabular
form, with rows and columns representing different data attributes and the various
relationships between the data values. You can use SQL statements to store, update,
remove, search, and retrieve information from the database. You can also use SQL to
maintain and optimize database performance.

Why is SQL important?

Structured query language (SQL) is a popular query language that is frequently used in
all types of applications. Data analysts and developers learn and use SQL because it
integrates well with different programming languages. For example, they can embed
SQL queries with the Java programming language to build high-performing data
processing applications with major SQL database systems such as Oracle or MS SQL
Server. SQL is also fairly easy to learn as it uses common English keywords in its
statements

History of SQL

SQL was invented in the 1970s based on the relational data model. It was initially
known as the structured English query language (SEQUEL). The term was later
shortened to SQL. Oracle, formerly known as Relational Software, became the first
vendor to offer a commercial SQL relational database management system.

What are the components of a SQL system?

Relational database management systems use structured query language (SQL) to


store and manage data. The system stores multiple database tables that relate to each
other. MS SQL Server, MySQL, or MS Access are examples of relational database
management systems. The following are the components of such a system.

SQL table

A SQL table is the basic element of a relational database. The SQL database table
consists of rows and columns. Database engineers create relationships between
multiple database tables to optimize data storage space.

For example, the database engineer creates a SQL table for products in a store:

Product ID Product Name Color ID


0001 Mattress Color 1
0002 Pillow Color 2
Then the database engineer links the product table to the color table with the Color ID:

Color ID Color Name


Color 1 Blue
Color 2 Red
SQL statements

SQL statements, or SQL queries, are valid instructions that relational database
management systems understand. Software developers build SQL statements by using
different SQL language elements. SQL language elements are components such as
identifiers, variables, and search conditions that form a correct SQL statement.

For example, the following SQL statement uses a SQL INSERT command to
store Mattress Brand A, priced $499, into a table named Mattress_table, with column
names brand_name and cost:

INSERT INTO Mattress_table (brand_name, cost)

VALUES(‘A’,’499’);

Stored procedures

Stored procedures are a collection of one or more SQL statements stored in the
relational database. Software developers use stored procedures to improve efficiency
and performance. For example, they can create a stored procedure for updating sales
tables instead of writing the same SQL statement in different applications.

How does SQL work?

Structured query language (SQL) implementation involves a server machine that


processes the database queries and returns the results. The SQL process goes through
several software components, including the following.

Parser

The parser starts by tokenizing, or replacing, some of the words in the SQL statement
with special symbols. It then checks the statement for the following:

Correctness

The parser verifies that the SQL statement conforms to SQL semantics, or rules, that
ensure the correctness of the query statement. For example, the parser checks if the
SQL command ends with a semi-colon. If the semi-colon is missing, the parser returns
an error.
Authorization

The parser also validates that the user running the query has the necessary
authorization to manipulate the respective data. For example, only admin users might
have the right to delete data.

Relational engine

The relational engine, or query processor, creates a plan for retrieving, writing, or
updating the corresponding data in the most effective manner. For example, it checks
for similar queries, reuses previous data manipulation methods, or creates a new one. It
writes the plan in an intermediate-level representation of the SQL statement called byte
code. Relational databases use byte code to efficiently perform database searches and
modifications.

Storage engine

The storage engine, or database engine, is the software component that processes the
byte code and runs the intended SQL statement. It reads and stores the data in the
database files on physical disk storage. Upon completion, the storage engine returns
the result to the requesting application.

What are SQL commands?

Structured query language (SQL) commands are specific keywords or SQL statements
that developers use to manipulate the data stored in a relational database. You can
categorize SQL commands as follows.

Data definition language

Data definition language (DDL) refers to SQL commands that design the database
structure. Database engineers use DDL to create and modify database objects based
on the business requirements. For example, the database engineer uses the CREATE
command to create database objects such as tables, views, and indexes.

Data query language

Data query language (DQL) consists of instructions for retrieving data stored in
relational databases. Software applications use the SELECT command to filter and
return specific results from a SQL table.

Data manipulation language

Data manipulation language (DML) statements write new information or modify existing
records in a relational database. For example, an application uses the INSERT
command to store a new record in the database.
Data control language

Database administrators use data control language (DCL) to manage or authorize


database access for other users. For example, they can use the GRANT command to
permit certain applications to manipulate one or more tables.

2. Introduction to Interactive SQL


Interactive SQL, also known as SQL (Structured Query Language) in an interactive
mode, is a way for users to communicate with and manipulate relational databases
using a command-line interface or a graphical user interface (GUI). It allows users to
execute SQL commands, queries, and statements in real-time, facilitating direct
interaction with the database. Interactive SQL is commonly used by database
administrators, developers, and analysts to perform various tasks related to database
management and querying.

Key Features:

1. Query Execution:
- Users can write and execute SQL queries interactively to retrieve, update, or
manipulate data in the database.

2. Data Modification:
- Interactive SQL enables users to perform data modification operations such as
inserting new records, updating existing records, and deleting records from database
tables.

3. Schema Definition and Modification:


- Users can define and modify the structure of the database using Data Definition
Language (DDL) statements. This includes creating, altering, and dropping database
objects like tables, indexes, and views.

4. Transaction Control:
- Interactive SQL provides commands for managing transactions, allowing users to
commit changes or roll back to a previous state.
5. User and Privilege Management:
- Database administrators can use interactive SQL to create and manage user
accounts, assign privileges, and control access to database objects.

6. Error Handling:
- The system provides feedback and error messages to users, helping them identify
and address issues in their SQL commands.

Examples:

Basic Query:
```sql
SELECT column1, column2 FROM table WHERE condition;
```

Data Modification:
```sql
INSERT INTO table (column1, column2) VALUES (value1, value2);
UPDATE table SET column1 = new_value WHERE condition;
DELETE FROM table WHERE condition;
```

Schema Definition and Modification:


```sql
CREATE TABLE new_table (
id INT PRIMARY KEY,
name VARCHAR(50)
);
ALTER TABLE existing_table ADD COLUMN new_column INT;

DROP TABLE table_to_drop;


```

Transaction Control:
```sql
-- Begin a transaction
BEGIN TRANSACTION;

-- Commit the transaction


COMMIT;

-- Rollback the transaction


ROLLBACK;
```

User and Privilege Management:


```sql
CREATE USER new_user IDENTIFIED BY 'password';
GRANT SELECT, INSERT, UPDATE ON table TO new_user;
REVOKE DELETE ON table FROM new_user;
```

Popular Interactive SQL Tools:

1. SQL Command-Line Interface (CLI):


- Many database systems provide a command-line interface where users can interact
with the database by entering SQL commands directly.

2. Graphical User Interface (GUI) Tools:


- GUI tools like SQL Server Management Studio (SSMS), Oracle SQL Developer, and
MySQL Workbench offer interactive environments with graphical interfaces for
executing SQL queries and managing databases.

3. Web-Based Interfaces:
- Some databases provide web-based interfaces that allow users to interact with the
database through a web browser.

Considerations:

1. Security:
- Access to interactive SQL interfaces should be secured, and users should be
granted appropriate privileges to ensure data security and integrity.

2. Performance:
- Users should consider the performance impact of their queries, especially in
production environments. Techniques like indexing and query optimization can be
applied to enhance performance.

3. Backup and Recovery:


- Regular backups of the database should be performed to facilitate recovery in case
of data loss or corruption.

Interactive SQL is a fundamental tool for managing and querying relational databases.
Whether using a command-line interface or a graphical tool, users can leverage SQL to
interact with the database, retrieve information, and perform various database
operations. It serves as a versatile and powerful means of communication with relational
databases.
3. Computations on table data
Computations on table data in the context of databases involve performing calculations,
aggregations, and transformations on the data stored in tables. This is commonly done
using SQL (Structured Query Language) and various functions to analyze, manipulate,
and derive new information from the existing data. Here are some common types of
computations on table data:

1. Aggregations:
- SUM:
```sql
SELECT SUM(column) FROM table;
```
- AVG (Average):
```sql
SELECT AVG(column) FROM table;
```
- COUNT:
```sql
SELECT COUNT(column) FROM table;
```
- MIN and MAX:
```sql
SELECT MIN(column), MAX(column) FROM table;
```

2. Grouping and Aggregating:


- Grouping data and applying aggregations on groups.
```sql
SELECT category, AVG(price) FROM products GROUP BY category;
```

3. Mathematical Expressions:
- Performing mathematical calculations on columns.
```sql
SELECT column1 + column2 AS sum_result FROM table;
```

4. String Manipulations:
- Performing operations on string data.
```sql
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
```

5. Date and Time Computations:


- Working with date and time data.
```sql
SELECT DATE_ADD(start_date, INTERVAL 7 DAY) AS new_date FROM events;
```

6. CASE Statements:
- Conditional computations.
```sql
SELECT
column1,
CASE
WHEN column2 > 10 THEN 'High'
WHEN column2 > 5 THEN 'Medium'
ELSE 'Low'
END AS priority
FROM table;
```

7. Window Functions:
- Performing computations across a "window" of rows related to the current row.
```sql
SELECT
product_name,
sales,
AVG(sales) OVER (PARTITION BY category) AS avg_category_sales
FROM products;
```

8. Joins and Computations:


- Combining data from multiple tables for computations.
```sql
SELECT
orders.order_id,
customers.customer_name,
orders.order_date
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id;
```

9. Subqueries:
- Using subqueries to perform computations.
```sql
SELECT
employee_id,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;
```

10. Stored Procedures and Functions:


- Complex computations can be encapsulated in stored procedures or functions for
reusability.
```sql
CREATE FUNCTION calculate_tax(price DECIMAL) RETURNS DECIMAL
BEGIN
RETURN price * 0.1; -- Assuming 10% tax
END;
```

Considerations:

- Performance:
- Efficiently compute and aggregate data, especially in large datasets, by optimizing
queries and indexing columns.

- Data Integrity:
- Ensure that computations align with the data types and constraints of the columns to
maintain data integrity.
- Testing:
- Test computations and aggregations on sample data before applying them to the
entire dataset, especially in production environments.

- Documentation:
- Document the logic of complex computations and transformations for future reference
and collaboration.

Computations on table data are fundamental for deriving insights, generating reports,
and supporting decision-making processes in various applications. SQL provides a
robust set of features to perform a wide range of computations on table data efficiently
and effectively.

4. Oracle functions, manipulating Dates in Oracle


Oracle provides a variety of functions for manipulating dates and working with date-
related data. These functions allow you to perform operations such as addition,
subtraction, formatting, and extraction on date and timestamp data types. Here are
some commonly used Oracle date functions:

1. SYSDATE:
- Returns the current date and time of the database server.

```sql
SELECT SYSDATE FROM dual;
```

2. TO_DATE:
- Converts a character string to a date format.

```sql
SELECT TO_DATE('2023-11-05', 'YYYY-MM-DD') FROM dual;
```

3. TO_CHAR:
- Converts a date or timestamp to a character string with a specified format.

```sql
SELECT TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS') FROM dual;
```

4. ADD_MONTHS:
- Adds a specified number of months to a date.

```sql
SELECT ADD_MONTHS(SYSDATE, 3) FROM dual;
```

5. MONTHS_BETWEEN:
- Returns the number of months between two dates.

```sql
SELECT MONTHS_BETWEEN(TO_DATE('2023-12-01', 'YYYY-MM-DD'), SYSDATE)
FROM dual;
```

6. LAST_DAY:
- Returns the last day of the month for a given date.

```sql
SELECT LAST_DAY(SYSDATE) FROM dual;
```

7. TRUNC:
- Truncates a date or timestamp to a specified unit (e.g., day, month, year).

```sql
SELECT TRUNC(SYSDATE, 'MONTH') FROM dual;
```

8. ROUND:
- Rounds a date or timestamp to the nearest unit.

```sql
SELECT ROUND(SYSDATE, 'MONTH') FROM dual;
```

9. EXTRACT:
- Extracts a specific component (e.g., year, month, day) from a date or timestamp.

```sql
SELECT EXTRACT(YEAR FROM SYSDATE) FROM dual;
```

10. NEXT_DAY:
- Returns the date of the first weekday following a specified date.

```sql
SELECT NEXT_DAY(SYSDATE, 'MONDAY') FROM dual;
```

11. TZ_OFFSET:
- Returns the time zone offset for a specified date.

```sql
SELECT TZ_OFFSET(SYSDATE) FROM dual;
```

Example Usage:

```sql
-- Calculate the age of employees
SELECT
employee_name,
EXTRACT(YEAR FROM SYSDATE) - EXTRACT(YEAR FROM birth_date) AS age
FROM employees;

-- Display the current date and time in a specific format


SELECT TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS') AS current_date_time
FROM dual;

-- Find the number of days between two dates


SELECT
start_date,
end_date,
end_date - start_date AS days_difference
FROM events;
```
These are just a few examples of the many date functions available in Oracle. Oracle's
date functions are powerful tools for handling and manipulating date and time
information in a wide range of scenarios, including reporting, analytics, and application
development.

5. Subqueries and Joins


Subqueries and joins are two fundamental concepts in SQL that allow you to retrieve
and manipulate data from multiple tables in a relational database.

Subqueries:

A subquery, also known as a nested query or inner query, is a query embedded within
another query. Subqueries can be used in various parts of a SQL statement, including
the SELECT, FROM, WHERE, and HAVING clauses. The result of a subquery can be
used as a value in the outer query.

Example of Subquery in WHERE Clause:

```sql
-- Find employees whose salary is above the average salary in the department
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id =
1);
```

Example of Subquery in SELECT Clause:

```sql
-- Retrieve the total sales and the percentage of each salesperson's contribution to total
sales
SELECT salesperson_id, total_sales, sales / total_sales * 100 AS
percentage_contribution
FROM (
SELECT salesperson_id, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY salesperson_id
) sales_summary;
```

Joins:

Joins are used to combine rows from two or more tables based on a related column
between them. The common columns used for joining tables are typically primary and
foreign keys. There are several types of joins, including INNER JOIN, LEFT JOIN (or
LEFT OUTER JOIN), RIGHT JOIN (or RIGHT OUTER JOIN), and FULL JOIN (or FULL
OUTER JOIN).

Example of INNER JOIN:

```sql
-- Retrieve information about employees and their departments
SELECT employees.employee_id, employee_name, department_name
FROM employees
INNER JOIN departments ON employees.department_id = departments.department_id;
```

Example of LEFT JOIN:


```sql
-- Retrieve all customers and their orders (including customers with no orders)
SELECT customers.customer_id, customer_name, order_id, order_date
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;
```

Combining Subqueries and Joins:

You can also use subqueries and joins together to create more complex queries,
combining the strengths of both.

Example of Subquery in JOIN:

```sql
-- Retrieve employees who have sales orders
SELECT employee_name
FROM employees
WHERE employee_id IN (SELECT DISTINCT employee_id FROM sales);
```

Example of JOIN with Aggregation:

```sql
-- Retrieve departments with the total number of employees and their average salary
SELECT departments.department_id, department_name, COUNT(employee_id) AS
num_employees, AVG(salary) AS avg_salary
FROM departments
LEFT JOIN employees ON departments.department_id = employees.department_id
GROUP BY departments.department_id, department_name;
```

Understanding how to use subqueries and joins effectively is essential for crafting
powerful and efficient SQL queries when working with relational databases. These
techniques enable you to retrieve and analyze data from multiple tables, providing
flexibility in handling complex data relationships.

6. SQL performance tuning


SQL performance tuning is a crucial aspect of database management, ensuring that
database queries and operations are executed efficiently and in a timely manner.
Optimizing SQL performance involves various strategies and techniques to enhance
query execution, reduce resource usage, and improve overall database
responsiveness. Here are key tips for SQL performance tuning:

1. Use Indexing:
- Indexes speed up data retrieval operations by providing a quick path to relevant
data.
- Identify columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses
for indexing.
- Regularly analyze and rebuild indexes to maintain optimal performance.

2. Optimize Queries:
- Write efficient queries by avoiding unnecessary joins, reducing the number of
columns retrieved, and optimizing WHERE clauses.
- Use appropriate aggregate functions and grouping to minimize data processing.

3. Limit the Use of SELECT *:


- Only retrieve the columns needed for the query to reduce the amount of data
transferred and processed.
- Unnecessary columns contribute to increased disk I/O and network overhead.
4. Avoid SELECT DISTINCT:
- Use SELECT DISTINCT sparingly, as it can be resource-intensive.
- Consider alternative approaches like GROUP BY if distinct values are needed.

5. Use Proper Data Types:


- Choose the most appropriate data types for columns to minimize storage
requirements and improve query performance.
- Avoid using VARCHAR(MAX) when a more specific length is known.

6. Partitioning Tables:
- Partition large tables to distribute data across multiple filegroups, making queries on
subsets of data more efficient.
- Partitioning is particularly useful for historical or time-series data.

7. Update Statistics:
- Regularly update database statistics to help the query optimizer generate efficient
execution plans.
- Outdated statistics can lead to suboptimal query performance.

8. Use Stored Procedures:


- Use stored procedures instead of ad-hoc queries for frequently executed operations.
- Stored procedures are precompiled and cached, reducing processing time.

9. Avoid Cursors:
- Cursors can be resource-intensive and slow. Where possible, use set-based
operations instead of row-by-row processing.
- Consider using alternatives like temp tables or table variables.

10. Review and Optimize Joins:


- Use appropriate join types (INNER JOIN, LEFT JOIN, etc.) based on the
relationships between tables.
- Avoid unnecessary or redundant joins.

11. Memory Configuration:


- Adjust memory settings for the database server to ensure sufficient memory is
allocated for caching and query processing.
- Proper memory configuration can significantly improve query performance.

12. Use Query Execution Plan:


- Analyze query execution plans to identify performance bottlenecks and areas for
improvement.
- Tools like SQL Server Management Studio (SSMS) provide visual representation of
query plans.

13. Monitor and Analyze:


- Regularly monitor database performance using tools like SQL Server Profiler, Oracle
Enterprise Manager, or other database-specific monitoring tools.
- Identify and analyze slow-performing queries for optimization.

14. Consider Denormalization:


- In some cases, denormalizing tables (e.g., creating redundant columns) can improve
query performance by reducing the need for complex joins.
- However, denormalization should be approached cautiously to maintain data
integrity.

15. Use Connection Pooling:


- Implement connection pooling to reuse existing database connections and reduce
the overhead of opening and closing connections for each query.

16. Upgrade Database Software:


- Regularly update to the latest version of your database management system, as
newer versions often include performance improvements and optimizations.

17. Concurrency Control:


- Optimize concurrency control mechanisms, such as isolation levels and locking
strategies, to balance data consistency and performance.

18. Consider Caching:


- Implement caching mechanisms for frequently accessed and read-only data to
reduce the load on the database.

19. Review Disk I/O:


- Optimize disk I/O by distributing data across multiple disks, using RAID
configurations, and regularly defragmenting disks.

20. Load Balancing:


- If applicable, implement load balancing for distributed databases to distribute query
loads across multiple servers.

SQL performance tuning is an ongoing process that requires a combination of database


design best practices, query optimization, and system configuration adjustments.
Regular monitoring, analysis, and optimization efforts are essential for maintaining
optimal performance as database usage evolves over time.

7. Indexes
Indexes play a crucial role in database management systems (DBMS) for improving the
speed of data retrieval operations. An index is a data structure associated with a table
that provides a quick reference to the rows in the table, based on the values of one or
more columns. Here are key points about indexes:
1. Purpose of Indexes:
- Faster Data Retrieval: Indexes speed up the retrieval of rows from a table by
allowing the DBMS to locate the desired rows more quickly.
- Efficient Query Processing: Indexes significantly improve the performance of
SELECT, JOIN, and WHERE clause operations.

2. Types of Indexes:
- Single-Column Index: Created on a single column.
- Composite Index: Created on multiple columns.
- Unique Index: Ensures that the indexed columns have unique values.
- Clustered Index: Determines the physical order of data rows in a table.
- Non-clustered Index: Stores a separate data structure from the actual table.

3. Creating Indexes:
- Syntax:
```sql
CREATE INDEX index_name
ON table_name (column1, column2, ...);
```
- Indexes can be created when defining a table or added later using the `CREATE
INDEX` statement.

4. Choosing Columns for Indexing:


- Identify columns used frequently in WHERE clauses, JOIN conditions, and ORDER
BY clauses.
- Avoid indexing columns with low selectivity (i.e., columns with a large number of
duplicate values).

5. Effect on INSERT, UPDATE, and DELETE Operations:


- While indexes speed up SELECT operations, they can slightly slow down INSERT,
UPDATE, and DELETE operations.
- Maintaining indexes during these operations can impact overall performance.

6. Index Scan vs. Table Scan:


- Index Scan: The DBMS uses the index to locate rows in the table based on the
indexed columns.
- Table Scan: The DBMS scans the entire table to find the required rows.

7. Dropping Indexes:
- Unused or redundant indexes should be dropped to save storage space and reduce
the overhead of maintenance operations.
- Syntax:
```sql
DROP INDEX index_name;
```

8. Clustered vs. Non-Clustered Index:


- Clustered Index: Determines the physical order of rows in the table. There can be
only one clustered index per table.
- Non-Clustered Index: Creates a separate data structure, and the order of rows in the
table is not affected.

9. Indexing Best Practices:


- Regularly analyze and optimize indexes to ensure they align with query patterns.
- Use tools and query execution plans to identify poorly performing indexes.
- Monitor fragmentation and rebuild/reorganize indexes as needed.

10. Covering Index:


- A covering index includes all the columns required for a query, eliminating the need
to access the actual table.
- Improves query performance by minimizing data retrieval from the table.

11. Full-Text Index:


- Full-text indexes are used for searching within the text content of columns.
- Especially useful for large text fields.

12. Spatial Index:


- Used for optimizing queries involving spatial data, such as geographic information
system (GIS) data.

13. Indexing in JOIN Operations:


- Indexing columns involved in JOIN conditions can significantly improve the
performance of JOIN operations.

14. Indexing and Query Optimization:


- Regularly review and optimize queries to ensure they take advantage of existing
indexes.
- Understand how the query optimizer uses indexes to generate execution plans.

Indexes are a vital component of database optimization, and their effective use can lead
to significant improvements in query performance. However, it's crucial to strike a
balance between creating indexes for performance gains and considering the impact on
data modification operations and storage requirements. Regular monitoring, analysis,
and maintenance are key to successful index management.
6.Views, Sequences

A view in a database is a virtual table that is based on the result of a SELECT query. It
does not store the data itself but provides a way to represent the data from one or more
tables in a structured manner. Views offer several advantages, including simplified
query complexity, enhanced security, and abstraction of underlying table structures.
Here are key points about views:

1. Creating a View:
```sql
CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;
```

2. Updating a View:
Views can be updated using the `CREATE OR REPLACE VIEW` statement.
```sql
CREATE OR REPLACE VIEW view_name AS
SELECT new_columns
FROM new_table
WHERE new_condition;
```

3. Dropping a View:
```sql
DROP VIEW view_name;
```
4. Complex Views:
Views can involve multiple tables and include joins, aggregations, and other complex
queries.

5. Security:
- Views can be used to restrict access to specific columns or rows, providing a
security layer.
- Users can be granted permissions on views without giving direct access to the
underlying tables.

6. Materialized Views:
- A materialized view is a type of view that stores the result set physically, allowing for
faster retrieval.
- Materialized views need to be refreshed periodically to synchronize with the
underlying data.

Sequences:

A sequence is a database object in relational database management systems (RDBMS)


that generates a sequence of unique values. Sequences are often used to generate
primary key values in tables. Unlike auto-increment or identity columns, sequences
provide more flexibility and can be used independently of a specific table. Here are key
points about sequences:

1. Creating a Sequence:
```sql
CREATE SEQUENCE sequence_name
START WITH 1
INCREMENT BY 1
MAXVALUE 1000
CYCLE;
```

2. Generating Values:
- To get the next value from a sequence:
```sql
SELECT sequence_name.NEXTVAL FROM dual;
```

- To get the current value without incrementing:


```sql
SELECT sequence_name.CURRVAL FROM dual;
```

3. Setting the Initial Value:


- The `START WITH` clause defines the initial value of the sequence.

4. Increment By:
- The `INCREMENT BY` clause defines the step size for the sequence.

5. Maxvalue and Cycling:


- The `MAXVALUE` clause sets an upper limit for the sequence.
- `CYCLE` specifies whether the sequence should start again from the beginning
(minvalue) after reaching the maxvalue.

6. Altering a Sequence:
- Sequences can be altered to modify parameters like increment, minvalue, maxvalue,
etc.
```sql
ALTER SEQUENCE sequence_name
INCREMENT BY 2
MAXVALUE 2000;
```

7. Dropping a Sequence:
```sql
DROP SEQUENCE sequence_name;
```

8. Usage in Tables:
- Sequences are commonly used to generate unique values for primary key columns
in tables.

```sql
CREATE TABLE example_table (
id NUMBER PRIMARY KEY DEFAULT sequence_name.NEXTVAL,
name VARCHAR2(50)
);
```

Sequences and views are powerful features in a relational database that enhance data
modeling, security, and data generation capabilities. Sequences are particularly useful
for generating unique identifiers, while views provide a flexible way to present and
manipulate data without directly altering underlying tables.
6.Data control permissions
Data control permissions, also known as data access control or data authorization, refer
to the mechanisms and policies put in place to regulate and restrict access to data
within a database system. These controls are crucial for maintaining data security,
ensuring that only authorized users can view, modify, or delete specific pieces of
information. Here are key concepts and mechanisms related to data control
permissions:

1. Role-Based Access Control (RBAC):


- Roles: Users are assigned to specific roles, and roles are granted permissions on
tables, views, or stored procedures.
- Grant and Revoke: Permissions are granted to roles, and users are assigned to
roles. Granting and revoking permissions at the role level simplifies user management.

2. Privileges:
- SELECT, INSERT, UPDATE, DELETE: Common privileges that regulate basic
CRUD operations.
- EXECUTE: Permission to execute stored procedures or functions.
- ALL PRIVILEGES: Grants all available permissions on a specific object.

3. GRANT and REVOKE Statements:


- GRANT: Assigns specific privileges to a user or a role.
```sql
GRANT SELECT, INSERT ON table_name TO user_name;
```

- REVOKE: Removes specific privileges from a user or a role.


```sql
REVOKE INSERT ON table_name FROM user_name;
```
4. Data Ownership:
- Assigning data ownership to specific users or roles allows them to control access to
their own data.
- Owners may have broader access rights to the data they own.

5. Row-Level Security (RLS):


- RLS allows fine-grained control over data access at the row level.
- Users may only see or modify data rows that meet specific criteria defined by
policies.

6. Column-Level Security:
- Restricts access to specific columns within a table.
- Users are granted permission to access certain columns while being restricted from
others.

7. Views for Access Control:


- Views can be used to restrict the columns or rows that users can see, providing an
additional layer of access control.
- Complex queries or aggregations can be hidden behind views.

8. Stored Procedures and Functions:


- Data access can be controlled by encapsulating business logic within stored
procedures or functions.
- Users may only execute these procedures, maintaining control over data
manipulation.

9. Triggers:
- Triggers can be used to automatically enforce data access policies when certain
events occur (e.g., before or after INSERT, UPDATE, DELETE).
- Custom logic can be executed to check and enforce access conditions.
10. Auditing:
- Auditing tools and mechanisms can be employed to track and log data access
activities.
- Audit logs can be reviewed to identify unauthorized access attempts or data
breaches.

11. Data Encryption:


- Encrypting sensitive data adds an extra layer of protection, even if unauthorized
access is gained.
- Encryption ensures that even if data is accessed, it cannot be easily read without the
proper decryption keys.

12. Database Views for Different User Roles:


- Create different views based on user roles to present a customized subset of data to
different user groups.
- Views can filter data based on the user's role or specific criteria.

13. Database Security Policies:


- Define and enforce security policies at the database level to ensure consistent
access control across multiple objects.

14. Dynamic Data Masking:


- Dynamically masks sensitive data based on user roles or permissions.
- Users with lower privileges see masked or truncated versions of sensitive data.

Data control permissions are a critical aspect of database security and are essential for
protecting sensitive information, ensuring data integrity, and complying with regulatory
requirements. Implementing a well-designed data control strategy involves a
combination of role-based access controls, fine-grained access policies, and auditing
mechanisms to provide a robust defense against unauthorized access and data
breaches.

You might also like