Database Management Systems
Database Management Systems
•
A Database Management System (DBMS) is a software system that is designed to
manage and organize data in a structured manner. It allows users to create, modify, and
query a database, as well as manage the security and access controls for that database.
• Data modeling: A DBMS provides tools for creating and modifying data models,
which define the structure and relationships of the data in a database.
• Data storage and retrieval: A DBMS is responsible for storing and retrieving data
from the database, and can provide various methods for searching and querying the
data.
• Concurrency control: A DBMS provides mechanisms for controlling concurrent
access to the database, to ensure that multiple users can access the data without
conflicting with each other.
• Data integrity and security: A DBMS provides tools for enforcing data integrity and
security constraints, such as constraints on the values of data and access controls
that restrict who can access the data.
• Backup and recovery: A DBMS provides mechanisms for backing up and
recovering the data in the event of a system failure.
• DBMS can be classified into two types: Relational Database Management
System (RDBMS) and Non-Relational Database Management System (NoSQL or
Non-SQL)
• RDBMS: Data is organized in the form of tables and each table has a set of rows
and columns. The data are related to each other through primary and foreign keys.
• NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or
column-based. These are designed to handle large-scale, high-performance
scenarios.
Database Languages
• Data Definition Language
• Data Manipulation Language
• Data Control Language
• Transactional Control Language
Data Definition Language
DDL is the short name for Data Definition Language, which deals with database schemas
and descriptions, of how the data should reside in the database.
• CREATE: to create a database and its objects like (table, index, views, store
procedure, function, and triggers)
• ALTER: alters the structure of the existing database
• DROP: delete objects from the database
• TRUNCATE: remove all records from a table, including all spaces allocated for the
records are removed
• COMMENT: add comments to the data dictionary
• RENAME: rename an object
Advantages of DBMS
• Data organization: A DBMS allows for the organization and storage of data in a
structured manner, making it easy to retrieve and query the data as needed.
• Data integrity: A DBMS provides mechanisms for enforcing data integrity
constraints, such as constraints on the values of data and access controls that
restrict who can access the data.
• Concurrent access: A DBMS provides mechanisms for controlling concurrent
access to the database, to ensure that multiple users can access the data without
conflicting with each other.
• Data security: A DBMS provides tools for managing the security of the data, such
as controlling access to the data and encrypting sensitive data.
• Backup and recovery: A DBMS provides mechanisms for backing up and
recovering the data in the event of a system failure.
• Data sharing: A DBMS allows multiple users to access and share the same data,
which can be useful in a collaborative work environment.
Disadvantages of DBMS
• Complexity: DBMS can be complex to set up and maintain, requiring specialized
knowledge and skills.
• Performance overhead: The use of a DBMS can add overhead to the performance
of an application, especially in cases where high levels of concurrency are required.
• Scalability: The use of a DBMS can limit the scalability of an application, since it
requires the use of locking and other synchronization mechanisms to ensure data
consistency.
• Cost: The cost of purchasing, maintaining and upgrading a DBMS can be high,
especially for large or complex systems.
• Limited Use Cases: Not all use cases are suitable for a DBMS, some solutions
don’t need high reliability, consistency or security and may be better served by other
types of data storage.
These are the main reasons which made a shift from file system to DBMS. Also, see
A Database Management System (DBMS) is a software system that allows users to
create, maintain, and manage databases. It is a collection of programs that enables users
to access and manipulate data in a database. A DBMS is used to store, retrieve, and
manipulate data in a way that provides security, privacy, and reliability.
Whether you're preparing for your first job interview or aiming to upskill in this ever-
evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide
top-quality content at affordable prices, all geared towards accelerating your growth in a
time-bound manner. Join the millions we've already empowered, and we're here to do
the same for you.
2. Database (DB)
A database is information that is set up for easy access, management and updating.
Computer databases typically store aggregations of data records or files that contain
information, such as sales transactions, customer data, financials and product
information.
Databases are used for storing, maintaining and accessing any sort of data. They
collect information on people, places or things. That information is gathered in one place
so that it can be observed and analyzed. Databases can be thought of as an organized
collection of information.
Businesses use data stored in databases to make informed business decisions. Some
of the ways organizations use databases include the following:
• Keep track of customers. Databases often store information about people, such as
customers or users. For example, social media platforms use databases to store
user information, such as names, email addresses and user behavior. The data is
used to recommend content to users and improve the user experience.
• Store personal data. Databases can also be used to store personal information.
For example, personal cloud storage is available for individual users to store media,
such as photos, in a managed cloud.
•
Evolution of databases
Databases were first created in the 1960s. These early databases were network models
where each record is related to many primary and secondary records. Hierarchical
databases were also among the early models. They have tree schemas with a root
directory of records linked to several subdirectories.
E.F. Codd created the relational database while at IBM. It became the standard for
database systems because of its logical schema, or the way it is organized. The use of
a logical schema separates the relational database from physical storage.
The relational database, combined with the growth of the internet beginning in the mid-
1990s, led to a proliferation of databases. Many business and consumer applications
rely on databases.
Types of databases
There are many types of databases. They may be classified according to content type:
bibliographic, full text, numeric and images. In computing, databases are often classified
based on the organizational approach they use.
Relational. This tabular approach defines data so it can be reorganized and accessed
in many ways. Relational databases are comprised of tables. Data is placed into
predefined categories in those tables. Each table has columns with at least one data
category, and rows that have a certain data instance for the categories which are
defined in the columns. Information in a relational database about a specific customer is
organized into rows, columns and tables. These are indexed to make it easier to search
using SQL or NoSQL queries.
Relational databases use SQL in their user and application program interfaces. A new
data category can easily be added to a relational database without having to change the
existing applications. A relational database management system (RDBMS) is used to
store, manage, query and retrieve data in a relational database.
Typically, the RDBMS gives users the ability to control read/write access, specify report
generation and analyze use. Some databases offer atomicity, consistency, isolation and
durability, or ACID, compliance to guarantee that data is consistent and that
transactions are complete.
Distributed. This database stores records or files in several physical locations. Data
processing is also spread out and replicated across different parts of the network.
Distributed databases can be homogeneous, where all physical locations have the
same underlying hardware and run the same operating systems and database
applications. They can also be heterogeneous. In those cases, the hardware, OS and
database applications can be different in the various locations.
Cloud. These databases are built in a public, private or hybrid cloud for a virtualized
environment. Users are charged based on how much storage and bandwidth they use.
They also get scalability on demand and high availability. These databases can work
with applications deployed as software as a service.
NoSQL. NoSQL databases are good when dealing with large collections of distributed
data. They can address big data performance issues better than relational databases.
They also do well analyzing large unstructured data sets and data on virtual servers in
the cloud. These databases can also be called non-relational databases.
These databases are often used to analyze interconnections. Graph databases are
often used to analyze data about customers as they interact with a business on
webpages and in social media.
Graph databases use SPARQL, a declarative programming language and protocol, for
analytics. SPARQL can perform all the analytics that SQL can perform, and can also be
used for semantic analysis, or the examination of relationships. This makes it useful for
performing analytics on data sets that have both structured and unstructured data.
SPARQL lets users perform analytics on information stored in a relational database, as
well as friend-of-a-friend relationships, PageRank and shortest path.
While the different types of databases vary in schema, data structure and data types
most suited to them, they are all comprised of the same five basic components.
1. Hardware. This is the physical device that database software runs on. Database
hardware includes computers, servers and hard drives.
3. Data. This is the raw information that the database stores. Database
administrators organize the data to make it more meaningful.
4. Data access language. This is the programming language that controls the
database. The programming language and the DBMS must work together. One of
the most common database languages is SQL.
5. Procedures. These rules determine how the database works and how it handles the
data.
Database challenges
Setting up, operating and maintaining a database has some common challenges, such
as the following:
• Data integrity ensures data is trustworthy. It is not always easy to achieve data
integrity because it means restricting access to databases to only those qualified to
handle it.
APIs connect the user or application to the database management system, which lets
them interact with the database.
Database management system
A DBMS enables users to create and manage a database. It also helps users create,
read, update and delete data in a database, and it assists with logging and auditing
functions.
The DBMS provides physical and logical independence from data. Users and
applications do not need to know either the physical or logical locations of data. A
DBMS can also limit and control access to the database and provide different views of
the same database schema to multiple users.
3. Advantages of DBMS over file processing systems
Database Management Systems (DBMS) offer several advantages over traditional file
processing systems. Here are some key benefits:
1. Data Integrity:
- DBMS enforces data integrity constraints, ensuring that the data remains accurate
and consistent.
- Redundancy is minimized, and relationships between different pieces of data are
maintained.
2. Data Independence:
- Changes in the structure or organization of the database (schema) do not affect the
application programs. This is known as data independence.
- In file processing systems, changes in data structures often require modifying
application programs.
8. Scalability:
- DBMS is designed to handle large volumes of data and users, providing scalability.
- File processing systems may face challenges scaling up as data and user
requirements grow.
Overall, DBMS provides a more organized, secure, and efficient way to manage and
retrieve data compared to traditional file processing systems.
4. Database Languages
In the context of databases, there are several types of languages used for different
purposes. Here are some of the key database languages:
3. Query Language:
- Purpose: A type of DML specifically used for querying databases to retrieve
information.
- Examples: SQL is a widely used query language.
6. Procedural Language:
- Purpose: Allows the creation of stored procedures and functions within the database.
- Examples: PL/SQL (used in Oracle Database), T-SQL (used in Microsoft SQL
Server), and PL/pgSQL (used in PostgreSQL).
These languages play a crucial role in interacting with databases, defining and
manipulating data, controlling access, and ensuring the integrity and security of the
information stored in the database. The choice of language often depends on the type
of database system and the specific requirements of the application.
Database Users
Database users are the ones who really use and take the benefits of the database.
There will be different types of users depending on their needs and way of accessing
the database.
1. Application Programmers – They are the developers who interact with the
database by means of DML queries. These DML queries are written in the
application programs like C, C++, JAVA, Pascal, etc. These queries are
converted into object code to communicate with the database. For example,
writing a C program to generate the report of employees who are working in a
particular department will involve a query to fetch the data from the database. It
will include an embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to
select/insert/delete/update data. They do not use any application or programs to
request the database. They directly interact with the database by means of
a query language like SQL. These users will be scientists, engineers, analysts
who thoroughly study SQL and DBMS to apply the concepts in their
requirements. In short, we can say this category includes designers and
developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special
database application programs. They are the developers who develop the
complex programs to the requirement.
4. Stand-alone Users – These users will have a stand-alone database for their
personal use. These kinds of the database will have readymade database
packages which will have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact
with the database. For example, online library system, ticket booking systems,
ATMs etc which has existing application and users use them to interact with the
database to fulfill their requests.
Database Administrators
The life cycle of a database starts from designing, implementing to the administration of
it. A database for any kind of requirement needs to be designed perfectly so that it
should work without any issues. Once all the design is complete, it needs to be
installed. Once this step is complete, users start using the database. The database
grows as the data grows in the database. When the database becomes huge, its
performance comes down. Also accessing the data from the database becomes a
challenge. There will be unused memory in the database, making the memory inevitably
huge. This administration and maintenance of the database are taken care of by the
database Administrator – DBA.
6. Structure of DBMS
DBMS means Database Management System, which is a tool or software used to create
the database or delete or manipulate the database. A software programme created to
store, retrieve, query, and manage data is known as a Database Management System
(DBMS). Data can be generated, read, updated, and destroyed by authorized entities
thanks to user interfaces (UIs).
Because they give programmers, Database Managers, and end users a consolidated
view of the data, Database Management Systems are crucial because they relieve
applications and end users of the need to comprehend the physical location of the data.
Application Programme Interfaces (APIs) manage internet requests and responses for
particular sorts of data.
In marketing materials, the phrase "database as a service" (DBaaS) may be used to refer
to both relational and non-relational DBMS components that are given via the internet.
Database Administrators are typically the only people who work directly with a DBMS.
Today, end users read and write to databases using front-end interfaces made by
programmers, while programmers use cloud APIs to connect with DBMSs.
o Query Processor
o Storage Manager
o Disk Storage
1. Query Processor
The query processing is handled by the query processor, as the name implies. It executes
the user's query, to put it simply. In this way, the query processor aids the database
system in making data access simple and easy. The query processor's primary duty is to
successfully execute the query. The Query Processor transforms (or interprets) the user's
application program-provided requests into instructions that a computer can understand.
Data Definition Language is what DDL stands for. As implied by the name, the DDL
Interpreter interprets DDL statements like those used in schema definitions (such as
create, remove, etc.). This interpretation yields a set of tables that include the meta-data
(data of data) that is kept in the data dictionary. Metadata may be stored in a data
dictionary. In essence, it is a part of the disc storage that will be covered in a later section
of this article.
o DML Compiler:
Compiler for DML Data Manipulation Language is what DML stands for. In keeping with
its name, the DML Compiler converts DML statements like select, update, and delete into
low-level instructions or simply machine-readable object code, to enable execution. The
optimization of queries is another function of the DML compiler. Since a single question
can typically be translated into a number of evaluation plans. As a result, some
optimization is needed to select the evaluation plan with the lowest cost out of all the
options. This process, known as query optimization, is exclusively carried out by the DML
compiler. Simply put, query optimization determines the most effective technique to carry
out a query.
Before the query evaluation, the embedded DML commands in the application program
(such as SELECT, FROM, etc., in SQL) must be pre-compiled into standard procedural
calls (program instructions that the host language can understand). Therefore, the DML
statements which are embedded in an application program must be converted into routine
calls by the Embedded DML Pre-compiler.
o Query Optimizer:
It starts by taking the evaluation plan for the question, runs it, and then returns the result.
Simply said, the query evaluation engine evaluates the SQL commands used to access
the database's contents before returning the result of the query. In a nutshell, it is in
charge of analyzing the queries and running the object code that the DML Compiler
produces. Apache Drill, Presto, and other Query Evaluation Engines are a few examples.
2. Storage Manager:
An application called Storage Manager acts as a conduit between the queries made and
the data kept in the database. Another name for it is Database Control System. By
applying the restrictions and running the DCL instructions, it keeps the database's
consistency and integrity. It is in charge of retrieving, storing, updating, and removing data
from the database.
o Integrity Manager:
Whenever there is any change in the database, the Integrity manager will manage the
integrity constraints.
o Authorization Manager:
Authorization manager verifies the user that he is valid and authenticated for the specific
query or request.
o File Manager:
All the files and data structure of the database are managed by this component.
o Transaction Manager:
It is responsible for making the database consistent before and after the transactions.
Concurrent processes are generally controlled by this component.
o Buffer Manager:
The transfer of data between primary and main memory and managing the cache memory
is done by the buffer manager.
3. Disk Storage
A DBMS can use various kinds of Data Structures as a part of physical system
implementation in the form of disk storage.
o Data Dictionary:
It contains the metadata (data of data), which means each object of the database has
some information about its structure. So, it creates a repository which contains the details
about the structure of the database object.
o Data Files:
o Indices:
These indices are used to access and retrieve the data in a very fast and efficient way.
2. File Organization:
- Manages the placement of data in files on the disk. It determines how data is stored,
whether in a single file or distributed across multiple files.
3. Data Buffer Management:
- Manages a memory buffer known as the data buffer or cache, which is used to store
a subset of the database. This buffer helps improve data retrieval performance by
caching frequently accessed data in memory.
4. Data Caching:
- The storage manager uses caching techniques to keep frequently accessed data in
the buffer, reducing the need to fetch data from the slower disk storage.
6. Record Management:
- Handles the storage and retrieval of individual records within a file. It must support
operations like inserting, updating, and deleting records.
8. Space Management:
- Manages the allocation and deallocation of space on the storage medium. It ensures
efficient utilization of storage space and may involve tasks like compaction to reclaim
unused space.
9. Concurrency Control:
- Coordinates access to the database by multiple users or transactions to ensure data
consistency. This involves handling situations where multiple users might attempt to
access or modify the same data simultaneously.
10. Transaction Management:
- Ensures the atomicity, consistency, isolation, and durability properties of
transactions. It manages the commit and rollback operations to maintain the integrity of
the database.
In summary, the storage manager is a critical component of a DBMS that handles the
physical aspects of data storage, retrieval, and management. Its efficient functioning is
essential for the overall performance, reliability, and integrity of the database system.
7. Query Processor
1. Query Parsing:
- The query processor first parses the query to ensure that it adheres to the syntax
and grammar rules of the query language (e.g., SQL). It checks for errors and verifies
the query's structure.
2. Query Optimization:
- Once the query is parsed, the query processor performs optimization to enhance the
efficiency of query execution. This involves choosing the most efficient execution plan
from several possible plans. Optimization aims to minimize the response time and
resource usage.
3. Query Rewrite:
- In some cases, the query processor may rewrite the query to improve its execution
efficiency. This can involve transforming the query into an equivalent but more efficient
form.
5. Data Retrieval:
- The query processor generates the instructions necessary for retrieving the data
from the storage manager. This includes specifying the conditions for data retrieval and
the columns to be returned.
8. Handling Transactions:
- If the query involves transactional operations, the query processor coordinates with
the transaction manager to ensure that the transaction follows the principles of
Atomicity, Consistency, Isolation, and Durability (ACID).
9. Error Handling:
- The query processor is responsible for detecting and handling errors during query
processing. This includes syntax errors, semantic errors, and runtime errors.
The query processor acts as an intermediary between the user/application and the
database engine, translating high-level queries into a series of instructions that the
engine can execute to retrieve, manipulate, or store data. The efficiency and
effectiveness of the query processor significantly impact the overall performance of a
database system.
More people than ever use computers today and rely on these devices and the
networks they're attached to for many tasks, from the critical to the frivolous.
Unsurprisingly, the sheer volume of users and the even more significant number of
requests strain servers and networks.
As if these challenges aren’t enough, IT professionals must deal with a constant stream
of new technology that must be implemented into the network. Evolving in this fast-
paced landscape is critical for keeping an organization up to date and
competitive.Simply put, there is no alternative.
Terminology Basics
A Server in the IT world is a remote computer that provides access to data and
services. Servers are usually physical devices such as rack servers, though the rise
of cloud computing has brought virtual servers into the equation. The server handles
processes like e-mail, application hosting, Internet connections, printing, and more.
The client-server architecture refers to a system that hosts, delivers, and manages most
of the resources and services that the client requests. In this model, all requests and
services are delivered over a network, and it is also referred to as the networking
computing model or client server network.
To sum it up briefly:
• Then, the network server accepts and processes the user request
We have already established that technology is constantly evolving and improving, often
at a very rapid pace. As a result, today's businesses increasingly rely on technology,
especially IT, to flourish and stay competitive in an "evolve or die" environment.
Therefore, today's organizations need a system that makes it easy to collect, process,
and act on corporate data, thus increasing business procedure efficiency and ensuring
survivability in the modern world's markets.
The client-server network model brings a higher level of processing that increases the
effectiveness of workstation power, workgroup empowerment, remote network
management, market-driven business, and the preservation of existing investments.
• Client and server machines typically require different hardware and software
resources and come from other vendors.
• The network has horizontal scalability, which increases the number of client
machines and vertical scalability, an then moves the entire process to more powerful
servers or a multi-server configuration.
• One computer server can provide multiple services simultaneously, although each
service requires a separate server program.
• Both client and server applications interact directly with a transport layer protocol.
This process establishes communication and enables the entities to send and
receive information.
• Both the client and server computers need a complete stack of protocols. The
transport protocol employs lower-layer protocols to send and receive individual
messages.
• Email servers: Thanks to ease and speed, e-mail has supplanted traditional postal
mail as the primary form of corporate communication. E-mail servers, aided by
various brands of dedicated software, send and receive e-mails between parties.
• File servers: If you store files on cloud-based services such as Google Docs or
Microsoft Office, you're using a file server. File servers are centralized locations for
file storage and are accessed by many clients.
• Web servers: These high-performance servers host many different websites, and
clients access them through the Internet. Here’s a step-by-step breakdown:
• The client/user uses their web browser to enter the URL they want
• The browser asks the Domain Name System (DNS) for an IP address
• The DNS server finds the desired server’s IP address and sends it to the web
browser
• The client/user receives the files sent by the server, and the process is repeated
as needed
Client-server architecture brings its share of positives and negatives to today’s digital
consumers. We start with a list of advantages:
• It’s a centralized system that keeps all the data and its controls in one place
• It allows the IT staff to change the Client and Server capacities separately
• Users don’t need to log into a terminal or another processor to access corporate
information or desktop tools like PowerPoint presenters or spreadsheet utilities
Naturally, client-server architecture isn’t all sunshine and lollipops. Here are the
downsides:
• If the server has a worm, virus, or Trojan, the users will likely catch it since the
network consists of linked clients and servers
• If a critical server goes down, the clients are dead in the water
• The setup is prone to phishing and Man in the Middle (MITM) attacks
1. User Views:
- Personalized Views: Users may have personalized views tailored to their specific
preferences or roles. For example, an employee in the sales department might have a
view that emphasizes sales-related data.
- Customizable Views: Some systems allow users to customize their views by
selecting specific columns, applying filters, or arranging data in a preferred manner.
2. Virtual Views:
- Virtual Tables: Virtual views are created through the use of queries and do not
physically store data. They are essentially queries that present the results as if they
were tables. Views can be used to simplify complex queries or to present a simplified
version of the data to users.
- Materialized Views: Unlike virtual views, materialized views store the results of a
query physically. They are precomputed and updated periodically to reflect changes in
the underlying data.
3. Application Views:
- Application-specific Views: Views designed for specific applications, where the data
is presented in a way that aligns with the application's requirements. For example, a
customer relationship management (CRM) application might have views tailored for
managing customer interactions.
4. DBMS Views:
- System Catalog Views: These views provide information about the database itself,
including details about tables, columns, indexes, and other database objects. They are
often used by database administrators for maintenance tasks.
- Security Views: Views related to database security, such as information about users,
roles, and permissions.
5. End-User Views:
- Graphical Views: Views that are presented graphically, such as charts, graphs, or
diagrams, to help end-users understand complex relationships in the data.
- Tabular Views: Standard views that present data in a tabular format, similar to a
spreadsheet. These are common in data analysis and reporting.
6. Read-Only Views:
- Read-Only Access Views: Views that restrict users to read-only access, preventing
them from modifying the underlying data. This can be useful for providing information to
users without allowing them to make changes.
7. Joint Views:
- Join Views: Views that combine data from multiple tables using SQL JOIN
operations. These views are useful for simplifying complex queries involving data from
different tables.
8. Temporal Views:
- Temporal Views: Views that represent the data at a specific point in time. This is
particularly important in scenarios where historical data is crucial, such as financial
records or versioned data.
The ability to create and use different views of data enhances the flexibility and usability
of a database system, allowing various users and applications to interact with the data
in a way that suits their specific requirements. It also helps in maintaining data security
by controlling the level of access granted to different users.
The ANSI-SPARC database architecture is the basis of most of the modern databases.
The three levels present in this architecture are Physical level, Conceptual level and
External level.
The details of these levels are as follows −
Physical Level
This is the lowest level in the three level architecture. It is also known as the internal level.
The physical level describes how data is actually stored in the database. In the lowest
level, this data is stored in the external hard drives in the form of bits and at a little high
level, it can be said that the data is stored in files and folders. The physical level also
discusses compression and encryption techniques.
Conceptual Level
The conceptual level is at a higher level than the physical level. It is also known as the
logical level. It describes how the database appears to the users conceptually and the
relationships between various data tables. The conceptual level does not care for how
the data in the database is actually stored.
External Level
This is the highest level in the three level architecture and closest to the user. It is also
known as the view level. The external level only shows the relevant database content to
the users in the form of views and hides the rest of the data. So different users can see
the database as a different view as per their individual requirements.
There are four common types of database model that are useful for different types of
data or information. Depending upon your specific needs, one of these models can be
used.
1. Hierarchical databases.
2. Network databases.
3. Relational databases.
4. Object-oriented databases.
1. Hierarchical databases
It is one of the oldest database model developed by IBM for information Management
System. In a hierarchical database model, the data is organized into a tree-like structure.
In simple language we can say that it is a set of organized data in tree structure.
This type of Database model is rarely used nowadays. Its structure is like a tree with
nodes representing records and branches representing fields. The windows registry used
in Windows XP is an example of a hierarchical database. Configuration settings are
stored as tree structures with nodes.
The following figure shows the generalized the structure of Hierarchical database model
in which data is stored in the form of tree like structure (data represented or stored in root
node, parent node and child node).
The following figure shows the example of hierarchical database model for the university
management system. This type of database employs the “parent-child” relationship of
storing data.
Disadvantages
2. Network databases
This is looks like a Hierarchical database model due to which many time it is called as
modified version of Hierarchical database. Network database model organised data
more like a graph and can have more than one parent node. The network model is a
database model conceived as a flexible way of representing objects and their
relationships.
Advantage
Disadvantage:
• All the records are maintained using pointers and hence the whole database structure
becomes very complex.
• The insertion, deletion and updating operations of any record require the large number
of pointers adjustments.
• The structural changes to the database is very difficult.
3. Relational Database
In relational model, three key terms are heavily used such as relations, attributes, and
domains. A relation nothing but is a table with rows and columns. The named columns of
the relation are called as attributes, and finally the domain is nothing but the set of values
the attributes can take. The following figure gives us the overview of rational database
model.
Keys of a relation-
• Primary key- it is the key that uniquely identifies a table. It doesn’t have null values.
• Foreign key- it refers to the primary key of some other table.it permits only those values
which appear in the primary key of the table to which it refers.
Microsoft SQL Server: Microsoft SQL Server is an RDBMS that supports a wide variety
of transaction processing, business intelligence, and analytics applications in corporate
IT environments.
DB2: DB2 is an RDBMS designed to store, analyze, and retrieve data efficiently.
The following tables shows the sample relational database model for the bank
environment where the data associated with the bank is stored in the form two
dimensional tables.
Advantage
Disadvantages
4. Object-oriented databases
The following figure shows the difference between relation and object-oriented database
model.
The following figure shows an example of object-oriented model.
Advantages
• Object database can handle different types of data while relational data base handles
a single data. Unlike traditional databases like hierarchical, network or relational, the
object-oriented databases can handle the different types of data, for example, pictures,
voice video, including text, numbers and so on.
• Object-oriented databases provide us code reusability, real world modelling, and
improved reliability and flexibility.
• The object-oriented database is having low maintenance costs as compared to other
model because most of the tasks within the system are encapsulated, they may be
reused and incorporated into new tasks.
Disadvantages
• There is no universally defined data model for an OODBMS, and most models lack a
theoretical foundation.
• In comparison to RDBMSs the use of OODBMS is still relatively limited.
• There is a Lack of support for security in OODBMSs that do not provide adequate
security mechanisms.
• The system more complex than that of traditional DBMSs.
In the context of database design and the Entity-Relationship (ER) model, attributes and
entity sets are fundamental concepts used to model and represent information about the
real-world domain.
1. Entity Set:
- An entity set is a collection or set of entities that share common characteristics or
properties.
- An entity is an object or concept in the real world that is distinguishable from other
objects. Entities can have attributes that describe various aspects of the entity.
- For example, in a university database, the "Student" entity set may include individual
students, and each student has attributes like "StudentID," "Name," and "Date of Birth."
2. Attributes:
- An attribute is a property or characteristic of an entity set that describes some aspect
of the entity.
- Attributes provide more detailed information about the entity within an entity set.
- Each attribute has a domain, which is the set of permissible values for that attribute.
For example, the "Color" attribute of a "Car" entity might have a domain of {Red, Blue,
Green}.
- Attributes can be categorized into different types:
- Simple Attribute: An attribute that cannot be divided further. For example, "Age" or
"Name."
- Composite Attribute: An attribute that can be divided into sub-parts that represent
more basic attributes with independent meanings. For example, "Address" may be a
composite attribute composed of "Street," "City," and "Zip Code."
- Derived Attribute: An attribute whose values can be derived from other attributes in
the database. For example, "Age" might be a derived attribute if the birthdate is known.
- Multi-valued Attribute: An attribute that can hold multiple values. For example,
"Phone Numbers" for a person, which may have multiple phone numbers.
In ER modeling, entities are typically depicted using rectangles, and attributes are
represented within ellipses connected to the respective entity. Relationships between
entities are shown using diamond shapes.
For example:
```
Student (Entity)
- StudentID (Attribute)
- Name (Attribute)
- Date of Birth (Attribute)
```
In this example, "Student" is the entity set, and "StudentID," "Name," and "Date of Birth"
are attributes associated with the "Student" entity set.
Understanding the distinctions between entity sets and attributes is crucial when
designing a database schema, as it helps in accurately representing the structure and
characteristics of the data in the real-world domain being modeled.
12. Relationship and Relationship sets
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship
too can have attributes. These attributes are called descriptive attributes.
Mapping Cardinalities:
express the number of entities to which another entity can be associated via a
relationship. For binary relationship sets between entity sets A and B, the mapping
cardinality must be one of:
One-to-one: An entity in A is associated with at most one entity in B, and an entity in B
is associated with at most one entity in A.
The appropriate mapping cardinality for a particular relationship set depends on the real
world being modeled.
The number of entity sets that participate in a relationship set is termed as the degree of
that relationship set. Thus,
Degree
On the basis of degree of a relationship set, a relationship set can be classified into the
following types-
Binary relationship set is a relationship set where two entity sets participate in a
relationship set.
Ternary relationship set is a relationship set where three entity sets participate in a
relationship set.
Participation Constraints
Total participation of an entity set represents that each entity in entity set must have at
least one relationship in a relationship set. It is also called mandatory participation. Total
participation is represented using a double line between the entity set and relationship
set.
Partial participation of an entity set represents that each entity in the entity set may or
may not participate in the relationship instance in that relationship set. It is also called as
optional participation. Partial participation is represented using a single line between the
entity set and relationship set.
4. Check Constraint:
- Defines a condition that must be true for data to be inserted or updated in a table.
- Example: A "Salary" column may have a check constraint to ensure that the salary is
greater than or equal to zero.
5. Unique Constraint:
- Ensures that values in a particular column or a combination of columns are unique
across rows in a table.
- Example: A "Username" column may have a unique constraint to ensure that each
user has a distinct username.
6. Default Constraint:
- Provides a default value for a column when no value is specified during insertion.
- Example: A "DateCreated" column may have a default constraint to set the current
date and time if no date is explicitly provided.
8. Key Constraint:
- Enforces uniqueness, either as a primary key or a unique constraint on one or more
columns.
- Example: A combination of "FirstName" and "LastName" columns may have a key
constraint to ensure unique combinations.
It is a type of flowchart that illustrates how “entities” such as people, objects or concepts
relate to each other within a system. ER Diagrams are most often used to design or
debug relational databases in the fields of software engineering, business information
systems, education and research. Also known as ERDs or ER Models, they use a
defined set of symbols such as rectangles, diamonds, ovals and connecting lines to
depict the interconnectedness of entities, relationships and their attributes. They mirror
grammatical structure, with entities as nouns and relationships as verbs.
ER diagrams are related to data structure diagrams (DSDs), which focus on the
relationships of elements within entities instead of relationships between entities
themselves. ER diagrams also are often used in conjunction with data flow diagrams
(DFDs), which map out the flow of information for processes or systems.
➢ History of ER models
Peter Chen (a.k.a. Peter Pin-Shan Chen), currently a faculty member at Carnegie-
Mellon University in Pittsburgh, is credited with developing ER modeling for database
design in the 1970s. While serving as an assistant professor at MIT’s Sloan School of
Management, he published a seminal paper in 1976 titled “The Entity-Relationship
Model: Toward a Unified View of Data.”
In a broader sense, the depiction of the interconnectedness of things dates back to least
ancient Greece, with the works of Aristotle, Socrates and Plato. It’s seen more recently
in the 19th and 20th Century works of philosopher-logicians like Charles Sanders
Peirce and Gottlob Frege.
By the 1960s and 1970s, Charles Bachman (above) and A.P.G. Brown were working
with close predecessors of Chen’s approach. Bachman developed a type of Data
Structure Diagram, named after him as the Bachman Diagram. Brown published works
on real-world systems modeling. James Martin added ERD refinements. The work of
Chen, Bachman, Brown, Martin and others also contributed to the development of
Unified Modeling Language (UML), widely used in software design.
• Database design: ER diagrams are used to model and design relational databases, in
terms of logic and business rules (in a logical data model) and in terms of the specific
technology to be implemented (in a physical data model.) In software engineering, an
ER diagram is often an initial step in determining requirements for an information
systems project. It’s also later used to model a particular database or databases. A
relational database has an equivalent relational table and can potentially be
expressed that way as needed.
• Business information systems: The diagrams are used to design or analyze relational
databases used in business processes. Any business process that uses fielded data
involving entities, actions and interplay can potentially benefit from a relational
database. It can streamline processes, uncover information more easily and improve
results.
• Business process re-engineering (BPR): ER diagrams help in analyzing databases
used in business process re-engineering and in modeling a new database setup.
• Research: Since so much research focuses on structured data, ER diagrams can play
a key role in setting up useful databases to analyze the data.
ER Diagrams are composed of entities, relationships and attributes. They also depict
cardinality, which defines relationships in terms of numbers. Here’s a glossary:
Entity
Entity type: A group of definable things, such as students or athletes, whereas the entity
would be the specific student or athlete. Other examples: customers, cars or products.
Entity set: Same as an entity type, but defined at a particular point in time, such as
students enrolled in a class on the first day. Other examples: Customers who purchased
last month, cars currently registered in Florida. A related term is instance, in which the
specific person or car would be an instance of the entity set.
Entity
keys: Refers to an attribute that uniquely defines an entity in an entity set. Entity keys
can be super, candidate or primary. Super key: A set of attributes (one or more) that
together define an entity in an entity set. Candidate key: A minimal super key, meaning
it has the least possible number of attributes to still be a super key. An entity set may
have more than one candidate key. Primary key: A candidate key chosen by the
database designer to uniquely identify the entity set. Foreign key: Identifies the
relationship between entities.
Relationship
How entities act upon each other or are associated with each other. Think of
relationships as verbs. For example, the named student might register for a course. The
two entities would be the student and the course, and the relationship depicted is the
act of enrolling, connecting the two entities in that way. Relationships are typically
shown as diamonds or labels directly on the connecting lines.
Recursive relationship: The same entity participates more than once in the relationship.
Attribute
Single-value: Just one attribute value. The types can be combined, such as: simple
single-value attributes or composite multi-value attributes.
Cardinality
Defines the numerical attributes of the relationship between two entities or entity sets.
The three main cardinal relationships are one-to-one, one-to-many, and many-many.
A one-to-one example would be one student associated with one mailing address.
A one-to-many example (or many-to-one, depending on the relationship direction): One
student registers for multiple courses, but all those courses have a single line back to
that one student. Many-to-many example: Students as a group are associated with
multiple faculty members, and faculty members in turn are associated with multiple
students.
ER components can be equated to parts of speech, as Peter Chen did. This shows how
an ER Diagram compares to a grammar diagram:
The database query language ERROL actually mimics natural language constructs.
ERROL is based on reshaped relational algebra (RRA) and works with ER models,
capturing their linguistic aspects.
UNIT-2
Relational model can represent as a table with columns and rows. Each row is known as
a tuple. Each table of the column has a name or attribute.
Attribute: It contains the name of a column in a particular table. Each attribute Ai must
have a domain, dom(Ai)
Relational schema: A relational schema contains the name of the relation and name of
all columns or attributes.
Relational key: In the relational key, each row has one or more attributes. It can identify
o In the given table, NAME, ROLL_NO, PHONE_NO, ADDRESS, and AGE are the
attributes.
o The instance of schema STUDENT has 5 tuples.
o t3 = <Laxman, 33289, 8583287182, Gurugram, 20>
Properties of Relations
The relational model is a conceptual framework for organizing and representing data in a
relational database. Proposed by Edgar Codd in 1970, it is based on the principles of
mathematical set theory and predicate logic. The relational model uses tables (relations)
to represent data, with each table consisting of rows (tuples) and columns (attributes).
The relationships between tables are established through keys.
Candidate Keys:
1. Key:
- A key is a set of one or more attributes that can uniquely identify a tuple (row) in a
relation (table).
- Keys play a vital role in establishing relationships between tables and enforcing data
integrity.
2. Super Key:
- A super key is a set of attributes that, taken collectively, can uniquely identify a tuple.
3. Candidate Key:
- A candidate key is a minimal super key, meaning it is a super key without any
unnecessary attributes.
- Each candidate key uniquely identifies tuples, and any subset of a candidate key
should not have the uniqueness property.
4. Primary Key:
- It is used to establish relationships between tables and must be unique for each tuple,
with no null values.
5. Alternate Key:
- An alternate key is a candidate key that is not chosen as the primary key.
6. Composite Key:
7. Unique Key:
- Ensures that the primary key attribute in a relation cannot have a null value.
- The foreign key in one table must match a primary key in another table.
- Ensures that data entered into a column falls within a specified range or set of values.
4. Check Constraint:
- Defines a condition that must be true for data to be inserted or updated in a table.
- Provides a way to enforce business rules on the data.
5. Default Constraint:
- Provides a default value for a column when no value is specified during insertion.
- Requires that every row must have a value in the specified column.
For example, when inserting a new tuple into a table, the DBMS checks whether the
primary key constraint is satisfied, ensuring that the primary key value is unique and not
null. If a constraint is violated, the DBMS will reject the operation and maintain the
consistency and integrity of the data.
In summary, the relational model relies on keys and integrity constraints to ensure data
accuracy, consistency, and reliability. Candidate keys play a crucial role in uniquely
identifying tuples, and integrity constraints provide a set of rules that guide the insertion,
updating, and deletion of data in a relational database. The DBMS enforces these
constraints to maintain the integrity of the data throughout its lifecycle.
Relational algebra is a theoretical query language that forms the basis of the SQL
language used in relational databases. It consists of a set of operations that can be
performed on relations (tables) to retrieve or manipulate data. Here are some
fundamental operations in relational algebra:
1. Selection (σ):
- Notation: σ<sub>condition</sub>(Relation)
2. Projection (π):
- Notation: π<sub>attributes</sub>(Relation)
3. Set Operations:
- Union (∪): Combines two relations and removes duplicate tuples.
- Difference (−): Retrieves tuples from the first relation that are not present in the
second.
4. Renaming (ρ):
- Notation: ρ<sub>newName(attribute)</sub>(Relation)
- Example: ρ<sub>NewSalary(Salary)</sub>(Employees)
5. Joins (⨝):
6. Division (÷):
- Purpose: Retrieves tuples from one relation that, when combined with another relation,
cover the entire set of attributes.
- Notation:
- COUNT<sub>attribute</sub>(Relation)
- SUM<sub>attribute</sub>(Relation)
- AVG<sub>attribute</sub>(Relation)
- MAX<sub>attribute</sub>(Relation)
- MIN<sub>attribute</sub>(Relation)
Example Query:
- π<sub>DepartmentID, COUNT(EmployeeID)</sub>(Employees)
⨝<sub>DepartmentID</sub> Departments
- Calculate the average salary for each department:
- π<sub>DepartmentID, AVG(Salary)</sub>(Employees)
⨝<sub>DepartmentID</sub> Departments
Relational algebra provides a theoretical foundation for query languages used in relational
databases, and understanding these operations is crucial for designing and querying
databases effectively.
UNIT-3
Functional Dependency:
For example, if we have attributes A and B in a relation, and every value of A uniquely
determines the value of B, we write it as A → B.
Key Points:
Normal Forms:
- Remove partial dependencies by putting the fields that do not depend on the primary
key into a separate table.
- Remove transitive dependencies by putting the fields that do not depend on the
primary key into a separate table.
4. Boyce-Codd Normal Form (BCNF):
Example:
- EmployeeID → EmployeeName
- EmployeeID → Department
- Department → DepartmentLocation
In this case, the relation is not in 3NF because of the transitive dependency. To normalize
it, you might create separate tables for Employee and Department, eliminating the
transitive dependency.
Normalization helps ensure that the data is organized efficiently and that updates, inserts,
and deletes can be performed without introducing inconsistencies.
The need for normalization and decompositions in the context of database design arises
from the desire to organize data efficiently, avoid data redundancy, and ensure data
integrity. Here are the primary reasons why normalization and decompositions are
essential:
5. Scalability:
- Issue: Poorly organized data structures can hinder the scalability of a database.
- Solution: Normalization enables the creation of modular and scalable database
structures. Well-organized tables make it easier to extend the database schema without
introducing complexities.
6. Supporting Maintainability:
- Issue: Poorly designed databases can be difficult to maintain and modify.
- Solution: Decomposing tables through normalization simplifies the structure, making
it easier to understand and maintain. Changes to the database schema become more
manageable.
7. Enabling Flexibility and Adaptability:
- Issue: Rigidity in database structures can impede the adaptability of a system to
changing requirements.
- Solution: Normalization allows for a more flexible database design. Changes and
adaptations can be made to individual components without affecting the entire system.
Decompositions:
Decompositions involve breaking down larger tables into smaller, well-organized tables.
This process is crucial for achieving normalization. Each decomposition brings a table to
a higher normal form, addressing specific issues related to data redundancy and
anomalies.
2. Minimizing Redundancy:
- Decomposition aims to minimize redundancy in the database by breaking down
larger relations into smaller ones. Redundancy can lead to data inconsistencies and
increased storage requirements.
3. Facilitating Maintenance:
- Smaller, well-structured relations are often easier to maintain. Changes to data or
schema modifications can be more straightforward when relations are appropriately
decomposed.
3. Decomposition
5. Perform Decomposition:
- Based on the chosen strategy, decompose the relation into smaller tables. The
resulting tables should be in a normalized form, and each table should have a clear
purpose.
7. Establish Relationships:
- Establish relationships between the decomposed tables. Foreign keys in some
tables should reference primary keys in other tables to maintain referential integrity.
8. Verify and Optimize:
- Verify that the decomposition meets the desired normal form and that it doesn't
introduce anomalies. Optimize the structure if necessary.
Example:
5. Perform Decomposition:
- Decompose the "Employee" relation into smaller tables, such as "EmployeeInfo"
(EmployeeID, EmployeeName) and "DepartmentInfo" (Department,
DepartmentLocation).
6. Create New Tables:
- Create the tables "EmployeeInfo" and "DepartmentInfo" with appropriate primary
keys.
7. Establish Relationships:
- Establish a relationship between "EmployeeInfo" and "DepartmentInfo" using the
common attribute "Department."
In the context of database design and normalization, a minimal cover, often referred to
as a minimal cover of functional dependencies, is a set of functional dependencies that
is both irreducible and still implies all the dependencies of the original set. The goal is to
represent the dependencies in a minimal and concise way without losing any
information.
Here are the steps to find the minimal cover of a set of functional dependencies:
Example:
The resulting minimal cover is a reduced set of dependencies that still implies the same
closure as the original set. It is a more concise and minimal representation of the
functional dependencies, making it easier to work with during database design and
normalization processes.
5. Attribute closure
The attribute closure is a concept in database theory, specifically in the context of
functional dependencies. The closure of a set of attributes with respect to a set of
functional dependencies represents the complete set of attributes that can be
functionally determined by that original set.
Given a set of attributes \(X\) and a set of functional dependencies \(F\), the closure of
\(X\), denoted as \(X^+\), is the set of all attributes that are functionally determined by
\(X\) according to the dependencies in \(F\).
The process of finding the attribute closure involves repeatedly applying the given
functional dependencies to the set of attributes until no new attributes can be added.
The closure is determined by the rules of functional dependency closure.
1. Reflexivity:
- \(Y \subseteq X^+\) for any \(Y \subseteq X\).
2. Augmentation:
- If \(X^+ \rightarrow Y\) and \(Z \subseteq Y\), then \(X^+ \rightarrow Z\).
3. Transitivity:
- If \(X^+ \rightarrow Y\) and \(Y^+ \rightarrow Z\), then \(X^+ \rightarrow Z\).
Example:
1. Reflexivity:
- \(X^+ = \{A, B\}\).
2. Augmentation:
- \(AB \rightarrow C\) implies \(X^+ = \{A, B, C\}\).
3. Transitivity:
- \(BC \rightarrow D\) implies \(X^+ = \{A, B, C, D\}\).
Result:
For \(X = \{A, B\}\) and \(F = \{AB \rightarrow C, BC \rightarrow D, D \rightarrow A\}\), the
attribute closure \(X^+\) is \(\{A, B, C, D\}\).
Attribute closure is a fundamental concept used in database normalization, particularly
in determining the candidate keys, super keys, and the normalization process. It helps
ensure that a relational database is organized to minimize redundancy and maintain
data integrity.
Let's go through the attribute closure process and the normalization levels (First Normal
Form, Second Normal Form, and Third Normal Form) with examples:
Attribute Closure:
Given a set of attributes \(X\) and a set of functional dependencies \(F\), the closure of
\(X\), denoted as \(X^+\), is the set of all attributes that can be functionally determined
by \(X\) based on the dependencies in \(F\).
Example:
1. Reflexivity:
- \(X^+ = \{A\}\).
2. Augmentation:
- \(A \rightarrow B\) implies \(X^+ = \{A, B\}\).
3. Transitivity:
- \(D \rightarrow A\) implies \(X^+ = \{A, B, D\}\).
A relation is in 1NF if it has a primary key and all attributes are atomic.
A relation is in 2NF if it is in 1NF and all non-prime attributes are fully functionally
dependent on the primary key.
Example:
Conclusion:
Multi-valued dependencies (MVDs) are a concept in database theory that extends the
idea of functional dependencies. While functional dependencies describe relationships
between attributes in a table, multi-valued dependencies involve relationships between
sets of attributes. Multi-valued dependencies are used to ensure that certain types of data
anomalies are avoided in a relational database.
Let's consider a relation \(R\) with attributes \(A\), \(B\), and \(C\). A multi-valued
dependency from \(A\) to \(B\) in \(R\), denoted as \(A \xrightarrow{} \xleftarrow{} B\),
specifies that for every value of \(A\) there is a set of values for \(B\) that is independent
of the other attributes in \(R\).
In simpler terms, if \(t_1\) and \(t_2\) are tuples in \(R\) with the same values for \(A\), then
for any tuple \(t_3\) in \(R\) with the same \(A\) value, there is a tuple \(t_4\) such that
\(t_1[B] = t_3[B]\) and \(t_2[B] = t_4[B]\).
Example:
Consider a relation \(R(A, B, C)\) with the following multi-valued dependency: \(A
\xrightarrow{} \xleftarrow{} B\).
This dependency implies that for every value of \(A\), there is a set of values for \(B\) that
is independent of \(C\).
```
|A|B|C|
|---|---|---|
|a|1|x|
|b|2|y|
|a|3|z|
|b|4|w|
```
In this example, for \(A = a\), the set of values for \(B\) is {1, 3}, and for \(A = b\), the set
of values for \(B\) is {2, 4}.
A relation is in Fourth Normal Form (4NF) if it is in Third Normal Form (3NF) and has no
non-trivial multi-valued dependencies.
normal form just like the other normal forms brings in features that organize the data
and eliminates anomalies and redundancy. Each normalization has a set of rules that
should be followed in creating the database. And in order to Based on the type
of normalization that is being wished to the larger tables are divided into smaller ones.
That is referred to as the decomposition of tables. A relation should satisfy the below
1. It should be in BCNF.
A table which is in 4NF will by default satisfy all the conditions of the previous normal
• 1NF
• 2NF
• 3NF
• BCNF
at least two attributes that depend on the third one. And those two attributes are
2. For every dependency A-> B, for every value of A multiple values of B exists then
All the above conditions should be satisfied to establish the fact that multi-valued
Example 1
Here we have a Students table which has information about the subjects and the activity
the students have enrolled in. Subjects_Enrolled and Activty_Enrolled are the two
attributes that are independent of each other. The given table is in BCNF which satisfies
relation.
Activity_Enrolled exist.
As we checked the above conditions it is clear that the relation consists of multi-valued
dependency. In order to normalize the table into 4NF, we need to decompose it into
smaller tables.
Student relation is now been decomposed into smaller tables S1 and S2 which are in
4NF.
S1:
\
S2:
Example 2
Here Course is a relation with the below dependencies.
for TextBook_Author. But instructors and textbooks are not related. Thus the table has a
multi-valued dependency and does not satisfy 4NF. To solve this, the decomposition of
the table is required. The table will be split into two tables where one table will preserve
the Course -> Instructor and the second table will preserve Course -> TextBook_Author.
restricted constraints.
Conclusion
By default, relation in 4NF will satisfy all the properties of 1NF, 2NF, 3NF, and BCNF. A
table needs to be in BCNF and have no multi-valued dependency. It will ensure data
integrity and consistency. These particular set of rules which constitutes the basis of
normalization is an update, delete and insert anomalies does not occur in databases.
Transaction Management:
1. Atomicity:
- Ensures that a transaction is treated as a single, indivisible unit of work. Either all
operations within the transaction are committed, or none of them are.
2. Consistency:
- Ensures that a transaction brings the database from one consistent state to another.
The integrity constraints of the database should be preserved.
3. Isolation:
- Ensures that the execution of one transaction is isolated from the execution of other
transactions. Transactions appear to be executed in isolation, even though they may be
executed concurrently.
4. Durability:
- Ensures that once a transaction is committed, its effects on the database are
permanent and survive subsequent failures.
Concurrency Control:
2. Timestamp Ordering:
- Assigns a unique timestamp to each transaction and uses these timestamps to order
the execution of transactions. Transactions with higher timestamps are given priority.
1. Deadlocks:
- Situations where two or more transactions are unable to proceed because each is
waiting for the other to release a lock.
2. Starvation:
- A transaction is unable to proceed because other transactions continually acquire
the required locks.
3. Concurrency Anomalies:
- Undesirable phenomena such as lost updates, uncommitted data, and inconsistent
retrievals that may occur due to concurrent execution.
9. Transaction States
In a database system, a transaction goes through various states during its lifecycle. The
management of these states is essential to ensure the reliability, consistency, and
integrity of the database. Here are the common states of a transaction:
1. Active State:
- The transaction is in the process of being executed. It starts in the active state when
the first operation of the transaction is executed.
3. Committed State:
- The transaction has been successfully completed, and all its changes have been
permanently saved to the database. It is now in a committed state.
4. Failed State:
- The transaction has encountered an error or exception during its execution, making
it unable to proceed further. It may have partially executed, but it cannot be completed.
5. Aborted State:
- The transaction has encountered an error or has been intentionally rolled back to its
previous state. Any changes made by the transaction are undone, and the database is
restored to its state before the transaction began.
6. Terminated State:
- The transaction has completed its execution, either by committing or aborting. It is
no longer active, and its outcome (commit or abort) is final.
1. Start:
- The transaction moves from a non-executing state (not started) to the active state
when its first operation is executed.
2. Execution:
- The transaction performs a sequence of read and write operations on the database
while in the active state.
4. Partially Committed:
- If the decision is to commit, the transaction enters the partially committed state. It is
here that the changes are about to be made permanent.
5. Committed:
- If the commit operation is successful, the transaction enters the committed state, and
its changes become permanent in the database.
6. Aborted:
- If the decision is to abort or if an error occurs, the transaction enters the aborted
state. Any changes made by the transaction are rolled back, and the database is
restored.
7. Terminated:
- After committing or aborting, the transaction enters the terminated state. It is no
longer active and has completed its lifecycle.
Transaction Management:
Transaction management systems (TMS) within a database system are responsible for
coordinating these state transitions, ensuring the integrity of the database, and handling
exceptional situations. The mechanisms involved in transaction management include
logging, recovery, and concurrency control. These mechanisms collectively contribute to
the ACID properties (Atomicity, Consistency, Isolation, Durability) of transactions in a
database system.
In the context of database systems, transactions and schedules play crucial roles in
ensuring the consistency and integrity of data. Let's explore these concepts:
Transactions:
A transaction is a sequence of one or more operations that are executed as a single unit
of work. Transactions ensure the integrity of the database by following the ACID
properties:
1. Atomicity:
- A transaction is atomic, meaning that either all its operations are executed
successfully, or none of them are. There is no partial completion.
2. Consistency:
- A transaction brings the database from one consistent state to another. The integrity
constraints of the database are preserved.
3. Isolation:
- The execution of one transaction is isolated from the execution of other transactions.
Each transaction appears to be executed in isolation, even though they may be
executed concurrently.
4. Durability:
- Once a transaction is committed, its effects on the database are permanent and
survive subsequent failures.
Schedules:
1. Serial Schedule:
- Transactions are executed one after the other, with no overlapping in time. While
simple, this approach may not fully utilize the capabilities of a multi-user system.
2. Concurrent Schedule:
- Multiple transactions are executed concurrently, with their operations interleaved in
time. Concurrency allows for better utilization of system resources but introduces
challenges related to data consistency and isolation.
Serializable Schedules:
Serializable schedules are schedules that produce the same results as if the
transactions were executed in some serial order. Ensuring serializability is crucial for
maintaining consistency and avoiding data anomalies.
1. Locking:
- Transactions acquire locks on data items before accessing them. Locks prevent
other transactions from accessing the same data simultaneously, ensuring serialized
access.
2. Timestamp Ordering:
- Transactions are assigned unique timestamps, and the system ensures that
operations are executed in timestamp order. Older transactions are given priority.
Example:
Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```
Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```
A schedule that interleaves the operations of T1 and T2 could look like this:
```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```
Ensuring that this concurrent schedule is equivalent to some serial order is important for
maintaining consistency.
```
(T1) (T2)
Read(A)
Read(B)
Write(B)
Write(A)
Commit
Commit
```
Here, T1 completes its operations before T2 begins, ensuring serializability.
Isolation Levels:
Isolation levels define the degree to which the operations of one transaction are isolated
from the operations of other concurrently executing transactions. The common isolation
levels include:
1. Read Uncommitted:
- Allows transactions to read uncommitted changes made by other transactions.
2. Read Committed:
- Ensures that transactions can only read committed changes made by other
transactions.
3. Repeatable Read:
- Ensures that once a transaction reads a value, it will always see that same value,
even if it is updated by other transactions.
4. Serializable:
- Ensures the highest level of isolation by preventing all types of anomalies, providing
a serializable execution.
Challenges and Concerns:
1. Deadlocks:
- Situations where two or more transactions are blocked, each waiting for the other to
release a lock.
2. Starvation:
- Occurs when a transaction is unable to proceed because other transactions
continually acquire the required locks.
3. Concurrency Anomalies:
- Undesirable phenomena such as lost updates, uncommitted data, and inconsistent
retrievals that may occur due to concurrent execution.
Example:
Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```
Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```
```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```
Managing conflicts and ensuring that the final result is equivalent to some serial order is
crucial.
Conflict Serializability:
A schedule is conflict serializable if there is no cycle in the precedence graph, where the
nodes represent transactions, and directed edges represent conflicts. If a schedule is
conflict serializable, it is guaranteed to be serializable.
Recoverability:
Recoverability Criteria:
1. Cascadeless (Strict) Recoverability:
- A transaction cannot be committed if it has uncommitted dependencies (reads or
writes) on another transaction that was subsequently aborted.
- This prevents the propagation of uncommitted changes.
2. Recoverable Schedule:
- A schedule is recoverable if, for every pair of transactions T1 and T2, if T2 reads a
data item written by T1, then T1 commits before T2 starts.
Example:
Transaction 1 (T1):
```
1. Read(A)
2. Write(B)
3. Commit
```
Transaction 2 (T2):
```
1. Read(B)
2. Write(A)
3. Commit
```
The schedule:
```
Read(A) (T1)
Read(B) (T2)
Write(B) (T1)
Write(A) (T2)
Commit (T1)
Commit (T2)
```
This schedule is not conflict serializable because it contains a cycle in the precedence
graph (T1 -> T2 -> T1).
For recoverability, we need to ensure that if T2 reads a data item written by T1, then T1
commits before T2 starts. In this case, the schedule is recoverable because T1 commits
before T2 starts.
Summary:
- Serializability:
- Ensures that the execution of transactions in a schedule is equivalent to some serial
order.
- Conflict serializability ensures that there are no conflicts between transactions that
form cycles in the precedence graph.
- Recoverability:
- Ensures the ability to recover a database to a consistent state after a failure.
- Cascadeless recoverability prevents the propagation of uncommitted changes.
- Recoverable schedules ensure that transactions reading data written by other
transactions see committed changes.
Ensuring both serializability and recoverability is crucial for maintaining the integrity and
reliability of a database system, especially in the presence of concurrent transactions
and potential failures.
Key Concepts:
1. Lock:
- A lock is a mechanism that restricts access to a resource, such as a data item, to a
single transaction at a time.
- Locks are used to prevent conflicts and maintain the consistency of the database.
2. Lock Types:
- Shared Lock (Read Lock):
- Allows multiple transactions to read a resource concurrently but prevents any
transaction from writing to it.
- Exclusive Lock (Write Lock):
- Grants exclusive access to a single transaction, preventing any other transactions
from reading or writing to the resource.
3. Granularity of Locks:
- Row-Level Locks:
- Lock specific rows in a table.
- Page-Level Locks:
- Lock entire pages of data.
- Table-Level Locks:
- Lock entire tables.
5. Deadlocks:
- A deadlock occurs when two or more transactions are blocked, each waiting for a
lock held by the other.
- Detection and resolution mechanisms are needed to address deadlocks.
1. Lock Request:
- A transaction requests a lock before accessing a resource. The request can be for a
shared or exclusive lock.
2. Lock Grant:
- If the requested lock is available, it is granted to the transaction. The transaction can
then access the resource.
3. Lock Release:
- After completing its operations on the resource, the transaction releases the lock,
making the resource available for other transactions.
4. Lock Escalation:
- Some systems support lock escalation, where a low-level lock (e.g., row-level) is
escalated to a higher level (e.g., table-level) to reduce the overhead of managing
numerous locks.
3. Isolation Levels:
- Different isolation levels (e.g., Read Committed, Serializable) impact the types of
locks acquired and the duration of their retention.
4. Lock Timeout:
- Systems may implement lock timeouts to prevent transactions from being blocked
indefinitely.
Summary:
Key Concepts:
1. Lock:
- A lock is a mechanism that restricts access to a resource, such as a data item, to a
single transaction at a time.
- Locks are used to prevent conflicts and maintain the consistency of the database.
2. Lock Types:
- Shared Lock (Read Lock):
- Allows multiple transactions to read a resource concurrently but prevents any
transaction from writing to it.
- Exclusive Lock (Write Lock):
- Grants exclusive access to a single transaction, preventing any other transactions
from reading or writing to the resource.
3. Granularity of Locks:
- Row-Level Locks:
- Lock specific rows in a table.
- Page-Level Locks:
- Lock entire pages of data.
- Table-Level Locks:
- Lock entire tables.
4. Two-Phase Locking (2PL):
- A protocol for ensuring serializability by requiring transactions to acquire all the locks
they need before releasing any locks.
- Consists of a growing phase (acquiring locks) and a shrinking phase (releasing
locks).
5. Deadlocks:
- A deadlock occurs when two or more transactions are blocked, each waiting for a
lock held by the other.
- Detection and resolution mechanisms are needed to address deadlocks.
1. Lock Request:
- A transaction requests a lock before accessing a resource. The request can be for a
shared or exclusive lock.
2. Lock Grant:
- If the requested lock is available, it is granted to the transaction. The transaction can
then access the resource.
3. Lock Release:
- After completing its operations on the resource, the transaction releases the lock,
making the resource available for other transactions.
4. Lock Escalation:
- Some systems support lock escalation, where a low-level lock (e.g., row-level) is
escalated to a higher level (e.g., table-level) to reduce the overhead of managing
numerous locks.
Challenges and Considerations:
3. Isolation Levels:
- Different isolation levels (e.g., Read Committed, Serializable) impact the types of
locks acquired and the duration of their retention.
4. Lock Timeout:
- Systems may implement lock timeouts to prevent transactions from being blocked
indefinitely.
Key Concepts:
1. Timestamp Assignment:
- Each transaction is assigned a unique timestamp based on the order of its arrival or
initiation.
2. Timestamp Ordering:
- Transactions are ordered based on their timestamps. Older transactions are given
priority over newer ones.
3. Concurrency Control:
- The system uses timestamps to control access to data items and determine whether
a transaction's operation is allowed.
- Read Operation:
- A transaction \(T_i\) with timestamp \(TS_i\) can read a data item \(X\) only if the
timestamp of the last transaction that wrote \(X\) is less than \(TS_i\).
- Write Operation:
- A transaction \(T_i\) with timestamp \(TS_i\) can write to a data item \(X\) only if no
transaction with a higher timestamp has read or written to \(X\).
- Example:
- If \(T_1\) with \(TS_1\) writes to \(X\), and later \(T_2\) with \(TS_2 > TS_1\) tries to
read \(X\), it can do so. However, if \(T_2\) tries to write to \(X\), it will be blocked since
\(T_1\) wrote to \(X\) first.
- Read Locks:
- A transaction \(T_i\) with timestamp \(TS_i\) can acquire a read lock on a data item
\(X\) if no transaction with a higher timestamp has a write lock on \(X\).
- Write Locks:
- A transaction \(T_i\) with timestamp \(TS_i\) can acquire a write lock on a data item
\(X\) only if no transaction has a read or write lock on \(X\).
- Example:
- If \(T_1\) with \(TS_1\) acquires a write lock on \(X\), \(T_2\) with \(TS_2 > TS_1\)
cannot acquire a read or write lock on \(X\) until \(T_1\) releases its lock.
1. High Concurrency:
- Timestamp-based protocols allow for high concurrency as transactions with non-
conflicting timestamps can proceed simultaneously.
2. Overhead:
- The overhead associated with maintaining timestamps and enforcing the protocol
should be considered.
Summary:
Structured query language (SQL) is a popular query language that is frequently used in
all types of applications. Data analysts and developers learn and use SQL because it
integrates well with different programming languages. For example, they can embed
SQL queries with the Java programming language to build high-performing data
processing applications with major SQL database systems such as Oracle or MS SQL
Server. SQL is also fairly easy to learn as it uses common English keywords in its
statements
History of SQL
SQL was invented in the 1970s based on the relational data model. It was initially
known as the structured English query language (SEQUEL). The term was later
shortened to SQL. Oracle, formerly known as Relational Software, became the first
vendor to offer a commercial SQL relational database management system.
SQL table
A SQL table is the basic element of a relational database. The SQL database table
consists of rows and columns. Database engineers create relationships between
multiple database tables to optimize data storage space.
For example, the database engineer creates a SQL table for products in a store:
SQL statements, or SQL queries, are valid instructions that relational database
management systems understand. Software developers build SQL statements by using
different SQL language elements. SQL language elements are components such as
identifiers, variables, and search conditions that form a correct SQL statement.
For example, the following SQL statement uses a SQL INSERT command to
store Mattress Brand A, priced $499, into a table named Mattress_table, with column
names brand_name and cost:
VALUES(‘A’,’499’);
Stored procedures
Stored procedures are a collection of one or more SQL statements stored in the
relational database. Software developers use stored procedures to improve efficiency
and performance. For example, they can create a stored procedure for updating sales
tables instead of writing the same SQL statement in different applications.
Parser
The parser starts by tokenizing, or replacing, some of the words in the SQL statement
with special symbols. It then checks the statement for the following:
Correctness
The parser verifies that the SQL statement conforms to SQL semantics, or rules, that
ensure the correctness of the query statement. For example, the parser checks if the
SQL command ends with a semi-colon. If the semi-colon is missing, the parser returns
an error.
Authorization
The parser also validates that the user running the query has the necessary
authorization to manipulate the respective data. For example, only admin users might
have the right to delete data.
Relational engine
The relational engine, or query processor, creates a plan for retrieving, writing, or
updating the corresponding data in the most effective manner. For example, it checks
for similar queries, reuses previous data manipulation methods, or creates a new one. It
writes the plan in an intermediate-level representation of the SQL statement called byte
code. Relational databases use byte code to efficiently perform database searches and
modifications.
Storage engine
The storage engine, or database engine, is the software component that processes the
byte code and runs the intended SQL statement. It reads and stores the data in the
database files on physical disk storage. Upon completion, the storage engine returns
the result to the requesting application.
Structured query language (SQL) commands are specific keywords or SQL statements
that developers use to manipulate the data stored in a relational database. You can
categorize SQL commands as follows.
Data definition language (DDL) refers to SQL commands that design the database
structure. Database engineers use DDL to create and modify database objects based
on the business requirements. For example, the database engineer uses the CREATE
command to create database objects such as tables, views, and indexes.
Data query language (DQL) consists of instructions for retrieving data stored in
relational databases. Software applications use the SELECT command to filter and
return specific results from a SQL table.
Data manipulation language (DML) statements write new information or modify existing
records in a relational database. For example, an application uses the INSERT
command to store a new record in the database.
Data control language
Key Features:
1. Query Execution:
- Users can write and execute SQL queries interactively to retrieve, update, or
manipulate data in the database.
2. Data Modification:
- Interactive SQL enables users to perform data modification operations such as
inserting new records, updating existing records, and deleting records from database
tables.
4. Transaction Control:
- Interactive SQL provides commands for managing transactions, allowing users to
commit changes or roll back to a previous state.
5. User and Privilege Management:
- Database administrators can use interactive SQL to create and manage user
accounts, assign privileges, and control access to database objects.
6. Error Handling:
- The system provides feedback and error messages to users, helping them identify
and address issues in their SQL commands.
Examples:
Basic Query:
```sql
SELECT column1, column2 FROM table WHERE condition;
```
Data Modification:
```sql
INSERT INTO table (column1, column2) VALUES (value1, value2);
UPDATE table SET column1 = new_value WHERE condition;
DELETE FROM table WHERE condition;
```
Transaction Control:
```sql
-- Begin a transaction
BEGIN TRANSACTION;
3. Web-Based Interfaces:
- Some databases provide web-based interfaces that allow users to interact with the
database through a web browser.
Considerations:
1. Security:
- Access to interactive SQL interfaces should be secured, and users should be
granted appropriate privileges to ensure data security and integrity.
2. Performance:
- Users should consider the performance impact of their queries, especially in
production environments. Techniques like indexing and query optimization can be
applied to enhance performance.
Interactive SQL is a fundamental tool for managing and querying relational databases.
Whether using a command-line interface or a graphical tool, users can leverage SQL to
interact with the database, retrieve information, and perform various database
operations. It serves as a versatile and powerful means of communication with relational
databases.
3. Computations on table data
Computations on table data in the context of databases involve performing calculations,
aggregations, and transformations on the data stored in tables. This is commonly done
using SQL (Structured Query Language) and various functions to analyze, manipulate,
and derive new information from the existing data. Here are some common types of
computations on table data:
1. Aggregations:
- SUM:
```sql
SELECT SUM(column) FROM table;
```
- AVG (Average):
```sql
SELECT AVG(column) FROM table;
```
- COUNT:
```sql
SELECT COUNT(column) FROM table;
```
- MIN and MAX:
```sql
SELECT MIN(column), MAX(column) FROM table;
```
3. Mathematical Expressions:
- Performing mathematical calculations on columns.
```sql
SELECT column1 + column2 AS sum_result FROM table;
```
4. String Manipulations:
- Performing operations on string data.
```sql
SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM employees;
```
6. CASE Statements:
- Conditional computations.
```sql
SELECT
column1,
CASE
WHEN column2 > 10 THEN 'High'
WHEN column2 > 5 THEN 'Medium'
ELSE 'Low'
END AS priority
FROM table;
```
7. Window Functions:
- Performing computations across a "window" of rows related to the current row.
```sql
SELECT
product_name,
sales,
AVG(sales) OVER (PARTITION BY category) AS avg_category_sales
FROM products;
```
9. Subqueries:
- Using subqueries to perform computations.
```sql
SELECT
employee_id,
salary,
(SELECT AVG(salary) FROM employees) AS avg_salary
FROM employees;
```
Considerations:
- Performance:
- Efficiently compute and aggregate data, especially in large datasets, by optimizing
queries and indexing columns.
- Data Integrity:
- Ensure that computations align with the data types and constraints of the columns to
maintain data integrity.
- Testing:
- Test computations and aggregations on sample data before applying them to the
entire dataset, especially in production environments.
- Documentation:
- Document the logic of complex computations and transformations for future reference
and collaboration.
Computations on table data are fundamental for deriving insights, generating reports,
and supporting decision-making processes in various applications. SQL provides a
robust set of features to perform a wide range of computations on table data efficiently
and effectively.
1. SYSDATE:
- Returns the current date and time of the database server.
```sql
SELECT SYSDATE FROM dual;
```
2. TO_DATE:
- Converts a character string to a date format.
```sql
SELECT TO_DATE('2023-11-05', 'YYYY-MM-DD') FROM dual;
```
3. TO_CHAR:
- Converts a date or timestamp to a character string with a specified format.
```sql
SELECT TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS') FROM dual;
```
4. ADD_MONTHS:
- Adds a specified number of months to a date.
```sql
SELECT ADD_MONTHS(SYSDATE, 3) FROM dual;
```
5. MONTHS_BETWEEN:
- Returns the number of months between two dates.
```sql
SELECT MONTHS_BETWEEN(TO_DATE('2023-12-01', 'YYYY-MM-DD'), SYSDATE)
FROM dual;
```
6. LAST_DAY:
- Returns the last day of the month for a given date.
```sql
SELECT LAST_DAY(SYSDATE) FROM dual;
```
7. TRUNC:
- Truncates a date or timestamp to a specified unit (e.g., day, month, year).
```sql
SELECT TRUNC(SYSDATE, 'MONTH') FROM dual;
```
8. ROUND:
- Rounds a date or timestamp to the nearest unit.
```sql
SELECT ROUND(SYSDATE, 'MONTH') FROM dual;
```
9. EXTRACT:
- Extracts a specific component (e.g., year, month, day) from a date or timestamp.
```sql
SELECT EXTRACT(YEAR FROM SYSDATE) FROM dual;
```
10. NEXT_DAY:
- Returns the date of the first weekday following a specified date.
```sql
SELECT NEXT_DAY(SYSDATE, 'MONDAY') FROM dual;
```
11. TZ_OFFSET:
- Returns the time zone offset for a specified date.
```sql
SELECT TZ_OFFSET(SYSDATE) FROM dual;
```
Example Usage:
```sql
-- Calculate the age of employees
SELECT
employee_name,
EXTRACT(YEAR FROM SYSDATE) - EXTRACT(YEAR FROM birth_date) AS age
FROM employees;
Subqueries:
A subquery, also known as a nested query or inner query, is a query embedded within
another query. Subqueries can be used in various parts of a SQL statement, including
the SELECT, FROM, WHERE, and HAVING clauses. The result of a subquery can be
used as a value in the outer query.
```sql
-- Find employees whose salary is above the average salary in the department
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees WHERE department_id =
1);
```
```sql
-- Retrieve the total sales and the percentage of each salesperson's contribution to total
sales
SELECT salesperson_id, total_sales, sales / total_sales * 100 AS
percentage_contribution
FROM (
SELECT salesperson_id, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY salesperson_id
) sales_summary;
```
Joins:
Joins are used to combine rows from two or more tables based on a related column
between them. The common columns used for joining tables are typically primary and
foreign keys. There are several types of joins, including INNER JOIN, LEFT JOIN (or
LEFT OUTER JOIN), RIGHT JOIN (or RIGHT OUTER JOIN), and FULL JOIN (or FULL
OUTER JOIN).
```sql
-- Retrieve information about employees and their departments
SELECT employees.employee_id, employee_name, department_name
FROM employees
INNER JOIN departments ON employees.department_id = departments.department_id;
```
You can also use subqueries and joins together to create more complex queries,
combining the strengths of both.
```sql
-- Retrieve employees who have sales orders
SELECT employee_name
FROM employees
WHERE employee_id IN (SELECT DISTINCT employee_id FROM sales);
```
```sql
-- Retrieve departments with the total number of employees and their average salary
SELECT departments.department_id, department_name, COUNT(employee_id) AS
num_employees, AVG(salary) AS avg_salary
FROM departments
LEFT JOIN employees ON departments.department_id = employees.department_id
GROUP BY departments.department_id, department_name;
```
Understanding how to use subqueries and joins effectively is essential for crafting
powerful and efficient SQL queries when working with relational databases. These
techniques enable you to retrieve and analyze data from multiple tables, providing
flexibility in handling complex data relationships.
1. Use Indexing:
- Indexes speed up data retrieval operations by providing a quick path to relevant
data.
- Identify columns used in WHERE clauses, JOIN conditions, and ORDER BY clauses
for indexing.
- Regularly analyze and rebuild indexes to maintain optimal performance.
2. Optimize Queries:
- Write efficient queries by avoiding unnecessary joins, reducing the number of
columns retrieved, and optimizing WHERE clauses.
- Use appropriate aggregate functions and grouping to minimize data processing.
6. Partitioning Tables:
- Partition large tables to distribute data across multiple filegroups, making queries on
subsets of data more efficient.
- Partitioning is particularly useful for historical or time-series data.
7. Update Statistics:
- Regularly update database statistics to help the query optimizer generate efficient
execution plans.
- Outdated statistics can lead to suboptimal query performance.
9. Avoid Cursors:
- Cursors can be resource-intensive and slow. Where possible, use set-based
operations instead of row-by-row processing.
- Consider using alternatives like temp tables or table variables.
7. Indexes
Indexes play a crucial role in database management systems (DBMS) for improving the
speed of data retrieval operations. An index is a data structure associated with a table
that provides a quick reference to the rows in the table, based on the values of one or
more columns. Here are key points about indexes:
1. Purpose of Indexes:
- Faster Data Retrieval: Indexes speed up the retrieval of rows from a table by
allowing the DBMS to locate the desired rows more quickly.
- Efficient Query Processing: Indexes significantly improve the performance of
SELECT, JOIN, and WHERE clause operations.
2. Types of Indexes:
- Single-Column Index: Created on a single column.
- Composite Index: Created on multiple columns.
- Unique Index: Ensures that the indexed columns have unique values.
- Clustered Index: Determines the physical order of data rows in a table.
- Non-clustered Index: Stores a separate data structure from the actual table.
3. Creating Indexes:
- Syntax:
```sql
CREATE INDEX index_name
ON table_name (column1, column2, ...);
```
- Indexes can be created when defining a table or added later using the `CREATE
INDEX` statement.
7. Dropping Indexes:
- Unused or redundant indexes should be dropped to save storage space and reduce
the overhead of maintenance operations.
- Syntax:
```sql
DROP INDEX index_name;
```
Indexes are a vital component of database optimization, and their effective use can lead
to significant improvements in query performance. However, it's crucial to strike a
balance between creating indexes for performance gains and considering the impact on
data modification operations and storage requirements. Regular monitoring, analysis,
and maintenance are key to successful index management.
6.Views, Sequences
A view in a database is a virtual table that is based on the result of a SELECT query. It
does not store the data itself but provides a way to represent the data from one or more
tables in a structured manner. Views offer several advantages, including simplified
query complexity, enhanced security, and abstraction of underlying table structures.
Here are key points about views:
1. Creating a View:
```sql
CREATE VIEW view_name AS
SELECT column1, column2
FROM table_name
WHERE condition;
```
2. Updating a View:
Views can be updated using the `CREATE OR REPLACE VIEW` statement.
```sql
CREATE OR REPLACE VIEW view_name AS
SELECT new_columns
FROM new_table
WHERE new_condition;
```
3. Dropping a View:
```sql
DROP VIEW view_name;
```
4. Complex Views:
Views can involve multiple tables and include joins, aggregations, and other complex
queries.
5. Security:
- Views can be used to restrict access to specific columns or rows, providing a
security layer.
- Users can be granted permissions on views without giving direct access to the
underlying tables.
6. Materialized Views:
- A materialized view is a type of view that stores the result set physically, allowing for
faster retrieval.
- Materialized views need to be refreshed periodically to synchronize with the
underlying data.
Sequences:
1. Creating a Sequence:
```sql
CREATE SEQUENCE sequence_name
START WITH 1
INCREMENT BY 1
MAXVALUE 1000
CYCLE;
```
2. Generating Values:
- To get the next value from a sequence:
```sql
SELECT sequence_name.NEXTVAL FROM dual;
```
4. Increment By:
- The `INCREMENT BY` clause defines the step size for the sequence.
6. Altering a Sequence:
- Sequences can be altered to modify parameters like increment, minvalue, maxvalue,
etc.
```sql
ALTER SEQUENCE sequence_name
INCREMENT BY 2
MAXVALUE 2000;
```
7. Dropping a Sequence:
```sql
DROP SEQUENCE sequence_name;
```
8. Usage in Tables:
- Sequences are commonly used to generate unique values for primary key columns
in tables.
```sql
CREATE TABLE example_table (
id NUMBER PRIMARY KEY DEFAULT sequence_name.NEXTVAL,
name VARCHAR2(50)
);
```
Sequences and views are powerful features in a relational database that enhance data
modeling, security, and data generation capabilities. Sequences are particularly useful
for generating unique identifiers, while views provide a flexible way to present and
manipulate data without directly altering underlying tables.
6.Data control permissions
Data control permissions, also known as data access control or data authorization, refer
to the mechanisms and policies put in place to regulate and restrict access to data
within a database system. These controls are crucial for maintaining data security,
ensuring that only authorized users can view, modify, or delete specific pieces of
information. Here are key concepts and mechanisms related to data control
permissions:
2. Privileges:
- SELECT, INSERT, UPDATE, DELETE: Common privileges that regulate basic
CRUD operations.
- EXECUTE: Permission to execute stored procedures or functions.
- ALL PRIVILEGES: Grants all available permissions on a specific object.
6. Column-Level Security:
- Restricts access to specific columns within a table.
- Users are granted permission to access certain columns while being restricted from
others.
9. Triggers:
- Triggers can be used to automatically enforce data access policies when certain
events occur (e.g., before or after INSERT, UPDATE, DELETE).
- Custom logic can be executed to check and enforce access conditions.
10. Auditing:
- Auditing tools and mechanisms can be employed to track and log data access
activities.
- Audit logs can be reviewed to identify unauthorized access attempts or data
breaches.
Data control permissions are a critical aspect of database security and are essential for
protecting sensitive information, ensuring data integrity, and complying with regulatory
requirements. Implementing a well-designed data control strategy involves a
combination of role-based access controls, fine-grained access policies, and auditing
mechanisms to provide a robust defense against unauthorized access and data
breaches.