0% found this document useful (0 votes)
10 views33 pages

Dbms - Unit - 02

The Relational Database Model organizes data into tables with rows and columns, where each row represents an entity and each column represents an attribute. Key concepts include attributes, relation schemas, tuples, and various types of keys such as primary and foreign keys, which help maintain data integrity and relationships. The model supports operations that ensure data consistency and integrity, while also having advantages and disadvantages in terms of performance and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Dbms - Unit - 02

The Relational Database Model organizes data into tables with rows and columns, where each row represents an entity and each column represents an attribute. Key concepts include attributes, relation schemas, tuples, and various types of keys such as primary and foreign keys, which help maintain data integrity and relationships. The model supports operations that ensure data consistency and integrity, while also having advantages and disadvantages in terms of performance and complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

RELATIONAL DATABASE MODEL

The Relational Model organizes data using tables (relations) consisting of rows and
columns. Each column represents a specific attribute with a unique name, while each row
holds data about a real-world entity or relationship. As a record-based model, it stores data
in fixed-format records with defined attributes.
Introduced by E.F. Codd, the relational model transforms conceptual designs from
ER diagrams into implementable structures. These structures are used in relational database
systems like Oracle SQL and MySQL.
 What is the Relational Model?
The relational model represents how data is stored and managed in
Relational Databases. Data is organized into tables, each known as a relation, consisting
of rows (tuples) and columns (attributes). Each row represents an entity or record, and
each column represents a particular attribute of that entity. A relational
database consists of a collection of tables each of which is assigned a unique name.
consider a relation STUDENT with attributes
ROLL_NO, NAME, ADDRESS ADDRESS, PHONE, and AGE shown in the table.

 Key Terms in the Relational Model


1. Attribute: Attributes are the properties that define an entity.
 Example: ROLL_NO, NAME, ADDRESS etc.
2. Relation Schema: A relation schema defines the structure of the relation and
represents the name of the relation with its attributes.
 Example: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and
AGE) is the relation schema for STUDENT. If a schema has more than 1
relation it is called Relational Schema.
3. Tuple: A Tuple represents a row in a relation. Each tuple contains a set of
attribute values that describe a particular entity.
 Example: (1, RAM, DELHI, 9455123451, 18) is a tuple in
the STUDENT table.
4. Relation Instance: The set of tuples of a relation at a particular instance of time
is called a relation instance. It can change whenever there is an insertion, deletion or update
in the database.
5. Degree: The number of attributes in the relation is known as the degree of the
relation.
 Example: The STUDENT relation has a degree of 5, as it has 5 attributes.
6. Cardinality: The number of tuples in a relation is known as cardinality.
 Example: The STUDENT relation defined above has cardinality 4.
7. Column: The column represents the set of values for a particular attribute.
 Example: The column ROLL_NO is extracted from the relation
STUDENT.
8. NULL Values: The value which is not known or unavailable is called a NULL
value. It is represented by NULL.
 Example: PHONE of STUDENT having ROLL_NO 4 is NULL.

 Types of Keys in the Relational Model


1. Primary Key:
A Primary Key uniquely identifies each tuple in a relation. It must contain
unique values and cannot have NULL values. Example: ROLL_NO in
the STUDENT table is the primary key.
2. Candidate Key
A Candidate Key is a set of attributes that can uniquely identify a tuple in a
relation. There can be multiple candidate keys, and one of them is chosen as the
primary key.
3. Super Key
A Super Key is a set of attributes that can uniquely identify a tuple. It may
contain extra attributes that are not necessary for uniqueness.
4. Foreign Key
A Foreign Key is an attribute in one relation that refers to the primary key of
another relation. It establishes relationships between
tables. Example: BRANCH_CODE in the STUDENT table is a foreign key that
refers to the primary key BRANCH_CODE in the BRANCH table.
5. Composite Key
A Composite Key is formed by combining two or more attributes to uniquely
identify a tuple. Example: A combination
of FIRST_NAME and LAST_NAME could be a composite key if no one in the
database shares the same full name.

 Relational Model Notation


 Relation schema R of degree n is denoted by by R(A1, A2, ...,An).
 Uppercase letters Q, R, S denote relation names.

2
 Lowercase letters q, r, s denote relation states.
 Letters t, u, v denote tuples.
 In general, the name of a relation schema such as STUDENT also indicates
the current set of tuples in that relation.
 An attribute A can be qualified with the relation name R to which it
belongs by using the dot notation R.A for example, STUDENT.Name or
STUDENT.Age.
 An n-tuple t in a relation r(R) is represented as t=<v1,v2,...,vn> where vi is
the value corresponding to the attribute Ai. The value vi for attribute Ai in
tuple t can be accessed using t[Ai] or t.Ai.

 Characteristics of the Relational Model


1. Data Representation: Data is organized in tables (relations), with rows (tuples)
representing records and columns (attributes) representing data fields.
2. Atomic Values: Each attribute in a table contains atomic values, meaning no multi-
valued or nested data is allowed in a single cell.
3. Unique Keys: Every table has a primary key to uniquely identify each record,
ensuring no duplicate rows.
4. Attribute Domain: Each attribute has a defined domain, specifying the valid data
types and constraints for the values it can hold.
5. Tuples as Rows: Rows in a table, called tuples, represent individual records or
instances of real-world entities or relationships.
6. Relation Schema: A table’s structure is defined by its schema, which specifies the
table name, attributes, and their domains.
7. Data Independence: The model ensures logical and physical data independence,
allowing changes in the database schema without affecting the application layer.
8. Integrity Constraints: The model enforces rules like:
9. Domain constraints: Attribute values must match the specified domain.
10. Entity integrity: No primary key can have NULL values.
11. Referential integrity: Foreign keys must match primary keys in the referenced table
or be NULL.
12. Relational Operations: Supports operations like selection, projection, join, union,
and intersection, enabling powerful data retrieval manipulation.
13. Data Consistency: Ensures data consistency through constraints, reducing
redundancy and anomalies.
14. Set-Based Representation: Tables in the relational model are treated as sets, and
operations follow mathematical set theory principles

 Constraints in Relational Model


While designing the Relational Model, we define some conditions which must
hold for data present in the database are called Constraints. These constraints are
checked before performing any operation (insertion, deletion, and updation) in the
database. If there is a violation of any of the constraints, the operation will fail.

3
1. Domain Constraints
Domain Constraints ensure that the value of each attribute A in a tuple must be
an atomic value derived from its specified domain, dom(A). Domains are defined by the
data types associated with the attributes. Common data types include:
 Numeric types: Includes integers (short, regular, and long) for whole
numbers and real numbers (float, double-precision) for decimal values,
allowing precise calculations.
 Character types: Consists of fixed-length (CHAR) and variable-
length (VARCHAR, TEXT) strings for storing text data of various sizes.
 Boolean values: Stores true or false values, often used for flags or
conditional checks in databases.
 Specialized types: Includes types
for date (DATE), time (TIME), timestamp (TIMESTAMP),
and money (MONEY), used for precise handling of time-related and
financial data.
2. Key Integrity
Every relation in the database should have at least one set of attributes that
defines a tuple uniquely. Those set of attributes is called keys. e.g.; ROLL_NO in
STUDENT is key. No two students can have the same roll number. So a key has two
properties:
 It should be unique for all tuples.
 It can’t have NULL values.
3. Referential Integrity Constraints
When one attribute of a relation can only take values from another attribute of the
same relation or any other relation, it is called referential integrity. Let us suppose we have
2 relations

4
Explanation: BRANCH_CODE of STUDENT can only take the values which are present
in BRANCH_CODE of BRANCH which is called referential integrity constraint. The
relation which is referencing another relation is called REFERENCING RELATION
(STUDENT in this case) and the relation to which other relations refer is called
REFERENCED RELATION (BRANCH in this case).

 Advantages
antages of the Relational Model
1. Simple model: Relational Model is simple and easy to use in comparison to
other languages.
2. Flexible: Relational Model is more flexible than any other relational model
present.
3. Secure: Relational Model is more secure than any other relational model.
4. Data Accuracy: Data is more accurate in the relational data model.
5. Data Integrity: The integrity of the data is maintained in the relational model.
6. Operations can be Applied Easily: It is better to perform operations in the
relational model.
 Disadvantages of the Relational Model
1. Performance: The relational model can experience performance issues with
very large databases.
2. Complexity for Complex Data:Data The model struggles with hierarchical or
complex data relationships, which might be better handled with other models
like the Graph or Document model.
model
3. Normalization Overhead
Overhead: Extensive use of normalization can result in
complex queries and slower performance.

LOGICAL VIEW OF DATABASE

A Logical Database is a special type of ABAP (Advance Business Application and


Programming) that is used to retrieve data from various tables and the data is interrelated to
each other. Also, a logical database provides a read
read-only view of Data.
 Structure Of Logical Database: A Logical database uses only a hierarchical
ierarchical
structure of tables i.e. Data is organized in a Tree-like
Tree like Structure and the data is stored
as records that are connected to each other through edges (Links). Logical Database
contains Open SQL statements which are used to read data from the database. The logical
database reads the program, stores them in the program if re required,
quired, and passes them
line by line to the application program.

 Features of Logical Database:


In this section, let us look at some features of a logical database:
 We can select only that type of Data that we need.
 Data Authentication is done in order tto maintain security.
 Logical Database uses hierarchical Structure due to this data integrity is
maintained.

 Goal Of Logical Database:


The goal of Logical Database is to create well
well-structured
structured tables that reflect the
need of the user. The tables of the Logical database store data in a non
non-redundant
manner and foreign keys will be used in tables so that relationships among tables and
entities will be supported.

 Tasks Of Logical Database:


Below is some important task of Logical Database:
 With thee help of the Logical database, we will read the same data from
multiple programs.
 A logical database defines the same user interface for multiple programs.
 Logical Database ensures the Authorization checks for the centralized
sensitive database.
 With the help of a Logical Database, Performance is improved. Like in
Logical Database we will use joins instead of multiple SELECT statements,
which will improve response time and this will increase the Performance of
Logical Database.

 Data View Of Logical Database:


Databa
Logical Database provides a particular view of Logical Database tables. A
logical database is appropriately used when the structure of the Database is Large. It
is convenient to use flow i.e
 SELECT
 READ
 PROCESS
 DISPLAY
In order to work with databases
atabases efficiently. The data of the Logical Database is hierarchical
in nature. The tables are linked to each other in a Foreign Key
relationship.Diagrammatically, the Data View of Logical Database is shown as:

Example:
Suppose in a Universityy or College, a HOD wants to get information about a specific
student. So for that, he firstly retrieves the data about its batch and Branch from a large
amount of Data, and he will easily get information about the required Student but didn't
alter the information about it.
 Advantages Of Logical Database:
Let us look at some advantages of the logical database:
 In a Logical database, we can select meaningful data from a large amount of data.
 Logical Database consists of Central Authorization which checks for f Database
Accesses is Authenticated or not.
 In this Coding, the part is less required to retrieve data from the database as
compared to Other Databases.
 Access performance of reading data from the hierarchical structure of the
Database is good.
 Easy to understand
nderstand user interfaces.
 Logical Database firstly check functions which further check that user input is
complete, correct, and plausible.

 Disadvantages Of Logical Database:


This section shows the disadvantages of the logical database:
 Logical Database takes more time when the required data is at the last because if
that table which is required at the lowest level then firstly all upper-level
upper tables
should be read which takes more time and this slows down the performance.
 In Logical Database ENDGET command
command doesn't exist due to this the code block
associated with an event ends with the next event statement.

INTEGRITY RULES

 What are Integrity Rules?


Integrity rules normally represent the pre-existing conditions, or constraints, of
data stored in the database that will ensure the data is valid and consistent. These rules,
defining permissible values, relationships, and data operations enclosed in the database
system, ensure the accuracy and reliability of the data to be used for its purposes. Rules
for integrity are considered barriers towards the possibility of some operations over the
database, such as deletion, modification, or insertion of data whose accuracy is doubtful
or inconsistent, thus enabling the database to be of high quality.

 Types of Integrity Rules


 Entity Integrity: Also called business key integrity, this rule proclaims that each
row in a table must contain some unique data, which is known as the business
key. Through this verification, a primary key will be ensured of non-duplication,
which also implies that no null values are allowed in the primary key column,
thus uniquely identifying each instance of an entity in the table.
 Referential Integrity: Referential integrity guarantees the consistent pass of key
reference constraints between data through main table and foreign keys
keys. This rule
means that for every fore
foreign key value in a child table, that value has to possess
the same primary key value in the parent table. This is the only way to maintain
foreign keys and make father child relationships work properly.
 Domain Integrity: Verifying domain validity means finding out if the value lies
within the allowed range for a particular attribute's column in the data table. It
implements data integrity by specifying data formats, ranges, and constraints, for
example, check constraints and limitations, that ensure that all officially
recognized data are kept.
 User-defined
defined Integrity: This is an enterprise-specific set of integrity rules that
would be established by different organizations depending on their business edges
and the rules ascribed to sensitive data. These rules can be the user-defined ones
like business rules or validation criteria or can even be the custom constraints that
are different from organization to organization.

 Implementation of Integrity Rules


Integrity rules are typically implemented through a combination of mechanisms
within the DBMS, including:
 Constraints: Constraints including a primary key constraint, a foreign key
constraint, a unique constraint, and a check constraint are to maintain the integrity
rules at the schema level.
 Triggers: Triggers are database elements that are activated on the fly to conduct
given activities (executed on insert, update, or delete operations, for example) on
specific database tables. Invoking triggers provides leverage to impose tougher
restrictions, thus enabling the enforcement of more advanced integrity rules or
conducting validation operations, which are a more complex kind of comparison
beyond checks of constraints.
 Stored Procedures: Stored procedures are precompiled sets of SQL operations that
specify business logic and data processing operations, and they are stored in a
database. They may need to be applied to the forces of rule-following and data
consistency tasks during the events of transactions.

 Significance of Integrity Rules


The rules of integrity, including the truth, credibility, and consistency, is one of
the key elements in ensuring proper functioning of the data in database. Their
significance extends to various aspects of database management, including:Their
significance extends to various aspects of database management, including:
 Data Quality: Integrity rules make sure data is precisely moderated, elaborated,
and of high- quality in the database, just like the data quality and reliability.
 Data Security: Integrity rules classify together while limiting and validating
requests. They are effective in defending against unauthorized access, data
breaches, and data manipulation attacks.
 Data Consistency: Ethical parameters help create a homogeneous database,
preventing data inconsistencies, anomalies, and errors that could arise from
invalid and incorrect data keystrokes, hence allowing for data stemming from
valid data sources that are consistently coherent and reliable both qualitatively
and quantitatively.
 Compliance: Extent rules make sure all regulations, industry standards, and
company policies governing data security, privacy, and management are complied
with.

RELATIONAL SET OPERATORS

 Relational Set Operators uses relational algebra to manipulate contents in a database.


All together there are eight different types of operators. These operators are SQL
commands.
 SELECT is the command to show all rows in a table. It can be used to select only
specific data from the table that meets certain criteria. This command is also referred
to as the Restrict command.
 UNION. It combines all of the rows in one table with all of the rows in another table
except for the duplicate tuples. The tables are required to have the same attribute
characteristics for the Union command to work. The tables must be union-compatible
which means that two tables being used have the same amount of columns and the
columns have the same names, and also need to share the same domain.

10
 INTERSECT is the second SQL command that takes two tables and combines only
the rows that appear in both
oth tables. The tables must be union-compatible
union compatible to be able to
use the Intersect command or else it won't work.
 DIFFERENCE in another SQL command that gets all rows in one table that are not
found in the other table. Basically it subracts one table from the other table to leave
only the attributes that are not the same in both tables. For this command to work both
tables must be union-compatible.
compatible.
 PRODUCT command would show all possible pairs of rows from both tables being
used. This command can also be refer
referred
red to as the Cartesian Product.
 PROJECT is the command that gives all values for certian attributes specified after
the command. It shows a vertical view of the given table.
 JOIN takes two or more tables and combines them into one table. This can be used in
combination with other commands to get specific information. There are several types
of the Join command. The Natural Join, Equijion, Theta Join etc.

DATA DICTIONARY AND THE SYSTEM CATLOG

 What is a Data Dictionary?


The data dictionary consists of two words, data, which represents data collected from
several sources, and dictionary, which represents where this data is available. The data
dictionary is an important part of the relational database because it provides additional
information about the relationship
ionship between several tables in the database. A data dictionary
in a DBMS helps users manage data in an orderly and orderly manner, thereby
preventing data redundancy.
Below is a data dictionary that describes the table that contains employee details.

Some of the advantages of using a data dictionary are:


 In DBMS, the data model proviprovides very little information about the database, so the
data dictionary is very important to get the right knowledge about the entities,
relationships, and attributes that exist in the data model.
 The data dictionary provides consistency by reducing data redundancydundancy in data
collection and use among different team members.
 Data dictionaries provide structured analysis and design tools by implementing data
standards. Data standards are sets of rules that govern the collection, recording, and
presentation of data.
 Using a data dictionary helps define the naming convention used in the model.
 Types of Data Dictionary in DBMS

There are basically two types of data dictionaries in a database management system:
 Integrated Data Dictionary
 Stand Alone Data Dictionary

 Integrated Data Dictionary


Every relational database has an Integrated Data Dictionary available in the
DBMS. This integrated data dictionary acts as a system directory that is accessed and
updated by the relational database. The old database does not have an integrated data
dictionary, so the database administrator must use the Stand Alone Data Dictionary. An
Integrated Data Dictionary in a DBMS can link metadata.
The integrated data dictionary can be further divided into two types:
1. Active: When any change
changes are made to the database, the active data dictionary
is automatically updated by the DBMS. It is also known as a self-updating
dictionary because it continuously updates its data.
2. Passive: Unlike active dictionaries, passive dictionaries must be updated
manually when there are changes in the database. This type of data dictionary
is difficult to manage because it requires proper functionality. Else, the
database and data dictionary will be synchronized.
3. Stand Alone Data Dictionary
This type of database in the DBMS is very adaptive because it grants the
administrator in charge of the confidential information complete autonomy to define and
manage all crucial data. Whether the information is printed or not has nothing to do with it.
A data dictionary that has a stand-alone
stand format enables database designers to have the
flexibility to communicate with end users regardless of their data dictionaries format.
There is no standard format for data dictionaries. Here are some common elements:
 Data Elements: The data dictionary describes each data element by specifying
the names, data types, storage formats and validation rules.
 Table: All information about the table, such as the user who created the table, the
number of rows and columns, the date the table was created and entered, etc.
 Indexes: Indexes for database tables are stored in the data dictionary. The DBMS
stores the index name used and index attributes, locations, and properties, as well
as the creation date, in each index.
 Programs: Applications defined for database access, reports, application formats
and screens, SQL queries, etc. also stored in the data dictionary.
 Relationships between data elements: A data dictionary stores relationship
types; for example, if it is mandatory or optional, the nature of the relationship
and connection, etc.
 Administration and End Users: The data dictionary stores all administrative and
end user data.
Metadata in a DBMS, stored in a data dictionary, is like a monitor that controls
database usage and whether users are allowed to access the database.

 How to Create a Data Dictionary?


 As mentioned above, most businesses rely on a database management system
that has an integrated data dictionary because it is automatically updated and
easy to maintain. Documentation for databases including MySQL, SQL Server,
Oracle, etc. This can be done in various relational databases such as
 Database administrators can use templates in SQL Server, Oracle, or Microsoft
Excel to create a stand--alone data dictionary.
 The various notations used to create a data dictionary are:

Examples
1. Employee Table:
Constraints:
 Primary Key: `EmployeeID`
 Foreign Key: `DepartmentID` references `Department.DepartmentID`

2. Product Inventory Table:

 Constraints:
 Primary Key: `ProductID`
 Foreign Key: `CategoryID`
yID` references `ProductCategory.CategoryID`
 Disadvantages of Data Dictionary
 Data dictionaries have little functional detail.
 Non-technical users may experience difficulties when using the data dictionary
for the first time.
 Data dictionary relational diagrams are often not visible.

Relationship set operators


 What is a Relationship in Database?
 A relationship in a DBMS exists when the variable has a connection with the
properties stored in different tables. Such relationships help the organization of
entities intertwined with each other, ultimately enabling efficient data
processing. They're exhibited usually via keys in a table, which is either
columns or fields that specify a distinctive arrangement for each record.
 As another example, one can think of the university database. Students,
courses, and instructors, variably, might become tables. The relationship
between the two tables is that courses indicate which courses the student can
enroll in. Database management facilitates the reliable data apostrophe and
performance of complex operations due to the fact that it guarantees the
quality and identity of the data.
 Why are Relationships Important in a Database?
Relationships are crucial in a database for several reasons:
 Data Integrity: The relationships make the necessary links between data sets and
ensure that information kept and joined together is accurate. In this way,
the consistency and correctness of the data are maintained.
 Efficient Data Retrieval: The relationship permits the complete search of data
among multiple tables, storing fewer inputs and executing the query quickly,
respectively.
 Normalization: Relations remain priorities that ensure the sync of data and
achieving a level of relevance.
 Complex Data Analysis: Relationships empower users for state-of-the-art
querying, aggregation, and analysis; thereby, users would be deriving meaningful
insights from their data.
 Scalability: It has turned out that well-defined relationships, in their turn, are the
key issue that helps the database to be scalable, thus having a structure that is
easy to manage and could be further adapted to possible expanding business
needs.
 Types Of Relationships In Database
1. One-to-One (1:1) Relationship
In one to one relationships, a record is present in one table along with its
corresponding existing relation, and the vacant relation among the records is present
in another table. The type of relationship we are talking about is not as usual, and it
is normally used when two entities that belong to a specific set need to be stored
independently for normalization or security purposes. In another case, a person's
employees' data consists of a record in the "personal details" table in a human
resources database.
2. One-to-Many (1:N) Relationship
A relationship where the items from one table can be linked to only one or
many items from another table is called a one-to-many relationship; in some cases,
one item from the first table correlates with only one item from the second table.
This connection becomes very strong in that it is particularly used to describe
situations where one object can be linked to many similar or identical objects. For
example, in an online store backend database, every customer may place multiple
orders, yet the master customer record stays the same. If a record has more than one
order, these are obtained from the backend database.
15
3. Many-to-Many (N:M) Relationship
The duality of a many-to-many relationship is characterized by the presence of
multiple records belonging to a table in association with multiple records from
another table. The interconnection of these relationships follows a junction table
format, which is the component that holds both tables together. In the many-to-many
relationship model, a wide variety of complex relationships can be established where
each entity has many related entities. Such a database for a music streaming service
could have a table representing each track that belongs to multiple playlists, and each
of them could contain multiple tracks.

DATA REDUNDANCY
Data redundancy in a Database Management System (DBMS) refers to the presence of
the same data in multiple locations within the database. While it can offer benefits like
improved data availability and simplified queries, uncontrolled redundancy can lead to
inconsistencies, storage inefficiency, and performance issues.
 What is Data Redundancy?
Data redundancy occurs when the same piece of information is stored in multiple
places within a database, either within the same table or across different tables. This can
happen due to poor database design, lack of normalization, or intentional denormalization
for performance optimization.
Problems Caused by Data Redundancy:
 Storage Inefficiency:Duplicate data consumes more storage space than
necessary.
 Data Inconsistency (Anomalies):When redundant data is updated in one
place but not in others, it can lead to inconsistencies, causing incorrect
information to be used.
 Update Anomalies: Updating a piece of data in one location but not in
others can lead to inconsistencies.
 Insertion Anomalies: Adding new data might require redundant entries in
multiple locations.
 Deletion Anomalies: Deleting data in one location might unintentionally
remove necessary data in another.
 Increased Complexity:Managing and querying redundant data can be more
complex and time-consuming.
 Performance Issues:Searching, updating, and inserting data can be slower
due to the extra processing required for redundant data.
Benefits of Data Redundancy (Controlled Redundancy):
 Improved Data Availability: Redundancy can provide backups in case of data loss or
system failures.
 Simplified Queries: In some cases, redundancy can eliminate the need for complex
joins, making queries simpler.
16
Managing Data Redundancy:
 Normalization:A A key technique for minimizing redundancy by organizing data into
tables and minimizing data duplication.
 Denormalization:Intentionally
Intentionally introducing redundancy, usually to improve query
performance, but this needs to be carefully managed to avoid inconsistencies.
 Data Governance Policies:Implementing
Policies:Implementing policies to control data redundancy and
ensure data quality.
 Replication:Creating
Creating copies of data for backup and disaster recovery purpose
purposes,
which can be considered a controlled form of redundancy.

INDEXES
Indexing is a crucial technique used in databases to optimize data retrieval
operations. It improves query performance by minimizing disk I/O operations, thus
reducing the time it takes to locate and access data. Essentially, indexing allows
the database management system (DBMS) to locate data more efficiently without having
to scan the entire dataset.
Indexes are organized data structures that allow quick searching based on key values.
When an index is created for a database table, it maintains a sorted order of key values
along with pointers to the actual data rows. This process significantly reduces the number
of disk accesses required to fulfill a query.

 Attributes of Indexing
Several Important attributes of indexing affect the performance and efficiency of
database operations:
1. Access Types: This refers to the type of access such as value-based
value search,
range access, etc.
2. Access Time: It refers to the time needed to find a particular
partic data element
or set of elements.
3. Insertion Time: It refers to the time taken to find the appropriate space and
insert new data.
4. Deletion Time: Time taken to find an item and delete it as well as update
the index structure.
5. Space Overhead: It refers to the additional space required by the index.

 File Organization in Indexing


File organization refers to how data and indexes are physically stored in memory or
on disk. The following are the common types of file organizations used in indexing:
1. Sequential File Organization (Ordered Index File)
In this type of organization, the indices are based on a sorted ordering of the values. These
are generally fast and a more traditional type of storing mechanism. These Ordered or
Sequential file organizations might store the data in a dense or sparse format.
 Dense Index: Every search key value in the data file corresponds to an index
record. This method ensures that each key value has a reference to its data
location.
Example: If a table contains multiple entries for the same key, a dense index
ensures that each key value has its own index record.

 Sparse Index: The index record appears only for a few items in the data file.
Each item points to a block as shown. To locate a record, we find the index record
with the largest search key value less than or equal to the search key value we are
looking for.
 Access Method: To locate a record, we find the index record with the largest key
value less than or equal to the search key, and then follow the pointers
sequentially.
 Access Cost: Accesses=log 2(n)+1\text{Accesses} = \log_2(n) +
1Accesses=log2(n)+1, where nnn is the number of blocks involved in the index
file.

2. Hash File Organization


In hash file organization, data is distributed across a range of buckets based on
a hash function applied to the key values. The hash function maps each key to a particular
bucket, where the corresponding data can be located.
 Types of Indexing Methods
There are different types of indexing techniques, each optimized for specific
use cases.
1. Clustered Indexing
 Clustered indexing is a technique where multiple related records are stored
together in the same file. This helps reduce the cost of searching because
related data is kept close to each other. Clustered indexing is especially useful
when multiple tables or records need to be frequently joined. Storing related
records together makes this process faster and more efficient.
 In clustered indexing, the data is stored in an ordered file, usually based on
a non-key field. This ordering can be based on a primary key or, in some
cases, a non-primary
primary key. When an index is created on non-primary
non key
columns, which may not be unique, the solution is to combine two or more
columns together to form a unique value. This combination is then used to
create the index.
 Clustered indexing works by grouping records with similar properties together.
For example, students can be grouped by their semester, such as first-semester,
second-semester, and so on. By grouping related records together, it becomes
faster to retrieve them because the index allows for quicker identification and
search of the data.
2. Primary Indexing
This is a type of Clustered Indexing wherein the data is sorted according to the search
key and the primary key of the database table is used to create the index. index It is a default
format of indexing where it induces sequential file organization. As primary keys
are unique and are stored in a sorted manner, the performance of the searching
operation is quite efficient.
 Key Features: The data is stored in sequential order, making searches faster and
more efficient.
3. Non-clustered or Secondary Indexing
A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual
pointers or references to the location where the data is actually stored. Data is not
physically stored in the order of the index. Instead, data is present in leaf nodes.
Example: The contents page of a book. Each entry gives us the page number or
location of the information stored. The actual data here(information on each page of the
book) is not organized but we have an ordered reference(contents page) to where the data
points actually lie. We can have only dense ordering in the non-clustered
clustered index as sparse
ordering is not possible because data is not physically organized accordingly.
rdingly.
It requires more time as compared to the clustered index because some amount of extra
work is done in order to extract the data by further following the pointer. In the case of a
clustered index, data is directly present in front of the index.
4. Multilevel Indexing
 With the growth of the size of the database, indices also grow. As the index is
stored in the main memory
memory, a single-level index might become too large a size to
store with multiple disk accesses. The multilevel indexing segregates the main
block into various smaller blocks so that the same can be stored in a single block
 The outer blocks are divided into inner blocks which in turn are pointed to the
data blocks. This can be easily stored in the main memory with fewer overheads.
This hierarchical approach reduces memory overhead and speeds up query
execution.

 Advantages of Indexing
 Improved Query Performance: Indexing enables faster data retrieval from the
database. The database may rapidly discover rows that match a specific value or
collection of values by generating an index on a column, minimizing the amount
of time it takes to perform a query.
 Efficient Data Access: Indexing can enhance data access efficiency by lowering
the amount of disk I/O required to retrieve data. The database can maintain the
data pages for frequently visited columns in memory by generating an index on
those columns, decreasing the requirement to read from disk.
 Optimized Data Sorting: Indexing can also improve the performance of sorting
operations. By creating an index on the columns used for sorting, the database can
avoid sorting the entire table and instead sort only the relevant rows.
 Consistent Data Performance: Indexing can assist ensure that the database
performs consistently even as the amount of data in the database rises. Without
indexing, queries may take longer to run as the number of rows in the table
grows, while indexing maintains a roughly consistent speed.
 Data Integrity: By ensuring that only unique values are inserted into columns
that have been indexed as unique, indexing can also be utilized to ensure the
integrity of data. This avoids storing duplicate data in the database, which might
lead to issues when performing queries or reports.

 Disadvantages of Indexing
While indexing offers many advantages, it also comes with certain trade-offs:
 Increased Storage Space: Indexes require additional storage. Depending on the
size of the data, this can significantly increase the overall storage requirements.
 Increased Maintenance Overhead: Indexes must be updated whenever data is
inserted, deleted, or modified, which can slow down these operations.
 Slower Insert/Update Operations: Since indexes must be maintained and
updated, inserting or updating data takes longer than in a non-indexed database.
 Complexity in Choosing the Right Index: Determining the appropriate indexing
strategy for a particular dataset can be challenging and requires an understanding
of query patterns and access behaviors.
 Features of Indexing
Several key features define the indexing process in databases:
 Efficient Data Structures: Indexes use efficient data structures like B-trees, B+
trees, and hash tables to enable fast data retrieval.
 Periodic Index Maintenance: Indexes need to be periodically maintained,
especially when the underlying data changes frequently. Maintenance tasks
include updating, rebuilding, or removing obsolete indexes.
 Query Optimization: Indexes play a critical role in query optimization. The
DBMS query optimizer uses indexes to determine the most efficient execution
plan for a query.
 Handling Fragmentation: Index fragmentation can reduce the effectiveness of
an index. Regular defragmentation can help maintain optimal performance.

22
CODD’S RULES
Codd's rules are proposed by a computer scientist named Dr. Edgar F. Codd and he
also invent the relational model for database management. These rules are made to ensure
data integrity, consistency, and usability. This set of rules basically signifies the
characteristics and requirements of a relational database management system (RDBMS).
Codd's Rules in DBMS

Rule 1: The Information Rule


All information, whether it is user information or metadata, that is stored in a
database must be entered as a value in a cell of a table. It is said that everything within the
database is organized in a table layout.
Rule 2: The Guaranteed Access Rule
Each data element is guaranteed to be accessible logically with a combination of the
table name, primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule
The database catalog, which contains metadata about the database, must be stored
and accessed using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule
A crucial component of any efficient database system is its ability to offer an easily
understandable data manipulation language (DML) that facilitates defining, querying, and
modifying information within the database.
Rule 6: The View Updating Rule
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete
A successful database system must possess the feature of facilitating high-level
insertions, updates, and deletions that can grant users the ability to conduct these operations
with ease through a single query.
Rule 8: Physical Data Independence
Application programs and activities should remain unaffected when changes are
made to the physical storage structures or methods.
Rule 9: Logical Data Independence
Application programs and activities should remain unaffected when changes are
made to the logical structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence
Integrity constraints should be specified separately from application programs and
stored in the catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence
The distribution of data across multiple locations should be invisible to users, and the
database system should handle the distribution transparently.
Rule 12: Non-Subversion Rule
If the interface of the system is providing access to low-level records, then the
interface must not be able to damage the system and bypass security and integrity
constraints.

ENTITY RELATIONSHIP MODEL


The Entity-Relationship Model (ER Model) is a conceptual model for designing a
databases. This model represents the logical structure of a database, including entities, their
attributes and relationships between them.
 Entity: An objects that is stored as data such as Student, Course or Company.
 Attribute: Properties that describes an entity such as StudentID, CourseName,
or EmployeeEmail.
 Relationship: A connection between entities such as "a Student enrolls in a Course".

24
The graphical representation of this model is called an Entity-Relation
Relation Diagram (ERD).
(ERD)

 ER Model in Database Design Process


We typically follow the below steps for designing a database for an application.
 Gather the requirements (functional and data) by asking questions to the database
users.
 Create a logical or conceptual design of the database. This is where ER model
plays a role. It is the most used graphical representation of the conceptual design
of a database.
 After this, focus on Physical Database Design (like indexing) and external design
(like views)

 Why Use ER Diagrams In DBMS?


 ER diagrams represent the E-R model in a database, making them easy to convert
into relations (tables).
 These diagrams serve the purpose of real-world modeling of objects which makes
them intently useful.
 Unlike technical schemas, ER diagrams require no technical knowledge of the
underlying DBMS used.
 They visually model data and its relationships, making complex systems easier to
understand.

 Symbols Used in ER Model


ER Model is used to model the logical view of the system from a data perspective which
consists of these symbols:
 Rectangles: Rectangles represent entities in the ER Model.
 Ellipses: Ellipses represent attributes in the ER Model.
 Diamond: Diamonds represent relationships among Entities.
 Lines: Lines represent attributes to entities and entity sets with other relationship
types.
 Double Ellipse: Double ellipses represent multi-valued Attributes, such as a
student's multiple phone numbers
 Double Rectangle: Represents weak entities, which depend on other entities for
identification.

 What is an Entity?
An Entity represents a real-world
world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent
these entities.
Example of entities:
 Real-World Objects: Person, Car, Employee etc.
 Concepts: Course, Event, Reservation etc.
 Things: Product, Document, Device etc.
The entity type defines the structure of an entity, while individual instances of that
type represent specific entities.

 What is an Entity Set?


 An entity refers to an individual object of an entity type, and the collection of all
entities of a particular type is called an entity set. For example, E1 is an entity that
belongs to the entity type "Student," and the group of all students forms the entity
set.
 In the ER diagram below, the entity type is represented as:
 Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely
identify each instance of the entity. A Strong Entity does not depend on any other
Entity in the Schema for its identification. It has a primary key that ensures its
uniqueness and is represented by a rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It
depends on a strong entity to be identified. A weak entity is associated with an
identifying entity (strong entity), which helps in its identification. A weak entity are
represented by a double rectangle. The participation of weak entity types is always
total. The relationship between the weak entity type and its identifying strong entity
type is called identifying relationship and it is represented by a double diamond.
Example:
A company may store the information of dependents (Parents, Children,
Spouse) of an Employee. But the dependents can't exist without the employee. So
dependent will be a Weak Entity Type and Employee will be identifying entity type
for dependent, which means it is Strong Entity Type.

 Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student
entity Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that
define entity type Student. In ER diagram, the attribute is represented by an oval.

 Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the
key attribute. For example, Roll_No will be unique for each student. In ER diagram,
the key attribute is represented by an oval with an underline.
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute.
For example, the Address attribute of the student Entity type consists of Street, City,
State, and Country. In ER diagram, the composite attribute is represented
epresented by an oval
comprising of ovals.

3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example,
Phone_No (can be more than one for a given student). In ER diagram, a multivalued
attribute is represented by a double oval.

4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is
known as a derived attribute. e.g.; Age (can be derived from DOB). In ER diagram,
the derived attribute is represented by a dashed oval.
 The Complete Entity Type Student with its Attributes can be represented as:

 Relationship Type and Relationship Set


A Relationship Type represents the association between entity types. For
example, ‘Enrolled in’ is a relationship type that exists between entity type Student and
Course. In ER diagram, the relationship type is represented by a diamond and
connecting the entities with lines.

A set of relationships of the same type is known as a relationship set. The following
relationship set depicts
icts S1 as enrolled in C2, S2 as enrol
enrolled
led in C1, and S3 as registered

in C3.
 Degree of a Relationship Set
The number of different entity sets participating in a relationship set is called
the degree of a relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a
relation, the relationship is called a unary relationship. For example, one person is married
to only one person.
2. Binary Relationship: W When there are TWO entities set participating in a
relationship, the relationship is called a binary relationship. For example, a Student is
enrolled in a Course.

3. Ternary Relationship: When there are three entity sets participating in a


relationship, the relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship,
the relationship is called an n-ary relationship.

 Cardinality in ER Model
 The maximum number of times an entity of an entity set participates in a
relationship set is known as cardinality.
 Cardinality can be of different types:
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the
cardinality is one-to-one.
one. Let us assume that a male can marry one female and a
female can marry one male. So the relationship will be one-to-.one
one
2.One-to-Many
In one-to-many mapping as well where each entity can be related to more than
one entity. Let us assume that one surgeon department can accommodate many
doctors. So the Cardinality will be 1 to M. It means one department has many
Doctors.

3. Many-to-One
 When entities in one entity set can take part only once in the relationship set
and entities in other entity sets can take part more than once in the relationship
set, cardinality is many to one.
 Let us assume that a student can take only one course but one course can be
taken by many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one
course.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship
cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be
many to many.
ER DIAGRAM

You might also like