0% found this document useful (0 votes)
7 views18 pages

Normalization

Normalization is the process of structuring a database to minimize redundancy and enhance data integrity by organizing tables and establishing relationships. It involves multiple stages, including First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), each with specific rules to ensure data is stored efficiently. Additionally, the document discusses database languages, types of databases (centralized and distributed), data security measures, the role of a Database Administrator (DBA), and the use of Entity-Relationship (ER) diagrams for database modeling.

Uploaded by

ambikaraut38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views18 pages

Normalization

Normalization is the process of structuring a database to minimize redundancy and enhance data integrity by organizing tables and establishing relationships. It involves multiple stages, including First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF), each with specific rules to ensure data is stored efficiently. Additionally, the document discusses database languages, types of databases (centralized and distributed), data security measures, the role of a Database Administrator (DBA), and the use of Entity-Relationship (ER) diagrams for database modeling.

Uploaded by

ambikaraut38
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Normalization

Normalization is the process of structuring a database to reduce data redundancy and improve data integrity. It

organizes tables and columns to eliminate duplicate data and ensure that data dependencies make logical sense.

The primary goal is to break down a large table into smaller, more manageable tables and establish relationships

between them.

Unnormalized Table Example

Let's start with an unnormalized table named Student Course Enrollment that contains information about

students, their courses, and their instructors

Problems:

 Multiple values in Courses and Instructor columns (bad for searching, updating)

 Redundancy: "Alice" appears twice

First Normal Form (1NF)

A table is in 1NF if it eliminates repeating groups and ensures that all data values are atomic, meaning they are

indivisible. This means a single cell in a table should not contain multiple values, like a comma-separated list of

items or an array. To achieve this, you break up a table with repeating information into separate rows.
Rule: A table is in 1NF if it meets the following criteria:

1. Atomicity: Each column contains atomic (indivisible) values.

2. No Repeating Groups: There are no repeating groups of columns.

Second Normal Form

A table is in 2NF if it's already in 1NF and all its non-key attributes are fully dependent on the entire primary

key. This rule primarily applies to tables with a composite primary key (a key made up of two or more

columns). If a non-key column depends on only part of the primary key, it creates a partial dependency. To fix

this, you split the table. The columns with the partial dependency are moved to a new table with the partial key

as its primary key.

Rule:

Must be in 1NF

A table is in 2NF if it is in 1NF and all non-key attributes are fully functionally dependent on the entire

primary key.
Third Normal Form (3NF)

A table is in 3NF if it's in 2NF and all non-key attributes have no transitive dependency. A transitive

dependency occurs when a non-key attribute is dependent on another non-key attribute, which in turn is

dependent on the primary key. In simpler terms, a non-key field must depend on "the key, the whole key, and

nothing but the key.

Rule:

Must be in 2NF

A table is in 3NF if it is in 2NF and all non-key attributes are not transitively dependent on the primary key.
Database Language

Database languages are specialized languages used to communicate with and manage database systems. They

are typically used to define the structure of a database, manipulate the data within it, control user access, and
manage transactions. The most well-known of these is SQL (Structured Query Language), which is a

comprehensive language that incorporates different types of commands to handle all these functions.

The four main types of database languages are:

Data Definition Language (DDL)

DDL commands are used to define and manage the structure of a database. They deal with the database's

schema, which is the skeleton that holds the data. You use DDL to create, alter, and delete database objects like

tables, views, indexes, and schemas. Once a DDL command is executed, the changes are permanent and cannot

be rolled back.

Common DDL Commands:

 CREATE: Creates a new database object (e.g., CREATE TABLE Students).

 ALTER: Modifies the structure of an existing database object (e.g., ALTER TABLE Students ADD

Email VARCHAR(255)).

 DROP: Deletes a database object and all its data (e.g., DROP TABLE Students).

 TRUNCATE: Deletes all data from a table while keeping its structure intact.

 RENAME: Changes the name of a database object.

Data Manipulation Language (DML)

DML commands are used to manipulate the data stored within the database objects. These are the most

frequently used commands as they allow you to insert, retrieve, update, and delete the actual information. DML

commands are not automatically saved, so their changes can be undone.

Common DML Commands:


 SELECT: Retrieves data from a database (e.g., SELECT * FROM Students WHERE StudentID = 101).

This is often considered a subcategory called Data Query Language (DQL).

 INSERT: Adds new rows of data to a table (e.g., INSERT INTO Students (StudentID, StudentName)

VALUES (104, 'Dave')).

 UPDATE: Modifies existing data within a table (e.g., UPDATE Students SET StudentMajor = 'Biology'

WHERE StudentID = 101).

 DELETE: Removes rows of data from a table (e.g., DELETE FROM Students WHERE StudentID =

102).

Data Control Language (DCL)

DCL commands are used to control user access and permissions within the database. They are crucial for

database security, allowing administrators to grant and revoke privileges to different users and roles. This

ensures that only authorized users can perform specific actions on the data and its structure.

Common DCL Commands:

 GRANT: Gives a user or role specific privileges (e.g., GRANT SELECT ON Students TO user1).

 REVOKE: Removes previously granted privileges from a user or role (e.g., REVOKE DELETE ON

Students FROM user1).

Transaction Control Language (TCL)

TCL commands are used to manage transactions, which are a sequence of database operations that are treated

as a single, indivisible unit. These commands ensure data integrity by allowing changes to be saved

permanently, undone, or managed in a controlled way.

Common TCL Commands:


 COMMIT: Permanently saves the changes made by a transaction.

 ROLLBACK: Undoes all changes made since the last COMMIT or SAVEPOINT.

 SAVEPOINT: Creates a temporary point in a transaction to which you can later roll back.

Centralized Database

A centralized database stores all its data and related processing on a single, central server. All users access this

one physical location over a network to retrieve and manage information. This model is straightforward and is

commonly used by smaller organizations or for applications where data doesn't need to be globally distributed.

The primary advantage is simplicity: it's easy to manage, secure, and maintain because all of the data resides in

one place. If the central server fails, the entire system becomes unavailable, which can cause significant

downtime. As the number of users and the amount of data grow, a centralized database can also suffer from

performance bottlenecks, as all traffic is directed to a single server. This makes it less scalable and less suitable

for large, geographically diverse organizations that require high availability and local data access.

Advantages

1. Simplified Management: It's much easier to manage, maintain, and secure since all data is in one place.

Tasks like backups and administration are straightforward.

2. Strong Data Integrity: With all data consolidated in a single location, it's easier to enforce data

consistency and integrity, as there are no synchronization issues between different data copies.

3. Lower Cost: The initial setup and operational costs are generally lower. You only need to invest in one

powerful server and the necessary software licenses.

4. Security Control: Security is more manageable as you only need to protect a single point of entry with

firewalls, access controls, and other security measures.


5. Simplified Development: Application development is simpler because there's no need to consider data

fragmentation or complex distributed transaction protocols.

Disadvantages

1. Single Point of Failure: If the central server goes down, the entire database becomes inaccessible,

leading to a complete service outage.

2. Performance Bottleneck: All user queries and requests are routed to a single server. As the number of

users grows, the server can become overloaded, leading to slow response times.

3. Scalability Issues: Scaling a centralized database is difficult and often requires expensive hardware

upgrades (vertical scaling) to handle increased load.

4. Reduced Availability: The database is only available if the central server is running. Any downtime,

planned or unplanned, affects all users.

5. High Latency: For geographically distant users, network latency can be a significant issue, as all data

requests must travel to and from the central server.

Distributed Database

A distributed database is a database in which data is stored across multiple computers, also known as nodes,

that are interconnected by a network. It differs from a centralized database where all data is on a single

machine. The system is designed to appear as a single, logical database to the user, who doesn't need to know

where the data is physically located. This architecture is crucial for handling large volumes of data and high

user traffic, as it can distribute the workload and data across multiple servers.

Distributed databases use two main strategies to manage data across nodes:
 Fragmentation: The database is broken into smaller logical units or fragments, and each fragment is

stored on a different node. For example, a global company might store customer data for a specific

region on a server located within that region.

 Replication: Copies of the same data are stored on multiple nodes. This ensures that if one node fails,

the data is still available from a replica on another node, increasing reliability and fault tolerance.

A Distributed Database Management System (DDBMS) is the software that manages this complex

environment, handling tasks like data fragmentation, replication, and query processing to ensure consistency

and transparency for the user.

Advantages

1. High Availability: There is no single point of failure. If one server goes down, the rest of the network

can continue to operate, ensuring uninterrupted service.

2. Scalability: Distributed databases can be easily scaled horizontally by adding more nodes, which is

often more cost-effective than upgrading a single, powerful server.

3. Improved Performance: Data can be stored closer to the users who access it most frequently, reducing

network latency and improving response times.

4. Fault Tolerance: Data is often replicated across multiple nodes. If a node fails, its data can still be

accessed from a healthy replica, making the system resilient to failures.

5. Modular Growth: New nodes can be added to the network incrementally to handle growth in data or

user traffic without impacting the rest of the system.

Disadvantages

1. Complexity: They are significantly more complex to design, implement, and manage than centralized

databases. Ensuring data consistency across all nodes is a major challenge.

2. Higher Cost: The initial setup and ongoing management can be more expensive due to the need for

multiple servers, complex networking, and specialized administrative skills.


3. Data Consistency Issues: Keeping all replicated data consistent in real-time is difficult. Network delays

or failures can lead to temporary data inconsistencies.

4. Security Risks: A distributed system has multiple access points, increasing the overall security risk and

making it more challenging to protect the data.

5. Lack of Standardization: There is a lack of universal standards for distributed database systems, which

can make it challenging to integrate different systems or migrate data between them.

Data Security

Data security in a Database Management System (DBMS) is the set of measures and controls designed to

protect the confidentiality, integrity, and availability of data. This involves not only safeguarding the data itself

but also the DBMS software, the physical infrastructure, and any applications that access the data. Key

components of database security include robust access control, which uses authentication to verify a user's

identity and authorization to define what they are allowed to do. Encryption, both for data at rest (on disk) and

in transit (over a network), serves as a final line of defense against unauthorized access. Additionally, constant

auditing and monitoring of database activity is essential for detecting suspicious behavior, investigating

security incidents, and ensuring compliance with regulations. Common threats that database security aims to

mitigate include SQL injection attacks, insider threats, and a lack of proper configuration, all of which can lead

to data breaches or corruption. By implementing a multi-layered security strategy, organizations can

significantly reduce the risk of data loss and maintain the trustworthiness of their information.

Key Security Measures

 Access Control: This is the most fundamental layer. It ensures only the right people can access the

database. This includes:


 Authentication: Verifying a user's identity, usually with a username and password. Multi-factor

authentication adds an extra layer of security.

 Authorization: Defining what an authenticated user is allowed to do. The principle of least

privilege is key here—users are given only the minimum permissions needed for their job.

 Data Encryption: This is a vital last line of defense that makes data unreadable to anyone who doesn't

have the decryption key. It's used in two main ways:

 Encryption at Rest: Encrypting data while it's stored on a disk. Even if a server is stolen, the

data is useless.

 Encryption in Transit: Encrypting data as it travels over a network between the user and the

database, preventing interception.

 Auditing and Monitoring: This involves continuously tracking and logging all database activity to

detect suspicious behavior. Auditing creates a record of who did what and when, which is crucial for

investigations and compliance. Real-time monitoring systems can alert administrators to unusual

activity, such as multiple failed login attempts.

Common Threats

Database security must address a range of threats, including SQL injection attacks, where malicious code is

inserted into an application's input fields, and insider threats, which are risks posed by employees or

contractors with authorized access. Other threats include privilege abuse by a legitimate user and

vulnerabilities caused by weak passwords or system misconfigurations.

Database Administrator (DBA)


A Database Administrator (DBA) is a specialized IT professional who manages, maintains, and secures a

database system. They ensure that data is correctly stored, organized, and available to users when needed. The

role is critical in any organization that relies on data for its operations, as the DBA is the primary person

responsible for the performance, integrity, and security of the database.

 Installation and Configuration 🛠️: DBAs install and configure the database management system

(DBMS) software and related tools, customizing them to meet an organization's specific requirements.

 Backup and Recovery: They create and implement comprehensive backup strategies to prevent data

loss. This includes scheduling regular backups and testing recovery procedures to ensure data can be

restored quickly and accurately after a system failure.

 Security Management: DBAs are the gatekeepers of the database. They manage user accounts, roles,

and permissions to ensure only authorized individuals can access and modify sensitive data. They also

monitor for security breaches.

 Performance Tuning: They constantly monitor database performance, analyze query efficiency, and

fine-tune system parameters to ensure fast data retrieval and a smooth user experience.

 Data Integrity: DBAs enforce data integrity rules to ensure data is consistent and accurate. They often

implement checks and constraints to prevent bad data from entering the database.

 Troubleshooting: When issues arise, such as a slow query, a system crash, or data corruption, the DBA

is responsible for diagnosing the root cause and implementing a solution to fix it.

 Maintenance: This includes routine tasks like applying software patches and updates, managing storage

space, and archiving old data to maintain a healthy database environment.

 Database Design: DBAs collaborate with developers and architects to design the database schema,

including tables, relationships, and indexes, to ensure it is logical and efficient.

 Capacity Planning: They monitor database growth and usage trends to predict future resource needs,

such as storage and memory, and plan for upgrades or new hardware.
 Data Migration: When an organization needs to move data, whether between different servers, from

on-premises to the cloud, or to a new database system, the DBA is responsible for planning and

executing the migration smoothly and without data loss.

ER- Diagram

An Entity-Relationship (ER) diagram is a visual tool used to model the logical structure of a database. It acts

as a blueprint, showing how different pieces of information, or entities, relate to each other. By using

standardized symbols, it helps developers and analysts understand the relationships, attributes, and constraints

of a database before it's actually built.

Key Figures and Their Workings

ER diagrams are built using three core components: entities, attributes, and relationships. Different notations

exist (like Chen or Crow's Foot), but the basic figures and their meanings are consistent.

 Entities: Represented by a rectangle, an entity is a real-world object or concept that you want to store

data about. Think of entities as the nouns in your database.

 Example: Student, Course, Instructor.

 Attributes: Represented by an oval or listed within the entity rectangle, attributes are the properties or

characteristics of an entity.

 Example: A Student entity might have attributes like StudentID, StudentName, and

StudentMajor. A key attribute, which uniquely identifies an entity (like StudentID), is often

underlined.
 Relationships: Represented by a diamond, a relationship is how two or more entities are associated

with each other. It's the verb that connects the nouns.

 Example: A Student enrolls in a Course. The diamond would be labeled "Enrolls In."

The lines connecting these figures also show cardinality, which defines the number of instances of one entity

that can be related to instances of another. The three main types are:

 One-to-One (1:1): Each entity instance relates to exactly one instance of another entity. (e.g., a person

has one social security card).

 One-to-Many (1:N): One entity instance can relate to many instances of another. (e.g., an instructor

teaches many courses).

 Many-to-Many (M:N): Many instances of one entity can relate to many instances of another. (e.g.,

many students enroll in many courses).

Symbols Used in ER Model

ER Model is used to model the logical view of the system from a data perspective which consists of these

symbols:
Key Advantages of ER Diagrams

1. Clarity and Simplicity: ER diagrams provide a simple, graphical representation of a complex database

structure. This visual format is much easier to understand than a detailed textual description or a list of

tables. It helps everyone involved—from developers and business analysts to clients—get a clear picture

of how the data is organized.

2. Effective Communication: By acting as a universal blueprint, ER diagrams facilitate communication

among various teams. Business users can validate the model to ensure it accurately represents their

needs, while developers can use the diagram to build the database without misunderstanding the

requirements.

3. Better Database Design: ER diagrams are a crucial step in the design process. They help designers

identify and correct flaws like redundancy, missing relationships, or inconsistent data before any code is

written. This proactive approach leads to a more efficient, logical, and robust final database.

4. Reduced Redundancy: The process of creating an ER diagram naturally encourages normalization. By

clearly mapping out entities and their relationships, designers can spot repeating data and structure the

database to minimize redundancy and improve data integrity.


5. Blueprint for Implementation: The ER diagram serves as a direct blueprint for building the database.

Once the diagram is finalized, developers can easily translate entities into tables, attributes into columns,

and relationships into foreign key constraints, significantly speeding up the implementation process.

What is an Entity?

An Entity represents a real-world object, concept or thing about which data is stored in a database. It act as a

building block of a database. Tables in relational database represent these entities.

Example of entities:

 Real-World Objects: Person, Car, Employee etc.

 Concepts: Course, Event, Reservation etc.

 Things: Product, Document, Device etc.

Types of |Entity

Strong Entity

A strong entity is an independent entity that can exist on its own and has a primary key to uniquely identify its

instances. It does not rely on any other entity for its existence.

For example, in a university database, Student is a strong entity because a student can exist independently of a

course. Their primary key, StudentID, is sufficient to identify them.

Weak Entity

A weak entity is an entity that cannot exist without its related strong entity. It does not have a primary key of

its own; instead, it relies on the primary key of its parent strong entity to form its own unique identifier.

For example, a Dependent entity (a child of an employee) is a weak entity because a dependent cannot exist in

the database unless they are associated with a specific Employee (the strong entity).

Data Integrity
Data integrity refers to the overall accuracy, completeness, and consistency of data throughout its entire

lifecycle. It's about ensuring that data is reliable and trustworthy. Maintaining data integrity is crucial for

making sound business decisions and for meeting legal and regulatory compliance requirements.

There are two main types of data integrity:

 Entity Integrity: This type of integrity ensures that each row in a table is uniquely identified. It is

enforced by the primary key, which must contain unique values and cannot be NULL. This prevents

duplicate rows and guarantees that every entity instance is distinct.

 Referential Integrity: This ensures that relationships between tables remain consistent. It is enforced

using foreign keys. A foreign key in one table must either point to a valid, existing record in another

table or be NULL. For example, a student record with a foreign key referencing a course ID must refer

to a course that actually exists in the courses table. This prevents "orphan" records.

Data integrity is protected through various constraints and rules defined within the database schema, such as

UNIQUE, NOT NULL, CHECK, and FOREIGN KEY constraints. These measures prevent the entry of

incorrect, incomplete, or duplicate data.

You might also like