0% found this document useful (0 votes)
44 views7 pages

Advanced Database - Midterm RVWR

ADVANCED-DATABASE-_-MIDTERM-RVWR

Uploaded by

Ynn Delro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views7 pages

Advanced Database - Midterm RVWR

ADVANCED-DATABASE-_-MIDTERM-RVWR

Uploaded by

Ynn Delro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATABASE BACKUP AND RECOVERY Recovery Strategies

● A database backup is a copy of the


database that can be used to restore 1. Point-in-Time Recovery: This allows the
the database to a previous point in database to be restored to a specific point in
time in case of data loss, corruption, time, often used when dealing with logical data
errors like accidental data deletion.
or failure. Backups ensure data is
2. Full Database Recovery: Using a full
not lost permanently and can be
backup, the database is restored to the point at
restored when needed. which
the backup was taken. This method may result
Why is Backup Important? in data loss from the time the full backup was
taken until the time of the failure.
● Data Integrity: In the event of data 3. Transaction Log Recovery: Transaction logs
corruption, backups ensure data record all changes made to the database. These
integrity and minimize loss. logs allow for recovery to a very specific point
● Disaster Recovery: If a natural and are essential for minimizing data loss in
disaster, hardware failure, or certain situations.
cyberattack occurs, backups allow for 4. Snapshot Recovery: Snapshots capture the
restoring essential data. database at specific moments in time. They are
● Business Continuity: With a solid useful for quick recovery in environments where
backup and recovery plan, organizations the database changes rapidly
can minimize downtime and continue
operations with minimal disruption. Best Practices for Database Backup and
Recovery
Types of Database Backup
Schedule Automated Regular Backups:
● Full Backup: A full backup captures the Schedule regular backups to ensure no
entire database. It is the most complete important data is left out.
type of backup and allows for a Test Recovery Plans: Regularly test backups by
straightforward recovery process. performing recovery simulations to ensure they
However, it is time- consuming and work correctly.
storage-intensive.
● Differential Backup: A differential Store Backups Offsite: Store backups in
backup includes all changes made since different locations (cloud or physical) to protect
the last full backup. This type of backup against disasters that affect the primary storage
is faster and takes up less storage than location.
a full backup but requires the last full
backup for restoration.
● Incremental Backup: Incremental Encrypt Backups:
backups store only the data that has Ensure all backups are encrypted to
changed since the last backup, Prevent unauthorized access.
regardless of
type (full or incremental). It’s efficient in Keep Versioning: Keep
terms of time and multiple versions of backups to protect against
storage but can be more complex to long-term corruption that
restore. may go unnoticed.
Database Design 5. Views: Views are virtual tables
created based on the data from one or
Database schema is a logical blueprint or more underlying tables. They provide
structural framework that defines the an abstracted and simplified
organization, structure, and relationships of data representation of data, allowing users
elements within a database. to query and manipulate data without
directly accessing the underlying
● It provides a high-level view of how data tables.
is organized and stored in a database
management system (DBMS). 6. Stored Procedures and
Functions: Some database schemas
[Link]: These are the primary data include definitions of stored
containers in a database, procedures and functions, which are
representing specific entities or pre-defined sets of SQL statements
concepts (e.g., customers, orders, that can be executed as a single unit.
products). Each table is defined with These procedures and functions can
a unique name and a set of columns, be used to perform data manipulations,
each with a defined data type. calculations, or other operations

7. Security and Access Control: The


[Link]: Columns represent the
schema may also include information
individual data attributes or fields
about access control, specifying who
within a table. Each column has a
has permissions to access and
name and a data type that specifies
manipulate data within the database
the kind of data it can store (e.g., text,
numbers, dates). Column definitions
Database schemas are crucial for
may also include constraints, such as
maintaining data integrity, consistency,
primary keys, foreign keys, and default
and organization within a database
values.
system. They serve as a foundation for
designing, implementing, and
3. Constraints: Constraints define
managing databases, making it easier
rules and conditions that data in the
to understand the database's structure
database must adhere to. Common
and interact with the data it contains.
constraints include primary keys (to
ensure uniqueness of records), foreign
Normalization Review:
keys (to establish relationships
Definition Review
between tables), and check constraints
• Functional dependency
(to validate data values).
– The relationship (within the relation)
that describes how the value of one
4. Relationships: Database schema
attribute may be used to find the value
outlines the relationships between
of another attribute.
tables, specifying how data in one
• Determinant
table relates to data in another.
– The attribute that can be used to find
Common relationship types include
the value of another attribute in the
one-to-one, one-to-many, and many-to-
relation
many relationships.
– The right-hand side of a functional
dependency
Second Normal Form
• Candidate key • Table must be in First Normal Form
– The value of a candidate key can be • Composite keys
used to find the value of every other – All columns in a row must refer to the entire
attribute in the table. key
– A simple candidate key consists of • Benefits
only one attribute. – Increased storage efficiency
– A composite candidate key consists of – Less data repetition
more than one attribute
Third Normal Form
Normal Forms • Table must be in Second Normal Form
• There are many defined normal – If your table is 2NF, there is a good chance it is
forms: 3NF
– First Normal Form (1NF) • All columns must depend directly on the
– Second Normal Form (2NF) primary key
– Third Normal Form (3NF) • “The key, the whole key, and nothing but the
– Boyce-Codd Normal Form (BCNF) key”
– Fourth Normal Form (4NF) • Benefits
– Fifth Normal Form (5NF) – No extraneous data
– Domain/Key Normal Form (DK/NF)
Joining Tables
Representing Weak Entities • Two Basic Joins
• If not ID-dependent, use the same – Inner-Join
techniques for strong entities. – Outer Join
• If ID-dependent, then must add • LEFT OUTER JOIN
primary key of the parent entity. • RIGHT OUTER JOIN
• FULL OUTER JOIN

Database Normalization De-Normalizing Tables


• Use with caution
• Normalize first, then de-normalize
• Cures the ‘SpreadSheet Syndrome’
• Use only when you cannot optimize
• Store only the minimal amount of information.
• Try temp tables, UNIONs, VIEWs, subselects
• Remove redundancies.
first
• Remove anomalies.
• Restructure data.
Database Design
First Normal Form • It is a collection of processes that facilitate the
• All values must be atomic designing, development, implementation and
• Each row must be unique maintenance of enterprise data management
– Use a primary key systems.
• Benefits
– Easier to query/sort the data • It helps produce database systems:
– More scalable + That meet the requirements of the users
– Each row can be identified for updating + Have high performance
• The main objectives of database designing are Entity
to produce logical and physical design models of • An entity can be a real-world object, either
the proposed database system. animate or
• The logical model concentrates on the data inanimate, that can be easily identifiable.
requirements and data to be stored independent • An entity set is a collection of similar types of
of physical considerations. The physical design entities.
model involves translating the logical design of
the database onto physical media using Attributes
hardware resources and software systems • Entities are represented by means of their
properties, called attributes. All attributes have
Conceptual Design values.
• There exists a domain or range of values that
• It is a high-level data model of the specific can be assigned to attributes.
application area.
• It describes how entities(objects, items) are Types of Attributes
related to each other. • Simple attribute – attributes that are atomic
• Example: ERD or UML Class Diagram values.
• Composite attribute – attributes made of
Logical Design more than one attribute.
• The result of the logical design phase is a set • Derived attribute – attributes that do not exist
of relation schemas. The ER diagram or class in the physical database, but their values are
diagram is the basis for these relation schemas. derived from other attributes present in the
• The relation schemas are the basis for table database.
definitions. In this phase, the • Single-value attribute – attributes contain
primary keys and foreign keys are defined. single value.
• Multi-value attribute – attributes may contain
more than one value.
Normalization
• Normalization is the last part of the logical
design.
• The goal of normalization is to eliminate Entity-Set and Keys
redundancy and potential update • Key is an attribute or collection of attributes
anomalies. that uniquely identifies an entity among entity
set.
Physical Design • Super Key – a set of attributes (one or more)
• The goal of the last phase of database design that collectively identifies an entity in a set.
is to implement the database. At this phase, one • Candidate Key – an entity set may have more
must know which database management than one candidate key.
systems (DBMS) is used. • Primary Key – is one of the candidate keys
chosen by database
designer to uniquely identify the entity set.
Entity-Relationship Model
• The entity-relationship model (ER model)
defines the conceptual view of a database. It Relationship
works around real-world entities and the • The association among entities is called a
associations among them. At view level, relationship.
The ER model is considered a good option for
designing databases.
continuous value based on input data.
Algorithm Example: Linear
Regression, Polynomial Regression
Cardinality Application: House price
• Cardinality defines the number of entities in prediction based on area,
one entity set, which can be associated with the number of rooms, etc
number of entities of another set via the
relationship set. ● Supervised Learning: A type of
• One-to-one learning where the algorithm is trained
• One-to-many on a labeled dataset.
• Many-to-one ● Unsupervised Learning: The algorithm
• Many-to-many identifies patterns in data that is not
labeled.
● Overfitting: When a model learns the
ER-Diagrams
training data too well, including noise,
• Entities are represented by means of
making it less effective on new data.
rectangles. Rectangles are named with the
● Feature Selection: The process of
entity set they represent
selecting the most important variables
• Attributes are the properties of entities.
for model training.
Attributes are represented by means of ellipses.
Every ellipse represents. one attribute and is
Generalization
directly connected it its entity.
Data mining is essentially about transforming
• Relationships are represented by
raw data into valuable insights and patterns that
diamond-shaped box.
can
used to make decisions or predictions.
Data Mining Algorithms and Applications
Data Warehouse
● Classification - Categorizes data into
predefined classes.
A Data Warehouse (DW) is a central repository
Algorithm Example -
of integrated data from multiple sources. It is
Decision Trees, Naive Bayes.
used to store historical data, allowing
Application - Email spam
businesses to perform analysis, reporting, and
detection.
data mining to make strategic decisions.

● Clustering: Groups data points into


Key Characteristics
clusters based on similarity.
● Subject-Oriented Organized around
Algorithm Example: K-means, DBSCAN.
key subjects such as sales, customers,
Application: Customer
or inventory.
segmentation in marketing.
● Integrated Combines data from different
sources into a consistent format.
● Association Rule Learning: Identifies
● Time-Variant Stores historical data that
interesting relations between variables.
can be analyzed over time.
Algorithm Example: Apriori,
● Non-Volatile Once entered, data is not
Eclat.
updated or deleted, making it stable for
Application: Market basket
analysis.
analysis to discover product
purchase patterns.

● Regression: Predicts a
Data warehousing is crucial for
Importance of Data Warehouse
organizations seeking to leverage
data for strategic decision-making.
● FACILITATES DATA ANALYSIS AND
DECISION- MAKING. By implementing a well-structured
● SUPPORTS BUSINESS data warehouse, businesses can
INTELLIGENCE (BI) AND REPORTING. gain valuable insights, improve
● HELPS WITH TREND ANALYSIS, efficiency, and enhance customer
FORECASTING, AND PERFORMANCE experiences across various
METRICS. industries.

Data Warehouse Architecture ADDITONALS :

● Data Source Layer RDBMS constraints


● ETL Process (Extract, Transform, Not Null: It is used to ensure that every column
Load) should not have a null value, i.e. empty cell
● Presentation Layer Unique: It is used to ensure that every column
contains unique data.
● What is the role of metadata in a Check: It is used to ensure that every entry in a
data warehouse? Describe the column or row satisfies a specified condition.
source, structure, and meaning of Primary Key: It is used to identify each row of
the table. The primary key must be unique. It
data.
must not contain null values. One table can only
● What is the primary purpose of a
contain one primary key. It links two or more
data warehouse? Provide a central tables.
repository for integrated data Foreign Key: It links two tables. It is filed in one
analysis table that refers to a primary key that is located
● How is a data warehouse different in another table. In other words, it makes a
from a regular database? It stores relation between two tables which contain the
historical data and is used for primary key. Tables can link with each other
analytics. using these two keys.
● What is a key characteristic of a data Data Integrity: It checks the integrity of data
before creating data. Below is the list of
Warehouse? Time-variant and
categories through which it checks the integrity
non-volatile
of data.
● In the context of data warehousing, Entity Integrity: It is used to ensure that there is
what is a data mart? A smaller, more no duplicate row in the table.
focused subset of a data warehouse Referential Integrity: It ensures that the rows that
● In the ETL process, what is the re-linked with other tables cannot be deleted.
primary purpose of the "Load" Domain Integrity: It ensures the entry of data in
phase? Move data from external a table based on specific condition like file
sources to the warehouse format, the rage of values, etc.
User-Defined Integrity: It ensures that integrity
in the table satisfies the user-defined conditions.
Data Definition Language (DDL)
DDL is a subset of SQL (Structured Query
Language) used to define and manage the
structure of a database. It includes
commands to create, modify, and delete
database objects such as tables,indexes,
views, and schemas. DDL statements
define the data types of columns, specify
constraints, and establish relationships
between tables.
● CREATE , ALTER, DROP,
TRUNCATE

Data Manipulation Language (DML)


DML is another subset of SQL used to
interact with the data stored in a database.
It includes commands to insert, update,
retrieve, and delete data from the
database. DML statements allow users to
interact with the data stored in the database
tables and apply changes as needed.

● SELECT, INSERT, UPDATE,


DELETE

Structured Query Language (SQL)


SQL is a domain-specific language used
for managing and querying relational
databases. It encompasses both DDL and
DML components. SQL enables users to
define the structure of the database,
manipulate data, retrieve information,
andperform various database
operations.

● COMMIT, ROLLBACK, SAVEPOINT,


SAVE TRANSACTION

You might also like