ADVANCED DATABASE CONCEPTS
Advanced Database Concepts
1. Distributed Databases
• Concepts: A distributed database is a
collection of multiple interconnected
databases spread across different physical
locations. They are managed by a distributed
database management system (DDBMS),
which ensures that data is accessible from any
site within the distributed system.
• Advantages:
o Data Distribution: Data is distributed
across various sites, which can enhance
performance and reliability.
o Improved Performance: Localized data
access can reduce the load on individual
servers and decrease latency.
o Scalability: Adding more nodes or
databases can enhance the system's
capacity.
o Reliability and Availability: Replication
and redundancy increase fault tolerance,
ensuring the system remains operational
even if some sites fail.
o Flexibility: It can be tailored to meet
specific organizational needs, including
geographic distribution and local
autonomy.
• Distributed Database Design:
o Fragmentation: Dividing a database into
smaller pieces or fragments that can be
distributed across different locations.
▪ Horizontal Fragmentation: Dividing a
table into rows.
▪ Vertical Fragmentation: Dividing a
table into columns.
o Replication: Copying data fragments and
storing them in multiple locations to
improve reliability and availability.
o Allocation: Deciding where to place
fragments and replicas across the
distributed system based on factors like
network latency, access patterns, and
resource availability.
2. NoSQL Databases
• Introduction to NoSQL: NoSQL (Not Only
SQL) databases are designed to handle
unstructured or semi-structured data and
scale horizontally across many servers. They
are particularly well-suited for large-scale
data storage and real-time web applications.
• Types of NoSQL Databases:
o Document Stores: Store data as
documents, usually in JSON or BSON
format. Each document can have a
unique structure, allowing flexibility.
▪ Examples: MongoDB, Couchbase.
o Key-Value Stores: Store data as key-value
pairs, where each key is unique, and the
value can be any data type.
▪ Examples: Redis, DynamoDB.
o Column-Family Stores: Store data in
columns rather than rows, which allows
for efficient querying and storage of
sparse data.
▪ Examples: Apache Cassandra, HBase.
o Graph Databases: Designed to store and
query data in the form of graphs, with
nodes, edges, and properties. They are
ideal for applications involving complex
relationships.
▪ Examples: Neo4j, Amazon Neptune.
3. Data Warehousing and Data Mining
• Concepts:
o Data Warehousing: A data warehouse is
a centralized repository for storing large
volumes of structured data from multiple
sources. It supports business intelligence
activities like querying, reporting, and
data analysis.
o Data Mining: The process of discovering
patterns, correlations, and anomalies
within large datasets to predict outcomes
or extract useful information.
• Architecture:
o Data Sources: Raw data is collected from
various operational databases, flat files,
and external sources.
o ETL Process (Extract, Transform, Load):
Data is extracted from source systems,
transformed into a suitable format, and
loaded into the data warehouse.
o Data Warehouse: Organized into fact and
dimension tables, typically following a
star or snowflake schema.
o OLAP (Online Analytical Processing):
Allows users to analyze data by providing
multi-dimensional views of data and
supporting complex queries.
• OLAP (Online Analytical Processing):
o MOLAP (Multidimensional OLAP): Data
is pre-aggregated in a multidimensional
cube, which allows for fast query
performance.
o ROLAP (Relational OLAP): Uses standard
relational databases to store data and
supports dynamic querying.
o HOLAP (Hybrid OLAP): Combines the
benefits of MOLAP and ROLAP by using a
combination of pre-aggregated cubes and
relational databases.
• Data Mining Techniques:
o Classification: Assigning data into
predefined categories or classes.
▪ Examples: Decision Trees, Support
Vector Machines.
o Clustering: Grouping data into clusters
based on similarity without predefined
categories.
▪ Examples: K-Means, Hierarchical
Clustering.
o Association Rule Learning: Discovering
interesting relationships or associations
between variables in large datasets.
▪ Examples: Apriori Algorithm, FP-
Growth.
o Anomaly Detection: Identifying unusual
patterns that do not conform to expected
behavior.
▪ Examples: Isolation Forest, DBSCAN.