CATHOLIC DIOCESE OF NSUKKA
ST CHARLES COLLEGE, OPI
COURSE: DATA PROCESSING
TERM: SECOND
CLASS: SS 3
Instructor: Fr. Maximillian ONOYIMA
Email:
[email protected], 08135778491
Topic: PARALLEL DATABASE
A parallel database is a system designed to enhance processing speed and input/output
operations by utilizing multiple CPUs and disks simultaneously. Unlike traditional databases that
process tasks sequentially (one after the other), parallel databases perform many tasks at the
same time. This approach significantly improves efficiency, especially when dealing with large
datasets or serving a large number of users. Organizations of all sizes benefit from parallel
databases because they enable better management of information, faster query responses, and
improved scalability. For instance, social media platforms like Instagram or TikTok rely on
parallel databases to handle millions of users uploading, viewing, and interacting with content
simultaneously.
Architectures of Parallel Databases
There are three primary architectures used to build parallel database management systems
(DBMS): Shared Memory, Shared Disk, and Shared Nothing. Each architecture has its own
advantages and disadvantages, making them suitable for different scenarios.
1. Shared Memory System
In a Shared Memory System, multiple processors are connected to an interconnected network
and share a common region of memory. This setup is similar to a group of students working on
a project using a single whiteboard. All team members can see and modify the information on
the whiteboard, making collaboration straightforward.
Advantages:
Ease of Programming: This architecture is closer to conventional computing systems,
making it easier to program.
Low Overhead: The system has minimal complexity in managing resources.
Efficient Use of OS Services: The operating system can efficiently utilize additional CPUs.
Disadvantages:
Bottleneck Problem: If too many processors try to access the shared memory
simultaneously, it can slow down the system.
High Cost: Building such a system is expensive due to the need for specialized hardware.
Less Sensitive to Partitioning: Data distribution is not optimized, which can reduce
efficiency.
2. Shared Disk System
In a Shared Disk System, each processor has its own main memory but shares access to all disks
through an interconnected network. Imagine a library where each student has their own
notebook (memory) but shares access to the same bookshelves (disks). This setup allows for
some independence while still relying on shared resources.
Advantages:
Simplicity: It shares some of the simplicity of shared memory systems.
Disadvantages:
Interference: Processors may interfere with each other when accessing the same disk,
leading to delays.
High Network Bandwidth Requirement: The system needs a robust network to handle
the increased traffic.
Less Sensitive to Partitioning: Like shared memory systems, data distribution is not
optimized.
3. Shared Nothing System
In a Shared Nothing System, each processor has its own local main memory and disk space. No
two processors can access the same storage area, and all communication between processors
occurs through a network connection. Think of a group of students working on individual
laptops. Each student has their own files and can only share information by sending emails or
messages.
Advantages:
Scalability: The system can easily grow by adding more processors.
Efficient Partitioning: This architecture benefits from good data partitioning, which
improves performance.
Cost-Effective: It is cheaper to build compared to shared memory systems.
Disadvantages:
Complex Programming: Managing communication between processors is more
challenging.
Reorganization Required: Adding new nodes (processors) often requires reorganizing
the system.
Parallel Query Evaluation
In a parallel database, queries are executed using a relational query execution plan, which is
essentially a graph or tree of relational algebra operators. These operators can execute in
parallel, meaning multiple operations can happen simultaneously. For example, if one operator
consumes the output of another operator, they can work together in a pipeline. This is known
as pipelined parallelism.
Practical Example: Imagine a factory assembly line where one worker passes their finished
product to the next worker for further processing. Each worker performs a specific task, and the
entire process is faster because multiple tasks are happening at the same time.
Data Partitioning
To make parallel databases efficient, large databases are divided into smaller parts and stored
across multiple disks. This process is called data partitioning. There are three main ways to
partition data:
1. Round-Robin Partitioning
In Round-Robin Partitioning, data is distributed evenly across all processors in a circular
manner. For example, if there are four processors, the first piece of data goes to Processor 1,
the second to Processor 2, and so on, repeating the cycle. This method is ideal for queries that
need to access the entire database.
Practical Example: Imagine distributing candies to a group of friends by giving one candy to
each friend in turn. This ensures everyone gets an equal share.
2. Hash Partitioning
In Hash Partitioning, a hash function (a mathematical formula) is applied to specific fields of a
tuple to determine which processor it should be assigned to. This method keeps data evenly
distributed, even as the database grows or shrinks over time.
Practical Example: Think of a game where each player is assigned to a team based on their birth
month. The hash function ensures that players are evenly distributed across teams.
3. Range Partitioning
In Range Partitioning, data is sorted and divided into ranges (e.g., A–D, E–H, etc.), and each
range is assigned to a processor. This method is useful for queries that need a specific range of
data.
Practical Example: Imagine organizing books in a library by their titles (A–D on one shelf, E–H on
another, etc.). This makes it easier to find books within a specific range.
Advantages of Parallel Databases
1. High Performance: Parallel databases can handle large amounts of data quickly, making
them ideal for applications like online gaming or e-commerce.
2. Speed: By performing multiple tasks simultaneously, parallel databases significantly
reduce processing time.
3. Reliability: The distributed nature of parallel databases reduces the risk of system
failure.
4. Capacity: These systems can store and manage massive datasets, making them suitable
for big data applications.
Disadvantages of Parallel Databases
1. High Implementation Cost: Building and maintaining a parallel database requires
significant hardware and software resources.
2. Complexity: Managing and maintaining the system is challenging, requiring specialized
knowledge.
3. Resource-Intensive: Parallel databases need ongoing support and maintenance, which
can be costly.
Practical Applications for Teens
1. Social Media Platforms: Platforms like Instagram or TikTok use parallel databases to
handle millions of users uploading, viewing, and interacting with content simultaneously.
2. Online Gaming: Games like Fortnite or Minecraft rely on parallel databases to manage
player data, scores, and interactions in real-time.
3. E-commerce Websites: Websites like Amazon use parallel databases to process millions
of product searches, orders, and payments at the same time.
Conclusion
Parallel databases are a powerful tool for managing large datasets and serving a large user base.
By using multiple CPUs and disks simultaneously, they offer high performance, speed, and
reliability. However, they also come with challenges, such as high implementation costs and
complexity. Understanding the different architectures and partitioning methods is crucial for
designing efficient parallel database systems. As technology continues to evolve, parallel
databases will play an increasingly important role in applications like social media, online
gaming, and e-commerce.