Parallel Processing
Parallel processing is the ability to carry out multiple operations or tasks
simultaneously. The simultaneous use of more than one CPU or processor core to execute
a program or multiple computational threads.
“Parallel computing is a form of computation in which many calculations are carried
out simultaneously operating on the principle that large problems can often be divided
into smaller ones, which are then solved concurrently.”
Types of Parallelism
Bit Level Parallelism
Instruction level parallelism
Data parallelism
Task parallelism
Basic requirements of parallel computing:
Computer hardware that is designed to work with multiple processors and that
provides a means of communication between those processors
Operating system that is capable of managing multiple processors
Application software that is capable of breaking large tasks into multiple smaller tasks
that can be performed in parallel
Classes of parallel computers
Parallel computers can be roughly classified according to the level at which the hardware
supports parallelism. This classification is broadly analogous to the distance between
basic computing nodes. These are not mutually exclusive; for example, clusters of
symmetric multiprocessors are relatively common.
Multicore computing
A multicore processor is a processor that includes multiple execution units ("cores") on
the same chip.
Symmetric multiprocessing
A symmetric multiprocessor (SMP) is a computer system with multiple identical
processors that share memory and connect via a bus
Distributed computing
A distributed computer (also known as a distributed memory multiprocessor) is a
distributed memory computer system in which the processing elements are connected by
a network.
Specialized parallel computers
Within parallel computing, there are specialized parallel devices that remain niche areas
of interest. While not domain-specific, they tend to be applicable to only a few classes of
parallel problems.
Parallel computing has become the dominant paradigm in computer architecture, mainly
in the form of multicore processors
Parallel computer programs are more difficult to write than sequential ones, because
concurrency introduces several new classes of potential software bugs, of which race
conditions are the most common. Communication and synchronization between the
different subtasks are typically one of the greatest obstacles to getting good parallel
program performance.
Parallel processing involves taking a large task, dividing it into several smaller tasks, and
then working on each of those smaller tasks simultaneously. The goal of this divide-and-
conquer approach is to complete the larger task in less time than it would have taken to
do it in one large chunk.
Taxonomy of Architectures
MAIN GOAL:
Reduce Wall-Clock Time
Speed up:
Need for parallel processing.
Computers were invented to solve problems faster than a human being
could. Since day one, people have wanted computers to do more and to do it faster.
Vendors responded with improved circuitry design for the processor, improved
instruction sets, and improved algorithms to meet the demand for faster response time.
Advances in engineering made it possible to add more logic circuits to processors.
Processor circuit designs developed from small-scale to medium-scale integration, and
then to large-scale and very large-scale integration. Some of today's processors have
billions of transistors in them. The clock cycle of processors has also been reduced over
the years. Some of today's processors have a clock cycle on the order of nanoseconds,
and CPU frequencies have crossed the one-gigahertz barrier. All of these advances have
led to processors that can do more work faster than ever before.
However, there are physical limitations on this trend of constant
improvement. The processing speed of processors depends on the transmission speed of
information between the electronic components within the processor. As improvements
in clock cycle and circuitry design reached an optimum level, hardware designers looked
for other alternatives to increase performance. Parallelism is the result of those efforts.
Parallelism enables multiple processors to work simultaneously on several parts of a task
in order to complete it faster than could be done otherwise. Parallel processing not only
increases processing power, it also offers several other advantages when it's implemented
properly. These advantages are:
Higher throughput
More fault tolerance
Better price/performance
Parallel processing is useful for only those applications that can break larger tasks into
smaller parallel tasks and that can manage the synchronization between those tasks. In
addition, there must be a performance gain large enough to justify the overhead of
parallelism.
Parallel Hardware Architectures
Symmetric Multiprocessing systems
Massively Parallel Processing systems
Clustered systems
Non Uniform Memory Access systems
Conclusions
Parallel processing can significantly reduce wall-clock time.
Writing and debugging software is more complicated
Tools for automatic parallelization are evolving, but a smart human can do a lot
better
Overhead of parallelism requires more CPU
Need to decide which architecture is most appropriate for a given application
Applications
1.Parallel Processing for Databases
Types of Parallelism in Databases
Database applications can exploit two types of parallelism in a parallel computing
environment: inter-query parallelism and intra-query parallelism. While inter-query
parallelism has been around for many years, database vendors recently have started to
implement intra-query parallelism as well.
Inter-query parallelism
Inter-query parallelism is the ability to use multiple processors to execute several
independent queries simultaneously. Figure illustrates inter-query parallelism, showing
how three independent queries can be performed simultaneously by three processors.
Inter-query parallelism does not provide speedup, because each query is still executed by
only one processor. In online transaction processing (OLTP) applications, each query is
independent and takes a relatively short time to execute. As the number of OLTP users
increases, more queries are generated. Without inter-query parallelism, all queries will be
performed by a single processor in a time-shared manner. This slows down response
time. With inter-query parallelism, queries generated by OLTP users can be distributed
over multiple processors. Since the queries are performed simultaneously by multiple
processors, response time remains satisfactory.
Intra-query parallelism
Intra-query parallelism is the ability to break a single query into subtasks and to execute
those subtasks in parallel using a different processor for each. The result is a decrease in
the overall elapsed time needed to execute a single query. Intra-query parallelism is very
beneficial in decision support system (DSS) applications, which often have complex,
long-running queries. As DSS systems have become more widely used, database vendors
have been increasing their support for intra-query parallelism.
Figure 1-5. Intra-query parallelism
Figure 1-4. Inter-query parallelism
Above figure shows how one large query may be decomposed into two subtasks, which
then are executed simultaneously using two processors. The results of the subtasks then
are merged to generate a result for the original query. Intra-query parallelism is useful not
only with queries, but also with other tasks such as data loading, index creation, and so
on. Oracle's support of intra-query parallelism.
2.Weather Forecasting
Weather forecasting is the real example of parallel processing. Satellites used for weather
collects million of bytes of data per second on the condition of earth atmosphere,
formation of cloud wind intensity and direction , temperature, and so on. This huge
amount of data required to be processed by complex algorithms to arrive at a proper
forecast. Thousand of iterations of computation may be needed to interpret this
environmental data. Parallel computers are used to perform these computations in timely
manner so that a weather forecast can be generated early enough for it to be helpful.
3. Robotics Application
Parallel processing is integral part of many robotic applications. The basic feature needed
by autonomous system to support their activities is able for sensing, planning and acting.
These features enable a robot to act in its environment securely and to accomplish a given
task. In dynamic environments, the necessary adaptation of the robot action is provided
by closed control loops comprising sensing, planning and acting. Unfortunately, this
control loop could not be closed in dynamic environments, because of the long execution
time of the single component. With this the time intervals of the single iterations become
too large for sound integration of components into a single control loop. To pursue this
approach, a reduction in the runtime is required, for which parallel computing is required.
Even industries utilizing parallel processing in various area of robotics.