Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018
The traditional parallel algorithms of data processing based on method: (for some principal the number of cores, a speed of processing etc.) separate all information which should be processed into parts and then process each part separately on a different cores (or processor). It takes quite a long time and is not always an optimal solution to separate into a parts. It's impossible to bring to minimum a standing of cores. It's not always possible to find optimal choice (or changing algorithm during execution). The algorithms provided by us have the main principal that the processing by the cores should be performed in parallel and bring to minimum the stay of cores.In the article are reviewed two algorithms working according to this principal "smart-delay" and the development of multiplication of matrix transposed ribbon-like algorithm.
2016
With the high requirements of gene sequencing in the field of scientific research, it is essential to make faster sequencing process. The one of the main sequencing operation is done using Smith Waterman algorithm. This algorithm is used in two conventional ways to evaluate the matrix elements. They are i. Sequential processes ii. Conventional parallel processes. Since the work is to consider these both of these approaches and evolve a new so that these there are three main objectives are met a) Should take less time for the computation of the matrix elements b) optimize the processor used. Even though later one conventional parallel (O(2m)) is faster than the former one sequential processing O(mn), this work tried an attempt for reducing the timing still further, with a challenge to reduce to an extent of O(m) by introducing cross diagonal element wise parallel processing approach. Also as a part of the work processor optimization of the processor for the conventional parallel and ...
Parallel processing offers enhanced speed of execution to the user and facilitated by different approaches like data parallelism and control parallelism. Graphic Processing Units provide faster execution due to dedicated hardware and tools. This paper presents two popular approaches and techniques for distributed computing and GPU computing, to assist a novice in parallel computing technique. The paper discusses environment needs to be setup for both the above approaches and as a case study demonstrate matrix multiplication algorithm using SIMD architecture.
Lecture Notes in Computer Science, 2012
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
Grid computing basically to make a super computer virtually by connecting the different machines remotely. The major objective of Grid computing resource sharing by combining the different administrative domains to obtain the common goal. In grid computing each node contribute and could get resources to resource pool. But unfortunately grid computing can't get big place in academic circles because it's considered to be a hard to implement technology. Grid are constructed through the universally useful Grid programming libraries are called middleware. A particular Grid container be used for different application. A Grid can be dedicated for some special applications.
The main goal of this research is to use OpenMP, Posix Threads and Microsoft Parallel Patterns libraries to design an algorithm to compute Matrix Multiplication effectively. By using the libraries of OpenMP, Posix Threads and Microsoft Parallel Patterns Libraries, one can optimize the speedup of the algorithm. First step is to write simple program which calculates a predetermined Matrix and gives the results, after compilation and execution of the code. In this stage only single core processor is used to calculate the Matrix multiplication. Later on, in this research OpenMP, Posix Threads and Microsoft Parallel Patterns libraries are added separately and use some functions in the code to parallelize the computation, by using those functions multi-cores of a processor are allowed. Then execute the program and check its run time, then a timer function is added to the code which periodically checks the time it took for the computer to do the parallelization. First the program is run without the Parallel libraries, and then with the OpenMP, Posix Threads and with Microsoft Parallel Patterns libraries code. Then program is executed for each input Matrix size and result is collected. Maximum 5 trials for each input size are conducted and record the time it took for the computer to parallelize the Matrix multiplication. Finally comparison of the performance in terms of execution time and speed up for OpenMP, Posix Threads and Microsoft Parallel Patterns libraries is done using different Matrix Dimensions and different number of processors.
In this paper a survey on current trends in parallel computing has been studied that depicts all the aspects of parallel computing system. A large computational problem that can not be solved by a single CPU can be divided into a chunk of small enough subtasks which are processed simultaneously by a parallel computer. The parallel computer consists of parallel computing hardware, parallel computing model, software support for parallel programming. Parallel performance measurement parameters and parallel benchmarks are used to measure the performance of a parallel computing system. The hardware and the software are specially designed for parallel algorithm and programming. This paper explores all the aspects of parallel computing and its usefulness.
Parallel computing has become an essential subject in the field of computer science and also it is shown to be critical when researching in high end solutions. The evolution of computer architectures (multicore and manycore) towards an increased quantity of cores, where parallelism could be the approach to option for speeding up an algorithm within the last few decades, the graphics processing unit, GPU and CPU, has gained an essential place in the area of high end computing (HPC) due to its low priced and massive processing power that is parallel. In this paper, we survey the idea of parallel computing, especially CPU computing and its programming models and also gives a couple of theoretical and technical concepts which can be often needed to understand the CPU and GPU as well as its parallelism in massive model. In particular, we show how this technology is new in assisting the field of computational physics, especially when the issue is data parallel.
Acta Informatica, 1976
This paper presents a model of parallel computing. Six examples illustrate the method of programming. An implementation scheme for programs is also presented. t
Journal of Communications and Networks, 2014
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributedmemory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
2006
We present a new parallel computation model called the Parallel Resource-Optimal computation model. PRO is a framework being proposed to enable the design of efficient and scalable parallel algorithms in an architecture-independent manner, and to simplify the analysis of such algorithms. A focus on three key features distinguishes PRO from existing parallel computation models. First, the design and analysis of a parallel algorithm in the PRO model is performed relative to the time and space complexity of a sequential reference algorithm. Second, a PRO algorithm is required to be both time-and space-optimal relative to the reference sequential algorithm. Third, the quality of a PRO algorithm is measured by the maximum number of processors that can be employed while optimality is maintained. Inspired by the Bulk Synchronous Parallel model, an algorithm in the PRO model is organized as a sequence of supersteps. Each superstep consists of distinct computation and communication phases, but the supersteps are not required to be separated by synchronization barriers. Both computation and communication costs are accounted for in the runtime analysis of a PRO algorithm. Experimental results on parallel algorithms designed using the PRO model-and implemented using its accompanying programming environment SSCRAP-demonstrate that the model indeed delivers efficient and scalable implementations on a wide range of platforms.
This paper presents a 7-step, semi-systematic approach for designing and implementing parallel algorithms. In this paper, the target implementation uses MPI for message passing. The approach is applied to a family of matrix factorization algorithms- LU, QR, and Cholesky - which share a common structure, namely, that the second factor of each is upper right triangular. The efficacy of the approach is demonstrated by implementing, tuning, and timing execution on two commercially available multiprocessor computers.
The next generation computing systems are focusing on parallel computing to solve the problem in fast and high speed using parallel programming concepts. The running time of any complex algorithm or applications can be improved by running more than one task at the same time with multiple processors concurrently. Here the performance improvement is measured in terms of increase in the number of cor es per machine and is analyzed for better Energy Efficient optimal work load balance. In this paper we investigate the review and literature survey of parallel computers for future improvements. We aim to present the theoretical and technical terminologies for parallel computing. The focus here is the process of comparative analysis of single core and multicore systems to run an application program for faster execution time and optimize the scheduler for better performance.
Lecture Notes in Computer Science, 2002
The emerging discipline of algorithm engineering has primarily focused on transforming pencil-and-paper sequential algorithms into robust, efficient, well tested, and easily used implementations. As parallel computing becomes ubiquitous, we need to extend algorithm engineering techniques to parallel computation. Such an extension adds significant complications. After a short review of algorithm engineering achievements for sequential computing, we review the various complications caused by parallel computing, present some examples of successful efforts, and give a personal view of possible future research.
Modern Applied Science
Recently, supercomputers structure and its software optimization have been popular subjects. Much of the software recently consumes a long period of time both to sort and search datasets, and thus optimizing these algorithms becomes a priority. In order to discover the most efficient sorting and searching algorithms for parallel processing units, one can compare CPU runtime as a performance index. In this paper, Quick, Bubble, and Merge sort algorithms have been chosen for comparison, as well as sequential and binary as search algorithms. Each one of the sort and search algorithms was tested in worst, average and best case scenarios. And each scenario was applied using multiple techniques (sequential, multithread, and parallel processing) on a various number of processors to spot differences and calculate speed up factor.The proposed solution aims to optimize the performance of a supercomputer focusing one-time efficiency; all tests were conducted by The IMAN1 supercomputer which is...
The Journal of Supercomputing, 2020
The prefix computation strategy is a fundamental technique used to solve many problems in computer science such as sorting, clustering, and computer vision. A large number of parallel algorithms have been introduced that are based on a variety of high-performance systems. However, these algorithms do not consider the cost of the prefix computation operation. In this paper, we design a novel strategy for prefix computation to reduce the running time for high-cost operations such as multiplication. The proposed algorithm is based on (1) reducing the size of the partition and (2) keeping a fixed-size partition during all the steps of the computation. Experiments on a multicore system for different array sizes and number sizes demonstrate that the proposed parallel algorithm reduces the running time of the best-known optimal parallel algorithm in the average range of 62.7-79.6%. Moreover, the proposed algorithm has high speedup and is more scalable than those in previous works.
2000
Large-scale parallel computations are more common than ever, due to the increasing availability of multi-processor systems. However, writing parallel software is often a complicated and error-prone task. To relieve Diffpack users of the tedious and low-level technical details of parallel programming, we have designed a set of new software modules, tools, and programming rules, which will be the topic of
This book grew out of lecture notes for a course on parallel algorithms that I gave at Drexel University over a period of several years. I was frustrated by the lack of texts that had the focus that I wanted. Although the book also addresses some architectural issues, the main focus is on the development of parallel algorithms on "massively parallel" computers. This book could be used in several versions of a course on Parallel Algorithms. We tend to focus on SIMD parallel algorithms in several general areas of application:
The Computer Journal, 2001
This paper presents a general data-parallel formulation for a class of problems based on the divide and conquer strategy. A combination of three techniques-mapping vectors, index-digit permutations and space-filling curves-are used to reorganize the algorithmic dataflow, providing great flexibility to efficiently exploit data locality and to reduce and optimize communications. In addition, these techniques allow the easy translation of the reorganized dataflows into HPF (High Performance Fortran) constructs. Finally, experimental results on the Cray T3E validate our method.
In this paper efforts have been put efforts to illustrate in the best way how to start with the programming of data parallel model. The various things that need to be taken care before starting up with this approach of programming. Data Parallel Approach means split the data on which the instruction is to be applied and assign the same task to different processing elements for the processing over the individual data that has been assigned to them. Hence after all the processing elements are done with the assigned task then the whole result is then accommodated back at one place.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.