Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1992, Microprocessors and Microsystems
AI
This paper presents a novel procedure for effectively programming a distributed memory hypercube multicomputer, which transforms sequential algorithms into parallel programs suitable for execution across multiple processors. The procedure is implemented using ACLAN, a parallel programming language designed for this purpose, along with ACLE, a simulation tool for running ACLAN programs on sequential computers. The authors discuss the structure of hypercube architecture, explore parallel programming models (SPMD and MPMD), and illustrate the application of their technique across algorithms in matrix algebra and image processing, showcasing successful results.
SIAM J. Sci. Stat. Comput., 1988
EDITOR'S NOTE [This paper] reports on the research that was recognized by two awards, the Gordon Bell Award and the Karp Prize, at IEEE's COMPCON 1988 meeting in San Francisco on March 2. The Gordon Bell Award recognizes the best contributions to parallel processing, either speedup or throughput, for practical, full-scale problems. Two awards were proposed by Dr. Bell: one for the best speedup on a general-purpose computer and a second for the best speedup on a special-purpose architecture. This year the two awards were restructured into first through fourth place awards because of the nature of the eleven December 1987 submissions.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999
In this work, we propose a heuristic algorithm based on Genetic Algorithm for the task-to-processor mapping problem in the context of local-memory multiprocessors with a hypercube interconnection topology. Hypercube multiprocessors have offered a cost effective and feasible approach to supercomputing through parallelism at the processor level by directly connecting a large number of low-cost processors with local memory which communicate by message passing instead of shared variables. We use concepts of the graph theory (task graph precedence to represent parallel programs, graph partitioning to solve the program decomposition problem, etc.) to model the problem. This problem is NP-complete which means heuristic approaches must be adopted. We develop a heuristic algorithm based on Genetic Algorithms to solve it.
The popular hypercube interconnection network has high wiring(VLS1) complexity. The reduced hypercube (RH) is obtained by a uniform reduction in the number of channels for each hypercube node in order io reduce the VLSI complexity. It is known that the RH achieves performance comparable to that of the hypercube, at much lower hardware cost, through hypercube emulation. The reduced complexity of the RH permits the construction of powerful, massively parallel computers. This paper proposes algorithms for data broadcasting and reduction, prefix computation, and sorting on the RH. These operations are fundamental to many parallel algorithms. A worst case analysis of each algorithm is given and compared with that of equivalent algorithms for the hypercube. It is shown that the proposed algorithms for the RH yield performance comparable to that of the hypercube. , and DMI-9500260. 0167-8191/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSDZ 0167-8191(96)00008-7 596 S.G. Ziavras, A. Mukherjee /Parallel Computing 22 (1996) 595406
Parallel Computing - PC, 1998
This paper presents a method to derive efficient algorithms for hypercubes. The method . exploits two features of the underlying hardware: a the parallelism provided by the multiple . communication links of each node and b the possibility of overlapping computations and communications which is a feature of machines supporting an asynchronous communication protocol. The method can be applied to a generic class of hypercube algorithms whose distinguishing features are quite frequent in common algorithms for hypercubes. Many examples of this class of algorithms are found in the literature for different problems. The paper shows the efficiency of the method for two case studies. The results show that the reduction in communication overhead is very significant in many cases. They also show that the algorithms produced by our method are always very close to the optimum in terms of execution time. q 0167-8191r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.
IEE Proceedings - Computers and Digital Techniques, 1994
Modified hypercubes (MHs) have been proposed as building blocks for hypercube-based parallel systems that support the application of incremental growth techniques. In contrast, systems implementing the standard hypercube network cannot be expanded in practice. However, processor allocation for MHs is a more difticult task due to a slight deviation in their topology from that of the standard hypercube. The paper proposes two strategies to solve the processor allocation problem for MHs. The proposed strategies are characterised by perfect subcube recognition ability and superior performance. Furthermore, two existing processor allocation strategies for standard hypercube networks, namely the buddy and free-list strategies, are shown to be ineffective for MHs, in the light of their inability to recognise many available subcubes. A comparative analysis that involves the buddy strategy and the new strategies is carried out using simulation results. 1 Introduction In the n-dimensional hypercube (or n-cube) parallel computer, n physical communication channels are attached to each of the 2" processors. If distinct n-bit binary addresses are assigned to the processors, then two processors are neighbours if their addresses differ by a single bit. Hence, the expansion of existing hypercube systems can be accomplished only by replacing the processor chips with others containing more communication ports. This, together with long wires in large systems, are the major drawbacks of the hypercube network, in spite of its rich properties: (a) regular topology, (b) simple routing, (c) high degree of fault tolerance, (d) small diameter (for a 2" processor system, the farthest node is only ri links away), and (e) efticient emulation of other topologies [lo, 111. Another major drawback of the hypercube network is that its total number of processors is always a power of two. Many hypercube variations reported in the literature fail to provide a solution to this incremental growth problem without expending extra resources for individual processors [7, 81, Ziavras [IO], on the other hand, has IEE, 1994 Paper 1036E (CZ, C3), first received 13th May and in revised form 23rd
Lecture Notes in Computer Science, 2005
In this paper, we describe our experience in writing parallel numerical algorithms using Hierarchically Tiled Arrays (HTAs). HTAs are classes of objects that encapsulate parallelism. HTAs allow the construction of single-threaded parallel programs where a master process distributes tasks to be executed by a collection of servers holding the components (tiles) of the HTAs. The tiled and recursive nature of HTAs facilitates the development of algorithms with a high degree of parallelism as well as locality. We have implemented HTAs as a MATLAB TM toolbox, overloading conventional operators and array functions such that HTA operations appear to the programmer as extensions of MATLAB TM . We have successfully used it to write some widely used parallel numerical programs. The resulting programs are easier to understand and maintain than their MPI counterparts.
Parallel Computing, 1999
This paper provides a survey of both architectural and algorithmic aspects of solving problems using parallel processors with ring, torus and hypercube interconnection.
… , IEEE Transactions on, 1995
Many parallel algorithms use hypercubes as the communication topology among their processes. When such algorithms are executed on hypercube multicomputers the communication cost is kept minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputers is constrained by the fact that the interconnection cost per node increases with the total number of nodes. From scalability point of view, meshes and toruses are more interesting classes of interconnection topologies. This paper focuses on the execution of algorithms with hypercube communication topology on multicomputers with mesh or torus interconnection topologies. The proposed approach is based on looking at different embeddings of hypercube graphs onto mesh or torus graphs. The paper concentrates on toruses since an already known embedding, which is called standard embedding, is optimal for meshes. In this paper, an embedding of hypercubes onto toruses of any given dimension is proposed. This novel embedding is called xor embedding. The paper presents a set of performance figures for both the standard and the xor embeddings and shows that the latter outperforms the former for any torus.
Choice Reviews Online, 2004
The major parallel programming models for scalable parallel architectures are the message passing model and the shared memory model. This article outlines the main concepts of these models as well as the industry standard programming interfaces MPI and OpenMP. To exploit the potential performance of parallel computers, programs need to be carefully designed and tuned. We will discuss design decisions for good performance as well as programming tools that help the programmer in program tuning.
1991
ies of a transputer-based exten hypercube 'Microprocessor Applications Laboratory. 'Suprcumputer Education and Rescarch Centre and Department of Computer S~icnce und Aotomution. Performance studies of muit,-transputer archttecture wlth static and dynamic irnkr. Proc. Con: Euromicro, 1988.
The architectural patterns for parallel programming is a collection of patterns related with a method for developing the coordination of parallel software systems. These architectural patterns take as input information (a) the available parallel hardware platform, (b) the parallel programming language of this platform, and (c) the analysis of the problem to solve, in terms of an algorithm and data. In this paper, it is presented the application of the architectural patterns along with the method for developing a coordination for solving an hypercube sorting. The method used here takes the information from the problem analysis, proposes an architectural pattern for the coordination, and provides some elements about its implementation.
A new methodology named CALMANT (CC-cube Algorithms on Meshes and Tori) for mapping a kind of algorithms that we call CC-cube algorithms onto multicomputers with hypercube, mesh or torus interconnection topology is proposed. This methodology is suitable when the initial problem, can be expressed as a set of processes that communicate through a hypercube topology (a CC-cube algorithm). There are many important algorithms that fit into the CC-cube type. CALMANT is based on three different techniques: a) the standard embedding to assign the processes of the algorithm to the nodes of the mesh multicomputer; b) the communication pipelining technique to increase the level of communication parallelism inherent in the CC-cube algorithms; and c) optimal message-routing algorithms proposed in this work in order to avoid conflicts and minimizing in this way the communication time. Although CALMANT is proposed for multicomputers with different interconnection network topologies, this paper only focuses on the particular case of meshes.
This paper presents a method to derive efficient algorithms for hypercubes. The method . exploits two features of the underlying hardware: a the parallelism provided by the multiple . communication links of each node and b the possibility of overlapping computations and communications which is a feature of machines supporting an asynchronous communication protocol. The method can be applied to a generic class of hypercube algorithms whose distinguishing features are quite frequent in common algorithms for hypercubes. Many examples of this class of algorithms are found in the literature for different problems. The paper shows the efficiency of the method for two case studies. The results show that the reduction in communication overhead is very significant in many cases. They also show that the algorithms produced by our method are always very close to the optimum in terms of execution time. q 0167-8191r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.
… Computing Conference, 1990., Proceedings of the …, 1990
Parallel systems are in general complicated to utilize eficiently. As they evolve in complexity, it hence becomes increasingly more important to provide libraries and language features that can spare the users from the knowledge of low-level system details. Our effort in this direction is to develop a set of basic matrix algorithms f o r distributed memory systems such as the hypercube.
Microprocessing and Microprogramming, 1990
The Extended Hypercube is a new approach in multiprocessor architectures, which reduces the communication burden on the processor elements. We propose a scheme for implementing such an architecture using INMOS transputers as the processor and controller elements to achieve a very high computation to communication ratio.
Parallel computing, 1987
We discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology. We present detailed performance measurements on hypercubes with 4, 1,6, and 64 nodes, and analyze them in ter:'as of communication overhead and load balancing. We show that the decomposition into square subblocks is optimal C code implementing the algorithms is available.
Parallel Computing, 1996
The popular hypercube interconnection network has high wiring(VLS1) complexity. The reduced hypercube (RH) is obtained by a uniform reduction in the number of channels for each hypercube node in order io reduce the VLSI complexity. It is known that the RH achieves performance comparable to that of the hypercube, at much lower hardware cost, through hypercube emulation. The reduced complexity of the RH permits the construction of powerful, massively parallel computers. This paper proposes algorithms for data broadcasting and reduction, prefix computation, and sorting on the RH. These operations are fundamental to many parallel algorithms. A worst case analysis of each algorithm is given and compared with that of equivalent algorithms for the hypercube. It is shown that the proposed algorithms for the RH yield performance comparable to that of the hypercube. , and DMI-9500260. 0167-8191/96/$15.00 0 1996 Elsevier Science B.V. All rights reserved SSDZ 0167-8191(96)00008-7 596 S.G. Ziavras, A. Mukherjee /Parallel Computing 22 (1996) 595406
Parallel Computing
We discuss algorithms for matrix multiplication on a concurrent processor containing a two-dimensional mesh or richer topology. We present detailed performance measurements on hypercubes with 4, 1,6, and 64 nodes, and analyze them in ter:'as of communication overhead and load balancing. We show that the decomposition into square subblocks is optimal C code implementing the algorithms is available.
1989
We consider the problem of subsystem allocation in the mesh, torus, and hypercube multicomputers. Although the usual practice is to use a serial algorithm on the host processor to do the allocation, we show how the free and non-faulty processors can be used to perform the allocation in parallel. The algorithms we provide are dynamic, require very little storage, and work correctly even in the presence of faults.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.