Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1996, Lecture Notes in Computer Science
E cient communication in networks is a prerequisite to exploit the performance of large parallel systems. For this reason much e ort has been done in recent years to develop e cient communication mechanisms. In this paper we survey the foundations and recent developments in designing and analyzing e cient packet routing algorithms. Organization of the Paper In the following chapter we introduce the basic notation about networks, messages, and protocols for routing. In Chapter 3 we introduce the routing number of a network, and relate it to the dilation and congestion of path systems. Chapter 4 contains an overview of oblivious routing protocols, and Chapter 5 describes e cient adaptive routing protocols. 2 Networks, Messages, Protocols In this chapter we introduce the basic notions used in routing theory. In particular, we describe a typically used hardware model and message passing model, de ne the routing problem, and describe di erent classes of strategies to solve routing problems. 2.1 The Hardware Model We model the topology of a network as an undirected graph G = (V; E). V represents the computers or processors, and E represents the communication links. We assume the communication links to work bidirectional, that is, each edge represents two links, one in each direction. The bandwidth of a link is de ned as the number of messages it can forward in one time step. Unless explicitly mentioned we assume that the bandwidth This article was processed using the L A T E X macro package with LLNCS style View publication stats View publication stats
seekdlib.uacee.org
Communication is an essential part of distributed computing and parallel computing. There exist many topologies, network architectures and routing schemas for such type of communication. In this paper, we will review a few selected network architectures and routing schemes, which are continually going through evolution process.
Concurrency and Computation: Practice and Experience, 1992
St. Petersburg FL 33733. USA
Theoretical Computer Science, 1987
The problem of efficient packet routing is central to the area of communication networks.
AlgorithmsESA' …, 1993
Journal of the ACM ( …, 2003
This paper provides necessary and sufficient conditions for deadlock-free unicast and multicast routing with the pathbased routing model in interconnection networks which use the wormhole switching technique. The theory is developed around three central concepts: channel waiting, False Resource Cycles, and valid destination sets. The first two concepts are suitable extensions to those developed for unicast routing by two authors of this paper; the third concept has been developed by Lin and Ni. The necessary and sufficient conditions can be applied in a straightforward manner to prove deadlock freedom and to find more adaptive routing algorithms for collective communication. The latter point is illustrated by developing two routing algorithms for multicast communication in 2-D mesh architectures. The first algorithm uses fewer resources (channels) than a proposed algorithm in literature but achieves the same adaptivity. The second achieves full adaptivity for multicast routing. Collective communication routines such as broadcast, scatter, gather, reductions, transpose, prefix computations (scan), etc. are very important for developing parallel programs that are both efficient and portable. Although there is a large body of research that has addressed the development of efficient collective communication algorithms (Kumar et al. [KGGK94] contains a good survey; a more recent survey dealing with wormhole-routed architectures appears in [MTR95]), this research has invariably assumed a simple underlying hardware model with nonadaptive (dimension-ordered) routing of point-to-point messages. This has been in large part because that model reflects the characteristics of most present day commercial multicomputers. In a position paper, Ni argues that supporting multicast at the router level is critical to the efficient performance of message-based parallel computers [Ni95]. There have been a number of recent research advances in adaptive routing and router models which permit multicasting in hardware. If these advances are to ultimately have practical impact on improving the performance of compute-intensive applications on parallel machines, the implementation of the libraries of collective communication primitives on these future machines should exploit these features.
Routing in computer network is an essential functionality, which influence both the network management as the quality of services in global networks. The management of the traffic flows has to satisfy requirements for volume of traffic to be transmitted as avoidance of congestions for decreasing the transmission delays. These two requirements in general are contradictory. The optimal traffic management is a key issue for the quality of the information services. Routing in networks, applying shortest path algorithm is widely used in communication protocols in WAN. Short explanations and illustration of these algorithms is given.
Synthèse : Revue des Sciences et de la Technologie, 2016
The calculation of the shortest path between a pair of routers is an important problem in telecommunication and computer networks. The calculation of the path in real time is useful in a number of situations. These include a routing process that attempts to reach its destination and minimizing the effects of collision with obstacles. Previous works on the shortest path are limited to sequential and parallel algorithms on general-purpose architectures. Researchers are increasingly interested in hardware's solutions. In this work , we propose an approach for implementing a routing algorithm which is effective than Dijkstra using a FPGA development board Xilinx Virtex-type order accelerate the process of routing based on the speed of hardware (FPGA). The results of the implementation in an FPGA card Virtex7 are promising.
Readings in Computer Architecture, 2000
Efficient routing of messages is critical to the performance of direct network systems. The popular wormhole routing technique faces several challengesparticularly flow control and deadlock avoidance. assively parallel computers with thousands of processors are considered the most promising technology to achieve teraflops computational power. Such large-scale multiprocessors are usually organized as ensembles of nodes, where each node has its own processor, local memory, and other supporting devices. These nodes may have different functional capabilities. For example, the set of nodes may include vector processors, graphics processors, I/O processors, and symbolic processors. The way the nodes are connected to one another varies among machines. In a direct network architecture, each node has a point-to-point, or direct, connection to some number of other nodes, called neighboring nodes. Direct networks have become a popular architecture for constructing massively parallel computers because they scale well; that is, as the number of nodes in the system increases, the total communication bandwidth, memory bandwidth, and processing capability of the system also increase. Figure 1 shows a generic multiprocessor with a set of nodes interconnected through a direct network. Because they do not physically share memory, nodes must communicate by passing messages through the network. Message size may vary, depending on the application. For efficient and fair use of network resources, a message is often divided into packets prior to transmission. A packet is the smallest unit of communication that contains routing and sequencing information; this information is carried in the packet header. Neighboring nodes may send packets to one another directly, while nodes that are not directly connected must rely on other nodes in the network to relay packets from source to destination. In many systems, each node contains a separate router to handle such communication-related tasks. Although a router's function could be performed by the corresponding local processor, dedicated routers are used to allow overlapped computation and communication within each node, Figure 2 shows the architecture of a generic node. Each router supports some number of input and output channels. Normally, every input channel is paired with a corresponding output channel. Internal channels connect the local processor/ memory to the router. Although it is common to provide only one pair of internal channels, some systems use more internal channels to avoid a communication bottleneck between the local processor/memory and the router. External channels are used for communication between routers and, therefore, between nodes. In
Proceedings of International Conference on Parallel Processing
Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 1996
The performance of a massively parallel computing system is often limited by the speed of its interconnection network. One strategy that has been proposed for improving network e ciency is the use of adaptive routing, in which network state information can be used in determining message paths. The design of an adaptive routing system involves several parameters, and in order to build high speed scalable computing systems, it is important to understand the costs and performance bene ts of these parameters. In this paper, we investigate the e ect of bu er design on communication latency. Four message storage models and their related route selection algorithms are analyzed. A comparison of their performance is presented, and the features of bu er design which are found to significantly impact network e ciency are discussed.
1994
Adaptive routing algorithms have been frequently suggested as a means of improving communication performance in multicomputers. These algorithms, unlike deterministic routing, can utilize network state information to exploit the presence of multiple paths. Adaptive routing, however, is complex and expensive. Before such schemes can be successfully incorporated in multiprocessor systems, it is necessary to have a clear understanding of the factors which a ect their performance potential. In this paper we present a simple and e cient scheme to model the performance of idealized adaptive routing. We evaluate a basic, high-performance adaptive system using an analytic queueing model which approximates its behavior. This analytic model predicts the performance of networks with varying parameters, and provides insight into the nature of message tra c. We have also conducted extensive simulation experiments, the results of which are used to validate the analytic model and to identify some of the conditions which promote high performance of adaptive routing in a communication network.
IEEE Parallel & Distributed Technology: Systems & Applications
Avoiding routing and virtual Path routing, two complementary strategies, achieve this for different communication patterns. daptive routing protocols exploit alternative paths between communicating nodes to more efficiently use nmvork bandwidth and provide resilience to failures. The latter property is particularly important for large-scale architectures, because A expanding the system size increases the probability of encountering a faulty network component. In static environments, where the number of application processes and their allocation do not change during execution, adaptive routing algorithms can take advantage of knowing the allocation of the communicating processes. Because the algorithm has complete knowledge of the network topology and the precise allocation of the destination process, at any step messages flow closer to the destination node.' However, in dynamic environments the allocation of the application processes might change frequently: tasks can be either created or terminated dynamically. Moreover, trying to achieve load balancing can induce process migration.2 If the routing algorithm takes into account knowledge of process allocation, this information must be frequently updated so that each process can maintain knowledge of the allocation of other processes in the system. This updating causes high traffic, and does not guarantee the router a consistent view of the system. The only way to avoid these problems is to use routing policies that can deliver messages without requiring knowledge of process allocation. Unfortunately, these policies can misroute messages; the message in an intermediate node can move farther from the destination. Moreover, messages can become trapped in loops (this is called li~elock).~
IEEE Transactions on Computers, 1996
We propose a new switching format for multiprocessor networks, which we call Conflict Sense Routing Protocol. This switching format is a hybrid of packet and circuit switching, and combines advantages of both. We initially present the protocol in a way applicable to a general topology. We then present an implementation of this protocol for a hypercube computer and a particular routing algorithm. We also analyze the steady-state throughput of the hypercube implementation for random node-to-node communications.
IEEE Transactions on Communications, 1989
The joint problem of selecting a primary route for each communicating pair and a capacity value for each link in computer communication networks is considered. The network topology and traffic characteristics are given; a set of candidate routes and of candidate capacities for each link are also available. The goal is to obtain the least costly feasible design where the costs include both capacity and queuing components. Lagrangean relaxation and subgradient optimization techniques were used in order to obtain verifiable good solutions to the problem. The method was tested on several topologies, and in all cases good feasible solutions, as well as tight lower bounds were obtained. I. INTRODUCTION S a result of the important advantages they offer, both the A number and the range of applications supported by communication based computer systems have significantly increased. A variety of computer networks, such as SNA [ 171, BNA [18], and DECNET [7] architectures, TELENET [25], TYMNET [26], TRANSPAC [6], and DATAPAC [4] are currently available. This paper deals with the following problem faced by the network designer whenever a new network is set up or when an existing network is to be expanded: how to simultaneously select the link capacities and the routes to be used by the communicating nodes in the network, such as to ensure an acceptable performance level at a minimum cost. The topology of the network and estimates of the external traffic requirements are given. Messages in the network follow static, nonbifurcated routes, a routing strategy adopted by many operational networks. The effectiveness of fixed routing methods is also supported by the simulation results presented in [15], suggesting that at steady state there is no significant difference between the delays induced in a network by good static and adaptive routing strategies. Statis routing policies are implemented by providing each pair of communicating nodes in the network with an ordered set of routes, out of which the first available route is chosen whenever a session is initiated. Such is, for instance, the general framework for routing in SNAbased networks (see [l]). Recently, the model presented in [13] has been implemented by IBM in a commercial product NETDA [23]. Consistent with this approach, we concentrate here on the choice of the primary route, i.e., the recommended one in the candidate set. Though some attempts at a formal treatment of the backbone network design problem in a general setting exist (see [3], [SI, [16], [19], and more recently, [9], [lo], and [24]), much of the Paper approved by the Editor for Wide Area Networks of the IEEE Communications Society.
1992
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues' such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g. effect of timeouts) we have performed detailed simulation studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: /I) (Soarse-grain parallelism can deliver multiple lOOMbps with currently available hardware platforms and existing networking protocols (such as TCP/IP and parallel FDDI rings)? |2) ^Scale-up is near linear,in,ji, the number of protocol processors and channels (for small n and up-to a few hundred Mbps)!<(|3) .Since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple lOOMbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: ,'l) Multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches^;!!' also provides graceful degradation and low-cost load balancing^) .Coarse-grain parallelism supports running several transport protocols in parallel to provide different 0 types of service^-For example, one TCP handles small messages for many users, other TCPs running in parallel provide high bandwidth service to a single application^ ' (3) ^Coarse grain parallelism will be able to incorporate many future improvements from related (e.g., reduced data movement, fast TCP,.fine-grain parallelism) also with a near linear speed-ups. /This work is sponsored by CIT (596045), DARPA (N00174-C-91-0119), NASA (NAG187263), aVd SUN (596044) grants.
2001
We study routing and scheduling in packet-switched networks. We assume an adversary that controls the injection time, source, and destination for each packet injected. A set of paths for these packets is admissible if no link in the network is overloaded. We present the first on-line routing algorithm that finds a set of admissible paths whenever this is feasible. Our algorithm calculates a path for each packet as soon as it is injected at its source using a simple shortest path computation. The length of a link reflects its current congestion. We also show how our algorithm can be implemented under today's Internet routing paradigms.
1981
new distributed algorithm is presented for dynamically determining weighted shortest paths used for message routing in computer networks. The major features of the algorithm are that the paths .defined do not form transient loops when weights change and the number of steps required to find new shortest paths when network links fail is less than for previous algorithms. Specifically, the worst case recovery time is proportional to the largest number of hops h in any of the weighted shortest paths. For previous loop-free distributed algorithms this recovery time is proportional to h2.
1997
Adaptive routing is widely regarded as a promising approach to improving interconnection network performance. Many designers of adaptive routing algorithms have used synthetic communication patterns, such as uniform and transpose traffic, to compare the performance of various adaptive routing algorithms with each other and with oblivious routing. These comparisons have shown that the average message latency is usually lower with adaptive routing. On the other hand, when a parallel program is executed on a multiprocessor, the goal is to reduce the total execution time. In this paper, we explain why improving the average message latency of a routing algorithm does not necessarily lead to a lower execution time for real applications. We support this observation by reporting simulation results for both adaptive and oblivious routing using communication derived from real applications. Specifically, we report the performance of various routing algorithms for directed acyclic graphs (DAGs) derived from the Cholesky factorization of sparse matrices. Our results show that there is little correlation between average message latency and the total execution time of a parallel program. Hence, average message latency does not seem to be a useful measure of the performance of a routing algorithm. This strongly suggests that current comparisons of routing algorithms do not provide a reliable indication of the performance improvements to be realized by executing programs on a multiprocessor with such a routing algorithm. We interpret these results and suggest several alternatives for further research.
Lecture Notes in Computer Science, 1997
In this work we present models for asynchronous networks and their motivation and use them for the analysis of routing algorithms. We try to construct them in a way that they can be both realistic and easy to work with. For some of the models presented here variants of techniques used in the analysis of synchronous routing, like the delay sequence argument, can be adapted. On the other hand, for others we can only prove large upper bounds for any routing protocol. However, we present a model for which it seems possible to get better than trivial upper bounds, although known proof techniques (like the delay sequence argument) cannot be applied.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.