Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010
Input-queued (IQ) switches are one of the reference architectures for the design of high-speed packet switches. Classical results in this field refer to the scenario in which the whole switch transfers the packets in a synchronous fashion, in phase with a sequence of fixed-size timeslots, selected to transport a minimum-size packet. However, for switches with large number of ports and high bandwidth, maintaining an accurate global synchronization and transferring all the packets in a synchronous fashion is becoming more and more challenging. Furthermore, variable size packets (as in the traffic present in the Internet) require rather complex segmentation and reassembly processes and some switching capacity is lost due to partial filling of timeslots. Thus, we consider a switch able to natively transfer packets in an asynchronous fashion thanks to a simple and distributed packet scheduler. We investigate the performance of asynchronous IQ switches and show that, despite their simplicity, their performance are comparable or even better than those of synchronous switches. These partly unexpected results highlight the great potentiality of the asynchronous approach for the design of high-performance switches.
IEEE/ACM Transactions on Networking, 2003
Our work is motivated by the desire to design packet switches with large aggregate capacity and fast line rates. In this paper, we consider building a packet switch from multiple lower speed packet switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical lower speed packet switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet switch performs packet-by-packet load balancing, or inverse multiplexing, over multiple independent packet switches. Each lower speed packet switch operates at a fraction of the line rate . For example, each packet switch can operate at rate . It is a goal of our work that all memory buffers in the PPS run slower than the line rate. Ideally, a PPS would share the benefits of an output-queued switch, i.e., the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service.
Computers, Materials & Continua, 2022
The high-performance computing paradigm needs high-speed switching fabrics to meet the heavy traffic generated by their applications. These switching fabrics are efficiently driven by the deployed scheduling algorithms. In this paper, we proposed two scheduling algorithms for input queued switches whose operations are based on ranking procedures. At first, we proposed a Simple 2-Bit (S2B) scheme which uses binary ranking procedure and queue size for scheduling the packets. Here, the Virtual Output Queue (VOQ) set with maximum number of empty queues receives higher rank than other VOQ's. Through simulation, we showed S2B has better throughput performance than Highest Ranking First (HRF) arbitration under uniform, and non-uniform traffic patterns. To further improve the throughput-delay performance, an Enhanced 2-Bit (E2B) approach is proposed. This approach adopts an integer representation for rank, which is the number of empty queues in a VOQ set. The simulation result shows E2B outperforms S2B and HRF scheduling algorithms with maximum throughput-delay performance. Furthermore, the algorithms are simulated under hotspot traffic and E2B proves to be more efficient.
2001
A parallel packet switch (PPS) is a switch in which the memories run slower than the line rate. Arriving packets are spread (or load-balanced) packet-by-packet over multiple slower-speed packet switches. It is already known that with a speedup of , a PPS can theoretically mimic a FCFS output-queued (OQ) switch. However, the theory relies on a centralized packet scheduling algorithm that is essentially impractical because of high communication complexity. In this paper, we attempt to make a high performance PPS practical by introducing two results. First, we show that small co-ordination buffers can eliminate the need for a centralized packet scheduling algorithm, allowing a full distributed implementation with low computational and communication complexity. Second, we show that without speedup, the resulting PPS can mimic an FCFS OQ switch within a delay bound.
IEEE Globecom 2006, 2006
Virtual Output Queuing is widely used by high- speed packet switches to overcome head-of-line blocking. This is done by means of matching algorithms. In fixed-length VOQ switches, variable-length IP packets are segmented into fixed- length cells at the inputs. When a cell is transferred to its destination output, it will stay in the reassembly buffer and wait for the other
2008 IEEE Sarnoff Symposium, 2008
Internet data is transmitted in variable-size packets to provide flexibility for different applications. There has been a growing interest in developing the design of Internet routers based variable-lenght packets to improve performance and reduce the amount of re-assembly memory. However, most of variable-length designs follow a time-slotted approach, which make them similar to routers that switch fixed-length packets. The use of slotted timing makes padding necessary when packet sizes are not proportional to the time-slot length. In this paper, we investigate the impact of concatenating packets to reduce the amount of padding in variable-length packet switches. This approach increases the utilization of interconnection bandwidth and overall throughput performance. Performance evaluation in an input-queued packet switch using packet concatenation is presented.
IEEE/ACM Transactions on Networking, 2002
We propose an efficient parallel switching architecture that requires no speedup and guarantees bounded delay. Our architecture consists of input-output-queued switches with first-in-first-out queues, operating at the line speed in parallel under the control of a single scheduler, with being independent of the number of inputs and outputs. Arriving traffic is demultiplexed (spread) over the identical switches, switched to the correct output, and multiplexed (combined) before departing from the parallel switch.
IEEE Design & Test of Computers, 1994
A S A FOLLOW-UP to the wellknown Integrated Services Digital Network communication network, a broadband network is evolving based on asynchronous transfer mode (ATM) principles. To meet the challenges of these new high- speed telecommunication networks, we have developed a highly integrated, flexible, high-speed switching component. We call it the 4/1 stage. In contrast to the fixed set of channels and capacities in ISDN networks, ATM offers variable bandwidth to every user. Instead of the physically switched communication links in a traditional communication system, ATM provides virtual channels-it multiplexes several communication channels onto the same physical media. Each channel contributes fixedsized packets (ATM cells, having a $byte header and 48 bytes of user data) at an arbitrary rate, each header carrying a virtual channel identifier (VCI) label. To establish bundled communication links between switching nodes, ATM allows grouping of these channels into virtual paths iden-SUMMER 1994
SPIE Proceedings, 2009
In this paper, the hardware implementation of a scheduler with QoS support is presented. The starting point is a Differentiated Service (DiffServ) network model. Each switch of this network classifies the packets in flows which are assigned to traffic classes depending of its requirements with an independent queue being available for each traffic class. Finally, the scheduler chooses the right queue in order to provide Quality of Service support. This scheduler considers the bandwidth distribution, introducing the time frame concept, and the packet delay, assigning a priority to each traffic class. The architecture of this algorithm is also presented in this paper describing their functionality and complexity. The architecture was described in Verilog HDL at RTL level. The complete system has been implemented in a Spartan-3 1000 FPGA device using ISE software from Xilinx, demonstrating it is a suitable design for high speed switches.
2002
Parallel Packet Switches (PPS) use internal, parallel switch planes that operate at less than line speed. A PPS can scale-up to faster line speeds than a single-plane switch can. Load balancing between planes and providing QoS to flows are open problems. Simulation is used to evaluate the performance of a 10-Gbps VIQ PPS that contains ten 1-Gbps switch planes. It is found that at high offered loads the mean delay of a VIQ PPS switch is lower than that of a single-plane iSLIP switch. For unbalanced loads, the VIQ PPS demonstrates stability where an iSLIP switch is unstable. Especially promising results are shown for VIQ PPS native switching of variable-length Ethernet packets.
IEEE/ACM Transactions on Networking, 2000
Input-queued (IQ) switches overcome the scalability problem suffered by output-queued switches. In order to provide differential quality of services (QoS), we need to efficiently schedule a set of incoming packets so that every packet can be transferred to its destined output port before its deadline. If no such a schedule exists, we wish to find one that allows a maximum number of packets to meet their deadlines. Recently, this problem has been proved to be NP-complete if three or more distinct deadlines (classes) are present in the set. In this paper, we propose a novel algorithm named Flow-based Iterative Packet Scheduling (FIPS) for this scheduling problem. A key component in FIPS is a non-trivial algorithm that solves the problem for the case where two classes are present in the packet set. By repeatedly applying the algorithm for two classes, we solve the general case of an arbitrary number of classes more efficiently. Applying FIPS to a frame-based model effectively achieves differential QoS provision in IQ switches. Using simulations, we have compared FIPS performance with five well-known existing heuristic algorithms including Earliest-Deadline-First (EDF), Minimum-Laxity-First (MLF) and their variants. The simulation results demonstrate that our new algorithm solves the deadline guaranteed packet scheduling problem with a much higher success rate and a much lower packet drop ratio than all other algorithms.
Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98CH36169)
Input queueing is becoming increasingly used for highbandwidth switches and routers. In previous work, it was proved that it is possible to achieve 100% throughput for input-queued switches using a combination of virtual output queueing and a scheduling algorithm called LQF. However, this is only a theoretical result: LQF is too complex to implement in hardware. In this paper we introduce a new algorithm called Longest Port First (LPF), which is designed to overcome the complexity problems of LQF, and can be implemented in hardware at high speed. By giving preferential service based on queue lengths, we prove that LPF can achieve 100% throughput.
[Conference Record] GLOBECOM '92 - Communications for Global Users: IEEE, 1992
7ltis paper deals with an efJicient high-speed packet switching in which each packet arrived at an input is stored in one of Npossible queues, one for each possible output link. An implementation architecture which permits to share by the N separate queues the same input buffer is considered and studied. An important result shown in the paper is that the proposed multiple input queueing approach outperfoms the output queueing approach without requiring a speed-up in the switching operations. Work carried out under thejitinncial support of the National Research Council (C. N. R.
IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320), 1999
Input-queued switching architecture is becomingan attrac. tive alternative for designing very high speed switches owing to its scalability. Tremendous efforts have been made to overcome the throughput problem caused by contentions occurred at the input and output sides of the switches. However, no QoS guarantees can be provided by the current input-queued switch design. Iu this paper, a frame based scheduling algorithm, referred to as Store-Sort-and-Forward (SSF), is proposed to provide QoS guarantees for inputqueued switches without requiring speedup. SSF uses a framing strategy in which the time axis is divided into constant-length frames, each made up of an integer multiple of time slots. Celts arrived during a frame are first held in tbe input buffers, and are then "sorted-and-transmitted" within the next frame. A bandwidth allocation strategy and a cell admission poticy are adopted to regulate the traffic to conform to the (r, T) traffic model. A strict sense 100% throughput is proved to be achievable by rearranging the cell transmission orders in each input buffer, and a sorting algorithm is proposed to order the cell transmission. The SSF algorithm guarantees bounded end-to-end delay and delay jitter. It is proved that a perfect matching can be achieved within N(ln N + O (1)) effective moves.
IEEE International Conference on Communications, 2007
Dealing with RTTs (Round Trip Time) in IQ switches has been recently recognized as a challenging problem, especially if considering distributed (multi-chip) scheduler implementation which are suited to reduce the hardware complexity in very large, high-speed, switches. Traditional iterative three-or two-phase scheduling algorithms are based on a monolithic implementation, thus allowing instantaneous information exchange among input and output selectors to determine a matching. Multichip implementation imply that information exchange among inputs and outputs is delayed by an inter-chip latency. This delay requires non-trivial modifications to scheduling algorithms to allow a fully distributed implementation while keeping good performance. We propose a new scheduling algorithm, named SRR (Synchronous Round Robin), which is suited to a fully distributed implementation and provides good performance if compared with more complex, non fully distributed, previously proposed scheduling algorithms.
International Journal of Innovative Research in Computer and Communication Engineering, 2015
In this paper, the packet switching architecture with output queuing is used. Here the switch is internally non-blocking, but if packets destined to same outputs, output blocking will occur and (even if there are output queues) has a capacity of N*N2.An exact model of the switch has been developed which can be used to determine the blocking performance of the switch and obtain both its throughput and packet loss characteristics. In this architecture, each line card is connected by a dedicated point to point link to the central switch fabric. Two structures can be classified as Centralized and Distributed. Buffer arrangements are also categorized into output queued and combined input- output queued switches and hardware complexity of OQ, VOQ are also discussed
2000
This work is motivated by the desire to build a very high speed packet-switch with extremely high linerates. In this work, we consider building a packet-switch from multiple, lower speed packet-switches operating independently and in parallel. In particular, we consider a (perhaps obvious) parallel packet switch (PPS) architecture in which arriving traffic is demultiplexed over identical, lower speed packet-switches, switched to the correct output port, then recombined (multiplexed) before departing from the system. Essentially, the packet-switch performs packet-by-packet load-balancing, or "inverse-multiplexing" over multiple independent packet-switches. Each lower-speed packet switch, operates at a fraction of the line-rate, ; for example, if each packet-switch operates at rate no memory buffers are required to operate at the full line-rate of the system. Ideally, a PPS would share the benefits of an output-queued switch; i.e. the delay of individual packets could be precisely controlled, allowing the provision of guaranteed qualities of service.
IEEE Global Telecommunications Conference, 2004. GLOBECOM '04., 2004
In this paper the new packet switch architecture with multiple output queuing (MOQ) is proposed. In this architecture the nonblocking switch fabric, which has the capacity of N × N 2 N × N 2 N × N 2 , and output buffers arranged into N separate queues for each output, are applied. Each of N queues in one output port stores packets directed to this output only from one input. Both switch fabric and buffers can operate at the same speed as input and output ports. This solution does not need any speedup in the switch fabric as well as arbitration logic for taking decisions which packets from inputs will be transferred to outputs. Two possible switch fabric structures are considered: the centralized structure with the switch fabric located on one or several separate boards, and distributed structure with the switch fabric distributed over line cards. Buffer arrangements as separate queues with independent write pointers or as a memory bank with one pointer are also discussed. The mean cell delay and cell loss probability as performance measures for the proposed switch architecture are evaluated and compared with performance of OQ architecture and VOQ architecture. The hardware complexity of OQ, VOQ and presented MOQ are also compared. We conclude that hardware complexity of proposed switch is very similar to VOQ switch but its performance is comparable to OQ switch.
A high performance packet switching architecture called the Pipeback switch is proposed. This architecture ensures lossless packet delivery while maintaining linear buffer complexity. The Pipeback switch improves upon the popular Knockout switch proposed by Y. Yeh et al. Both switches use an N × N space division fabric with output queuing and both designs are motivated by the observation that the probability of more than L packets arriving in a given timeslot being destined for one particular output port sharply decreases as L is increased. This probability is comparable to the packet loss probability due to transmission errors for L N . While the arrival of more than L packets destined for a single output in a single timeslot in the Knockout switch results in dropped packets due to buffer blocking, the Pipeback switch avoids such loss by maintaining a separate shared buffer architecture common to all the output ports. This common architecture consists of a novel Pipeback concentration network and a buffer pool. The buffer pool accommodates all the knocked out packets that the Knockout switch would have dropped as a result of buffer blocking, and pipes them back to a separate input line. We further show that the use of buffer pool leads to a reduction in the number of separate output buffers required at each output port.
IEEE Journal on Selected Areas in Communications, 1997
In this paper, we introduce a new approach to ATM switching. We propose an ATM switch architecture which uses only a single shift-register-type buffering element to store and queue cells, and within the same (physical) queue, switches the cells by organizing them in logical queues destined for different output lines. The buffer is also a sequencer which allows flexible ordering of the cells in each logical queue to achieve any appropriate scheduling algorithm. This switch is proposed for use as the building block of large-scale multistage ATM switches because of low hardware complexity and flexibility in providing (per-VC) scheduling among the cells. The switch can also be used as scheduler/controller for RAM-based switches. The singlequeue switch implements output queueing and performs full buffer sharing. The hardware complexity is low. The number of input and output lines can vary independently without affecting the switch core. The size of the buffering space can be increased simply by cascading the buffering elements.
GLOBECOM '05. IEEE Global Telecommunications Conference, 2005., 2005
Recently, there is tremendous interest in the research of two-stage switches. Unlike input-buffered switches, two-stage switches do not need to find matchings between inputs and outputs. As such, they are much easier to scale and much simpler to implement. However, twostage switches usually suffer from the out-of-sequence problem. Though there are several methods proposed in the literature to solve such a problem, these proposed methods require either complex scheduling or additional hardware, which defeats the purpose of design simplicity. To design a simple and high performance switch using the two-stage architecture, we address three buffer design problems in this paper: re-sequencing buffers, central buffers and input buffers. We show that the size of the re-sequencing buffer needs to be proportional to the size of the central buffer to ensure that no packets are lost due to re-sequencing. Via simulations, we find that a moderate size of central buffer yields good throughput when traffic is not bursty. However, when the traffic is bursty, one needs to address the head-of-line blocking (HOL) problem at the input. We also find that using the round-robin service policy for multiple virtual output queues (VOQ) at inputs may exhibit a catastrophic phenomenon, called a non-ergodic mode. When a switch is trapped in a nonergodic mode, its throughput is sharply reduced. To solve such a problem in input buffers, we show that one may introduce "randomness" into a switch to jump out of a non-ergodic mode.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.