Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2000, IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture which is the manner in which wires, FPGAs and Field-Programmable Interconnect Devices (FPIDs) are connected. Several routing architectures for MFSs have been proposed and previous research has shown that the partial crossbar is one of the best existing architectures [Kim96] [Khal97]. In this paper we propose a new routing architecture, called the Hybrid Complete-Graph and Partial-Crossbar (HCGP) which has superior speed and cost compared to a partial crossbar. The new architecture uses both hardwired and programmable connections between the FPGAs. We compare the performance and cost of the HCGP and partial crossbar architectures experimentally, by mapping a set of 15 large benchmark circuits into each architecture. A customized set of partitioning and inter-chip routing tools were developed, with particular attention paid to architecture-appropriate inter-chip routing algorithms. We show that the cost of the partial crossbar (as measured by the number of pins on all FPGAs and FPIDs required to fit a design), is on average 20% more than the new HCGP architecture and as much as 25% more. Furthermore, the critical path delay for designs implemented on the partial crossbar were on average 20% more than the HCGP architecture and up to 43% more. Using our experimental approach, we also explore a key architecture parameter associated with the HCGP architecture: the proportion of hard-wired connections versus programmable connections, to determine its best value.
1997
Multi-FPGA systems (MFSs) are used as custom computing machines, logic emulators and rapid prototyping vehicles. A key aspect of these systems is their programmable routing architecture, the manner in which wires, FPGAs and Field-Programmable Interconnect Devices (FPIDs) are connected. In this paper we present an experimental study for evaluating and comparing two commonly used routing architectures for multi-FPGA systems: 8-way mesh and partial crossbar. A set of 15 large benchmark circuits are mapped into these architectures, using a customized set of partitioning, placement and inter-chip routing tools. Particular attention was paid to the development of appropriate interchip routing algorithms for each architecture. The architectures are compared on the basis of cost (the total number of pins required in the system) and speed (determined by post inter-chip routing critical path delay). The results show that the 8-way mesh architecture has high cost, poor routability and speed while the partial crossbar architecture gives relatively low cost, good routability and speed. Using our experimental approach, we also explore a key architecture parameter associated with the partial crossbar architecture, and its impact on the routability and speed of the architecture. We briefly describe an inter-chip router for the partial crossbar architecture, called PCROUTE, that gives excellent routability and speed results for real benchmark circuits.
2005
Abstract Modern FPGA architectures provide ample routing resources so that designs can be routed successfully. The routing architecture is designed to handle versatile connection configurations. However, providing such great flexibility comes at a high cost in terms of area, delay and power. We propose a new FPGA routing architecture\ footnoteThis work was supported in part by a grant from NSF under contract CAREER CCF-0347891 that utilizes a mixture of hardwired and traditional flexible switches.
2000
Multi-FPGA boards are being used for logic emulation, rapid prototyping, custom computing and low volume subsystem implementation. A key feature which characterizes these boards is their routing architecture(RA). Inter-FPGA connections in an RA can be of two types, namely fixed connections through direct wires and programmable connections through intermediate Field-Programmable Interconnect Devices. This paper presents an analytical approach for evaluating routing performance of an RA employing both types of connections . Our approach consists of two steps: 1) Generation of random interconnection requirement matrix for modeling real circuits with an orientation towards the available interconnection architecture. 2) Checking the routability of the generated matrix on the given RA.
Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2002
To fully realize the benefits of partial and rapid reconfiguration of field-programmable devices, we often need to dynamically schedule computing tasks and generate instance-specific configurations-new graphs which must be routed during program execution. Consequently, route time can be a significant overhead cost reducing the achievable net benefits of dynamic configuration generation. By adding hardware to accelerate routing, we show that it is possible to compute routes in one thousandth the time of a traditional, software router and achieve routes that are within 5% of the state-of-the-art offline routing algorithms for a sample set of application netlists and within 25% for a set of difficult synthetic benchmarks. We further outline how strategic use of parallelism can allow the total route time to scale substantially less than linearly in graph size. We detail the source of the benefits in our approach and survey a range of options for hardware assistance that vary from a speedup of over 10× with modest hardware overhead to speedups in excess of 1000×.
2004
We have developed a hop-based complete detailed router ROAD-HOP that uses the Bump & Refit (£ ¥ ¤ § ¦
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000
number of MUX inputs required (see ). A MUX with n inputs contributes at most log(n) to the entropy, so we sum the log of the number of inputs over all the signals. We obtain an entropy of 240 bits per cluster or 40.0 bits per basic logic cell. This looks reasonable compared to the lower bound. Now suppose we alter the previous parameters to Fcint = 0:25, F cfb = 0:25, and Fc = 0:1. Then the entropy per cell becomes 26.6, which we can confidently say is insufficient (based on our lower bound of 27) even for PLDs with only 65K cells. This is true for any detailed routing architecture consistent with these parameters and for any routing algorithm.
Proceedins of the 14th ACM Great Lakes symposium on VLSI - GLSVLSI '04, 2004
In this paper we compare the routing architecture of island-style FPGAs based on field-programmable switch boxes with a maskprogrammable routing structure, in order to assess its position in the design space of routing opportunities available to VLSI IC designers. Although the results presented in this work depend on a few implementation details that will be discussed in the paper, the mask-programmable routing structure shows a large area saving and delay improvement with respect to the field-programmable switch box. As a consequence, we believe that between the two bounds of the design space, i.e., ASICs and FPGAs, there are several hybrid architectural solutions trading off performances, power, area, and programmability, which in the future can be considered for different applications.
World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, 2010
Modern applications realized onto FPGAs exhibit high connectivity demands. Throughout this paper we study the routing constraints of Virtex devices and we propose a systematic methodology for designing a novel general-purpose interconnection network targeting to reconfigurable architectures. This network consists of multiple segment wires and SB patterns, appropriately selected and assigned across the device. The goal of our proposed methodology is to maximize the hardware utilization of fabricated routing resources. The derived interconnection scheme is integrated on a Virtex style FPGA. This device is characterized both for its high-performance, as well as for its low-energy requirements. Due to this, the design criterion that guides our architecture selections was the minimal Energy×Delay Product (EDP). The methodology is fully-supported by three new software tools, which belong to MEANDER Design Framework. Using a typical set of MCNC benchmarks, extensive comparison study in terms of several critical parameters proves the effectiveness of the derived interconnection network. More specifically, we achieve average Energy×Delay Product reduction by 63%, performance increase by 26%, reduction in leakage power by 21%, reduction in total energy consumption by 11%, at the expense of increase of channel width by 20%.
2010 International Conference on Field-Programmable Technology, 2010
We consider coarse and fine-grained techniques for parallel FPGA routing on modern multi-core processors. In the coarse-grained approach, sets of design signals are assigned to different processor cores and routed concurrently. Communication between cores is through the MPI (message passing interface) communications protocol. In the fine-grained approach, the task of routing an individual load pin on a signal is parallelized using threads. Specifically, as FPGA routing resources are traversed during maze expansion, delay calculation, costing and priority queue insertion for these resources execute concurrently. The proposed techniques provide deterministic/repeatable results. Moreover, the coarse and fine-grained approaches are not mutually exclusive and can be used in tandem. Results show that on a 4-core processor, the techniques improve router run-time by ∼2.1×, on average, with no significant impact on circuit speed performance or interconnect resource usage.
Proceedings of the internation symposium on Field programmable gate arrays - FPGA'06, 2006
A fundamental difference between ASICs and FPGAs is that wires in ASICs are designed such that they match the requirements of a particular design. Wire parameters such as length, width, layout and the number of wires can be varied to implement a desired circuit. Conversely, in an FPGA, area is fixed and routing resources exist whether or not they are used, so the goal becomes implementing a circuit within the limits of available resources. The architecture for existing routing structures in FPGAs has evolved over time to suit the requirements of large, localized digital circuits. However, FPGAs now have the capacity to implement networks of such circuits, and system-level interconnection becomes a key element of the design process.
Abstrsact — Network-on-chip(NoC) architectures are emerging for the highly scalable, reliable, and modular on-chip communication infrastructure platform. The NoC architecture uses layered protocols and packet-switched networks which consist of on-chip routers, links, and network interfaces on a predefined topology. In this Project, we design network-on-chip which is based on the Cartesian network environment. This project proposes the new Cartesian topology which is used to reduce network routing time, and it is a suitable alternate to network design and implementation. The Cartesian Network-On-Chip can be modeled using Verilog HDL and simulated using Modelsim software.
2013 International Conference on Field-Programmable Technology (FPT), 2013
The FPGA's interconnection network not only requires the larger portion of the total silicon area in comparison to the logic available on the FPGA, it also contributes to the majority of the delay and power consumption. Therefore it is essential that routing algorithms are as efficient as possible. In this work the connection router is introduced. It is capable of partially ripping up and rerouting the routing trees of nets. To achieve this, the main congestion loop rips up and reroutes connections instead of nets, which allows the connection router to converge much faster to a solution. The connection router is compared with the VPR directed search router on the basis of VTR benchmarks on a modern commercial FPGA architecture. It is able to find routing solutions 4.4% faster for a relaxed routing problem and 84.3% faster for hard instances of the routing problem. And given the same amount of time as the VPR directed search, the connection router is able to find routing solutions with 5.8% less tracks per channel.
1997
This paper describes the design and development of routing chips used in a proprietary high-speed network switch called HSSI. This high-speed switch is under development at our research lab. We design the chips using VHDL language and implement them using FPGA technology. The result is a high-performance routing chip set which can operate at a speed faster than 100 Mbps. We found that using VHDL along with the FPGA technology provides a fast development environment that can reduce the design effort tremendously.
2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2019
FPGA routing is an important part of physical design as the programmable interconnection network requires the majority of the total silicon area and the connections largely contribute to delay and power. It should also occur with minimum runtime to enable efficient design exploration. In this work we elaborate on the concept of the connection-based routing principle. The algorithm is improved and a timing-driven version is introduced. The router, called CROUTE, is implemented in an easy to adapt FPGA CAD framework written in Java, which is publicly available on GitHub. Quality and runtime are compared to the state-of-the-art router in VPR 7.0.7. Benchmarking is done with the TITAN23 design suite, which consists of large heterogeneous designs targeted to a detailed representation of the Stratix IV FPGA. CROUTE gains in both the total wirelength and maximum clock frequency while reducing the routing runtime. The total wire-length reduces by 11% and the maximum clock frequency increases by 6%. These high-quality results are obtained in 3.4x less routing runtime.
2011 21st International Conference on Field Programmable Logic and Applications, 2011
We propose a new FPGA routing approach that, when combined with a low-cost architecture change, results in a 34% reduction in router run-time, at the cost of a 3% area overhead, with no increase in critical path delay. Our approach begins with traditional PathFinder-style routing, which we run on a coarsened representation of the routing architecture. This leads to fast generation of a partial routing solution where signals are assigned to groups of wire segments rather than individual wire segments. A Boolean satisfiability (SAT)-based stage follows, generating a legal routing solution from the partial solution. Our approach points to a new research direction: reducing FPGA CAD run-time by exploring FPGA architectures and algorithms together.
2009 International Conference on Field-Programmable Technology, 2009
This paper optimizes the routing structure for hybrid FPGAs, in which high I/O density coarse-grained units are embedded within fine-grained logic. This significantly increases the routing resource requirement between elements. We investigate the routing demand for hybrid FPGAs over a set of domainspecific applications. The trade-off in delay, area and routability of the separation distance between coarse-grained blocks are studied. The effects of adding routing switches to the coarsegrained blocks and using wider channels near them to meet extra routing demand are examined. Our optimized architectures are compared to existing column based architecture. The results show that (1) there is 44% tracks usage at the edge of the embedded blocks, (2) both the separation of embedded blocks and addition of switches to embedded blocks can increase the area and delay performance by 48.4% compared to column based FPGA architecture, (3) wider channel width reduces the area of highly congested system by 34.9%, but it cannot further improve the system with separation of embedded blocks and additional switches on embedded blocks.
Design Automation For Embedded Systems, 2004
Multi-FPGA Boards (MFBs) have been in use for more than a decade for implementing systems requiring high performance and for emulation /prototyping of multimillion gate chips. It is important to develop an MFB architecture which can be used for emulation or prototyping of a large number of circuits. A key feature of an MFB is its routing architecture de®ned by its inter-Field-Programmable Gate Array (FPGA) connections. There are two types of inter-FPGA connections, namelyЮxed connections (FCs) connecting a pair of FPGAs through dedicated wires and programmable connections (PCs) which connect a pair of FPGAs through a programmable switch. An architecture which has a mix of both these type of connections is called a hybrid routing architecture. It has been shown in the literature [7] that a hybrid MFB architecture is more ef®cient for emulation than an architecture with only one type of connections. The cost of an MFB and delay of the emulated circuit on it depends on the number of PCs used for emulation. An objective of a designer of an MFB for circuit emulation is to minimize the required number of PCs. In this paper, we describe algorithms to evaluate the requirement of PCs for many hybrid routing architectures. The requirement of PCs can be reduced if some programmable connections are replaced by a connection using only FCs by routing through FPGAs. Such a routing is called multi-hop routing. We present an optimal and a heuristic algorithm for estimation of PCs when limited number of hops through FPGAs are permitted. The unique feature of our evaluation scheme is that it is generic and treat routing architecture as a parameter. We have used benchmark circuits as well as synthetic cloned circuits for testing our algorithms. Our heuristic algorithm is very fast and gives optimal results most of the time. Our algorithms can be used for actual routing during circuit emulation.
20th International Parallel and Distributed Processing Symposium, IPDPS 2006, 2006
The novel design of an efficient FPGA interconnection architecture with multiple Switch Boxes (SB) and hardwired connections for realizing data intensive applications (i.e. DSP applications), is introduced. For that purpose, after exhaustive exploration, we modify the routing architecture through efficient selection of the appropriate switch box with hardwired connections, taking into account the statistical and spatial routing restrictions of DSP applications mapped onto FPGA. More specifically, we propose a new technique for selecting the appropriate combination of switch boxes, depending on the localized performance and power consumption requirements of each specific region of FPGA architecture. In order to perform the mapping, we developed a novel algorithm, which takes into account the modified architectural routing features. This algorithm was implemented within a new tool called EX-VPR. Using a number of DSP applications, extensive comparison study of various combinations of switch boxes in terms of total power consumption, performance, Power×Delay product prove the effectiveness of the proposed approach.
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays, 2008
Architecture development for FPGAs has typically been a very empirical discipline, requiring the synthesis of benchmark circuits into candidate architectures. This is difficult to do in the early stages of architecture development, however, because there is no complete architecture to synthesize circuits into. The effort required to create prototype tools for nascent architectures is far too great for every new logic block or routing architecture idea, and so it would be extremely helpful to have a simple and intuitive FPGA interconnect model to guide the architect. In this paper we present such an interconnect model for island-style FPGAs, whose single output is the estimated routing demand (often referred to as W, the number of routing tracks per channel) for an FPGA as a function of several logic block, circuit and routing architecture parameters. The goal of this model is to be as simple as possible, while still accurate enough to be useful, to provide understanding and intuition on FPGA routing. Our methodology is empirical-we propose model forms based on empirical observations, intuition and some derivation, and then fit models to experimentally generated data. We show the development of the model in stages, beginning with a fully flexible FPGA, and gradually proceeding to one which includes the key parameters that control the flexibility of FPGA routing, and one key parameter describing the logic block and another relating to the typical circuit. We then show how to use these models in early-stage architecture development to provide feedback on several aspects of logic block architecture. We also show how the model can be used to explore the routing architecture space itself and to provide an overall intuition for architecture development.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.