Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2010
…
184 pages
1 file
The advantage in multiprocessors is the performance speedup obtained with processorlevel parallelism. Similarly, the exibility for application-specic adaptability is the advantage in recongurable architectures. To benet from both these architectures, we present a recongurable multiprocessor template that combines parallelism in multiprocessors and exibility in recongurable architectures. A fast, single cycle, resourceecient, run-time reconguration scheme accelerates customisations in the recongurable multiprocessor template. Based on this methodology, a four-core multiprocessor called QuadroCore has been implemented on UMC's 90nm standard cells and on Xilinx's FPGA. QuadroCore is customisable and adapts to variations in the granularity of parallelism, the amount of communication between tasks, and the frequency of synchronisation. To validate the advantages of this approach, a diverse set of applications has been mapped onto the QuadroCore multiprocessor. Experimental results show speedups in the range of 3 to 11 in comparison to a single processor. In addition, energy savings of up to 30% were noted on account of reconguration. Furthermore, to steer application mapping based on power considerations, an instruction-level power model has been developed. Using this model, power-driven instruction selection introduces energy savings of up to 70% in the QuadroCore multiprocessor.
2009
Parallelism and adaptability are two distinct architectural design considerations in embedded processors. Multicore processors accelerate application execution on account of their inherent parallelism and run-time reconfiguration capabilities add adaptability during infield deployment. To benefit from both these features, a reconfigurable multiprocessor architecture − QuadroCore has been developed. A novel reconfiguration mechanism has been incorporated that provides fast run-time adaptability in a 4-processor cluster. In this paper, this scheme of reconfiguration has been used to save energy when using QuadroCore for data-parallel applications. As a proof of concept, a data-intensive neural network application called Self-organising Maps has been implemented on QuadroCore. Via reconfiguration, energy reduction of up to 30% has been observed for an implementation in UMC's 90nm standard cell technology.
ACM Transactions on Reconfigurable Technology and Systems, 2010
In multiprocessors, performance improvement is typically achieved by exploring parallelism with fixed granularities, such as instruction-level, task-level, or data-level parallelism. We introduce a new reconfiguration mechanism that facilitates variations in these granularities in order to optimize resource utilization in addition to performance improvements. Our reconfigurable multiprocessor QuadroCore combines the advantages of reconfigurability and parallel processing. In this paper, a unified hardware-software approach for the design of our QuadroCore is presented. This design-flow is enabled via compiler-driven reconfiguration, which matches application-specific characteristics to a fixed set of architectural variations. A special reconfiguration mechanism has been developed that alters the architecture within a single clock cycle.
Microprocessors and Microsystems, 2012
In this paper, we address the problem of organization and management of threads on a multithreading custom computing machine composed of a General Purpose Processor (GPP) and Reconfigurable Coprocessors. We target higher portability, flexibility, and performance of the prospective design solutions by means of a strictly architectural approach. Our proposal to improve overall system performance is twofold. First, we provide architectural mechanisms to accelerate applications by supporting computationally intensive kernels with reconfigurable hardware accelerators. Second, we propose an infrastructure capable of facilitating thread management. Besides the architectural and microarchitectural extensions of the reconfigurable computing system, we also propose a hierarchical programming model. The model supports balanced and performance efficient SW/HW co-execution of multithreading applications. We demonstrate that our approach provides better performance-portability and performance-flexibility trade-off characteristics compared to other state-of-the-art proposals. The experimental results, based on real applications, suggest average system speedups between 1.2 and 19.6. Based on singlethreaded synthetic benchmark, we achieve average speedup between 8.5 and 129. For multithreaded synthetic benchmark, the achieved average speedup is between 1.3 and 7.3.
Lecture Notes in Computer Science, 2011
In this paper, we address organization and management of threads on a multithreading custom computing machine composed by a General Purpose Processor (GPP) and Reconfigurable Co-Processors. Our proposal to improve overall system performance is twofold. First, we provide architectural mechanisms to accelerate applications by supporting computationally intensive kernels with reconfigurable hardware accelerators. Second, we propose an infrastructure capable to facilitate thread management. The latter can be employed by, e.g., RTOS kernel services. Besides the architectural and microarchitecural extensions of the reconfigurable computing system, we also propose a hierarchical programming model. The model supports balanced and performance efficient SW/ HW co-execution of multithreading applications. Our experimental results based on real applications suggest average system speedups between 1.2 and 19.6 times and based on synthetic benchmarks, the achieved speedups are between 1.3 and 29.8 times compared to software only implementations.
2011 9th IEEE International Conference on Industrial Informatics, 2011
The growing complexity and diversity of embedded systems -combined with continuing demands for higher performance and lower power consumption -place increasing pressure on embedded platforms designers. To address these problems, the Embedded Reconfigurable Architectures project (ERA), investigates innovations in both hardware and tools to create next-generation embedded systems. Leveraging adaptive hardware enables maximum performance for given power budgets. We design our platform via a structured approach that allows integration of reconfigurable computing elements, network fabrics, and memory hierarchy components. Commercially available, off-the-shelf processors are combined with other proprietary and application-specific, dedicated cores. These computing and network elements can adapt their composition, organization, and even instruction-set architectures in an effort to provide the best possible trade-offs in performance and power for the given application(s). Likewise, network elements and topologies and memory hierarchy organization can be selected both statically at design time and dynamically at run-time. Hardware details are exposed to the operating system, run-time system, compiler, and applications. This combination supports fast platform prototyping of high-efficient embedded system designs. Our design philosophy supports the freedom to flexibly tune all these hardware elements, enabling a better choice of power/performance trade-offs than that afforded by the current state of the art.
Journal of Signal Processing Systems, 2012
Day after day, embedded systems add more compute-intensive applications inside their end products: cryptography or image and video processing are some examples found in leading markets like consumer electronics and automotive. To face up these ever-increasing computational demands, the use of hardware accelerators synthesized in field-programmable gate arrays (FPGA) lets achieve processing speedups of orders of magnitude versus their counterpart CPU-based software approaches. However, the inherent increment in physical resources penalizes in cost. To address this issue, dynamically reconfigurable hardware technology definitively reached its maturity. SRAM-based reconfigurable logic goes beyond the classical conception of static hardware resources distributed in space and held invariant for the entire application life cycle; it provides a new design abstraction featured by the temporal partitioning of such resources to promote their continuous reuse, reconfiguring them on the fly to play a different role in each instant. This new computing paradigm lets balance the design of embedded applications by partitioning their functionality in space and time-through a series of mutually-exclusive processing tasks synthesized multiplexed in time on the same set of resources-and achieving thus cost savings in both area and power metrics. However, the exploitation of this system versatility requires special attention to avoid performance degradation. Such technical aspects are addressed in this work intended to be a survey on reconfigurable hardware technology and aimed at defining an open, standard and cost-effective system architecture driven by flexible coprocessors instantiated on demand on reconfigurable resources of an FPGA. This concept fits well with the functional features demanded to many embedded applications today and its feasibility has been proved with a state-of-the-art commercial SRAM-based FPGA platform. The achieved results highlight dynamic partial reconfiguration as a potential technology to lead the next computing wave in the industry.
Presently electronic devices such as smart phones, games consoles have requirement for heterogeneous tasks that need to execute in real time. To meet this challenging requirement heterogeneous multi core processor is the likely platforms to host the different application of tasks. However, there are many hardware/software challenges on multi-core processor heterogeneous design that can include; sharing of resources, task balancing, throughput, communication and scheduling. This paper presents a review on a heterogeneous multiprocessor reconfigurable design using Field Programmable Gate Arrays (FPGA) hardware platforms. The FPGA-based hardware design represents a middleware structure to manage the communication between cores in any specific processor through varying hardware blocks physically in FPGA platforms. The FPGA-based design provides efficient platforms for developing hardware/software multi-cores system designing for re-configurable heterogeneous multiprocessor. Finally, the re-configurable design improves heterogeneous multiprocessor performance in term of decreasing the latency in communication, power consumption and execution time.
Workshop on …, 2004
There are a growing number of reconfigurable architectures that combine the advantages of a hardwired implementation (performance, power consumption) with the advantages of a software solution (flexibility, time to market). Today, there are devices on the market that can be dynamically reconfigured at run-time within one clock cycle. But the benefits of these architectures can only be utilized if applications can be mapped efficiently. In this paper we describe a design approach for reconfigurable architectures that takes into account the three aspects architecture, compiler, and applications. To realize the proposed design flow we developed a synthesizable architecture model. From this model we obtain estimations for speed, area, and power that are used to provide the compiler with the necessary timing information and to optimize the architecture.
2005
The X4CP32 is an architecture that combines the parallel and reconfigurable paradigms. It consists of a grid of Reconfigurable and Programming Units (RPUs), each one containing 4 Cells (including a microprocessor in each Cell), responsible for all the processing and program flow. This paper presents architectural modifications in the X4CP32 in order to increase its performance. The RPU was implemented according to the VLIW (Very Long Instruction Word) methodology, and the Cells were redesigned with a pipelined implementation. These improvements raised the maximum IPC of the RPU from 0.5 to 4 with an area overhead of 26%. To evaluate the new architecture, versions of the 2D Discrete Cosine Transform, Montgomery Modular Multiplication and Color Space Conversion were mapped, using the baseline architecture and the pipelined VLIW architecture.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2003
Journal of Low Power Electronics and Applications, 2014
Architecture of Computing Systems – ARCS 2016, 2016
it - Information Technology, 2007
International Journal of Reconfigurable Computing, 2009
Journal of Systems Architecture, 2017
Reconfigurable Computing, 2011
Proc. of the 2006 …, 2006