Papers by Ricardo Menotti

Many video and image/signal processing applications can be structured as sequences of data-depend... more Many video and image/signal processing applications can be structured as sequences of data-dependent tasks using a consumer/producer communication paradigm and are therefore amenable to pipelined execution. This paper presents an execution technique to speed-up the overall execution of successive, data-dependent tasks on a reconfigurable architecture. The technique pipelines sequences of data-dependent tasks by overlapping their execution subject to data-dependences. It decouples the concurrent data-path and control units and uses a custom, application data-driven, fine-grained synchronization and buffering scheme. In addition, the execution scheme allows for outof-order, but data-dependent producer-consumer pairs not allowed by previous data-driven pipelining approaches. The approach has been exploited in the context of a highlevel compiler targeting FPGAs. The preliminary experimental results reveal noticeable performance improvements and buffer size reductions for a number of benchmarks over traditional approaches.
Abstract In embedded reconfigurable computing systems, general purpose processors (GPPs) are typi... more Abstract In embedded reconfigurable computing systems, general purpose processors (GPPs) are typically extended with coprocessors to meet specific goals, such as higher performance and/or energy savings. Coprocessors can range from specialized modules which execute a specific task to reconfigurable arrays of ALUs. This paper presents our ongoing work on techniques to dynamically offload computations being executed by a GPP to a coprocessor.
Abstract Embedded systems are considered one of the areas with more potential for future innovati... more Abstract Embedded systems are considered one of the areas with more potential for future innovations. Two embedded fields that will most certainly take a primary role in future innovations are mobile robotics and mobile computing. Mobile robots and smartphones are growing in number and functionalities, becoming a presence in our daily life. In this paper, we study the current feasibility of a smartphone to execute navigation algorithms and provide autonomous control, eg, for a mobile robot.
Abstract Meeting safety requirements typically require substantial invasive extensions to applica... more Abstract Meeting safety requirements typically require substantial invasive extensions to applications. Even in the absence of faults, the overhead associated with these invasive extensions may unacceptably increase execution time. In this paper we focus on a number of experiments with schemes for error detection, having a 3D Path Planning application for an avionics system as case study. We analyze how these error detection schemes can be implemented to meeting system's time budget.
Abstract This paper presents a novel approach to accelerate program execution by mapping repetiti... more Abstract This paper presents a novel approach to accelerate program execution by mapping repetitive traces of executed in-structions, called Megablocks, to a runtime reconfigurable array of functional units. An offline tool suite extracts Megablocks from microprocessor instruction traces and generates a Reconfigurable Processing Unit (RPU) tailored for the execution of those Mega-blocks. The system is able to move transparently computations from the microprocessor to the RPU at runtime.
Abstract Coarse-grained reconfigurable architectures have proven their value as programmable acce... more Abstract Coarse-grained reconfigurable architectures have proven their value as programmable accelerators for general purpose processors. For early evaluation of those architectures, we need an approach able to exploit and retarget different processing elements (PEs) while maintaining the same compilation flow. Bearing in mind those aspects, this paper describes an approach able to map, evaluate and generate reconfigurable architectures based on an array of PEs.
abstract Reconfigurable computing has already confirmed a significant potential for accelerating ... more abstract Reconfigurable computing has already confirmed a significant potential for accelerating certain computing tasks. However, the most successful applications relied on user expertise to design a specific architecture implemented by the hardware structures of the reconfigurable computing device. Hence, one of the most challenging issues is to map, efficiently and automatically, computations (described in software programming languages) to reconfigurable computing devices.
Abstract Typical computing systems based on general purpose processors (GPPs) are extended with c... more Abstract Typical computing systems based on general purpose processors (GPPs) are extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, efficient dynamic mapping techniques are required. Those dynamic mapping techniques will be responsible for automatically moving computations originally running in the GPP to the CGRA.
Abstract Typical computing systems based on general purpose processors (GPPs) can be extended wit... more Abstract Typical computing systems based on general purpose processors (GPPs) can be extended with coarse-grained reconfigurable arrays (CGRAs) to provide higher performance and/or energy savings. In order for applications to take advantage of these computing systems, possibly including CGRAs varying in size, efficient dynamic compilation/mapping techniques are required. Dynamic mapping will be responsible for automatically moving computations originally running in the GPP to the CGRA.
Abstract This paper presents an offline tool-chain which automatically extracts loops (Mega block... more Abstract This paper presents an offline tool-chain which automatically extracts loops (Mega blocks) from Micro Blaze instruction traces and creates a tailored Reconfigurable Processing Unit (RPU) for those loops. The system moves loops from the CPU to the RPU transparently, at runtime, and without changing the executable binaries. The system was implemented in an FPGA and for the tested kernels measured speedups ranged between 3.9× and 18.2× for a Micro Blaze CPU without cache.
Abstract Coarse-grained reconfigurable computing architectures vary widely in the number and char... more Abstract Coarse-grained reconfigurable computing architectures vary widely in the number and characteristics of the processing elements (cells) and routing topologies used. In order to exploit several different topologies, a place and route framework, able to deal with such vast design exploration space, is of paramount importance. Bearing this in mind, this paper proposes a placement scheme able to target different topologies when considering data-driven reconfigurable architectures.
The main characteristic of Reconfigurable Computing (RC) is the presence of hardware that can be ... more The main characteristic of Reconfigurable Computing (RC) is the presence of hardware that can be reconfigured (reconfigware-RW) to implement specific functionality more suitable for specially tailored hardware than on a simple uniprocessor. RC systems join microprocessors and programmable hardware in order to take advantage of the combined strengths of hardware and software [20, 5] and have been used in applications ranging from embedded systems to high performance computing.
This book describes a wide range of code transformations and mapping techniques for compiling pro... more This book describes a wide range of code transformations and mapping techniques for compiling programs written in high-level programming languages to reconfigurable architectures.
The invention relates to a method for compiling programs on a system consisting of at least one f... more The invention relates to a method for compiling programs on a system consisting of at least one first processor and a reconfigurable unit. It is provided in this method that the code parts suitable for the reconfigurable unit are determined and extracted and the remaining code is extracted in such a manner for processing by the first processor.
The eXtreme Processing Platform (XPP) is a unique reconfigurable computing (RC) architecture supp... more The eXtreme Processing Platform (XPP) is a unique reconfigurable computing (RC) architecture supported by a complete set of design tools. This paper presents the XPP Vectorizing C Compiler XPP-VC, the first high-level compiler for this architecture. It uses new mapping techniques, combined with efficient vectorization. A temporal partitioning phase guarantees the compilation of programs with unlimited complexity, provided that only the supported C subset is used.
Abstract The problem of simultaneous localization and mapping has been studied by the mobile robo... more Abstract The problem of simultaneous localization and mapping has been studied by the mobile robotics scientific community over the last two decades. Most solutions for this problem are based on probabilistic theory in order to represent the uncertainty in robot perception and action. One of the most efficient probabilistic methods is the extended Kalman filter (EKF). However, the EKF demands a considerable amount of computing power and is usually processed by high-end laptops coupled to the robots.
Abstract It is predicted that by the year 2010, 90% of the overall program code developed will be... more Abstract It is predicted that by the year 2010, 90% of the overall program code developed will be for embedded computing systems. This fact requires urgent changes in the organization of the current computer science curriculums, as advocated by a number of academics. The changes will help students deal with the idiosyncrasies of embedded systems, which requires knowledge about the computation engine, its energy consumption model, performance, interfaced artifacts, reconfigurable hardware programming, etc.
Uploads
Papers by Ricardo Menotti