Papers by George Economakos
Conference on Correct Hardware Design and Verification Methods, 1997
Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems - JTRES '06, 2006
Java processors have been introduced to offer hardware acceleration for java applications. They e... more Java processors have been introduced to offer hardware acceleration for java applications. They execute java bytecodes directly in hardware. However, the stack nature of the java virtual machine instruction set imposes a limitation on the achievable execution performance. If we intend to exploit instruction level parallelism, we must remove the stack completely. This can be achieved by recursive stack folding
Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007), 2007
In this paper a novel coarse grained reconfigurable arithmetic unit (RAU) is introduced. The RAU&... more In this paper a novel coarse grained reconfigurable arithmetic unit (RAU) is introduced. The RAU's design is based on a technique that Mines flexibility into custom Carry-Save-Arithmetic (CSA) circuits exploiting a stable and canonical interconnection scheme. The reconfigurable architecture prototype is presented. Two mapping strategies of DSP algorithms onto the proposed unit, are also analyzed. Experimental results report an average
2010 IEEE Computer Society Annual Symposium on VLSI, 2010
This paper presents a methodology for fast and efficient Design Space Exploration during High Lev... more This paper presents a methodology for fast and efficient Design Space Exploration during High Level Synthesis. An augmented instance of the design space is studied taking under consideration the effects of both compiler- and architectural-level transformations onto the final datapath. A new gradient-based pruning technique has been developed, which evaluates large portions of the augmented solution space in a quick
2009 NASA/ESA Conference on Adaptive Hardware and Systems, 2009
Datapath synthesis incorporating complex operation templates has been proven extremely efficient ... more Datapath synthesis incorporating complex operation templates has been proven extremely efficient especially for the digital signal processing (DSP) application domain.However, only architectural level optimizations have been reported for the specification and implementation of the operation templates. This paper introduces the consideration of arithmetic level optimizations for template based datapath synthesis. A high performance architecture for the implementation of DSP kernels

Abstact: - Modern digital design, having to cope with the increasing device and application compl... more Abstact: - Modern digital design, having to cope with the increasing device and application complexities, is based on high-level textual system specifications. While languages like VHDL and Verilog HDL have been effec- tively put to use for hardware design, system level hardware/software codesign requires more abstract and more powerful specification languages, like SystemC. However, for SystemC to be effective, it must be integrated in the existing design flow, taking advantage of previous work, avoiding past mistakes and generating the least possible new, offering clear and distinct advantages, implementing the latest research results and helping designers over- come the learning curve. This paper discusses issues of a pure SystemC design platform, states its applicability and its new perspective in system level design and offers tool interoperability solutions. Through such platforms, SystemC can leverage digital design industry from high level hardware design to system level har...

Journal of Systems Architecture, 2008
Java processors have been introduced to offer hardware acceleration for Java applications. They e... more Java processors have been introduced to offer hardware acceleration for Java applications. They execute Java bytecodes directly in hardware. However, the stack nature of the Java virtual machine instruction set imposes a limitation on the achievable execution performance. In order to exploit instruction level parallelism and allow out of order execution, we must remove the stack completely. This can be achieved by recursive stack folding algorithms, such as OPEX, which dynamically transform groups of Java bytecodes to RISC like instructions. However, the decoding throughputs that are obtained are limited. In this paper, we explore microarchitectural techniques to improve the decoding throughput of Java processors. Our techniques are based on the use of a predecoded cache to store the folding results, so that it could be reused. The ultimate goal is to exploit every possible instruction level parallelism in Java programs by having a superscalar out of order core in the backend being fed at a sustainable rate. With the use of a predecoded cache of 2 Â 2048 entries and a 4-way superscalar core we have from 4.8 to 18.3 times better performance than an architecture employing pattern based folding.

Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era, 2000
Computer-aided synthesis of digital circuits from behavioral level specifications offers an effec... more Computer-aided synthesis of digital circuits from behavioral level specifications offers an effective way to deal with the increasing complexity of digital hardware design. A high-level synthesis tool transforms an abstract algorithmic description into a detailed register transfer level implementation. Even though considerable research has taken place, regarding high-level synthesis, practical implementations are just emerging. This happens due to the fact that designers demand interaction at both the specification and implementation level. This paper describes an efficient implementation of an original idea, for the design of a grammar based interactive design environment, which allows designers supplement high-level synthesis optimizations and set constraints among the operators in the textual algorithmic description to meet their implementation preferences. The suggested methodology raises the feasibility for high level design space exploration by enabling synthesis results to be directly modifiable by the user.

Proceedings of 2010 International Symposium on VLSI Design, Automation and Test, 2010
In this paper a new technique for the design of combinational circuits for low power is introduce... more In this paper a new technique for the design of combinational circuits for low power is introduced. The basic idea is to bypass blocks of logic when their function is not required, using low delay and area overhead components (transmission gates). While this technique offers great dynamic power savings mainly in array multipliers, due to their regular interconnection scheme, it misses the reduced area and fast speed advantages of tree multipliers. Therefore, a mixed style architecture, using a traditional, tree based part, combined with a bypass, array based part, is proposed. Through extensive experimentation it has been found that while the bypass technique offers the minimum dynamic power consumption value, the mixed architecture offers a delay*power product improvement ranging from 1.2x to 6.5x, compared to all other architectures. Furthermore, the tree part of the mixed architecture has enough timing slack to be implemented with high Vth low leakage components, offering an extra 20%-30% leakage power saving, which is a considerable value in deep submicron technologies.
Proceedings Design, Automation and Test in Europe, 1998
Attribute grammars have been used extensively in every phase of traditional compiler construction... more Attribute grammars have been used extensively in every phase of traditional compiler construction. Recently, it has been shown that they can also be effectively adopted to handle scheduling algorithms in high-level synthesis. Their main advantages are modularity and declarative notation in the development of design automation environments. In this paper, past results are further elaborated and more scheduling techniques are presented and implemented in a flexible environment for the design automation of digital systems. This novel approach can be proven valuable for fast evaluation of new algorithms and techniques in the field.
Computers in Cardiology 1998. Vol. 25 (Cat. No.98CH36292), 1998
A novel system for ECG Telemedicine and Telecare is presented. It exhibits a low cost performance... more A novel system for ECG Telemedicine and Telecare is presented. It exhibits a low cost performance ratio and thus can provide advanced telemedicine services in remote or isolated areas. It can be used as a fundamental telemedicine platform, offering essential medical services, allowing a referred physician from far away to offer assistance without traveling to the referred medical site. The
Computers in Cardiology 1998. Vol. 25 (Cat. No.98CH36292), 1998
Telemedicine applications are tools that aid doctors perform their duty without being physically ... more Telemedicine applications are tools that aid doctors perform their duty without being physically present. This can be accomplished provided that some kind of communications infrastructure can convey the same amount of information and in the same way as if the doctor was present. It is also desirable for that communications infrastructure to support mobile communications so as to be applicable
2010 IEEE International Symposium on Industrial Electronics, 2010
Although the performance of traditional PLC technology is adequate for the majority of industrial... more Although the performance of traditional PLC technology is adequate for the majority of industrial automation and control tasks, there exist a number of demanding applications, which need more powerful alternatives. One such alternative, which has received considerable research interest in recent years, is the implementation of control algorithms on FPGAs. An inherent difficulty of this approach is that it requires
Lecture Notes in Computer Science, 2011
The shrinking of interconnect width and thickness, due to technology scaling, along with the inte... more The shrinking of interconnect width and thickness, due to technology scaling, along with the integration of low-k dielectrics, reveal novel reliability wear-out mechanisms, progressively affecting the performance of complex systems. These phenomena progressively deteriorate the electrical characteristics and therefore the delay of interconnects, leading to violations in timing-critical paths. This work estimates the timing impact of Time-Dependent Dielectric Breakdown (TDDB) between wires of the same layer, considering temperature variations. The proposed framework is evaluated on a Leon3 MP-SoC design, implemented at a 45nm CMOS technology. The results evaluate the system's performance drift due to TDDB, considering different physical implementation scenarios.

2011 6th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS), 2011
Reconfigurable computing is a cost-effective alternative to technology shrinking in order to achi... more Reconfigurable computing is a cost-effective alternative to technology shrinking in order to achieve higher performance in digital design, especially considering run time reconfiguration. Research in the field consists of new reconfigurable architectures, either coarse-grain or fine-grain, and new methodologies to map applications onto them. Usually, top-down methodologies are proposed, that start from the application's dataflow graph and try to merge different parts into the same reconfigurable component. This paper presents a bottomup approach, that searches available RTL component libraries for primitives that can be connected in alternative ways and generate new components, with different modes of functionality. Such components, called morphable components, are designed to impose the minimum accepted area and timing overhead, without any reconfiguration overhead. The great advantage of the bottom-up approach is that it can be integrated easily with existing design methodologies and tools, offering great overall performance improvements. The results obtained with different DSP benchmarks in a high-level synthesis environment show an average performance gain of 15%, without any practical datapath area increase, offering uniform and balanced resource utilization.
Journal of telemedicine and telecare, 1996
In this paper we present the principles of a new platform developed for handling ECG signals in a... more In this paper we present the principles of a new platform developed for handling ECG signals in a telemedicine setting. We focus on three basic services: an ECG file management system (acquisition, storage, transmission); ECG-oriented teleconferencing; and realtime transmission of ECGs over the telephone network. This work has been carried out in the context of national and EU-sponsored projects. Its main purpose was to help patients from remote or isolated areas, like small islands, with insufficient health-care services, to get appropriate and experienced medical care directly from large central hospitals. We present the design and the basic operations of the ECG handling system.

Lecture Notes in Computer Science, 1997
This paper considers the automatic synthesis of systolic architectures from nested loop algorithm... more This paper considers the automatic synthesis of systolic architectures from nested loop algorithmic specifications. The high level input is given in the form of uniform dependence loops with unit dependencies and the target architecture is a multidimensional systolic array with unbounded number of cells. A complete methodology for the hardware synthesis of the resulting architecture, based on VHDL specifications, is presented. This methodology automatically detects all necessary computation and communication elements and produces optimal layouts. The theoretical framework of our method is based on the properties of the generalized UET grids. First, we calculate the optimal makespan for the generalized UET grids and then we establish the minimum number of systolic cells required to achieve the optimal makespan. The complexity of the proposed scheduling algorithm is completely independent of the size of the nested loop and depends only on its dimension, thus being the most efficient (in terms of complexity) known to us. All these methods were implemented and incorporated in an integrated software package which provides the designer with a powerful parallel design environment, from high level algorithmic specifications to lowlevel (i.e., actual layouts) optimal implementation. Index terms: UET grid index space, optimal makespan, optimal mapping, number of systolic cells, uniform unit dependence vectors, VHDL based design automation.
2011 18th IEEE International Conference on Electronics, Circuits, and Systems, 2011
Reconfigurable computing is a cost-effective alternative to technology shrinking in order to achi... more Reconfigurable computing is a cost-effective alternative to technology shrinking in order to achieve higher performance in digital design, especially considering run time reconfiguration. Research in the field consists of new reconfig-urable architectures, either coarse-grain or fine-grain, and new methodologies to map applications onto them. A special case of coarse-grain reconfigurable components are morphable multipliers, which use multiplexers to feed different
2010 17th IEEE International Conference on Electronics, Circuits and Systems, 2010
The interface between storage and processing has always been one of the main bottlenecks to the p... more The interface between storage and processing has always been one of the main bottlenecks to the performance and energy efficiency in embedded system design. In this paper we are exploring the potential to increase the bandwidth through this interface by increasing the number of physical connections. This option becomes available using 3D stacking technologies. Our aim is to evaluate the
2011 6th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS), 2011
The continuous scaling of CMOS transistor and interconnect geometries brings to light novel chall... more The continuous scaling of CMOS transistor and interconnect geometries brings to light novel challenges regarding the design of VLSI systems in the nanoscale era. On the other hand, most of the forthcoming deep-deep submicron technologies are not yet mature to be used for fabrication. Hence, the development of standard-cell libraries at the nanometer regime is emerging, in order to estimate
Uploads
Papers by George Economakos