Skip to main content

Grigoris Dimitroulakos

University of the Peloponnese, Department of Informatics and Telecommunications, Faculty Member

Followers

35

Following

14

Co-authors

9

Public Views

Χρήστος Καρανικόλας

Eap Greece

Konstantinos Masselos

Chris Karanikolas

University of the Peloponnese

Abdullah Almogahed

Foundation University, Islamabad. Pakistan

InterestsView All (10)

Uploads

Papers by Grigoris Dimitroulakos

Simulating Software Evolution to Evaluate the Reliability of Early Decision-making among Design Alternatives toward Maintainability

ACM Transactions on Software Engineering and Methodology

Critical decisions among design altern seventh atives with regards to maintainability arise early... more Critical decisions among design altern seventh atives with regards to maintainability arise early in the software design cycle. Existing comparison models relayed on the structural evolution of the used design patterns are suitable to support such decisions. However, their effectiveness on predicting maintenance effort is usually verified on a limited number of case studies under heterogeneous metrics. In this article, a multi-variable simulation model for validating the decision-making reliability of the derived formal comparison models for the significant designing problem of recursive hierarchies of part-whole aggregations, proposed in our prior work, is introduced. In the absence of a strict validation, the simulation model has been thoroughly calibrated concerning its decision-making precision based on empirical distributions from time-series analysis, approximating the highly uncertain nature of actual maintenance process. The decision reliability of the formal models has been...

An ultra high speed architecture for VLSI implementation of hash functions

10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003

Today, security is a topic which attacks the great interest of researchers. Many encryption algor... more Today, security is a topic which attacks the great interest of researchers. Many encryption algorithms have been investigated, and developed in the last years. The research community efforts are also centered to the efficient implementation of them, in both software platforms and hardware devices. This work is related to hash functions FPGA implementation. Two different hash functions are studied: RIPEMD-160 and SHA-1. A high speed architecture is proposed for the implementation of both of them in the same hardware module. The proposed system reaches throughput values equal to 1,4 for SHA-1 and 1,6 for RIPEMND-160. The proposed system is compared with other related works in both software and hardware.

Speedups in embedded systems with a high-performance coprocessor datapath

ACM Transactions on Design Automation of Electronic Systems, 2007

This article presents the speedups achieved in a generic single-chip microprocessor system by emp... more This article presents the speedups achieved in a generic single-chip microprocessor system by employing a high-performance datapath. The datapath acts as a coprocessor that accelerates computational-intensive kernel sections thereby increasing the overall performance. We have previously introduced the datapath which is composed of Flexible Computational Components (FCCs). These components can realize any two-level template of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed. The overall application speedups of several real-life applications relative to the software execution on the microprocessor are estimated using the design flow. These speedups range from 1.75 to ...

Automatic Generation of Code Analysis Tools: The CastQL Approach

Proceedings of the 1st International Workshop on Real World Domain Specific Languages, 2016

Source code analysis and manipulation tools have become an essential part of software development... more Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been t...

A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance Analysis

ACM Transactions on Design Automation of Electronic Systems, 2020

Source code optimization can heavily improve software code implementation quality while still bei... more

Early Evaluation of Implementation Alternatives of Composite Data Structures Toward Maintainability

ACM Transactions on Software Engineering and Methodology, 2017

Selecting between different design options is a crucial decision for object-oriented software dev... more Selecting between different design options is a crucial decision for object-oriented software developers that affects code quality characteristics. Conventionally developers use their experience to make such decisions, which leads to suboptimal results regarding code quality. In this article, a formal model for providing early estimates of quality metrics of object-oriented software implementation alternatives is proposed. The model supports software developers in making fast decisions in a systematic way early during the design phase to achieve improved code characteristics. The approach employs a comparison model related to the application of the Visitor design pattern and inheritance-based implementation on structures following the Composite design pattern. The model captures maintainability as a metric of software quality and provides precise assessments of the quality of each implementation alternative. Furthermore, the model introduces the structural maintenance cost metric ba...

Matlab to C Compilation Targeting Application Specific Instruction Set Processors

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions f... more This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-onchip development context while still improving implementation efficiency.

MEMSCOPT: A source-to-source compiler for dynamic code analysis and loop transformations

In this paper, we present MEMSCOPT, a source-to-source compiler incorporated in a system level de... more In this paper, we present MEMSCOPT, a source-to-source compiler incorporated in a system level design tool chain for dynamic code analysis and loop transformations targeting memory performance optimization. MEMSCOPT is user interactive, supported by both Windows and Linux platforms and integrates with Visual Studio and NetBeans.

XMSIM: A tool for early memory hierarchy evaluation

In this demonstration we present the usage of XMSIM, a tool for memory hierarchy evaluation of mu... more

Dynamic source code analysis for memory hierarchy optimization in multimedia applications

Realizing image and signal processing algorithms in embedded systems is a three step process incl... more Realizing image and signal processing algorithms in embedded systems is a three step process including algorithmic design, implementation and mapping to a target architecture and memory hierarchy. This paper presents MemAddIn, a dynamic analysis tool for C applications that exposes the critical application's loops which deserve the designer's attention for memory hierarchy optimization. MemAddIn is based on an extension of MEMSCOPT compiler and integrates in the Visual Studio IDE offering a unified environment for the application's implementation and optimization. To conclude on the criticality of the application loops the tool utilizes two metrics which are relevant with the underlying memory architecture cost and performance.

Reuse Distance Analysis for Locality Optimization in Loop-Dominated Applications

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, 2015

This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C cod... more This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application.

A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors

ACM Transactions on Design Automation of Electronic Systems, 2017

This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for ... more This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set Processors (ASIPs). Custom instructions are represented via specialized intrinsic functions in the generated code, and the generated code can be used as input to any C/C++ compiler supporting the target processor. Furthermore, the specialized instruction set of the target processor is described in a parameterized way using a target processor-independent architecture description approach, thus allowing the support of any processor. The compiler has been used for the generation of application code for two different ASIPs for several benchmarks. The code generated by the compiler achieves a speedup between 2×-74× and 2×-97× compared to the code generated by the MathWorks MATLAB-to-C compiler. Experimental results also prove that the compiler efficiently exploits SIMD custom instructions achieving a 3.3 factor speedup compared to cases where no SIMD processing is used. Thus the compiler can be employed to reduce the development time/effort/cost and time to market through raising the abstraction of application design in an embedded systems/system-on-chip development context.

A Retargetable MATLAB to C Compiler Exploiting Custom Instructions and Data Parallelism

ACM Transactions on Embedded Computing Systems

This article presents a MATLAB to C compiler that exploits custom instructions present in state-o... more

Automatic generation of code analysis tools: The CastQL approach

Proceedings of the 1st International Workshop on Real World Domain Specific Languages (RWDSL)

Source code analysis and manipulation tools have become an essential part of software development... more Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been t...

MATLAB-to-C compilation targeting Application Specific Instruction Set Processors

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions f... more This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processorâ€™s special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost...

A partitioning flow for accelerating applications in processor-FPGA systems

This paper presents a hardware/software partitioning flow for improving performance in systemson-... more This paper presents a hardware/software partitioning flow for improving performance in systemson-chip comprised by processor and Field Programmable Gate Array. Speedups are achieved by executing critical software parts on the reconfigurable FPGA logic. A generic hybrid system architecture is considered by the methodology. The partitioning flow uses an automated analysis process at the basic-block level for detecting critical application parts. Two different instances of the generic platform and five real-world applications are used in the experiments. The analytical experimentation illustrates that the speedup of the applications ranges from 1.3 to 3.7 relative to an all software solution.

Exploiting the Distributed Foreground Memory in Coarse Grain Reconfigurable Arrays for Reducing the Memory Bottleneck in DSP Applications

This paper presents a methodology for memory-aware mapping on 2-Dimensional coarse-grained reconf... more This paper presents a methodology for memory-aware mapping on 2-Dimensional coarse-grained reconfigurable architectures that aims in the minimization of the data memory accesses for DSP and multimedia applications. Additionally, the realistic 2-Dimensional coarse-grained reconfigurable architecture template to which the mapping methodology targets, models a large number of existing coarse-grained architectures. This is exploited for quantifiyng the influnce that the architectural features have on performance improvements achieved by our methodology. A novel mapping algorithm is also proposed that uses a list scheduling technique in which the binding, routing, and scheduling phases are considered together and they are steered by a set of costs. The algorithm transfers the data reuse values in the internal interconnection network instead of being fetched in order to reduce the data transfer burden on the bus network. The experimental results show that memory accesses and execution time are reduced since the mapping methodology efficiently exploits the data reuse opportunities.

An Automated Methodology for Memory-Conscious Mapping of DSP Applications on Coarse-Grain Reconfigurable Arrays

2005 IEEE International Symposium on Circuits and Systems, 2005

This paper presents a memory-conscious mapping methodology of computational intensive application... more This paper presents a memory-conscious mapping methodology of computational intensive applications on coarse-grain reconfigurable arrays. By exploiting the inherent abundant amounts of data reuse in DSP applications, the methodology tries to minimize the data memory bandwidth, which constitutes a major bottleneck for the applications performance. This is achieved by using the distributed foreground storage elements in the architecture and by the proper placing operations in the processing elements. The methodology considers a realistic 2-Dimensional coarse-grain reconfigurable architecture template which can model a large number of existing coarse-grain architectures. The experimental results show that memory accesses and execution time are reduced since the mapping methodology efficiently exploits the data reuse opportunities. The need for taking into account memory bandwidth limitations is also illustrated. I.

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

The execution time improvements achieved in a generic microprocessor system by employing a high-p... more The execution time improvements achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates computational intensive kernel regions thereby increasing the overall performance. The data-path has been previously introduced and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. For evaluating the effectiveness of our coprocessor approach, several real-world DSP applications are mapped to the system. Study of the performance improvements relative to the microprocessor architecture and to the computational resources of the data-path is performed. Significant overall application speedups are reported that range from 1.75 to 3.95, having an average value of 2.72, while the overhead in circuit area is small.

Improving Performance of Embedded Processors with a High-Performance Coarse-Grained Reconfigurable Data-Path

MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference, 2006

An embedded system that extends microprocessor cores with a high-performance Coarse-Grained Recon... more An embedded system that extends microprocessor cores with a high-performance Coarse-Grained Reconfigurable Data-Path is introduced. The data-path is composed by computational resources able to realize complex operations which aid in improving the performance of time critical application parts, called kernels. A compilation flow is defined for mapping high-level software descriptions to the microprocessor system. The kernel code is mapped using a properly developed mapping algorithm for the Reconfigurable Data-Path. Extensive exploration is performed by mapping four real-life applications on three different instances of the system. Important overall application speedups have been reported that range from 1.70 to 3.68 relative to an all-processor execution.

Simulating Software Evolution to Evaluate the Reliability of Early Decision-making among Design Alternatives toward Maintainability

ACM Transactions on Software Engineering and Methodology

Critical decisions among design altern seventh atives with regards to maintainability arise early... more Critical decisions among design altern seventh atives with regards to maintainability arise early in the software design cycle. Existing comparison models relayed on the structural evolution of the used design patterns are suitable to support such decisions. However, their effectiveness on predicting maintenance effort is usually verified on a limited number of case studies under heterogeneous metrics. In this article, a multi-variable simulation model for validating the decision-making reliability of the derived formal comparison models for the significant designing problem of recursive hierarchies of part-whole aggregations, proposed in our prior work, is introduced. In the absence of a strict validation, the simulation model has been thoroughly calibrated concerning its decision-making precision based on empirical distributions from time-series analysis, approximating the highly uncertain nature of actual maintenance process. The decision reliability of the formal models has been...

An ultra high speed architecture for VLSI implementation of hash functions

10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003

Today, security is a topic which attacks the great interest of researchers. Many encryption algor... more Today, security is a topic which attacks the great interest of researchers. Many encryption algorithms have been investigated, and developed in the last years. The research community efforts are also centered to the efficient implementation of them, in both software platforms and hardware devices. This work is related to hash functions FPGA implementation. Two different hash functions are studied: RIPEMD-160 and SHA-1. A high speed architecture is proposed for the implementation of both of them in the same hardware module. The proposed system reaches throughput values equal to 1,4 for SHA-1 and 1,6 for RIPEMND-160. The proposed system is compared with other related works in both software and hardware.

Speedups in embedded systems with a high-performance coprocessor datapath

ACM Transactions on Design Automation of Electronic Systems, 2007

This article presents the speedups achieved in a generic single-chip microprocessor system by emp... more This article presents the speedups achieved in a generic single-chip microprocessor system by employing a high-performance datapath. The datapath acts as a coprocessor that accelerates computational-intensive kernel sections thereby increasing the overall performance. We have previously introduced the datapath which is composed of Flexible Computational Components (FCCs). These components can realize any two-level template of primitive operations. The automated coprocessor synthesis method from high-level software description and its integration to a design flow for executing applications on the system is presented. For evaluating the effectiveness of our coprocessor approach, analytical study in respect to the type of the custom datapath and to the microprocessor architecture is performed. The overall application speedups of several real-life applications relative to the software execution on the microprocessor are estimated using the design flow. These speedups range from 1.75 to ...

Automatic Generation of Code Analysis Tools: The CastQL Approach

Proceedings of the 1st International Workshop on Real World Domain Specific Languages, 2016

Source code analysis and manipulation tools have become an essential part of software development... more Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been t...

A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance Analysis

ACM Transactions on Design Automation of Electronic Systems, 2020

Source code optimization can heavily improve software code implementation quality while still bei... more

Early Evaluation of Implementation Alternatives of Composite Data Structures Toward Maintainability

ACM Transactions on Software Engineering and Methodology, 2017

Selecting between different design options is a crucial decision for object-oriented software dev... more Selecting between different design options is a crucial decision for object-oriented software developers that affects code quality characteristics. Conventionally developers use their experience to make such decisions, which leads to suboptimal results regarding code quality. In this article, a formal model for providing early estimates of quality metrics of object-oriented software implementation alternatives is proposed. The model supports software developers in making fast decisions in a systematic way early during the design phase to achieve improved code characteristics. The approach employs a comparison model related to the application of the Visitor design pattern and inheritance-based implementation on structures following the Composite design pattern. The model captures maintainability as a metric of software quality and provides precise assessments of the quality of each implementation alternative. Furthermore, the model introduces the structural maintenance cost metric ba...

Matlab to C Compilation Targeting Application Specific Instruction Set Processors

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions f... more This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processor's special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost and time to market by raising the abstraction of application design in an embedded systems / system-onchip development context while still improving implementation efficiency.

MEMSCOPT: A source-to-source compiler for dynamic code analysis and loop transformations

In this paper, we present MEMSCOPT, a source-to-source compiler incorporated in a system level de... more In this paper, we present MEMSCOPT, a source-to-source compiler incorporated in a system level design tool chain for dynamic code analysis and loop transformations targeting memory performance optimization. MEMSCOPT is user interactive, supported by both Windows and Linux platforms and integrates with Visual Studio and NetBeans.

XMSIM: A tool for early memory hierarchy evaluation

In this demonstration we present the usage of XMSIM, a tool for memory hierarchy evaluation of mu... more

Dynamic source code analysis for memory hierarchy optimization in multimedia applications

Realizing image and signal processing algorithms in embedded systems is a three step process incl... more Realizing image and signal processing algorithms in embedded systems is a three step process including algorithmic design, implementation and mapping to a target architecture and memory hierarchy. This paper presents MemAddIn, a dynamic analysis tool for C applications that exposes the critical application's loops which deserve the designer's attention for memory hierarchy optimization. MemAddIn is based on an extension of MEMSCOPT compiler and integrates in the Visual Studio IDE offering a unified environment for the application's implementation and optimization. To conclude on the criticality of the application loops the tool utilizes two metrics which are relevant with the underlying memory architecture cost and performance.

Reuse Distance Analysis for Locality Optimization in Loop-Dominated Applications

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015, 2015

This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C cod... more This paper discusses MemAddIn, a compiler assisted dynamic code analysis tool that analyzes C code and exposes critical parts for memory related optimizations on embedded systems that can heavily affect systems performance, power and cost. The tool includes enhanced features for data reuse distance analysis and source code transformation recommendations for temporal locality optimization. Several of data reuse distance measurement algorithms have been implemented leading to different trade-offs between accuracy and profiling execution time. The proposed tool can be easily and seamlessly integrated into different software development environments offering a unified environment for application development and optimization. The novelties of our work over a similar optimization tool are also discussed. MemAddIn has been applied for the dynamic computation of data reuse distance for a number of different applications. Experimental results prove the effectiveness of the tool through the analysis and optimization of a realistic image processing application.

A MATLAB Vectorizing Compiler Targeting Application-Specific Instruction Set Processors

ACM Transactions on Design Automation of Electronic Systems, 2017

This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for ... more This article discusses a MATLAB-to-C vectorizing compiler that exploits custom instructions, for example, for Single Instruction Multiple Data (SIMD) processing and instructions for complex arithmetic present in Application-Specific Instruction Set Processors (ASIPs). Custom instructions are represented via specialized intrinsic functions in the generated code, and the generated code can be used as input to any C/C++ compiler supporting the target processor. Furthermore, the specialized instruction set of the target processor is described in a parameterized way using a target processor-independent architecture description approach, thus allowing the support of any processor. The compiler has been used for the generation of application code for two different ASIPs for several benchmarks. The code generated by the compiler achieves a speedup between 2×-74× and 2×-97× compared to the code generated by the MathWorks MATLAB-to-C compiler. Experimental results also prove that the compiler efficiently exploits SIMD custom instructions achieving a 3.3 factor speedup compared to cases where no SIMD processing is used. Thus the compiler can be employed to reduce the development time/effort/cost and time to market through raising the abstraction of application design in an embedded systems/system-on-chip development context.

A Retargetable MATLAB to C Compiler Exploiting Custom Instructions and Data Parallelism

ACM Transactions on Embedded Computing Systems

This article presents a MATLAB to C compiler that exploits custom instructions present in state-o... more

Automatic generation of code analysis tools: The CastQL approach

Proceedings of the 1st International Workshop on Real World Domain Specific Languages (RWDSL)

Source code analysis and manipulation tools have become an essential part of software development... more Source code analysis and manipulation tools have become an essential part of software development processes. Automating the development of such tools can heavily reduce development time, effort and cost. This paper proposes a framework for the efficient development of code analysis software. A tool for automatically generating the front end of analysis tools for a given language grammar is proposed. The proposed approach can be applied to any language that can be described using the BNF notation. The proposed framework also provides a domain specific language to concisely express queries on the internal representation generated by the front end. This language tackles the problem of writing complex code in a general purpose programming language in order to retrieve information from the internal representation. The approach has been evaluated through two different realistic usage scenarios applied to a number of different benchmark applications. The front end generator has also been t...

MATLAB-to-C compilation targeting Application Specific Instruction Set Processors

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions f... more This paper discusses a MATLAB to C compiler exploiting custom instructions such as instructions for SIMD processing and instructions for complex arithmetic present in Application Specific Instruction Set Processors (ASIPs). The compiler generates ANSI C code in which the processorâ€™s special instructions are represented via specialized intrinsic functions. By doing this the generated code can be used as input to any C/C++ compiler. Thus the proposed compiler allows the description of the specialized instruction set of the target processor in a parameterized way allowing the support of any processor. The proposed compiler has been used for the generation of application code for an ASIP targeting DSP applications. The code generated by the proposed compiler achieves a speed up between 2x-30x on the targeted ASIP for six DSP benchmarks compared to the code generated by Mathworks MATLAB to C compiler. Thus the proposed compiler can be employed to reduce the development time/effort/cost...

A partitioning flow for accelerating applications in processor-FPGA systems

This paper presents a hardware/software partitioning flow for improving performance in systemson-... more This paper presents a hardware/software partitioning flow for improving performance in systemson-chip comprised by processor and Field Programmable Gate Array. Speedups are achieved by executing critical software parts on the reconfigurable FPGA logic. A generic hybrid system architecture is considered by the methodology. The partitioning flow uses an automated analysis process at the basic-block level for detecting critical application parts. Two different instances of the generic platform and five real-world applications are used in the experiments. The analytical experimentation illustrates that the speedup of the applications ranges from 1.3 to 3.7 relative to an all software solution.

Exploiting the Distributed Foreground Memory in Coarse Grain Reconfigurable Arrays for Reducing the Memory Bottleneck in DSP Applications

This paper presents a methodology for memory-aware mapping on 2-Dimensional coarse-grained reconf... more This paper presents a methodology for memory-aware mapping on 2-Dimensional coarse-grained reconfigurable architectures that aims in the minimization of the data memory accesses for DSP and multimedia applications. Additionally, the realistic 2-Dimensional coarse-grained reconfigurable architecture template to which the mapping methodology targets, models a large number of existing coarse-grained architectures. This is exploited for quantifiyng the influnce that the architectural features have on performance improvements achieved by our methodology. A novel mapping algorithm is also proposed that uses a list scheduling technique in which the binding, routing, and scheduling phases are considered together and they are steered by a set of costs. The algorithm transfers the data reuse values in the internal interconnection network instead of being fetched in order to reduce the data transfer burden on the bus network. The experimental results show that memory accesses and execution time are reduced since the mapping methodology efficiently exploits the data reuse opportunities.

An Automated Methodology for Memory-Conscious Mapping of DSP Applications on Coarse-Grain Reconfigurable Arrays

2005 IEEE International Symposium on Circuits and Systems, 2005

This paper presents a memory-conscious mapping methodology of computational intensive application... more This paper presents a memory-conscious mapping methodology of computational intensive applications on coarse-grain reconfigurable arrays. By exploiting the inherent abundant amounts of data reuse in DSP applications, the methodology tries to minimize the data memory bandwidth, which constitutes a major bottleneck for the applications performance. This is achieved by using the distributed foreground storage elements in the architecture and by the proper placing operations in the processing elements. The methodology considers a realistic 2-Dimensional coarse-grain reconfigurable architecture template which can model a large number of existing coarse-grain architectures. The experimental results show that memory accesses and execution time are reduced since the mapping methodology efficiently exploits the data reuse opportunities. The need for taking into account memory bandwidth limitations is also illustrated. I.

Accelerating DSP Applications in Embedded Systems with a Coprocessor Data-Path

The execution time improvements achieved in a generic microprocessor system by employing a high-p... more The execution time improvements achieved in a generic microprocessor system by employing a high-performance data-path are presented. The data-path acts as a coprocessor that accelerates computational intensive kernel regions thereby increasing the overall performance. The data-path has been previously introduced and it is composed by Flexible Computational Components (FCCs) that can realize any two-level template of primitive operations. For evaluating the effectiveness of our coprocessor approach, several real-world DSP applications are mapped to the system. Study of the performance improvements relative to the microprocessor architecture and to the computational resources of the data-path is performed. Significant overall application speedups are reported that range from 1.75 to 3.95, having an average value of 2.72, while the overhead in circuit area is small.

Improving Performance of Embedded Processors with a High-Performance Coarse-Grained Reconfigurable Data-Path

MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference, 2006

An embedded system that extends microprocessor cores with a high-performance Coarse-Grained Recon... more An embedded system that extends microprocessor cores with a high-performance Coarse-Grained Reconfigurable Data-Path is introduced. The data-path is composed by computational resources able to realize complex operations which aid in improving the performance of time critical application parts, called kernels. A compilation flow is defined for mapping high-level software descriptions to the microprocessor system. The kernel code is mapped using a properly developed mapping algorithm for the Reconfigurable Data-Path. Extensive exploration is performed by mapping four real-life applications on three different instances of the system. Important overall application speedups have been reported that range from 1.70 to 3.68 relative to an all-processor execution.