Papers by Panagiotis Michailidis
Systems and Computational Biology - Bioinformatics and Computational Modeling
Proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms, 2011
Parameter estimation in nonlinear regression models (NRMs) represents a major challenge for vario... more Parameter estimation in nonlinear regression models (NRMs) represents a major challenge for various scientific computing applications. In this study, we briefly consider a recent population-based metaheuristic algorithm named Jaya, which is used in estimating the parameters of NRMs. The algorithm is experimentally tested on a set of benchmark regression problems of various levels of difficulty. We show that the algorithm can be used as an alternative means of parameter estimation in NRMs. It is efficient in computational time and achieves a high success rate and accuracy.

International Journal of Parallel, Emergent and Distributed Systems, 2017
In this work, we propose a hybrid parallel Jaya optimisation algorithm for a multi-core environme... more In this work, we propose a hybrid parallel Jaya optimisation algorithm for a multi-core environment with the aim of solving large-scale global optimisation problems. The proposed algorithm is called HHCPJaya, and combines the hyper-population approach with the hierarchical cooperation search mechanism. The HHCPJaya algorithm divides the population into many small subpopulations, each of which focuses on a distinct block of the original population dimensions. In the hyper-population approach, we increase the small subpopulations by assigning more than one subpopulation to each core, and each subpopulation evolves independently to enhance the explorative and exploitative nature of the population. We combine this hyper-population approach with the two-level hierarchical cooperative search scheme to find global solutions from all subpopulations. Furthermore, we incorporate an additional updating phase on the respective subpopulations based on global solutions, with the aim of further improving the convergence rate and the quality of solutions. Several experiments applying the proposed parallel algorithm in different settings prove that it demonstrates sufficient promise in terms of the quality of solutions and the convergence rate. Furthermore, a relatively small computational effort is required to solve complex and large-scale optimisation problems.
Proceedings of the 18th Panhellenic Conference on Informatics, 2014
In this paper we present the parallel implementation of the string matching problem with static t... more In this paper we present the parallel implementation of the string matching problem with static text allocation. Experiments are realized using the Message Passing Interface (MPI) library on a homogeneous and a heterogeneous cluster of workstations. Our experimental results report that this implementation achieved significant speedup using different text sizes and number of workstations. We also present a performance prediction model that is general enough to address performance evaluation of both types (homogeneous and heterogeneous) computations. This model agrees well with our experimental measurements of both implementations.
Sci. Ann. Cuza Univ., 2002
Scalable Computing: Practice and Experience, Jun 30, 2015
Multiple pattern matching algorithms are used to locate the occurrences of patterns from a finite... more Multiple pattern matching algorithms are used to locate the occurrences of patterns from a finite pattern set in a large input string. Aho-Corasick, Set Horspool, Set Backward Oracle Matching, Wu-Manber and SOG, five of the most well known algorithms for multiple matching require an increased computing power, particularly in cases where large-size datasets must be processed, as is common in computational biology applications. Over the past years, Graphics Processing Units (GPUs) have evolved to powerful parallel processors outperforming CPUs in scientific applications. This paper evaluates the speedup of the basic parallel strategy and the different optimization strategies for parallelization of Aho-Corasick, Set Horspool, Set Backward Oracle Matching, Wu-Manber and SOG algorithms on a GPU.

Applied mathematical sciences, 2013
The main problem of the kernel density estimation methods is the huge computational requirements,... more The main problem of the kernel density estimation methods is the huge computational requirements, especially for large data sets. One way for accelerating these methods is to use the parallel processing. Recent advances in parallel processing have focused on the use Graphics Processing Units (GPUs) using Compute Unified Device Architecture (CUDA) programming model. In this work we discuss a naive and two optimised CUDA algorithms for the two kernel estimation methods: univariate and multivariate. These optimised algorithms are based on the use of shared memory tiles and loop unrolling techniques. We also present exploratory experimental results of the proposed CUDA algorithms according to the several values of parameters such as number of threads per block, tile size, loop unroll level, number of variables and data (sample) size. The experimental results show significant performance gains of all proposed CUDA algorithms over serial CPU version and small performance speed-ups of the two optimised CUDA algorithms over naive GPU algorithms. Finally, based on extended performance results are obtained general conclusions of all proposed CUDA algorithms for some parameters.
Proceedings of the Sixth International Conference on Engineering Computational Technology
ABSTRACT Searching for the Longest Common Subsequence (LCS for short) is one of the most fundamen... more ABSTRACT Searching for the Longest Common Subsequence (LCS for short) is one of the most fundamental tasks in bioinformatics. In this paper, we present a parallel implementation of the LCS computation for heterogeneous master-worker platforms. It is the first parallel implementation on cluster environment for this problem. We also report a set of numerical experiments on a heterogeneous platform where computational resources have different computing powers. Also, the workers are connected to the master by communication links of same capacities. The obtained experimental results demonstrate the proposed parallel implementation is efficient with low communication overhead.

Applied Numerical Mathematics, 2016
Numerical linear algebra is one of the most important forms of scientific computation. The basic ... more Numerical linear algebra is one of the most important forms of scientific computation. The basic computations in numerical linear algebra are matrix computations and linear systems solution. These computations are used as kernels in many computational problems. This study demonstrates the parallelisation of these scientific computations using multicore programming frameworks. Specifically, the frameworks examined here are Pthreads, OpenMP, Intel Cilk Plus, Intel TBB, SWARM, and FastFlow. A unified and exploratory performance evaluation and a qualitative study of these frameworks are also presented for parallel scientific computations with several parameters. The OpenMP and SWARM models produce good results running in parallel with compiler optimisation when implementing matrix operations at large and medium scales, whereas the remaining models do not perform as well for some matrix operations. The qualitative results show that the OpenMP, Cilk Plus, TBB, and SWARM frameworks require minimal programming effort, whereas the other models require advanced programming skills and experience. Finally, based on an extended study, general conclusions regarding the programming models and matrix operations for some parameters were obtained.
Proceedings of the 2012 16th Panhellenic Conference on Informatics, PCI 2012, 2012
Abstract This paper implements basic computational kernels of the scientific computing such as ma... more Abstract This paper implements basic computational kernels of the scientific computing such as matrix-vector product, matrix product and Gaussian elimination on multi-core platforms using several parallel programming tools. Specifically, these tools are Pthreads, OpenMP, Intel Cilk++, Intel TBB, Intel ArBB, SMPSs, SWARM and Fast Flow. The aim of this paper is to present an unified quantitative and qualitative study of these tools for parallel computation of scientific computing kernels on multicore. Finally, based on this study we conclude that the ...

Proceedings - 15th IEEE International Conference on Computational Science and Engineering, CSE 2012 and 10th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, EUC 2012, 2012
The social researchers use computationallyintensive statistical and econometric methods for data ... more The social researchers use computationallyintensive statistical and econometric methods for data analysis. One way for accelerating these computations is to use the parallel computing with multi-core platforms. In this paper we parallelize some representative computational kernels from statistics and econometrics on multi-core platform using the programming libraries such as Pthreads, OpenMP, Intel Cilk++, Intel TBB, Intel ArBB, SWARM and FastFlow. Specifically, these kernels are multivariate descriptive statistics (such as multivariate mean and multivariate covariance) and kernel density estimation (univariate and multivariate). The purpose of this paper is to present an extensive quantitative and qualitative study of the multi-core programming models for parallel statistical and econometric computations. Finally, based on this study we conclude that the Intel ArBB and the SWARM programming environments are more efficient for implementing statistical computations of large and small scale, respectively. The reason for which these models are efficient because they give good performance and simplicity of programming.

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communications, HPCC-2012 - 9th IEEE International Conference on Embedded Software and Systems, ICESS-2012, 2012
The broad introduction of multi-core platforms into computing has brought a great opportunity to ... more The broad introduction of multi-core platforms into computing has brought a great opportunity to develop computationally demanding applications such as matrix computations on parallel computing platforms. Basic matrix computations such as vector and matrix addition, dot product, outer product, matrix transpose, matrix-vector and matrix multiplication are very challenging computational kernels arising in scientific computing. In this paper, we parallelize those basic matrix computations using the multicore and parallel programming tools. Specifically, these tools are Pthreads, OpenMP, Intel Cilk++, Intel TBB, Intel ArBB, SMPSs, SWARM and FastFlow. The purpose of this paper is to present a quantitative and qualitative study of these tools for parallel matrix computations. Finally, based on this study we conclude that the Intel ArBB and SWARM parallel programming tools are the most appropriate because these give good performance and simplicity of programming.
Journal of Systems Architecture, 2008
In this paper, we present linear processor array architectures for flexible approximate string ma... more In this paper, we present linear processor array architectures for flexible approximate string matching. These architectures are based on parallel realization of dynamic programming and non-deterministic finite automaton algorithms. The algorithms consist of two phases, i.e. preprocessing and searching. Then, starting from the data dependence graphs of the searching phase, parallel algorithms are derived, which can be realized directly onto special purpose processor array architectures for approximate string matching. Further, the preprocessing phase is also accommodated onto the same processor array designs. Finally, the proposed architectures support flexible patterns i.e. patterns with a ''don't care'' symbol, patterns with a complement symbol and patterns with a class symbol.

The Journal of Supercomputing, 2005
A longest common subsequence (LCS) of two strings is a common subsequence of two strings of maxim... more A longest common subsequence (LCS) of two strings is a common subsequence of two strings of maximal length. The LCS problem is to find an LCS of two given strings and the length of the LCS (LLCS). In this paper, we present a new linear processor array for solving the LCS problem. The array is based on parallelization of a recent LCS algorithm which consists of two phases, i.e. preprocessing and computation. The computation phase is based on bit-level dynamic programming approach. Implementations of the preprocessing and computation phases are discussed on the same processor array architecture for the LCS problem. Further, we propose a block processor array architecture which reduces the overall communication and time requirements. Finally, we develop a performance model for estimating the performance of the processor array architecture on Pentium processors.
Uploads
Papers by Panagiotis Michailidis