Papers by Gianluigi Folino
Preface: nature inspired solutions for high performance computing
Computer systems are characterized by an ever-growing complexity and a pronounced distributed nat... more Computer systems are characterized by an ever-growing complexity and a pronounced distributed nature. Since controlling highly distributed systems and managing the communication among them are far beyond the capabilities of a central entity, it is essential to develop new decentralized architectures. Such architectures, for example Grids, Clouds and P2P systems, are increasingly popular, but they need new types of algorithms to be efficiently managed.
An Autonomic Middleware for Grid-Enabled Self-Organizing Applications
AN AUTONOMIC MIDDLEWARE FOR GRID-ENABLED SELF-ORGANIZING APPLICATIONS Gianluigi Folino and Giando... more AN AUTONOMIC MIDDLEWARE FOR GRID-ENABLED SELF-ORGANIZING APPLICATIONS Gianluigi Folino and Giandomenico Spezzano Institute for High Performance Computing and Networking (ICAR) National Research Council (CNR), Italy Via P.
A cellular environment for steering high performance scientific computations
Abstract: This paper presents CARAVEL, a problem solving environment where simulation and steerin... more Abstract: This paper presents CARAVEL, a problem solving environment where simulation and steering are integrated to facilitate interactive exploration and modelling of complex applications on high performance computers. CARAVEL uses the cellular automata (CA) formalism both as a tool to model and simulate dynamic complex phenomena and as a computational model for parallel processing. It is an environment for CA programming and parallel execution.
Pruning GP-Based Classifier Ensembles by Bayesian Networks
Classifier ensemble techniques are effectively used to combine the responses provided by a set of... more Classifier ensemble techniques are effectively used to combine the responses provided by a set of classifiers. Classifier ensembles improve the performance of single classifier systems, even if a large number of classifiers is often required. This implies large memory requirements and slow speeds of classification, making their use critical in some applications. This problem can be reduced by selecting a fraction of the classifiers from the original ensemble.
Grid-based PSE Toolkits for Multidisciplinary Applications. FIRB Grid
Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) resear... more Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) researchers. The goal is to further improve GP classification performances. Among the ensemble techniques, also bagging and boosting have been taken into account. These techniques improve classification accuracy by combining the responses of different classifiers by using a majority vote rule. However, it is really hard to ensure that classifiers in the ensemble be appropriately diverse, so as to avoid correlated errors.
• Maintaining diversity in the genetic programming is important, because it helps to prevent the ... more • Maintaining diversity in the genetic programming is important, because it helps to prevent the GP process from a premature convergence.• The lack of diversity may lead to convergence towards local optima or towards a not optimal behavior in dynamic environments.• Experimental analysis of diversity can give us a better perspective about the population transition and the search process in GP.
Background Sequencing technologies have different biases, in single-genome sequencing and metagen... more Background Sequencing technologies have different biases, in single-genome sequencing and metagenomic sequencing; these can significantly affect ORFs recovery and the population distribution of a metagenome. In this paper we investigate how well different technologies represent information related to a considered organism of interest in a metagenome, and whether it is beneficial to combine information obtained using different technologies.
Abstract This paper presents a study that evaluates the influence of the parallel genetic program... more Abstract This paper presents a study that evaluates the influence of the parallel genetic programming (GP) models in maintaining diversity in a population. The parallel models used are the cellular and the multipopulation one. Several measures of diversity are considered to gain a deeper understanding of the conditions under which the evolution of both models is successful. Three standard test problems are used to illustrate the different diversity measures and analyze their correlation with performance.
Content-based mining for solving geoprocessing problems on grids
… on Use of P2P, GRID and …, Jan 1, 2007
Abstract Geological data management and mining are critical areas of modern-day geology research.... more Abstract Geological data management and mining are critical areas of modern-day geology research. High throughput and high information content are two important aspects of any geoprocessing application. Geological data mining is efficient and faster if the geological data are indexed, stored and mined on content. A challenge for geological information mining is the distributed nature of the resources. Grid computing has emerged as an important new field in the distributed computing arena. It focuses on intensive resource ...
Future Generation Computer Systems, Jan 1, 2010
Special Issue: Bio-Inspired Optimization Techniques for High Performance Computing
New Generation Computing, Jan 1, 2011
In the last few years, bio-inspired algorithms, mimicking the darwinian evolution or the behavior... more In the last few years, bio-inspired algorithms, mimicking the darwinian evolution or the behavior of ant colonies, flocks of birds, insect swarms, 1) etc.., have emerged as a viable solution to many parallel and distributed computational problems, and they have proved effective especially when distributed systems need adaptive and fault-tolerance properties. 3, 4) Their inherent parallelism and scalability make these kinds of algorithm very suitable for dynamically changing environments and systems, such as Grid Computing, Cloud ...
Nature Inspired Cooperative Strategies for …, Jan 1, 2010
In the last few years, the bio-inspired community has experienced a growing interest in the field... more In the last few years, the bio-inspired community has experienced a growing interest in the field of Swarm Intelligence algorithms applied to real world problems. In spite of the large number of algorithms using this approach, a few methodologies exist for evaluating the properties of self-organizing and the effectiveness in using these kinds of algorithm. This paper presents an entropy-based model that can be used to evaluate self-organizing properties of Swarm Intelligence algorithms and its application to SPARROW-SNN, an adaptive flocking algorithm used for performing approximate clustering. Preliminary experiments, performed on a synthetic and a real-world data set confirm the presence of self-organizing characteristics differently from the classical flocking algorithm.
Proceedings of the …, Jan 1, 2011
In this paper we present a novel approach for combining GP-based ensembles by means of a Bayesian... more In this paper we present a novel approach for combining GP-based ensembles by means of a Bayesian Network. The proposed system is able to effectively learn decision tree ensembles using two different strategies: decision trees ensembles are learned by means of boosted GP algorithm; the responses of the learned ensembles are combined using a Bayesian network, which also implements a selection strategy that reduces the size of the built ensembles.
Genetic Programming and Evolvable …, Jan 1, 2010
A distributed data mining algorithm to improve the detection accuracy when classifying malicious ... more A distributed data mining algorithm to improve the detection accuracy when classifying malicious or unauthorized network activity is presented. The algorithm is based on genetic programming (GP) extended with the ensemble paradigm. GP ensemble is particularly suitable for distributed intrusion detection because it allows to build a network profile by combining different classifiers that together provide complementary information. The main novelty of the algorithm is that data is distributed across multiple autonomous sites and the learner component acquires useful knowledge from this data in a cooperative way. The network profile is then used to predict abnormal behavior. Experiments on the KDD Cup 1999 Data show the capability of genetic programming in successfully dealing with the problem of intrusion detection on distributed data.
Scalable classification of large data sets by parallel genetic programming
Distributed and parallel …, Jan 1, 2000
Google, Inc. (search). ...
Handling Different Categories of Concept Drifts in Data Streams Using Distributed GP
Genetic Programming, Jan 1, 2010
Abstract. Using Genetic Programming (GP) for classifying data streams is problematic as GP is slo... more Abstract. Using Genetic Programming (GP) for classifying data streams is problematic as GP is slow compared with traditional single solution techniques. However, the availability of cheaper and betterperforming distributed and parallel architectures make it possible to deal with complex problems previously hardly solved owing to the large amount of time necessary. This work presents a general framework based on a distributed GP ensemble algorithm for coping with different types of concept drift for the task of classification of large ...
StreamGP: tracking evolving GP ensembles in distributed data streams using fractal dimension
… of the 9th annual conference on …, Jan 1, 2007
Abstract The paper presents an adaptive GP boosting ensemble method forthe classification of dist... more Abstract The paper presents an adaptive GP boosting ensemble method forthe classification of distributed homogeneous streaming data that comes from multiple locations. The approach is able to handle concept drift via change detection by employing a change detection strategy, based on self-similarity of the ensemble behavior, and measured by its fractal dimension. It is efficient since each nodeof the network works with its local streaming data, and communicate only the local model computed with the otherpeer-nodes. ...

Parallel and Distributed …, Jan 1, 2008
The comparison of protein tertiary structures is a key milestone in many structural bioinformatic... more The comparison of protein tertiary structures is a key milestone in many structural bioinformatics activities that rely in comparing very large structure datasets. As the number of proteins in the dataset increases, the corresponding computational time taken by the protein structure comparison algorithms also increases, squarely for an all-againstall comparison and linearly for an all-against-target assessment. Thus ever larger proteomics problems call for the distribution of pairwise comparison jobs in the form of well granulated subsets/packages to be run in parallel on a pool of networked processors/workstations under the coordination of a Message Passing Interface (MPI) environment. This paper evaluates the effect on the performance of such jobs when the MPI environment is integrated with a Local Resource Management System (LRMS) such as Sun Grid Engine (SGE). From our experiments with different ways of integration we draw a comparative picture of all possible approaches with the description of resource usage information for each parallel job on each processor. Understanding of different ways of integration sheds light on the most promising routes for setting up an efficient environment for very large scale protein structure comparisons.
Uploads
Papers by Gianluigi Folino