Papers by Francesco Taglino
Control Engineering and Applied Informatics, Jun 27, 2023
In this work, the problem of evaluating semantic similarity in a taxonomy by relying on the notio... more In this work, the problem of evaluating semantic similarity in a taxonomy by relying on the notion of information content is investigated. In particular, a measure that takes into account not only the generic sense of a concept but also its intended sense in a given context is considered. Such a measure needs a semantic relatedness approach in order to evaluate the relatedness between the generic sense and the intended sense of a concept. In this work, we show that relying on the Linked Data Semantic Distance with Global Normalization leads to higher Spearman's correlation values with human judgment with respect to the original proposal and previous experiments of the authors.

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2018
The recent advances in biotechnology and IT have led to an ever-increasing availability of public... more The recent advances in biotechnology and IT have led to an ever-increasing availability of public biomedical data distributed in large databases. Analyzing this huge volume of data is a challenging task because of its complexity, high heterogeneity and its multiple and numerous correlated factors. In the framework of neurodegenerative diseases, the last years have witnessed the creation of specialized databases such as the international projects ADNI (Alzheimer’s Disease Neuroimaging Initiative). The main problems to fully exploit this database are related to the querying, integration, and analysis of data themselves. Here, we aim to develop a detailed ontology for clinical multidimensional datasets from ADNI repository in order to simplify the data access and to obtain new diagnostic knowledge about Alzheimer’s Disease.
International Journal of Computer Integrated Manufacturing, Jun 1, 2009
This paper presents a semantic-mediation architecture that enables standards-based interoperabili... more This paper presents a semantic-mediation architecture that enables standards-based interoperability between heterogeneous supply-chain applications. The architecture was implemented using a state-of-the-art semantic-mediation toolset for design-time and run-time integration tasks. The design-time tools supported a domain ontology definition, message annotations, message schema transformations, and reconciliation rules specifications. The run-time tools performed exchanges, transformations, and reconciliations of the messages. The architecture supports a supply-chain integration scenario where heterogeneous automotive manufacturing supply-chain applications exchange inventory information.

Semantic search is the new frontier for the search engines of the last generation. Advanced seman... more Semantic search is the new frontier for the search engines of the last generation. Advanced semantic search methods are exploring the use of weighted ontologies, i.e., domain ontologies where concepts are associated with weights, inversely related to their selective power. In this paper, we present and assess four different ontology weighting methods, organized according to two groups: intensional methods, based on the sole ontology structure, and extensional methods, where also the content of the search space is considered. The comparative assessment is carried out by embedding the different methods within the semantic search engine SemSim, based on weighted ontologies, and then by running four retrieval tests over a search space we have previously proposed in the literature. In order to reach a broad audience of readers, the key concepts of this paper have been presented by using a simple taxonomy, and the already experimented dataset.
Information Sciences, Apr 1, 2023

Springer eBooks, 2004
In this chapter the main issues of an Ontology-based platform for semantic interoperability, with... more In this chapter the main issues of an Ontology-based platform for semantic interoperability, with particular attention to the underlying methodology, are illustrated. The solutions presented here have been developed in the context of Harmonise, an IST project aimed at developing an interoperability platform for SMEs in the tourism sector. The illustrated platform is an advanced software solution, based on the use of computational ontologies, aiming at the reconciliation of conceptual, structural, and formatting differences that hamper information exchange. The proposed approach relies on the availability of a domain ontology, used as a semantic reference for cooperating systems. Then we elaborate on semantic clashes, arising when a conceptual local schema is contrasted with the ontology. Semantic clashes are taken into account when the elements of a conceptual local schema are semantically annotated. Semantic annotation is requested to reconcile existing differences of cooperating information systems. Having illustrated the underlying approach, we briefly report on the main phases required to an information system to enter in the Harmonise space (i.e., to acquire interoperability capability) and, finally, on the overall Harmonise platform.

Springer eBooks, 2011
In recent years, the evolution of infrastructures and technologies carried out by emerging paradi... more In recent years, the evolution of infrastructures and technologies carried out by emerging paradigms, such as Cloud Computing, Future Internet and SaaS (Software-as-a-Service), is leading the area of enterprise systems to a progressive, significant transformation process. This evolution is characterized by two aspects: a progressive commoditization of the traditional ES functions, with the 'usual' management and planning of resources, while the challenge is shifted toward the support to enterprise innovation. This process will be accelerated by the advent of FInES (Future Internet Enterprise System) research initiatives, where different scientific disciplines converge, together with empirical practices, engineering techniques and technological solutions. All together they aim at revisiting the development methods and architectures of the Future Enterprise Systems, according to the different articulations that Future Internet Systems (FIS) are assuming, to achieve the Future Internet Enterprise Systems (FInES). In particular, this paper foresees a progressive implementation of a rich, complex, articulated digital world that reflects the real business world, where computational elements, referred to as FInER (Future Internet Enterprise Resources), will directly act and evolve according to what exists in the real world.

Lecture Notes in Computer Science, 2023
To ensure critical infrastructure is operating as expected, high-quality sensors are increasingly... more To ensure critical infrastructure is operating as expected, high-quality sensors are increasingly installed. However, due to the enormous amounts of high-frequency time series they produce, it is impossible or infeasible to transfer or even store these time series in the cloud when using state-of-the-practice compression methods. Thus, simple aggregates, e.g., 1–10-minutes averages, are stored instead of the raw time series. However, by only storing these simple aggregates, informative outliers and fluctuations are lost. Many Time Series Management System (TSMS) have been proposed to efficiently manage time series, but they are generally designed for either the edge or the cloud. In this paper, we describe a new version of the open-source model-based TSMS ModelarDB. The system is designed to be modular and the same binary can be efficiently deployed on the edge and in the cloud. It also supports continuously transferring high-frequency time series compressed using models from the edge to the cloud. We first provide an overview of ModelarDB, analyze the requirements and limitations of the edge, and evaluate existing query engines and data stores for use on the edge. Then, we describe how ModelarDB has been extended to efficiently manage time series on the edge, a novel file-based data store, how ModelarDB’s compression has been improved by not storing time series that can be derived from base time series, and how ModelarDB transfers high-frequency time series from the edge to the cloud. As the work that led to ModelarDB began in 2015, we also reflect on the lessons learned while developing it.

Journal of Web Semantics, Apr 1, 2023
We present the parametric method SemSim p aimed at measuring semantic similarity of digital resou... more We present the parametric method SemSim p aimed at measuring semantic similarity of digital resources. SemSim p is based on the notion of information content, and it leverages a reference ontology and taxonomic reasoning, encompassing different approaches for weighting the concepts of the ontology. In particular, weights can be computed by considering either the available digital resources or the structure of the reference ontology of a given domain. SemSim p is assessed against six representative semantic similarity methods for comparing sets of concepts proposed in the literature, by carrying out an experimentation that includes both a statistical analysis and an expert judgement evaluation. To the purpose of achieving a reliable assessment, we used a real-world large dataset based on the Digital Library of the Association for Computing Machinery (ACM), and a reference ontology derived from the ACM Computing Classification System (ACM-CCS). For each method, we considered two indicators. The first concerns the degree of confidence to identify the similarity among the papers belonging to some special issues selected from the ACM Transactions on Information Systems journal, the second the Pearson correlation with human judgement. The results reveal that one of the configurations of SemSim p outperforms the other assessed methods. An additional experiment performed in the domain of physics shows that, in general, SemSim p provides better results than the other similarity methods. Keywords Semantic similarity reasoning • weighted ontology • information content • statistical analysis • expert judgement • benchmarking.
Computing and Informatics

Transactions on Large-Scale Data- and Knowledge-Centered Systems LIII
To ensure critical infrastructure is operating as expected, high-quality sensors are increasingly... more To ensure critical infrastructure is operating as expected, high-quality sensors are increasingly installed. However, due to the enormous amounts of high-frequency time series they produce, it is impossible or infeasible to transfer or even store these time series in the cloud when using state-of-the-practice compression methods. Thus, simple aggregates, e.g., 1–10-minutes averages, are stored instead of the raw time series. However, by only storing these simple aggregates, informative outliers and fluctuations are lost. Many Time Series Management System (TSMS) have been proposed to efficiently manage time series, but they are generally designed for either the edge or the cloud. In this paper, we describe a new version of the open-source model-based TSMS ModelarDB. The system is designed to be modular and the same binary can be efficiently deployed on the edge and in the cloud. It also supports continuously transferring high-frequency time series compressed using models from the ed...

Background: The recent advances in biotechnology and computer science have led to an ever-increas... more Background: The recent advances in biotechnology and computer science have led to an ever-increasing availability of public biomedical data distributed in large databases worldwide. However, these data collections are far from being “big” enough and “standardized” so to be integrated, making impossible to fully exploit latest machine learning technologies for the analysis of data themselves. Hence, facing this huge flow of biomedical data is a challenging task for researchers and clinicians due to their complexity and high heterogeneity. An effective strategy to address this issue could be the building of a formal conceptual model, which in general allows the design of semantic tools to collect and explore data for a given pathology. This is the case of neurodegenerative diseases and the Alzheimer Disease (AD), in particular. The last years have witnessed the creation of specialized data collections such as the one maintained by the Alzheimer’s Disease Neuroimaging Initiative (ADNI)...

These data refer to the benchmarking of 10 methods for computing semantic similarity applied to 1... more These data refer to the benchmarking of 10 methods for computing semantic similarity applied to 13 golden datasets. The 10 addressed methods are: - Wikipedia Link-based Measure (WLM ) [Witten and Milne, 2008]; - Linked Open Data Description Overlap-based approach (LODDO) [Zhou et al. 2012] - Linked Data Semantic Distance (LDSD) [Passant, 2010] - Linked Data Semantic Distance with Global Normalization (PLDSDGN) [Piao and Breslin, 2016] - Propagated Linked Data Semantic Distance (PLDSD) [Alfarhood et al. 2017] - Information Content-based approach [Schuhmacher and Ponzetto, 2014] - REWOrD [Pirrò, 2012] - Exclusivity-based [Hulpuş et al 2015] - ASRMP [El vaigh et al. 2020] - Proximity-based [Leal, 2013] The 13 golden datasets are: Atlasify240, B0, B1, GM30, MTurk287, R122, RG65, MC30, KORE-IT, KORE-HW, KORE-VG, KORE-TV, KORE-CN In the experimentation, two DBpedia knowledge grapsh have been considered, i.e., with and without the dbo:wikiPageWikiLink links.

This dataset represents the results of the experimentation of a method for evaluating semantic si... more This dataset represents the results of the experimentation of a method for evaluating semantic similarity between concepts in a taxonomy. The method is based on the information-theoretic approach and allows senses of concepts in a given context to be considered. Relevance of senses is calculated in terms of semantic relatedness with the compared concepts. In a previous work [9], the adopted semantic relatedness method was the one described in [10], while in this work we adopted the one described in [11]. This results in an improvement of the method. The dataset is composed of two folders, which contain the results of the previous and the new experimentation, respectively. In particular, in each folder there is a set of files, each referring to one pair of the well-known Miller and Charles benchmark dataset [1] for assessing semantic similarity. For each pair of concepts, the same 28 pairs are all considered as possible different contexts. We applied our proposal by extending 7 metho...

This dataset represents the results of the experimentation of a method for evaluating semantic si... more This dataset represents the results of the experimentation of a method for evaluating semantic similarity between concepts in a taxonomy. The method is based on the information-theoretic approach and allows senses of concepts in a given context to be considered. The dataset is composed of 28 files. Each file refers to one pair of the well-known Miller and Charles [1] benchmark dataset for assessing semantic similarity. For each pair of concepts, the same 28 pairs are all considered as possible different contexts. We applied our proposal by extending 7 methods for computing semantic similarity in a taxonomy, selected from the literature. The methods considered in the experiment are referred to as (R[2], W&P[3], L[4], J&C[5], P&S[6], A[7], A&M[8]): REFERENCES [1] Miller, G.A., Charles, W.G. Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1-28 (1991) [2] Resnik, P. {\em Using Information Content to Evaluate Semantic Similarity in a Taxonomy}. In Pro...
Uploads
Papers by Francesco Taglino