Papers by Lucas Zamboulis
XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision
Technical Report XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision
2008 8th Ieee International Conference on Bioinformatics and Bioengineering, Oct 1, 2008
The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.
Processing IQL queries and migrating data in the automed toolkit
Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulo... more Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulovassilis, Lucas Zamboulis School of Computer Science and ... female>>; {fl,n} <- distinct <<person,name>>; (= f fl] and the names of females working in the 'CS-BBK' department: [n ...
Bncod, 2004
This paper describes the integration of XML data sources within the AutoMed heterogeneous data in... more This paper describes the integration of XML data sources within the AutoMed heterogeneous data integration system. The paper presents a description of the overall framework, as well as an overview of and comparison with related work and implemented solutions by other researchers. The main contribution of this research is an algorithm for the integration of XML data sources, based on graph restructuring of their schemas.
Data integration in the life sciences requires resolution of conflicts arising from the heterogen... more Data integration in the life sciences requires resolution of conflicts arising from the heterogeneity of data resources and from incompatibilities between the inputs and outputs of services used in the analysis of the resources. This paper presents an approach that addresses these problems in a uniform way. We present results from the application of our approach for the integration of
Ontologies-Based Databases and Information Systems, 2008
Schema-based data transformation and integration (DTI) has been an active research area for some ... more Schema-based data transformation and integration (DTI) has been an active research area for some time, while more recent advances in ontologies have led to signiflcant research in ontology-based DTI. These two approaches present some overlaps and some difierences, and in this paper we inves- tigate possible synergies between them. In particular, we show how ontologies can enhance schema-based DTI ap-
Conference on Advanced Information Systems Engineering, 2006
This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources... more This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources, making use of known cor- respondences from them to ontologies expressed in the form of RDFS schemas. The paper flrst illustrates how correspondences to a single on- tology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies,
Data Integration over the Web, 2004
This paper describes how the AutoMed data integration system is be- ing extended to support the i... more This paper describes how the AutoMed data integration system is be- ing extended to support the integration of heterogeneous XML documents. So far, the contributions of this research have been the development of two algorithms. One restructures the schema describing an XML document into another schema, and the other materialises an integrated schema resulting from the transformation of several source
Lecture Notes in Computer Science, 2007
This paper focuses on the problem of bioinformatics service reconciliation in a generic and scala... more This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services.
2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008
The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.

Lecture Notes in Computer Science, 2006
Grid computing has great potential for supporting the integration of complex, fast changing biolo... more Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.
This paper proposes a framework for transforming and integrating heterogeneous XML data sources, ... more This paper proposes a framework for transforming and integrating heterogeneous XML data sources, making use of known correspondences from them to ontologies expressed in the form of RDFS schemas. Our algorithms generate schema transformation/integration rules which are implemented in the Au- toMed heterogeneous data integration system. The paper flrst illustrates how correspondences to a single ontology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies, themselves interconnected via schema transformation rules. The con- tribution of this research is a set of automatic, XML-speciflc algorithms for the transformation and integration of XML data, making use of RDFS ontologies as a 'semantic bridge'.
This technical report gives an outline of the IQL query language used within the AutoMed heteroge... more This technical report gives an outline of the IQL query language used within the AutoMed heterogeneous data integration system, and describes query processing in AutoMed. This report aims to serve as a guide to the query processing components of the AutoMed toolkit ...
Processing IQL queries and migrating data in the AutoMed toolkit
… técnico, AutoMed Project, 2006
Abstract: representation of IQLThe string representations of IQL queries must be parsed to create... more Abstract: representation of IQLThe string representations of IQL queries must be parsed to create an abstractsyntax tree representation. A binary tree representation is used for the latter.These trees represent repeated function applications. All non-leaf cells are eitherapply cells (@) or lambda cells (). An apply cell represents the left child beingapplied to the rightchild. For example:;\Psi@;\Psi@@R2(+) 1This tree represents the
Nucleic Acids Research, 2008

The performance of Grid computing technologies for distributed data access and query processing h... more The performance of Grid computing technologies for distributed data access and query processing has been investigated in a number of studies. However, difierent Grid data sources may have schema con∞icts which re- quire flne-grained resolution through the use of data integration technolo- gies that are not supported by the current generation of Grid data access and querying middleware. This is particularly the case with distributed querying and analysis of complex data sources as found in Life Sciences applications. The query performance of architectures that combine Grid data access and query processing capabilities with data integration tech- nologies has not been investigated to date. In this paper, we investigate architectural, optimisation and perfor- mance issues arising from the coupling of Grid data access and distributed querying together with data integration technologies. Speciflcally, we in- vestigate these issues for the OGSA-DAI and OGSA-DQP open-source Grid access and...
Proceedings of UK e-ScienceAll Hands Meeting, 2005
The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that... more The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that supports bioinformaticians in constructing, executing and evaluating in silico analyses of proteomics data. It will be constructed using a combination of generic e-science and Grid technologies, plus proteomics specific components and clients that embody knowledge of the proteomics domain and the available resources. In this paper, we describe some of our earlier results in prototyping specific examples of proteomics data integration, ...
Grid data sources may have schema-and data-level conflicts that need to be addressed using data t... more Grid data sources may have schema-and data-level conflicts that need to be addressed using data transformation and integration technologies not supported by the current generation of Grid data access and querying middleware. We present an architecture that combines Grid data access and distributed querying with fine-grained data transformation/integration technologies, and the results of a query performance evaluation on this architecture. The performance evaluation indicates that it is indeed feasible to combine such technologies while achieving acceptable query performance. We also discuss the significance of our results for the further development of query performance over heterogeneous Grid data sources.

Lecture Notes in Computer Science, 2005
This paper presents an extensible architecture that can be used to support the integration of het... more This paper presents an extensible architecture that can be used to support the integration of heterogeneous biological data sets. In our architecture, a clustering approach has been developed to support distributed biological data sources with inconsistent identification of biological objects. The architecture uses the AutoMed data integration toolkit to store the schemas of the data sources and the semiautomatically generated transformations from the source data into the data of an integrated warehouse. AutoMed supports bi-directional, extensible transformations which can be used to update the warehouse data as entities change, are added, or are deleted in the data sources. The transformations can also be used to support the addition or removal of entire data sources, or evolutions in the schemas of the data sources or of the warehouse itself. The results of using the architecture for the integration of existing genomic data sets are discussed.
Uploads
Papers by Lucas Zamboulis