Skip to main content

Lucas Zamboulis

Followers

8

Following

1

Co-authors

2

Public Views

Richard Tabor Greene

PUT

Université Laval

University of Poitiers

INSA Lyon

Skandar Basrour

Université Grenoble Alpes

University of Zurich, Switzerland

University of Eastern Finland

North-West University

Nimrat Chatterjee

University of Vermont

Armando Marques-Guedes

UNL - New University of Lisbon

Interests

Uploads

Papers by Lucas Zamboulis

XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision

Technical Report XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision

Flexible data integration and ontology-based data access to medical records

2008 8th Ieee International Conference on Bioinformatics and Bioengineering, Oct 1, 2008

The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.

Processing IQL queries and migrating data in the automed toolkit

Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulo... more Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulovassilis, Lucas Zamboulis School of Computer Science and ... female>>; {fl,n} <- distinct <<person,name>>; (= f fl] and the names of females working in the 'CS-BBK' department: [n ...

XML Data Integration by Graph Restructuring

Bncod, 2004

This paper describes the integration of XML data sources within the AutoMed heterogeneous data in... more This paper describes the integration of XML data sources within the AutoMed heterogeneous data integration system. The paper presents a description of the overall framework, as well as an overview of and comparison with related work and implemented solutions by other researchers. The main contribution of this research is an algorithm for the integration of XML data sources, based on graph restructuring of their schemas.

$Fig. 3. XML DataSource Schema integration Bottom-up approach: In this approach, a global schema G'S is not present and is produced automatically from the source schemas, without loss of informa- tion. A slightly different version of the above schema transformation algorithm is applied to the data source schemas in a pairwise fashion, in order to incre- mentally derive each one’s union-compatible schema (Figure 3). The data source schemas S; are first transformed into intermediate schemas, [.$;. Then, the union schemas, US;, are produced along with the id transformations. To start with, the intermediate schema of the first data source schema is itself, S$, = I'S}. Then, the schema transformation algorithm is employed on I'S} and $2 (see annota- tion 1 in Figure 3) The algorithm augments [$}{ with the constructs from $2 it does not contain. It also restructures S2 to match the structure of IS}, also augmenting it with the constructs from I$} it does not contain. As a result, I'S} is transformed to I'$?, while S2 is transformed to [S3. The same process is performed between [Sj and $3, resulting in the creation of [S3 and I'S} (an- notation 2). The algorithm is then applied between J.$? and 93, resulting only in the creation of 9}, since this time [.$? does not have any constructs [93 does not contain (annotation 3). The remaining intermediate schemas are gen- erated in the same manner: to produce schema I,$;, the schema transformation algorithm is employed on [S}_, and Sj, resulting in the creation of [$?_, and IS}; all other intermediate schemas except [$?_, and IS} are then extended with the constructs of 5; they do not contain. Finally, we automatically gen- erate the union schemas, US;, the id transformations between them, and the global schema (by using append semantics).$

A Uniform Approach to Workflow and Data Integration

Data integration in the life sciences requires resolution of conflicts arising from the heterogen... more Data integration in the life sciences requires resolution of conflicts arising from the heterogeneity of data resources and from incompatibilities between the inputs and outputs of services used in the analysis of the resources. This paper presents an approach that addresses these problems in a uniform way. We present results from the application of our approach for the integration of

Ontology-Assisted Data Transformation and Integration

Ontologies-Based Databases and Information Systems, 2008

Schema-based data transformation and integration (DTI) has been an active research area for some ... more Schema-based data transformation and integration (DTI) has been an active research area for some time, while more recent advances in ontologies have led to signiflcant research in ontology-based DTI. These two approaches present some overlaps and some difierences, and in this paper we inves- tigate possible synergies between them. In particular, we show how ontologies can enhance schema-based DTI ap-

Information Sharing for the Semantic Web -a Schema Transformation Approach

Conference on Advanced Information Systems Engineering, 2006

This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources... more This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources, making use of known cor- respondences from them to ontologies expressed in the form of RDFS schemas. The paper flrst illustrates how correspondences to a single on- tology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies,

Using AutoMed for XML data transformation and integration

Data Integration over the Web, 2004

This paper describes how the AutoMed data integration system is be- ing extended to support the i... more This paper describes how the AutoMed data integration system is be- ing extended to support the integration of heterogeneous XML documents. So far, the contributions of this research have been the development of two algorithms. One restructures the schema describing an XML document into another schema, and the other materialises an integrated schema resulting from the transformation of several source

Bioinformatics Service Reconciliation by Heterogeneous Schema Transformation

Lecture Notes in Computer Science, 2007

This paper focuses on the problem of bioinformatics service reconciliation in a generic and scala... more This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services.

Flexible data integration and ontology-based data access to medical records

2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.

Data Access and Integration in the ISPIDER Proteomics Grid

Lecture Notes in Computer Science, 2006

Grid computing has great potential for supporting the integration of complex, fast changing biolo... more Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.

Information Sharing for the Semantic Web | a Schema Transformation Approach Version 3

This paper proposes a framework for transforming and integrating heterogeneous XML data sources, ... more This paper proposes a framework for transforming and integrating heterogeneous XML data sources, making use of known correspondences from them to ontologies expressed in the form of RDFS schemas. Our algorithms generate schema transformation/integration rules which are implemented in the Au- toMed heterogeneous data integration system. The paper flrst illustrates how correspondences to a single ontology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies, themselves interconnected via schema transformation rules. The con- tribution of this research is a set of automatic, XML-speciflc algorithms for the transformation and integration of XML data, making use of RDFS ontologies as a 'semantic bridge'.

Processing IQL queries in the AutoMed toolkit Version 1.2

This technical report gives an outline of the IQL query language used within the AutoMed heteroge... more

Processing IQL queries and migrating data in the AutoMed toolkit

… técnico, AutoMed Project, 2006

Abstract: representation of IQLThe string representations of IQL queries must be parsed to create... more Abstract: representation of IQLThe string representations of IQL queries must be parsed to create an abstractsyntax tree representation. A binary tree representation is used for the latter.These trees represent repeated function applications. All non-leaf cells are eitherapply cells (@) or lambda cells (). An apply cell represents the left child beingapplied to the rightchild. For example:;\Psi@;\Psi@@R2(+) 1This tree represents the

ISPIDER Central: an integrated database web-server for proteomics

Nucleic Acids Research, 2008

Query Processing and Optimisation in Integrated Heterogeneous Grid Resources

The performance of Grid computing technologies for distributed data access and query processing h... more The performance of Grid computing technologies for distributed data access and query processing has been investigated in a number of studies. However, difierent Grid data sources may have schema con∞icts which re- quire flne-grained resolution through the use of data integration technolo- gies that are not supported by the current generation of Grid data access and querying middleware. This is particularly the case with distributed querying and analysis of complex data sources as found in Life Sciences applications. The query performance of architectures that combine Grid data access and query processing capabilities with data integration tech- nologies has not been investigated to date. In this paper, we investigate architectural, optimisation and perfor- mance issues arising from the coupling of Grid data access and distributed querying together with data integration technologies. Speciflcally, we in- vestigate these issues for the OGSA-DAI and OGSA-DQP open-source Grid access and...

Proteome data integration: Characteristics and challenges

by Lucas Zamboulis and Khalid Belhajjame

Proceedings of UK e-ScienceAll Hands Meeting, 2005

The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that... more The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that supports bioinformaticians in constructing, executing and evaluating in silico analyses of proteomics data. It will be constructed using a combination of generic e-science and Grid technologies, plus proteomics specific components and clients that embody knowledge of the proteomics domain and the available resources. In this paper, we describe some of our earlier results in prototyping specific examples of proteomics data integration, ...

Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources

Grid data sources may have schema-and data-level conflicts that need to be addressed using data t... more Grid data sources may have schema-and data-level conflicts that need to be addressed using data transformation and integration technologies not supported by the current generation of Grid data access and querying middleware. We present an architecture that combines Grid data access and distributed querying with fine-grained data transformation/integration technologies, and the results of a query performance evaluation on this architecture. The performance evaluation indicates that it is indeed feasible to combine such technologies while achieving acceptable query performance. We also discuss the significance of our results for the further development of query performance over heterogeneous Grid data sources.

Cluster Based Integration of Heterogeneous Biological Databases Using the AutoMed Toolkit

by Galia Rimon and Lucas Zamboulis

Lecture Notes in Computer Science, 2005

This paper presents an extensible architecture that can be used to support the integration of het... more This paper presents an extensible architecture that can be used to support the integration of heterogeneous biological data sets. In our architecture, a clustering approach has been developed to support distributed biological data sources with inconsistent identification of biological objects. The architecture uses the AutoMed data integration toolkit to store the schemas of the data sources and the semiautomatically generated transformations from the source data into the data of an integrated warehouse. AutoMed supports bi-directional, extensible transformations which can be used to update the warehouse data as entities change, are added, or are deleted in the data sources. The transformations can also be used to support the addition or removal of entire data sources, or evolutions in the schemas of the data sources or of the warehouse itself. The results of using the architecture for the integration of existing genomic data sets are discussed.

XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision

Technical Report XML Schema Matching & XML Data Migration & Integration: A Step Towards The Semantic Web Vision

Flexible data integration and ontology-based data access to medical records

2008 8th Ieee International Conference on Bioinformatics and Bioengineering, Oct 1, 2008

The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.

Processing IQL queries and migrating data in the automed toolkit

Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulo... more Page 1. Processing IQL Queries and Migrating Data in the AutoMed toolkit Edgar Jasper, Alex Poulovassilis, Lucas Zamboulis School of Computer Science and ... female>>; {fl,n} <- distinct <<person,name>>; (= f fl] and the names of females working in the 'CS-BBK' department: [n ...

XML Data Integration by Graph Restructuring

Bncod, 2004

This paper describes the integration of XML data sources within the AutoMed heterogeneous data in... more This paper describes the integration of XML data sources within the AutoMed heterogeneous data integration system. The paper presents a description of the overall framework, as well as an overview of and comparison with related work and implemented solutions by other researchers. The main contribution of this research is an algorithm for the integration of XML data sources, based on graph restructuring of their schemas.

$Fig. 3. XML DataSource Schema integration Bottom-up approach: In this approach, a global schema G'S is not present and is produced automatically from the source schemas, without loss of informa- tion. A slightly different version of the above schema transformation algorithm is applied to the data source schemas in a pairwise fashion, in order to incre- mentally derive each one’s union-compatible schema (Figure 3). The data source schemas S; are first transformed into intermediate schemas, [.$;. Then, the union schemas, US;, are produced along with the id transformations. To start with, the intermediate schema of the first data source schema is itself, S$, = I'S}. Then, the schema transformation algorithm is employed on I'S} and $2 (see annota- tion 1 in Figure 3) The algorithm augments [$}{ with the constructs from $2 it does not contain. It also restructures S2 to match the structure of IS}, also augmenting it with the constructs from I$} it does not contain. As a result, I'S} is transformed to I'$?, while S2 is transformed to [S3. The same process is performed between [Sj and $3, resulting in the creation of [S3 and I'S} (an- notation 2). The algorithm is then applied between J.$? and 93, resulting only in the creation of 9}, since this time [.$? does not have any constructs [93 does not contain (annotation 3). The remaining intermediate schemas are gen- erated in the same manner: to produce schema I,$;, the schema transformation algorithm is employed on [S}_, and Sj, resulting in the creation of [$?_, and IS}; all other intermediate schemas except [$?_, and IS} are then extended with the constructs of 5; they do not contain. Finally, we automatically gen- erate the union schemas, US;, the id transformations between them, and the global schema (by using append semantics).$

A Uniform Approach to Workflow and Data Integration

Data integration in the life sciences requires resolution of conflicts arising from the heterogen... more Data integration in the life sciences requires resolution of conflicts arising from the heterogeneity of data resources and from incompatibilities between the inputs and outputs of services used in the analysis of the resources. This paper presents an approach that addresses these problems in a uniform way. We present results from the application of our approach for the integration of

Ontology-Assisted Data Transformation and Integration

Ontologies-Based Databases and Information Systems, 2008

Schema-based data transformation and integration (DTI) has been an active research area for some ... more Schema-based data transformation and integration (DTI) has been an active research area for some time, while more recent advances in ontologies have led to signiflcant research in ontology-based DTI. These two approaches present some overlaps and some difierences, and in this paper we inves- tigate possible synergies between them. In particular, we show how ontologies can enhance schema-based DTI ap-

Information Sharing for the Semantic Web -a Schema Transformation Approach

Conference on Advanced Information Systems Engineering, 2006

This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources... more This paper proposes a framework for transforming and in- tegrating heterogeneous XML data sources, making use of known cor- respondences from them to ontologies expressed in the form of RDFS schemas. The paper flrst illustrates how correspondences to a single on- tology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies,

Using AutoMed for XML data transformation and integration

Data Integration over the Web, 2004

This paper describes how the AutoMed data integration system is be- ing extended to support the i... more This paper describes how the AutoMed data integration system is be- ing extended to support the integration of heterogeneous XML documents. So far, the contributions of this research have been the development of two algorithms. One restructures the schema describing an XML document into another schema, and the other materialises an integrated schema resulting from the transformation of several source

Bioinformatics Service Reconciliation by Heterogeneous Schema Transformation

Lecture Notes in Computer Science, 2007

This paper focuses on the problem of bioinformatics service reconciliation in a generic and scala... more This paper focuses on the problem of bioinformatics service reconciliation in a generic and scalable manner so as to enhance interoperability in a highly evolving field. Using XML as a common representation format, but also supporting existing flat-file representation formats, we propose an approach for the scalable semi-automatic reconciliation of services, possibly invoked from within a scientific workflows tool. Service reconciliation may use the AutoMed heterogeneous data integration system as an intermediary service, or may use AutoMed to produce services that mediate between services. We discuss the application of our approach for the reconciliation of services in an example bioinformatics workflow. The main contribution of this research is an architecture for the scalable reconciliation of bioinformatics services.

Flexible data integration and ontology-based data access to medical records

2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008

The ASSIST project aims to facilitate cervical cancer research by integrating medical records con... more The ASSIST project aims to facilitate cervical cancer research by integrating medical records containing both phenotypic and genotypic data, and residing in different medical centres or hospitals. The goal of ASSIST is to enable the evaluation of medical hypotheses and the conduct of association studies in an intuitive manner, thereby allowing medical researchers to identify risk factors that can then be used at the point of care to identify women who are at high risk of developing cervical cancer.

Data Access and Integration in the ISPIDER Proteomics Grid

Lecture Notes in Computer Science, 2006

Grid computing has great potential for supporting the integration of complex, fast changing biolo... more Grid computing has great potential for supporting the integration of complex, fast changing biological data repositories to enable distributed data analysis. One scenario where Grid computing has such potential is provided by proteomics resources which are rapidly being developed with the emergence of affordable, reliable methods to study the proteome. The protein identifications arising from these methods derive from multiple repositories which need to be integrated to enable uniform access to them. A number of technologies exist which enable these resources to be accessed in a Grid environment, but the independent development of these resources means that significant data integration challenges, such as heterogeneity and schema evolution, have to be met. This paper presents an architecture which supports the combined use of Grid data access (OGSA-DAI), Grid distributed querying (OGSA-DQP) and data integration (AutoMed) software tools to support distributed data analysis. We discuss the application of this architecture for the integration of several autonomous proteomics data resources.

Information Sharing for the Semantic Web | a Schema Transformation Approach Version 3

This paper proposes a framework for transforming and integrating heterogeneous XML data sources, ... more This paper proposes a framework for transforming and integrating heterogeneous XML data sources, making use of known correspondences from them to ontologies expressed in the form of RDFS schemas. Our algorithms generate schema transformation/integration rules which are implemented in the Au- toMed heterogeneous data integration system. The paper flrst illustrates how correspondences to a single ontology can be exploited. The approach is then extended to the case where correspondences may refer to multiple ontologies, themselves interconnected via schema transformation rules. The con- tribution of this research is a set of automatic, XML-speciflc algorithms for the transformation and integration of XML data, making use of RDFS ontologies as a 'semantic bridge'.

Processing IQL queries in the AutoMed toolkit Version 1.2

This technical report gives an outline of the IQL query language used within the AutoMed heteroge... more

Processing IQL queries and migrating data in the AutoMed toolkit

… técnico, AutoMed Project, 2006

Abstract: representation of IQLThe string representations of IQL queries must be parsed to create... more Abstract: representation of IQLThe string representations of IQL queries must be parsed to create an abstractsyntax tree representation. A binary tree representation is used for the latter.These trees represent repeated function applications. All non-leaf cells are eitherapply cells (@) or lambda cells (). An apply cell represents the left child beingapplied to the rightchild. For example:;\Psi@;\Psi@@R2(+) 1This tree represents the

ISPIDER Central: an integrated database web-server for proteomics

Nucleic Acids Research, 2008

Query Processing and Optimisation in Integrated Heterogeneous Grid Resources

The performance of Grid computing technologies for distributed data access and query processing h... more The performance of Grid computing technologies for distributed data access and query processing has been investigated in a number of studies. However, difierent Grid data sources may have schema con∞icts which re- quire flne-grained resolution through the use of data integration technolo- gies that are not supported by the current generation of Grid data access and querying middleware. This is particularly the case with distributed querying and analysis of complex data sources as found in Life Sciences applications. The query performance of architectures that combine Grid data access and query processing capabilities with data integration tech- nologies has not been investigated to date. In this paper, we investigate architectural, optimisation and perfor- mance issues arising from the coupling of Grid data access and distributed querying together with data integration technologies. Speciflcally, we in- vestigate these issues for the OGSA-DAI and OGSA-DQP open-source Grid access and...

Proteome data integration: Characteristics and challenges

by Lucas Zamboulis and Khalid Belhajjame

Proceedings of UK e-ScienceAll Hands Meeting, 2005

The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that... more The aim of the ISPIDER project is to create a proteomics grid; that is, a technical platform that supports bioinformaticians in constructing, executing and evaluating in silico analyses of proteomics data. It will be constructed using a combination of generic e-science and Grid technologies, plus proteomics specific components and clients that embody knowledge of the proteomics domain and the available resources. In this paper, we describe some of our earlier results in prototyping specific examples of proteomics data integration, ...

Query performance evaluation of an architecture for fine-grained integration of heterogeneous grid data sources

Grid data sources may have schema-and data-level conflicts that need to be addressed using data t... more Grid data sources may have schema-and data-level conflicts that need to be addressed using data transformation and integration technologies not supported by the current generation of Grid data access and querying middleware. We present an architecture that combines Grid data access and distributed querying with fine-grained data transformation/integration technologies, and the results of a query performance evaluation on this architecture. The performance evaluation indicates that it is indeed feasible to combine such technologies while achieving acceptable query performance. We also discuss the significance of our results for the further development of query performance over heterogeneous Grid data sources.

Cluster Based Integration of Heterogeneous Biological Databases Using the AutoMed Toolkit

by Galia Rimon and Lucas Zamboulis

Lecture Notes in Computer Science, 2005

This paper presents an extensible architecture that can be used to support the integration of het... more This paper presents an extensible architecture that can be used to support the integration of heterogeneous biological data sets. In our architecture, a clustering approach has been developed to support distributed biological data sources with inconsistent identification of biological objects. The architecture uses the AutoMed data integration toolkit to store the schemas of the data sources and the semiautomatically generated transformations from the source data into the data of an integrated warehouse. AutoMed supports bi-directional, extensible transformations which can be used to update the warehouse data as entities change, are added, or are deleted in the data sources. The transformations can also be used to support the addition or removal of entire data sources, or evolutions in the schemas of the data sources or of the warehouse itself. The results of using the architecture for the integration of existing genomic data sets are discussed.