Skip to main content

Helena Deus

DERI, Health Care and Life Sciences, Post-Doc

Followers

201

Following

165

Public Views

Richard Tabor Greene

PUT

Athanassios Tsikliras

Aristotle University of Thessaloniki

Central Research Institute for Dryland Agriculture

Wageningen University

Drexel University

Vieri del Bianco

Università degli Studi dell'Insubria

Kansas State University

Armando Marques-Guedes

UNL - New University of Lisbon

The University of Sheffield

Gwen Robbins Schug

University of North Carolina at Greensboro

InterestsView All (9)

Uploads

Papers by Helena Deus

RPPAML/RIMS: A metadata format and an information management system for reverse phase protein arrays

Background Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the ... more Background Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest.

SemScape: Visualizating Semantic Web Data Landscapes

SemScape is a plugin for the popular network biology software Cytoscape [1] that allows user inte... more SemScape is a plugin for the popular network biology software Cytoscape [1] that allows user interaction and queries over remote sparql endpoints. In particular it allows visualization of sparql queries' results, interactive navigation of RDF graphs and reconstuction of semantic data landscapes. Cytoscape is a network visualization and analysis tool that, while inherently domain independent, provides links and shortcuts to typical biomedical datasets and manipulations.

SemScape: Visualizating Semantic Web Data Landscapes with Cytoscape 3.0

Abstract. Core to the success of applying Semantic Web technologies (SWT) towards supporting Life... more Abstract. Core to the success of applying Semantic Web technologies (SWT) towards supporting Life Sciences research is the availability of tools that lower the entry barrier for adoption by biomedical researchers. Researchers need to easily and intuitively exploit and query the wealth of data that is available behind as SPARQL endpoints. Here, we present SemScape, a semantic-web enabled plugin for the popular network biology software Cytoscape.

Emerging practices for mapping and linking life sciences data using RDF—A case series

Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variet... more Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variety of genomic and drug-related data sets as Resource Description Framework (RDF) triples. This experience has helped the interest group define a general data workflow for mapping health care and life science (HCLS) data to RDF and linking it with other Linked Data sources.

Cataloguing and Linking Life Sciences LOD Cloud

Abstract. The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple dat... more Abstract. The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple datasets that add high value to biomedical research. The ability to navigate through these datasets in order to derive and discover new meaningful biological correlations is considered one of the most significant resources for supporting clinical decision making.

Provenance of Microarray Experiments for a Better Understanding of Experiment Results

Abstract—This paper describes a Semantic Web (SW) model for gene lists and the metadata required ... more Abstract—This paper describes a Semantic Web (SW) model for
gene lists and the metadata required for their practical
interpretation. Our provenance information captures the context
of experiments as well as the processing and analysis parameters
involved in deriving the gene lists from DNA microarray
experiments. We demonstrate a range of practical neuroscience
queries which draw on the proposed model. Our provenance
representation includes the origins of the gene list and basic
information about the data set itself (e.g. last modification date
and original data source), in order to facilitate the federation of
gene lists with other types of Semantic Web-formatted data and
include the integration of a broader molecular context through
additional omics data.

S3QL: A distributed domain specific language for controlled semantic integration of life sciences data

Background The value and usefulness of data increases when it is explicitly interlinked with rela... more Background
The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.

We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.

Results
Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.

Conclusions
S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

Provenance of Microarray Experiments for a Better Understanding of Experiment Result

AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services

Background AGUIA is a front-end web application originally developed to manage clinical, demograp... more Background
AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.

Methods
The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.

Results
We identified an autonomous set of descriptors, the GBox, that provides user and domain specifications for the graphical user interface. This was achieved by identifying a formalism that makes use of an RDF schema to enable the automatic assembly of graphical user interfaces in a meaningful manner while using only resources native to the client web browser (JavaScript interpreter, document object model). We defined a generalized RDF model such that changes in the graphic descriptors are automatically and immediately (locally) reflected into the configuration of the client's interface application.

Exposing the cancer genome atlas as a SPARQL endpoint.

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize... more The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source.

DASMiner: discovering and integrating data from DAS sources

Background DAS is a widely adopted protocol for providing syntactic interoperability among biolog... more Background
DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources.

Results
We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources.

The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served.

The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API.

Conclusion
The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories.

RPPAML/RIMS: A metadata format and an information management system for reverse phase protein arrays

Background Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the ... more Background Reverse Phase Protein Arrays (RPPA) are convenient assay platforms to investigate the presence of biomarkers in tissue lysates. As with other high-throughput technologies, substantial amounts of analytical data are generated. Over 1000 samples may be printed on a single nitrocellulose slide. Up to 100 different proteins may be assessed using immunoperoxidase or immunoflorescence techniques in order to determine relative amounts of protein expression in the samples of interest.

SemScape: Visualizating Semantic Web Data Landscapes

SemScape is a plugin for the popular network biology software Cytoscape [1] that allows user inte... more SemScape is a plugin for the popular network biology software Cytoscape [1] that allows user interaction and queries over remote sparql endpoints. In particular it allows visualization of sparql queries' results, interactive navigation of RDF graphs and reconstuction of semantic data landscapes. Cytoscape is a network visualization and analysis tool that, while inherently domain independent, provides links and shortcuts to typical biomedical datasets and manipulations.

SemScape: Visualizating Semantic Web Data Landscapes with Cytoscape 3.0

Abstract. Core to the success of applying Semantic Web technologies (SWT) towards supporting Life... more Abstract. Core to the success of applying Semantic Web technologies (SWT) towards supporting Life Sciences research is the availability of tools that lower the entry barrier for adoption by biomedical researchers. Researchers need to easily and intuitively exploit and query the wealth of data that is available behind as SPARQL endpoints. Here, we present SemScape, a semantic-web enabled plugin for the popular network biology software Cytoscape.

Emerging practices for mapping and linking life sciences data using RDF—A case series

Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variet... more Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variety of genomic and drug-related data sets as Resource Description Framework (RDF) triples. This experience has helped the interest group define a general data workflow for mapping health care and life science (HCLS) data to RDF and linking it with other Linked Data sources.

Cataloguing and Linking Life Sciences LOD Cloud

Abstract. The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple dat... more Abstract. The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple datasets that add high value to biomedical research. The ability to navigate through these datasets in order to derive and discover new meaningful biological correlations is considered one of the most significant resources for supporting clinical decision making.

Provenance of Microarray Experiments for a Better Understanding of Experiment Results

Abstract—This paper describes a Semantic Web (SW) model for gene lists and the metadata required ... more Abstract—This paper describes a Semantic Web (SW) model for
gene lists and the metadata required for their practical
interpretation. Our provenance information captures the context
of experiments as well as the processing and analysis parameters
involved in deriving the gene lists from DNA microarray
experiments. We demonstrate a range of practical neuroscience
queries which draw on the proposed model. Our provenance
representation includes the origins of the gene list and basic
information about the data set itself (e.g. last modification date
and original data source), in order to facilitate the federation of
gene lists with other types of Semantic Web-formatted data and
include the integration of a broader molecular context through
additional omics data.

S3QL: A distributed domain specific language for controlled semantic integration of life sciences data

Background The value and usefulness of data increases when it is explicitly interlinked with rela... more Background
The value and usefulness of data increases when it is explicitly interlinked with related data. This is the core principle of Linked Data. For life sciences researchers, harnessing the power of Linked Data to improve biological discovery is still challenged by a need to keep pace with rapidly evolving domains and requirements for collaboration and control as well as with the reference semantic web ontologies and standards. Knowledge organization systems (KOSs) can provide an abstraction for publishing biological discoveries as Linked Data without complicating transactions with contextual minutia such as provenance and access control.

We have previously described the Simple Sloppy Semantic Database (S3DB) as an efficient model for creating knowledge organization systems using Linked Data best practices with explicit distinction between domain and instantiation and support for a permission control mechanism that automatically migrates between the two. In this report we present a domain specific language, the S3DB query language (S3QL), to operate on its underlying core model and facilitate management of Linked Data.

Results
Reflecting the data driven nature of our approach, S3QL has been implemented as an application programming interface for S3DB systems hosting biomedical data, and its syntax was subsequently generalized beyond the S3DB core model. This achievement is illustrated with the assembly of an S3QL query to manage entities from the Simple Knowledge Organization System. The illustrative use cases include gastrointestinal clinical trials, genomic characterization of cancer by The Cancer Genome Atlas (TCGA) and molecular epidemiology of infectious diseases.

Conclusions
S3QL was found to provide a convenient mechanism to represent context for interoperation between public and private datasets hosted at biomedical research institutions and linked data formalisms.

Provenance of Microarray Experiments for a Better Understanding of Experiment Result

AGUIA: autonomous graphical user interface assembly for clinical trials semantic data services

Background AGUIA is a front-end web application originally developed to manage clinical, demograp... more Background
AGUIA is a front-end web application originally developed to manage clinical, demographic and biomolecular patient data collected during clinical trials at MD Anderson Cancer Center. The diversity of methods involved in patient screening and sample processing generates a variety of data types that require a resource-oriented architecture to capture the associations between the heterogeneous data elements. AGUIA uses a semantic web formalism, resource description framework (RDF), and a bottom-up design of knowledge bases that employ the S3DB tool as the starting point for the client's interface assembly.

Methods
The data web service, S3DB, meets the necessary requirements of generating the RDF and of explicitly distinguishing the description of the domain from its instantiation, while allowing for continuous editing of both. Furthermore, it uses an HTTP-REST protocol, has a SPARQL endpoint, and has open source availability in the public domain, which facilitates the development and dissemination of this application. However, S3DB alone does not address the issue of representing content in a form that makes sense for domain experts.

Results
We identified an autonomous set of descriptors, the GBox, that provides user and domain specifications for the graphical user interface. This was achieved by identifying a formalism that makes use of an RDF schema to enable the automatic assembly of graphical user interfaces in a meaningful manner while using only resources native to the client web browser (JavaScript interpreter, document object model). We defined a generalized RDF model such that changes in the graphic descriptors are automatically and immediately (locally) reflected into the configuration of the client's interface application.

Exposing the cancer genome atlas as a SPARQL endpoint.

The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize... more The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to characterize several types of cancer. Datasets from biomedical domains such as TCGA present a particularly challenging task for those interested in dynamically aggregating its results because the data sources are typically both heterogeneous and distributed. The Linked Data best practices offer a solution to integrate and discover data with those characteristics, namely through exposure of data as Web services supporting SPARQL, the Resource Description Framework query language. Most SPARQL endpoints, however, cannot easily be queried by data experts. Furthermore, exposing experimental data as SPARQL endpoints remains a challenging task because, in most cases, data must first be converted to Resource Description Framework triples. In line with those requirements, we have developed an infrastructure to expose clinical, demographic and molecular data elements generated by TCGA as a SPARQL endpoint by assigning elements to entities of the Simple Sloppy Semantic Database (S3DB) management model. All components of the infrastructure are available as independent Representational State Transfer (REST) Web services to encourage reusability, and a simple interface was developed to automatically assemble SPARQL queries by navigating a representation of the TCGA domain. A key feature of the proposed solution that greatly facilitates assembly of SPARQL queries is the distinction between the TCGA domain descriptors and data elements. Furthermore, the use of the S3DB management model as a mediator enables queries to both public and protected data without the need for prior submission to a single data source.

DASMiner: discovering and integrating data from DAS sources

Background DAS is a widely adopted protocol for providing syntactic interoperability among biolog... more Background
DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources.

Results
We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources.

The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served.

The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API.

Conclusion
The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories.