Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Lecture Notes in Computer Science
As Open Data becomes commonplace, methods are needed to integrate disparate data from a variety of sources. Although Linked Data design has promise for integrating world wide data, integrators often struggle to provide appropriate transparency for their sources and transformations. Without this transparency, cautious consumers are unlikely to find enough information to allow them to trust third party content. While capturing provenance in RPI's Linking Open Government Data project, we were faced with the common problem that only a portion of provenance that is captured is effectively used. Using our water quality portal's use case as an example, we argue that one key to enabling provenance use is a better treatment of provenance granularity. To address this challenge, we have designed an approach that supports deriving abstracted provenance from granular provenance in an open environment. We describe the approach, show how it addresses the naturally occurring unmet provenance needs in a family of applications, and describe how the approach addresses similar problems in open provenance and open data environments.
2012
Abstract As Open Data becomes commonplace, methods are needed to integrate disparate data from a variety of sources. Although Linked Data design has promise for integrating world wide data, integrators often struggle to provide appropriate transparency for their sources and transformations. Without this transparency, cautious consumers are unlikely to find enough information to allow them to trust third party results.
2011
The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance.
2011
Abstract—Linked data has finally arrived. But with the availability and actual usage of linked data, data from different sources gets quickly mixed and merged. While there is a lot of fundamental work about the provenance of metadata and the commonly recognized demand for expressing provenance information, there still is no standard or at least best-practice recommendation. In this paper, we summarize our own requirements based on experiences at the Mannheim University Library for metadata provenance, examine the feasibility to implement these requirements with currently available (de-facto) standards, and propose a way to bridge the missing gaps. By this paper, we hope to obtain additional feedback, which we will feed back into ongoing discussions within the recently founded DCMI task-group on metadata provenance. I.
Linked Open Data (LOD) is rapidly emerging in publishing and sharing structured data over the semantic web using URIs and RDF in many application domains such as fisheries, health, environment, education and agriculture. Since different schemas that have the same semantics are found in different datasets of the LOD Cloud, the problem of managing semantic heterogeneity among the schemas is increasing. Schema level mapping among the datasets of the LOD Cloud is necessary as instance level mapping among the datasets is not feasible in the process of making knowledge discovery easy and systematic. In order to correctly interpret query results over the integrated dataset, schema level mapping provenance is necessary. In this paper, we review existing approaches of linked data provenance representation, storage and querying, and applications of linked data provenance where mapping is at the instance level. The analysis of existing approaches will assist us in revealing open research problems in the area of linked data provenance where mapping is at the schema level. Furthermore, we explain how schema level mapping provenance in linked data can be used to facilitate data integration and data mining, and also to ensure quality and trust in data.
Lecture Notes in Computer Science, 2010
The World Wide Web evolves into a Web of Data, a huge, globally distributed dataspace that contains a rich body of machineprocessable information from a virtually unbound set of providers covering a wide range of topics. However, due to the openness of the Web little is known about who created the data and how. The fact that a large amount of the data on the Web is derived by replication, query processing, modification, or merging raises concerns of information quality. Poor quality data may propagate quickly and contaminate the Web of Data. Provenance information about who created and published the data and how, provides the means for quality assessment. This paper takes a first step towards creating a quality-aware Web of Data: we present approaches to integrate provenance information into the Web of Data and we illustrate how this information can be consumed. In particular, we introduce a vocabulary to describe provenance of Web data as metadata and we discuss possibilities to make such provenance metadata accessible as part of the Web of Data. Furthermore, we describe how this metadata can be queried and consumed to identify outdated information.
2011
Abstract In this article, the authors provide an example workflow-and a simple classification of user questions on the workflow's data products-to combine and interchange contextual metadata through a semantic data model and infrastructure. They also analyze their approach's potential to support enhanced semantic provenance applications.
IEEE Internet Computing, 2000
E d i t o r s : M . B r i a n B l a k e • m b7@ g e o r g e t o w n .e d u M i c h a e l N . H u h n s • h u h n s @ s c .e d u
International Journal of Information Technology and Computer Science, 2015
Tracking provenance of RDF resources is an important task in Linked Data generating applications. It takes on a central function in gathering information as well as workflow. Various Linked Data generating applications have evolved for converting legacy data to RDF resources. These data belong to bibliographic, geographic, government, publications, and cross-domains. However, most of them do not support tracking data and workflow provenance for individual RDF resources. In such cases, it is required for those applications to track, store and disseminate provenance information describing their source data and involved operations. In this article, we introduce an approach for tracking provenance of RDF resources. Provenance information is tracked during the conversion process and it is stored into the triple store. Thereafter, this information is disseminated using provenance URIs. The proposed framework has been analyzed using Harvard Library Bibliographic Datasets. The evaluation has been made on datasets through converting legacy data into RDF and Linked Data with provenance. The outcome has been quiet promising in the sense that it enables data publishers to generate relevant provenance information while taking less time and efforts.
Concurrency and Computation: Practice and Experience, 2008
VisTrails is a new workflow and provenance management system that provides support for scientific data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. VisTrails uses a new change-based provenance mechanism which was designed to handle rapidly-evolving workflows. It uniformly and automatically captures provenance information for data products and for the evolution of the workflows used to generate these products. In this paper, we describe how the VisTrails provenance data is organized in layers and present a first approach for querying this data that we developed to tackle the Provenance Challenge queries.
Lecture Notes in Computer Science, 2013
Assessing the quality of linked data currently published on the Web is a crucial need of various data-intensive applications. Extensive work on similar applications for relational data and queries has shown that data provenance can be used in order to compute trustworthiness, reputation and reliability of query results, based on the source data and query operators involved in their derivation. In particular, abstract provenance models can be employed to record information about source data and query operators during query evaluation, and later be used e.g., to assess trust for individual query results. In this paper, we investigate the extent to which relational provenance models can be leveraged for capturing the provenance of SPARQL queries over linked data, and identify their limitations. To overcome these limitations, we advocate the need for new provenance models that capture the full expressive power of SPARQL, and can be used to support assessment of various forms of data quality for linked data manipulated declaratively by such queries. * An earlier version of this paper appeared in IEEE Internet Computing 15(1): 31-39, 2011 1 www.w3.org/standards/semanticweb/data 2 www.w3.org/wiki/SparqlEndpoints 3 www.w3.org/2005/-Incubator/-prov/wiki/User Requirements
2010
Data provenance graphs are form of metadata that can be used to establish a variety of properties of data products that undergo sequences of transformations, typically specified as workflows. Their usefulness for answering user provenance queries is limited, however, unless the graphs are enhanced with domain-specific annotations. In this paper we propose a model and architecture for semantic, domain-aware provenance, and demonstrate its usefulness in answering typical user queries.
International Journal of Computer Applications, 2012
The Linked Open Data project has changed the world by allowing publishers to publish data of any kind as linked data and share, reuse them among other data. In Open Data each data is easily accessible and processable by machine hence one can navigate to an endless web of data sources. In the present day, many times Linked Data still suffers from trust, quality and privacy. It is requisite to endow with provenance access mechanisms that express the diverse characteristics of a dataset. In a huge volume of data-sources it is thorny to find out the trusted data and determine what a data is meant for. The Data Provenance in the Web of Data is a new technique which enabled the publishers and consumers of Linked Data to assess the quality and trustworthiness of the data. Several techniques have been emerged to represent and describe the provenance metadata in relation to the linked datasets. In this paper we appraise different techniques in this field mostly in terms of the representation, storage, and generation of provenance information of Linked Data. In addition to that we have illustrated, evaluated and identified the contemporary research challenges in this field.
Future Generation …, 2010
2011
Abstract Reproducibility has been a cornerstone of the scientific method for hundreds of years. The range of sources from which data now originates, the diversity of the individual manipulations performed, and the complexity of the orchestrations of these operations all limit the reproducibility that a scientist can ensure solely by manually recording their actions.
2015
The Open Provenance Model (OPM) is a community data model for provenance that is designed to facilitate the meaningful interchange of provenance information between systems. Underpinning OPM, is a notion of directed graph, used to represent data products and processes involved in past computations, and dependencies between them; it is complemented by inference rules allowing new dependencies to be derived. The Open Provenance Model was designed from requirements captured in two "Provenance Challenges", and tested during the third: these challenges were international, multi-disciplinary activities aiming to exchange provenance information between multiple systems and query it. The design of OPM was mostly driven by practical and pragmatic considerations. The purpose of this paper is to formalize the theory underpinning this data model. Specifically, this paper proposes a temporal semantics for OPM graphs, defined in terms of a set of ordering constraints between time-points associated with OPM constructs. OPM inferences are characterized with respect to this temporal semantics, and a novel set of patterns is introduced to establish soundness and completeness properties. Building on this novel foundation, the paper proposes new definitions for graph algebraic operations, graph refinement and the notion of account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Overall, this paper provides a strong theoretical underpinning to a data model being adopted by a community of users that help its disambiguation and promote inter-operability.
2010
This paper describes the foundations of a framework for constructing interoperable semantic applications that support recording of provenance information. The framework uses a client-server infrastructure to control the encoding of application. Provenance records for application components, settings, and data sources are stored as part of the final application file using the Open Provenance Model (OPM)[1].
Journal of Database Management, 2015
Existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely ...
International Journal of Scientific and Engineering Research
Provenance of data items plays a pivotal role during the reuse and integration of data from the diverse sources. Determination of trust and authenticity is essential to verify various data products available on the web. Over the past few years, data publication in the Linking Open Data (LOD) cloud has been growing rapidly. Due to the absence of meta-data or provenance, data in the web starts suffering from trust and quality. Until now, lot of standard approaches has been proposed for converting legacy data to Linked Data. However, these approaches still lack a reliable method for tracking provenance of the data items. It is imperative to know the existing approaches for better understanding about how the provenance is tracked and stored. This article reviews almost preeminent approaches on tracking provenance during generation of Linked Data from the legacy data systems and presents them concisely by analyzing with various other approaches.
ACM Transactions on Software …, 2009
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.