Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
Most application provenance systems are hard coded for a particular type of system or data, while current provenance file systems maintain in-memory provenance graphs and reside in kernel space, leading to complex and constrained implementations. Story Book resides in user space, and treats provenance events as a generic event log, leading to a simple, flexible and easily optimized system.
University of Southampton, …, 2006
Journal of Database Management, 2015
Existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely ...
The provenance of a file represents the origin and history of the file data. A Distributed Provenance Aware Storage System (DPASS) tracks the provenance of files in a distributed file system. The provenance information can be used to identify potential dependencies between files in a filesystem. Some applications of provenance tracking include (i) tracking the transformations applied to process raw data in scientific communities and (ii) intrusion detection and forensic analysis of computer systems. In this report we present the design and implementation of a provenance aware storage system, which efficiently stores and retrieves provenance information for files in a distributed file system, while incurring minimal space and time overheads.
2008
The concept of provenance is already well understood in the study of fine art where it refers to the trusted, documented history of some work of art. Given that documented history, the object attains an authority that allows scholars to understand and appreciate its importance and context relative to other works of art. This same concept of provenance may also be applied to data and information generated within a computer system; particularly when the information is subject to regulatory control over an extended period of time.
ACM Transactions on Software …, 2009
Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to capture provenance. Most of these systems focus on provenance collection, a few systems focus on building applications that use the provenance, but all of these systems ignore an important aspect: efficient long-term storage of provenance. In this article, we first analyze the provenance collected from multiple workloads and characterize the properties of provenance with respect to long-term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of Web graph compression (adapted for provenance) and dictionary encoding, provides the best trade-off in terms of compression ratio, compression time, and query performance when compared to other compression schemes.
2015
We present a formalism for provenance in distributed systems based on the pi-calculus. Its main feature is that all data products are annotated with metadata represent-ing their provenance. The calculus is given a provenance tracking semantics, which ensures that data provenance is updated as the computation proceeds. The calculus also enjoys a pattern-restricted input primitive which al-lows processes to decide what data to receive and what branch of computation to proceed with based on the provenance information of data. We give examples to illustrate the use of the calculus and discuss some of the semantic properties of our provenance notion. We con-clude by reviewing related work and discussing direc-tions for future research. 1
2009
In this demonstration we present the Perm provenance management system (PMS). Perm is capable of computing, storing and querying provenance information for the relational data model. Provenance is computed by using query rewriting techniques to annotate tuples with provenance information. Thus, provenance data and provenance computations are represented as relational data and queries and, hence, can be queried, stored and optimized using standard relational database techniques. This demo shows the complete Perm system and lets attendants examine in detail the process of query rewriting and provenance retrieval in Perm, the most complete data provenance system available today. For example, Perm supports lazy and eager provenance computation, external provenance and various contribution semantics for an almost complete subset of SQL..
In many application areas like scientific computing, data-warehousing, and data integration detailed information about the origin of data is required. This kind of information is often referred to as data provenance. The provenance of a piece of data, a so-called data item, includes information about the source data from which it is derived and the transformations that lead to its creation and current representation. In the context of relational databases, provenance has been studied both from a theoretical and algorithmic perspective. Yet, in spite of the advances made, there are very few practical systems available that support generating, querying and storing provenance information (We refer to such systems as provenance management systems or PMS). These systems support only a subset of SQL, a severe limitation in practice since most of the application domains that benefit from provenance information use complex queries. Such queries typically involve nested sub-queries, aggregat...
Future Generation Computer Systems, 2011
The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance. Challenge participants record provenance, run queries over it, and export/import provenance as OPM documents with other teams to verify interoperability. Fourteen teams participated in the challenge that concluded at a workshop in June 2009 at Amsterdam. The experiences of several participating teams are included in this special issue. In this editorial, we describe the challenge in detail, review its outcome, and introduce articles included in this special issue.
Egyptian Computer Science Journal, 2000
This document covers the logical and process architectures of provenance systems. The logical architecture identifies key roles and their interactions, whereas the process architecture discusses distribution and security. A fundamental aspect of our presentation is its technology-independent nature, which makes it reusable: the principles that are exposed in this document may be applied to different technologies.
Lecture Notes in Computer Science, 2012
In this paper, we propose a provenance model able to represent the provenance of any data object captured at any abstraction layer (workflow/process/OS) and present an abstract schema of the model. The expressive nature of the model makes it potential to be utilized in real world data processing systems.
Future Generation …, 2010
International Journal of Computers and Applications, 2018
Distributed computing infrastructure such as cloud computing has become an essential part of computing landscape over the past years. The phenomenon is rapidly gaining an overwhelming application in several organizations due partly to its robustness and ease of use. To secure data integrity in cloud computing environment, data provenance was introduced. Current data provenance information systems mainly deal with the problems and challenges of data provenances capture, query and storage as well as their security. In this paper, we considered how to manage the volume of provenance by introducing a Lightweight Intuitive Provenance in a cloud computing environment. We introduce the arithmetic coding method to enhance the provenance compression model through a space usage model. To speed up the searching time of the provenance in our system, a time efficiency model is applied. Experimental results show that our approach is a feasible mechanism for provenance storage resource management in cloud computing environment.
Computer Standards & Interfaces, 2013
In this paper we propose an efficient and scalable storage model and lookup for provenance logs. The proposed system exploits the loosely coupled structure of the provenance logs by separating metadata from the generating process to manage large datasets with good scalability. In addition, the system utilizes the trie based lookup table to greatly improve the provenance data lookup time. Performance results on thousands of graph logs show that our prototype implementation can effectively handle logs without any resource over-utilization, thus leading to good scalability.
Concurrency and Computation: Practice and Experience, 2008
Provenance-aware storage systems (PASS) are a new class of storage system treating provenance as a first-class object, providing automatic collection, storage, and management of provenance as well as query capabilities. We developed the first PASS prototype between 2005 and 2006, targeting scientific end users. Prior to undertaking the provenance challenge, we had focused on provenance collection and storage, without much emphasis on a query model or language. The challenge forced us to (quickly) develop a query model and infrastructure implementing this model. We present a brief overview of the PASS prototype and a discussion of the evolution of the query model that we developed for the challenge.
Distributed computing infrastructure such as cloud computing has become an essential part of computing landscape over the past years. The phenomenon is rapidly gaining an overwhelming application in several organizations due partly to its robustness and ease of use. To secure data integrity in cloud computing environment, data provenance was introduced. Current data provenance information systems mainly deal with the problems and challenges of data provenances capture, query and storage as well as their security. In this paper, we considered how to manage the volume of provenance by introducing a Lightweight Intuitive Provenance in a cloud computing environment.We introduce the arithmetic coding method to enhance the provenance compression model through a space usage model. To speed up the searching time of the provenance in our system, a time efficiency model is applied. Experimental results show that our approach is a feasible mechanism for provenance storage resource management in cloud computing environment.
2007
In many application areas like e-science and data-warehousing detailed information about the origin of data is required. This kind of information is often referred to as data provenance or data lineage. The provenance of a data item includes information about the processes and source data items that lead to its creation and current representation. The diversity of data representation models and application domains has lead to a number of more or less formal definitions of provenance. Most of them are limited to a special application domain, data representation model or data processing facility. Not surprisingly, the associated implementations are also restricted to some application domain and depend on a special data model. In this paper we give a survey of data provenance models and prototypes, present a general categorization scheme for provenance models and use this categorization scheme to study the properties of the existing approaches. This categorization enables us to distinguish between different kinds of provenance information and could lead to a better understanding of provenance in general. Besides the categorization of provenance types, it is important to include the storage, transformation and query requirements for the different kinds of provenance information and application domains in our considerations. The analysis of existing approaches will assist us in revealing open research problems in the area of data provenance.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.