Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Concurrency and Computation: Practice and Experience
…
10 pages
1 file
Provenance-aware storage systems (PASS) are a new class of storage system treating provenance as a first-class object, providing automatic collection, storage, and management of provenance as well as query capabilities. We developed the first PASS prototype between 2005 and 2006, targeting scientific end users. Prior to undertaking the provenance challenge, we had focused on provenance collection and storage, without much emphasis on a query model or language. The challenge forced us to (quickly) develop a query model and infrastructure implementing this model. We present a brief overview of the PASS prototype and a discussion of the evolution of the query model that we developed for the challenge.
2009
Most application provenance systems are hard coded for a particular type of system or data, while current provenance file systems maintain in-memory provenance graphs and reside in kernel space, leading to complex and constrained implementations. Story Book resides in user space, and treats provenance events as a generic event log, leading to a simple, flexible and easily optimized system.
Concurrency and Computation: Practice and Experience, 2008
VisTrails is a new workflow and provenance management system that provides support for scientific data exploration and visualization. Whereas workflows have been traditionally used to automate repetitive tasks, for applications that are exploratory in nature, change is the norm. VisTrails uses a new change-based provenance mechanism which was designed to handle rapidly-evolving workflows. It uniformly and automatically captures provenance information for data products and for the evolution of the workflows used to generate these products. In this paper, we describe how the VisTrails provenance data is organized in layers and present a first approach for querying this data that we developed to tackle the Provenance Challenge queries.
University of Southampton, …, 2006
Future Generation …, 2010
2011
The third provenance challenge was organized to evaluate the efficacy of the Open Provenance Model (OPM) in representing and sharing provenance with the goal of improving the specification. A data loading scientific workflow that ingests data files into a relational database for the Pan-STARRS sky survey project was selected as a candidate for collecting provenance.
Journal of Database Management, 2015
Existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance. However, the provenance captured from different layers provides the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this paper, the authors propose a provenance model able to represent the provenance of any data object captured at any abstraction layer and present an abstract schema of the model. The expressive nature of the model enables a wide range of provenance queries. The authors also illustrate the utility of their model in real world data processing systems. In the paper, they also introduce a data provenance distributed middleware system composed of several different components and services that capture provenance according to their model and securely ...
Provenance is the metadata that describes the history of objects. Provenance provides new functionality in a variety of areas, including experimental documentation, debugging, search, and security. As a result, a number of groups have built systems to capture provenance. Most of these systems focus on provenance collection, a few systems focus on building applications that use the provenance, but all of these systems ignore an important aspect: efficient long-term storage of provenance. In this article, we first analyze the provenance collected from multiple workloads and characterize the properties of provenance with respect to long-term storage. We then propose a hybrid scheme that takes advantage of the graph structure of provenance data and the inherent duplication in provenance data. Our evaluation indicates that our hybrid scheme, a combination of Web graph compression (adapted for provenance) and dictionary encoding, provides the best trade-off in terms of compression ratio, compression time, and query performance when compared to other compression schemes.
In many application areas like scientific computing, data-warehousing, and data integration detailed information about the origin of data is required. This kind of information is often referred to as data provenance. The provenance of a piece of data, a so-called data item, includes information about the source data from which it is derived and the transformations that lead to its creation and current representation. In the context of relational databases, provenance has been studied both from a theoretical and algorithmic perspective. Yet, in spite of the advances made, there are very few practical systems available that support generating, querying and storing provenance information (We refer to such systems as provenance management systems or PMS). These systems support only a subset of SQL, a severe limitation in practice since most of the application domains that benefit from provenance information use complex queries. Such queries typically involve nested sub-queries, aggregat...
The provenance of a file represents the origin and history of the file data. A Distributed Provenance Aware Storage System (DPASS) tracks the provenance of files in a distributed file system. The provenance information can be used to identify potential dependencies between files in a filesystem. Some applications of provenance tracking include (i) tracking the transformations applied to process raw data in scientific communities and (ii) intrusion detection and forensic analysis of computer systems. In this report we present the design and implementation of a provenance aware storage system, which efficiently stores and retrieves provenance information for files in a distributed file system, while incurring minimal space and time overheads.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Concurrency and Computation: Practice and Experience, 2008
Lecture Notes in Computer Science, 2015
ACM Transactions on Software …, 2009