Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Lecture Notes in Computer Science
…
18 pages
1 file
The provenance of research data is of critical importance to the reproducibility of and trust in scientific results. As research infrastructures provide more amalgamated datasets for researchers and more integrated facilities for processing and publishing data, the capture of provenance in a standard, machine-actionable form becomes especially important. Significant progress has already been made in providing standards and tools for provenance tracking, but the integration of these technologies into research infrastructure remains limited in many scientific domains. Further development and collaboration are required to provide frameworks for provenance capture that can be adopted by as widely as possible, facilitating interoperability as well as dataset reuse. In this chapter, we examine the current state of the art for provenance, and the current state of provenance capture in environmental and Earth science research infrastructures in Europe, as surveyed in the course of the ENVRIplus project. We describe a service developed for the upload, dissemination and application of provenance templates that can be used to generate standardised provenance traces from input data in accordance with current best practice and standards. The use of such a service by research infrastructure architects and researchers can expedite both the understanding and use of provenance technologies, and so drive the standard use of provenance capture technologies in future research infrastructure developments.
Earth Science Informatics, 2010
Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of data and a record of the process that led to its current state. It encompasses the documentation of a variety of artifacts related to particular data. Provenance is important for understanding and using scientific datasets, and critical for independent confirmation of scientific results. Managing provenance throughout scientific data processing has gained interest lately and there are a variety of approaches. Large scale scientific datasets consisting of thousands to millions of individual data files and processes offer particular challenges. This paper uses Communicated by: Thomas Narock C. Tilmes (B)
International Journal of …, 2011
Studies in Computational Intelligence, 2013
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein.
… in Science and …, 2008
Concurrency and Computation: Practice and Experience, 2008
Workflows and data pipelines are becoming increasingly valuable to computational and experimental sciences. These automated systems are capable of generating significantly more data within the same amount of time compared to their manual counterparts. Automatically capturing and recording data provenance and annotation as part of these workflows are critical for data management, verification, and dissemination. We have been prototyping a workflow provenance system, targeted at biological workflows, that extends our content management technologies and other open source tools. We applied this prototype to the provenance challenge to demonstrate an end-to-end system that supports dynamic provenance capture, persistent content management, and dynamic searches of both provenance and metadata. We describe our prototype, which extends the Kepler system for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTPbased query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to provide access to the provenance record with a variety of commonly available client tools.
Geological Society of America Special Papers, 2011
A necessary first step in the preservation of digital scientific data is gathering enough information "about" a scientific outcome or data collection, that it can be discovered and used a decade from now as easily as it is reused next week. Data provenance, or lineage of a collection, can capture how a particular scientific collection was created, when and by whom. Our goal is to devise tools automate the collection of provenance so that this task does not fall onto the researcher, and to efficiently store and represent the provenance data that makes the data more amenable to long term preservation. We demonstrate through application to several projects that automated provenance collection can reach the level of necessary provenance but challenges remain in addressing provenance collection in a non-workflow setting, and in data preservation in cyberinfrastructure architectures.
AGU Fall Meeting Abstracts, 2010
The notion of sharing scientific data has only recently begun to gain ground in science, where data is still considered a private asset. There is growing evidence, however, that the benefits of scientific collaboration through early data sharing during the course of a science project may outgrow the risk of losing exclusive ownership of the data. As exemplar success stories are making the headlines [1], principles of effective information sharing have become the subject of e-science research. In particular, any piece of published data ...
2011
Abstract Reproducibility has been a cornerstone of the scientific method for hundreds of years. The range of sources from which data now originates, the diversity of the individual manipulations performed, and the complexity of the orchestrations of these operations all limit the reproducibility that a scientist can ensure solely by manually recording their actions.
Lecture Notes in Computer Science, 2008
NASA and other organizations involved with climate research have captured huge archives of earth observations. The sensors, spacecraft, and science algorithms for transforming and analyzing the data and the processing frameworks are evolving over time. Science Data Processing Systems (SDPSes) should capture, archive, and distribute provenance information of all externally received data and algorithms, as well as describing all internal processes used for data transformation. This will make the data sets produced by the systems easier to understand, enable independent scientific reproducability, and ultimately, increase the credibility of the scientific research that makes use of those data sets.
2018
Scientific communities are building platforms where the usage of data-intensive workflows is crucial to conduct their research campaigns. However managing and effectively support the understanding of the ’live’ processes, fostering computational steering, sharing and re-use of data and methods, present several bottlenecks. These are often caused by the poor level of documentation on the methods and the data and how users interact with it. This work wants to explore how in such systems, flexibility in the management of the provenance and its adaptation to the different users and application contexts can lead to new opportunities for its exploitation, improving productivity.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Journal of Grid Computing, 2007
Proceedings of the VLDB Endowment
Concurrency and Computation: Practice and Experience, 2008
Future Generation …, 2010
2012
Workflows in Support …
Journal of Grid Computing, 2005