Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012, Journal of Grid Computing
…
14 pages
1 file
Digital preservation is the persistent archiving of digital assets for future access and reuse, irrespective of the underlying platform and software solutions. Existing preservation systems have a strong focus on grids, but the advent of cloud technologies offers an attractive option. We describe a middleware system that enables a flexible choice between a grid and a cloud for ad-hoc computations that arise during the execution of a preservation workflow and also for archiving digital objects. The choice between different infrastructures remains open during the lifecycle of the archive, ensuring a smooth switch between different solutions to accommodate the changing requirements of the organization that needs its digital assets preserved. We also offer insights on the costs, running times, and organizational issues of cloud computing, proving that the cloud alternative is particularly attractive for smaller organizations without access to a grid or with limited IT infrastructure.
2013 IEEE International Conference on Cloud Engineering (IC2E), 2013
The emergence of the cloud and advanced object-based storage services provides opportunities to support novel models for long term preservation of digital assets. Among the benefits of this approach is leveraging the cloud's inherent scalability and redundancy to dynamically adapt to evolving needs of digital preservation. Preservation DataStores in the Cloud (PDS Cloud) is an OAIS-based preservation-aware storage service employing multiple heterogeneous cloud providers. It materializes the logical concept of a preservation information-object into physical cloud storage objects. Preserved information can be interpreted by deploying virtual appliances in the compute cloud, built from readily available components and provisioned with data objects together with their designated rendering software. PDS Cloud has a hierarchical data model and resource naming structure, supporting independent tenants whose assets are organized in multiple aggregations based on content and value. Each aggregation has a separate preservation profile that is reconfigurable as requirements keep changing over the long term. Continuous changes to data objects, life-cycle activities, virtual appliances and cloud providers are applied in a manner transparent to the client. PDS Cloud is being developed as an infrastructure component of the European Union ENSURE project, where it is used for preservation of medical and financial data.
We present, compare and contrast new directions in long term digital preservation as covered by the four large European Community funded research projects that started in 2011. The new projects widen the domain of digital preservation from the traditional purview of memory institutions preserving documents to include scenarios such as health-care, data with direct commercial value, and webbased data. Some of these projects consider not only how to preserve the programs needed to interpret the data but also how to manage and preserve the related workflows. Considerations such as risk analysis and cost estimation are built into some of them, and more than one of these efforts is examining the use of cloud-based technologies. All projects look into programmatic solutions, while emphasizing different aspects such as data collection, scalability, reconfigurability, and full lifecycle management. These new directions will make digital preservation applicable to a wider domain of users and will give better tools to assist in the process.
Purpose -The purpose of this paper is to examine the characteristics of managing records in a cloud computing environment and compare these with existing archiving models, exemplified by the open archival information system (OAIS) reference model. Design/methodology/approach -The authors compare the functional entities in OAIS with a layered model of cloud computing, in which services are abstracted and shared between layers. Findings -It is concluded that there are a number of areas where OAIS does not integrate well with cloud computing systems. Based on the findings, a new layered model for a cloud archiving system is defined using the concepts and information types from the OAIS reference model. The proposed model allows the sharing of functionality and information objects by making them available as services to higher layers. The model covers the entire document lifecycle, making archive functionality such as preservation planning possible at an early stage and helping to simplify records transfer.
2008
The SHAMAN project targets a framework integrating advances in the data grid, digital library, and persistent archives communities in order to archive a longterm preservation environment. Within the project we identified several challenges for digital preservation in the area of memory institutions, where already existing systems start to struggle with e.g. complex or many small objects. In order to overcome these, we propose a grid based framework for digital preservation. In this paper we describe the main objectives of the project SHAMAN and the identified challenges for a heterogeneous and distributed environment. We on the one hand assess in a bottom-up approach the capabilities and interfaces of legacy systems and on the other hand derive requirements based on project objectives. The focus points to the integration of storage infrastructures and distributed data management. In the end we derive a service-oriented architecture with an grid-based integration layer as approach to manage the challenges.
The Memory of the World in the Digital Age: Digitization and Preservation / Duranti, Luciana ; Shaffer, Elizabeth (Eds.). UNESCO, 2013
The research is positioned in the context of the responsibility of archives to preserve important records in an increasingly changing technological environment, and focused on the impact of cloud solutions on archival theory and practice. Authors address several questions which they consider crucial for archival science and community. Results of the survey on the usage of private cloud are given. In view of that, the authors examine if the concept "Archiving-as-a-Service" will require redefinition of archival practice in the new technological and organizational context. Finally, they suggest the need for transition from postcustodial to "postcustodial 2.0" paradigm.
2006
" Both informal collaborations (associations and alliances) and formal partnerships among contractors and subcontractors will also surely arise, in which responsibilities for archiving are allocated among various other interests in digital information.
Individuals and organizations are increasingly drawn by the lure of cloud computing for the many benefits it offers. Scalable, agile, efficient, on-demand computing resources mean that email, photos, data, documents and records can be easily stored and shared through a seemingly endless number of hosted web applications, and that sophisticated software, platforms, and infrastructure are available to the budget-conscious and technology-resource limited. Commercial cloud architectures offer on-demand access to services across a network of standard internet-accessible devicesmobile phones, tablets, laptopsand a vast array of other devices, such as game consoles, MP3 players, and e-business technologies. Resources are shared among users, and resource use is monitored and invoiced based on usage for service. People useand increasingly rely oncloud services for communication, backup and storage, collaboration, distribution, recordkeeping and preservation. While they engage in such use, these technologies and services change their behavior.
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries, 2013
The rapidly growing production of digital data, together with their increasing importance and essential demands for their longevity, urgently require systems that provide reliable long-term preservation of digital objects. These systems have to ensure guaranteed availability, integrity, authenticity, and interpretability over the course of the preservation, where the preservation period may last for several years, for instance in business or scientific applications, the lifetime of a human in medical applications, up to potentially unlimited time-spans for preservation in cultural heritage digital libraries. This means that all kinds of technical problems (network, software or hardware failures) need to be reliably handled and that the evolution of data formats is supported. At the same time, systems need to scale with the volume of data to be archived. Thus, long-term digital preservation systems have to be inherently distributed to allow content to be replicated. Institutions with long-term archiving needs for the preservation of digital data, have to collaborate in order to build a highly reliable and available, geographically distributed Internet-based digital archiving system. By employing distributed systems technologies be it for the creation of a small cooperating network of few institutions with limited resources, or a large network with many nodes providing combined potentially vast amounts of globally distributed resources, the challenges lie in the autonomic, efficient, and fault-tolerant use of these resources without a centralized global coordinator. We present novel concepts for a distributed long-term preservation system for digital data, with a focus on long-term preservation as required by archives, museums, research communities, or the corporate sector. These concepts are the result of combining distributed, autonomic, and process oriented computing, with requirements from the digital preservation community regarding special system, user, and metadata functionality. Originating from this fusion, our novel concepts are the main ingredients of the described system model, consisting of a data model, and different i processes. At the data level, support is provided for complex data objects, management of collections, annotations, and arbitrary links between digital objects. At process level, our proposed archiving system model supports automated processes that provide dynamic replication, consistency checks, and automated recovery of the archived digital objects utilizing autonomic behavior governed by preservation policies without any centralized coordinator in a fully distributed network. This allows for an efficient and fault-tolerant use of the resources provided in the network. Further, we present a prototype implementation of the DISTARNET (DISTributed ARchival NETwork) system, a distributed long-term digital preservation solution, which utilizes the described novel concepts. While implementing the described data model and processes, our implementation is additionally governed by considerations such as fault-tolerance on the node level, maintainability and extendability, and longterm use of the system, which all flow into the described system architecture, and resulting implementation. Subsequently, we then perform an evaluation of the implemented prototype and the underlying concepts, with the use of realistic scenarios. The evaluation consists of two parts. In the first part, we define and employ a benchmark geared towards triple stores, in which we evaluate the feasibility and the constraints of using triple stores for RDF-based metadata storage and management in the context of long-term preservation systems. In the second part, we perform a qualitative and quantitative evaluation of the DISTARNET system prototype implementation. The former looking at the correct execution of the developed processes, and the later looking at the performance of the system regarding the overall archiving storage capacity and scalability of the system. I would like to express the deepest appreciation to my thesis advisor and mentor Prof. Dr. Heiko Schuldt for his extraordinary supervision, and for the generous and friendly support he provided over these years while displaying a lot of patience. With his stimulating discussions he gave me a lot of insight which I have been able to express in this thesis. Also he provided me the huge opportunity to write this thesis in his group, and to say the least, this thesis would not have been possible without him. I wish to thank Prof. Dr. Andreas Rauber from the Vienna University of Technology in Austria, for kindly agreeing to be my co-referee. I am particularly grateful for patient guidance, the support, and advice given by P.D. Dr. Lukas Rosenthaler, Prof. Dr. Rudolf Gschwind, Dr. Simon Margulies, and all the great people at the Imaging and Media Lab at the University of Basel. I would also like to thank Daniela Bienz, for her loving support and enthusiastic encouragement during this time. Finally, I wish to thank my father, Branislav Subotic, for his patience and continuously support throughout my thesis. Carol, head of the computer science department at ACME Ltd. a large multinational pharmaceutical corporation, has the task to implement a new digital archiving solution which is compliant to the company's preservation policy. In addition to the standard requirement for digital preservation (integrity, authenticity, etc.), this policy imposes that data has to be redundantly stored at least three different locations, with added constraints regarding minimum distance between locations and -for some types of data -also the country or state in which the data is allowed to be stored. Other issues like enforcing integrity (uncorrupted record), authenticity (linking of provenance information to each record), chain of custody (tracking of location and management controls within the preservation environment, e.g., who, where, and when handled the archived data in the network), trustworthiness (sustainability of the records), and readability (long-term access through data format migration) need also to be addressed. ACME Ltd. operates several data-centers worldwide, which Carol can use for deploying her new archiving solution.
Proceedings of the IEEE
The integration of grid, data grid, digital library, and preservation technology has resulted in software infrastructure that is uniquely suited to the generation and management of data. Grids provide support for the organization, management, and application of processes. Data grids manage the resulting digital entities. Digital libraries provide support for the management of information associated with the digital entities. Persistent archives provide long-term preservation. We examine the synergies between these data management systems and the future evolution that is required for the generation and management of information.
2011
Digital preservation deals with the problem of retaining the meaning of digital information over time to ensure its accessibility. The process often involves a workflow which transforms the digital objects. The workflow defines document pipelines containing transformations and validation checkpoints, either to facilitate migration for persistent archival or to extract metadata. The transformations, nevertheless, are computationally expensive, and therefore digital preservation can be out of reach for an organization whose core operation is not in data conservation. The operations described the document workflow, however, do not frequently reoccur. This paper combines an implementation-independent workflow designer with cloud computing to support small institution in their adhoc peak computing needs that stem from their efforts in digital preservation.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
International Journal of Distributed and Cloud Computing, 2019
9th Annual International Conference on Computer Science Education: Innovation and Technology (CSEIT 2018), 2018
IBM Journal of Research and Development, 2008
Proceedings of the third ACM conference on Digital libraries - DL '98, 1998
Serials: The Journal for the Serials …, 2008
Makerere University, 2019
Medical Imaging 2009: Advanced PACS-based Imaging Informatics and Therapeutic Applications, 2009
Canadian Journal of Information and Library Science, 2015