Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
1998, Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing - PODC '98
results in high latency and low bandwidth to web-enabled clients and prevents the timely delivery of software. We present an algorithm for modifying delta compressed files so that the compressed versions may be reconstructed without scratch space. This allows network clients with limited resources to efficiently update software by retrieving delta compressed versions over a network. Delta compression for binary files, compactly encoding a version of data with only the changed bytes from a previous version, may be used to efficiently distribute software over low bandwidth channels, such as the Internet. Traditional methods for rebuilding these delta files require memory or storage space on the target machine for both the old and new version of the file to be reconstructed. With the advent of network computing and Internet-enabled devices, many of these network attached target machines have limited additional scratch space. We present an algorithm for modifying a delta compressed version file so that it may rebuild the new me version in the space that the current version occupies. Differential or delta compression [5, 11, compactly encoding a new version of a file using only the changed bytes from a previous version, can be used to reduce the size of the file to be transmitted and consequently the time to perform software update. Currently, decompressing delta encoded files requires scratch space, additional disk or memory storage, used to hold a required second copy of the file. Two copies of the compressed file must be concurrently available, as the delta file contains directives to read data from the old file version while the new file version is being materialized in another region of storage. This presents a problem. Network attached devices often have limited memory resources and no disks and therefore are not capable of storing two file versions at the same time. Furthermore, adding storage to network attached devices is not viable, as keeping these devices simple limits their production costs.
Inexpensive storage and more powerful processors have resulted in a proliferation of data that needs to be reliably backed up. Network resource limitations make it increasingly difficult to backup a distributed file system on a nightly or even weekly basis. By using delta compression algorithms , which minimally encode a version of a file using only the bytes that have changed, a backup system can compress the data sent to a server. With the delta backup technique, we can achieve significant savings in network transmission time over previous techniques. Our measurements indicate that file system data may, on average, be compressed to within 10% of its original size with this method and that approximately 45% of all changed files have also been backed up in the previous week. Based on our measurements, we conclude that a small file store on the client that contains copies of previously backed up files can be used to retain versions in order to generate delta files. To reduce the load on the backup server, we implement a modified version storage architecture, version jumping, that allows us to restore delta encoded file versions with at most two accesses to tertiary storage. This minimizes server work-load and network transmission time on file restore.
2015
Real-time compression for primary storage is quickly becoming widespread as data continues to grow exponen-tially, but adding compression on the data path consumes scarce CPU and memory resources on the storage sys-tem. Our work aims to mitigate this cost by introducing methods to quickly and accurately identify the data that will yield significant space savings when compressed. The first level of filtering that we employ is at the data set level (e.g., volume or file system), where we estimate the overall compressibility of the data at rest. Accord-ing to the outcome, we may choose to enable or disable compression for the entire data set, or to employ a sec-ond level of finer-grained filtering. The second filtering scheme examines data being written to the storage system in an online manner and determines its compressibility. The first-level filtering runs in mere minutes while providing mathematically proven guarantees on its esti-mates. In addition to aiding in selecting which vo...
The present study was conducted to introduce a new technique for storing and retrieving data without using any storage appliance/device, any of the common compressed/zipped or decompressed/unzipped software application as no database environment was used to store or retrieve data. The outcomes showed that the mechanism of the new technique AMA-TECH is able to store the entire given data in one single code and retrieves that data, as stored, from that code itself. AMA-TECH was able to manipulate all data types at once (image, text, or mixture data). What makes AMA-TECH different was that the generated code was not an index to such a record in a computer database's table or a name of a compressed file. Technically, the code, per se, becames storage appliance to store and retrieve data as stored. Accordingly, the measurement of data size was not by Kilobyte, Megabyte, and so on. Instead, data was measured by how many nanoseconds, seconds, or minutes that AMA-TECH needs/consumed to store and retrieve data in-and-from the code itself. To our knowledge so far, this solution has never introduced before. However, many and many other issues were raised that still we are working on them.
2004
We consider the utility of two key properties of network-embedded storage: programmability and network-awareness. We describe two extensive applications, whose performance and functionalities are significantly enhanced through innovative combination of the two properties. One is an incremental file-transfer system tailor-made for low-bandwidth conditions. The other is a “customizable” distributed file system that can assume very different personalities in different topological and workload environments. The applications show how both properties are necessary to exploit the full potential of network-embedded storage. We also discuss the requirements of a general infrastructure to support easy and effective access to network-embedded storage, and describe a prototype implementation of such an infrastructure.
For backup storage, increasing compression allows users to protect more data without increasing their costs or storage footprint. Though removing duplicate re- gions (deduplication) and traditional compression have become widespread, further compression is attainable. We demonstrate how to efficiently add delta compres- sion to deduplicated storage to compress similar (non- duplicate) regions. A challenge when adding delta com- pression is the large number of data regions to be in- dexed. We observed that stream-informed locality is ef- fective for delta compression, so an index for delta com- pression is unnecessary, and we built the first storage sys- tem prototype to combine delta compression and dedu- plication with this technology. Beyond demonstrating extra compression benefits between 1.4-3.5X, we also investigate throughput and data integrity challenges that arise.
2008
To a storage systems researcher, all user bytes are created opaque and equal. Whether they encode a timeless wedding photograph, a recording of a song downloaded from the WWW, or tax returns of yesteryear is not germane to the already complex problem of their storage. But according to Cathy Marshall, this is only one stripe of the storage beast. There is a bigger problem that we will have to confront eventually: The exponentially growing size of our personal digital estate will soon—if it has not done so already—surpass our management abilities.
Operating Systems Review, 1998
This paper describes the Network-Attached Secure Disk (NASD) storage architecture, prototype implementations oj NASD drives, array management for our architecture, and three,filesystems built on our prototype. NASD provides scalable storage bandwidth without the cost of servers used primarily ,fijr trut&rring data from peripheral networks (e.g. SCSI) to client networks (e.g. ethernet). Increasing datuset sizes, new attachment technologies, the convergence of peripheral and interprocessor switched networks, and the increased availability of on-drive transistors motivate and enable this new architecture. NASD is based on four main principles: direct transfer to clients, secure interfaces via cryptographic support, asynchronous non-critical-path oversight, and variably-sized data objects. Measurements of our prototype system show that these services can be cost-#ectively integrated into a next generation disk drive ASK. End-to-end measurements of our prototype drive andfilesysterns suggest that NASD cun support conventional distributed filesystems without per$ormance degradation. More importantly, we show scaluble bandwidth for NASD-specialized filesystems. Using a parallel data mining application, NASD drives deliver u linear scaling of 6.2 MB/s per clientdrive pair, tested with up to eight pairs in our lab. Keywords D.4.3 File systems management, D.4.7 Distributed systems, B.4 Input/Output and Data Communications.
Computers and Electronics in Agriculture, 2014
It is considered the bandwidth bottleneck problem arising in ISO 11783 networks of mobile farm equipments when large file transfers are performed. To overcome this problem, a compression protocol called ISOBUSComp allowing the implementation of dynamic (''on the fly'') data compression services for general ISO11783 file transfers is proposed. As a result, transmitting Electronic Control Units are free to choose any data compression technique they wish and receiving Electronic Control Units need not to be aware of such decisions, but just be able to process a suitable Universal Decompression Virtual Machine. Comprehensive simulation studies show that dynamic data compression services built upon the proposed protocol help to reduce bus utilization of ISO 11783 networks between a 28% and a 63%, thus speeding up the time for large file transfers.
International Journal of Recent Technology and Engineering (IJRTE), 2019
In the digital world today, data is growing tremendously. The onus lies on the network to compute, process, transfer and store this data. There is a direct proportional relationship between the size of data to the efficiency of a given system. The major challenge that systems face today is the size of data thereby the vision of these systems is to compress the data to its maximum so that the storage space, processing time is reduced thereby making the system effective. As DC ideas result to viable usage of accessible storage space and effective transfer speed, various methodologies were created in a few angles. An itemized overview on many existing DC strategies is expressed to address the present necessities in lieu of the information quality, coding plans and applications in request to break down how DC strategies and its applications have advanced a similar investigation is performed to recognize the commitment of inspected procedures as far as their qualities, fundamental ideas,...
Journal of the Chinese Institute of Engineers, 2012
The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden.
Bell Labs Technical Journal, 2007
Current cell phones are capable of accessing the Internet, downloading music and games, taking pictures, and recording videos. These activities can easily result in a huge amount of multimedia content requiring tens or hundreds of gigabytes of storage, which is several orders of magnitude more than the storage capacity of today's cell phones. In this paper, we propose a novel remote storage solution to increase the amount of storage accessible to a mobile user. Our solution consists of two components: a client program running on the cell phone and a remote server to manage remote storage. The novelty of our solution results from the client program that makes the accesses to the remote storage completely transparent and seamless to the user. Thus, the user's perception of the amount of storage available on the cell phone is considerably higher than the actual capacity of the phone. In addition, the client program implements sophisticated caching algorithms and wireless optimizations to ensure that the user-perceived performance is not significantly impacted. © 2007 Alcatel-Lucent.
Due to the booming growth of Information and Communication Technology (ICT), a vast amount of data is produced at a considerably high rate and it drives the traditional methods of storing data to its limits and most of the time it simply overwhelms the current storage systems. Because of that throughout the history of the development of ICT, the effort to find an efficient and feasible data storage system that has a substantial capacity to cater the current data storage needs has been relentless.Currently available shared storage devices are mostly file servers and Peer-to-Peer systems which are organized in various different architectures but there are certain areas that pose problems in implementing such systems at small scale and also at enterprise level. Networked Shared Storage System(NSS) is introduced as a system motivated by that historical desire to achieve the ultimate reliable and secure storage media and it represents the way ahead in discovering the ultimate solution for this long lasting problem. NSS is a Local Area Network (LAN) based, secure and reliable distributed storage system. Its primary objective is to use the free local hard disk space available in the workstations connected to a LAN, as its storage media. This is achieved by the radical but fail proof method of breaking down the single file in to a set of data chunks and distributing them throughout the LAN. These chunks are then remerged to reproduce the original file at the users’ request. Through this method the largely the unutilized free disk space of nodes connected to the LAN is used to create a free disk space pool that will serve the storage needs of the users of that same network, rather than incorporating separate data servers for that. A LAN based storage system is invariably challenged by the inherent unavailability of the nodes of a LAN. But the NSS overcomes this problem via a robust and efficient data replication algorithm that makes replicas of the data chunks when storing them. Thus providing a high percentage of availability/reliability for the stored data.Peer-to-Peer communication is used when distributing the chunked data throughout the network via the embedded FTP servers. This architecture will minimize the security issues and protect the privacy of data which is greatly challenged in a LAN based environment. NSS is highly scalable and applicable for both medium scale LAN and its enterprise level equivalent, with no additional modification to the architecture and with lesser cost and effort than most existing solutions (Cloud Servers and Server Farms). Thus making it the way ahead in achieving the ultimate storage m
In this paper, we show that coding can be used in storage area networks (SANs) to improve various quality of service metrics under normal SAN operating conditions, without requiring additional storage space. For our analysis, we develop a model which captures modern characteristics such as constrained I/O access bandwidth limitations. Using this model, we consider two important cases: single-resolution (SR) and multi-resolution (MR) systems. For SR systems, we use blocking probability as the quality of service metric and propose the network coded storage (NCS) scheme as a way to reduce blocking probability. The NCS scheme codes across file chunks in time, exploiting file striping and file duplication. Under our assumptions, we illustrate cases where SR NCS provides an order of magnitude savings in blocking probability. For MR systems, we introduce saturation probability as a quality of service metric to manage multiple user types, and we propose the uncoded resolutionaware storage (URS) and coded resolution-aware storage (CRS) schemes as ways to reduce saturation probability. In MR URS, we align our MR layout strategy with traffic requirements. In MR CRS, we code videos across MR layers. Under our assumptions, we illustrate that URS can in some cases provide an order of magnitude gain in saturation probability over classic non-resolution aware systems. Further, we illustrate that CRS provides additional saturation probability savings over URS.
… Research TR-2006 …, 2006
Remote Differential Compression (RDC) protocols can efficiently update files over a limitedbandwidth network when two sites have roughly similar files; no site needs to know the content of another's files a priori. We present a heuristic approach to identify and transfer the file differences that is based on finding similar files, subdividing the files into chunks, and comparing chunk signatures. Our work significantly improves upon previous protocols such as LBFS and RSYNC in three ways. Firstly, we present a novel algorithm to efficiently find the client files that are the most similar to a given server file. Our algorithm requires 96 bits of meta-data per file, independent of file size, and thus allows us to keep the metadata in memory and eliminate the need for expensive disk seeks. Secondly, we show that RDC can be applied recursively to signatures to reduce the transfer cost for large files. Thirdly, we describe new ways to subdivide files into chunks that identify file differences more accurately. We have implemented our approach in DFSR, a state-based multimaster file replication service shipping as part of Windows Server 2003 R2. Our experimental results show that similarity detection produces results comparable to LBFS while incurring a much smaller overhead for maintaining the metadata. Recursive signature transfer further increases replication efficiency by up to several orders of magnitude.
Replicating data off-site is critical for disaster recovery reasons, but the current approach of transferring tapes is cumbersome and error-prone. Replicating across a wide area network (WAN) is a promising alternative, but fast network connections are expensive or impractical in many remote locations, so improved compression is needed to make WAN replication truly practical. We present a new technique for replicating backup datasets across a WAN that not only eliminates duplicate regions of files (deduplication) but also compresses similar regions of files with delta compression, which is available as a feature of EMC Data Domain systems. Our main contribution is an architecture that adds stream-informed delta compression to already existing deduplication systems and eliminates the need for new, persistent indexes. Unlike techniques based on knowing a file’s version or that use a memory cache, our approach achieves delta compression across all data replicated to a server at any time in the past. From a detailed analysis of datasets and hundreds of customers using our product, we achieve an additional 2X compression from delta compression beyond deduplication and local compression, which enables customers to replicate data that would otherwise fail to complete within their backup window.
Proc. Int. Web Caching Workshop, 1999
This paper comtemplates two important information-provisioning pradigms-caching and replication, and proposes a new approach on network storage system in order to meet the increasing demands for large les. The key idea of our system is to have two tailor-made servers for large and small object, respectively. Modi ed proxy cache is responsible for store and deliver small objects. For large objects, we introduce a replication mechanism. By replicating large objects in a dedicated server and relaying through a proxy cache, storage can be fully utilized. Large objects can be accessed without a priori knowledge of server location. Our system provides a scalable solution to the continuing growth of tra c volume while it retains transparency for users. It is also designed to ensure compatibility with existing networking standards, applications and system software.
2004
This paper presents a distributed mobile storage system designed for storage elements connected by a network of non-uniform quality. Flexible data placement is crucial, and it leads to challenges for locating data and keeping it consistent. Our system employs a locationand topology-sensitive multicast-like solution for locating data, lazy peer-to-peer propagation of invalidation information for ensuring consistency, and a distributed snapshot mechanism for supporting sharing. The combination of these mechanisms allows a user to make the most of what a non-uniform network has to offer in terms of gaining fast access to fresh data, without incurring the foreground penalty of keeping distributed elements on a weak network consistent.
IAEME PUBLICATION, 2015
Cloud computing promises to increase the velocity with which applications are deployed. Data Compression in cloud computing deals with reducing the storage space and providing privacy for users .Each authorized user is able to get an individual token of their file from duplicate check based on the privileges. Authorized user is able to use his/her individual private keys to generate query and hence attributes are attached along with the file. Attributes are found in the private cloud and hence control immediately passes to the private cloud, where duplicate check is performed. Data stored in the public cloud is accessed only by the authorized users by providing different encryption privilege keys. Convergent and symmetric encryption techniques produce identical cipher text that results in minimum overhead. Proof of reliability assures a verifier via a proof that a user’s file is available.
MASCOTS, 2013
Data compression and decompression utilities can be critical in increasing communication throughput, reducing communication latencies, achieving energy-efficient communication, and making effective use of available storage. This paper experimentally evaluates several such utilities for multiple compression levels on systems that represent current mobile platforms. We characterize each utility in terms of its compression ratio, compression and decompression throughput, and energy efficiency. We consider different use cases that are typical for modern mobile environments. We find a wide variety of energy costs associated with data compression and decompression and provide practical guidelines for selecting the most energy efficient configurations for each use case.
— Cloud computing is a rapidly growing computation paradigm. Cloud Computing combines all the Information Technology (IT) capabilities. As it has turned into the standard for cloud suppliers to have numerous server farms the world over, noteworthy requests exist for between server farm's information moves in extensive volumes, e.g., relocation of huge information. A test emerges on the best way to plan the mass information exchanges at various criticalness levels, keeping in mind the end goal to completely use the accessible between server farm's data transfer capacity. The model mass information exchange framework in light of the open stream outline work. The Bulk Data Transfer (BDT) framework lives over the vehicle layer in the system layer and stack, with no unique necessity on the lower transport layer and organizing layer conventions, where the standard TCP/IP stack is embraced. It applies the time extension chart method which is later embraced by Postcard, whose calculation overhead keeps the planning from happening as every now and again as our own. Our propose a dependable and productive basic mass information move administration in a between server farm's system, including ideal directing for particular lumps after some time, which can be incidentally put away at transitional server farm's and sent at precisely figured times. Our proposed objectives are optimizing the data transfer by using effective snappy compression/ decompression techniques in client's side. Snappy has high compression ratio and input/output performance. Compression will be done via block size, tag, offset, length, index, encoding. It reduces the time because, it is a lossless algorithm. Hence we propose an improvement in data movement by using the techniques snappy and BDT. Keywords— Snappy data compression/decompression; optimizing data movement; Bulk Data Transfer (BDT))
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.