Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2013, 2013 IEEE International Symposium on Information Theory
…
5 pages
1 file
Distributed storage systems need to store data redundantly in order to provide some fault-tolerance and guarantee system reliability. Different coding techniques have been proposed to provide the required redundancy more efficiently than traditional replication schemes. However, compared to replication, coding techniques are less efficient for repairing lost redundancy, as they require retrieval of larger amounts of data from larger subsets of storage nodes. To mitigate these problems, several recent works have presented locally repairable codes designed to minimize the repair traffic and the number of nodes involved per repair. Unfortunately, existing methods often lead to codes where there is only one subset of nodes able to repair a piece of lost data, limiting the local repairability to the availability of the nodes in this subset.
IEEE Transactions on Information Theory, 2000
Regenerating codes are a class of recently developed codes for distributed storage that, like Reed-Solomon codes, permit data recovery from any subset of k nodes within the n-node network. However, regenerating codes possess in addition, the ability to repair a failed node by connecting to an arbitrary subset of d nodes. It has been shown that for the case of functional-repair, there is a tradeoff between the amount of data stored per node and the bandwidth required to repair a failed node. A special case of functional-repair is exact-repair where the replacement node is required to store data identical to that in the failed node. Exact-repair is of interest as it greatly simplifies system implementation.
2014 IEEE International Symposium on Information Theory, 2014
The repair locality of a storage code is the maximum number of nodes that may be contacted during the repair of a failed node. Having small repair locality is desirable since it is proportional to the number of disk accesses required during a node repair, which for certain applications seems to be the main bottleneck. However, recent publications show that small repair locality comes with a penalty in terms of code distance or storage overhead, at least if exact repair is required. Here, we first review some of the recent work on possible (information-theoretical) trade-offs between repair locality and other code parameters like storage overhead (or, equivalently, coding rate) and code distance, which all assume the exact repair regime. Then, we present some new information theoretical lower bounds on the storage overhead as a function of the repair locality, valid for most common coding and repair models.
IEEE Information Theory Workshop 2010 (ITW 2010), 2010
We consider the setting of data storage across n nodes in a distributed manner. A data collector (DC) should be able to reconstruct the entire data by connecting to any k out of the n nodes and downloading all the data stored in them. When a node fails, it has to be regenerated back using the existing nodes. An obvious means of accomplishing this is to use a Reed-Solomon type MDS code where each node stores a single finite field symbol and where one downloads the entire file for regeneration of a failed node. However, storing vectors in place of symbols makes it easy to extract partial information from a node, and helps in reducing the amount of download required for regeneration of a failed node, termed as repair bandwidth.
2014
Google, Amazon, and other services store data in multiple geographically separated disks called nodes, among other reasons, to safeguard the data from node failures. Standard techniques for such a distributed way of storage include multiple backups (typically triple replication) or using erasure codes such as Reed-Solomon codes. The latter codes are the most space-efficient for a targeted worst-case number of simultaneous node failures. They are extremely inefficient however for repairing the frequently occurring single node failure. Replication provides the most cost-effective repair in this scenario but ultimately is an unwise option in today’s data proliferation. New erasure codes are therefore required to simultaneously optimize storage efficiency, worst-case resilience and repair costs for single node failures. This dissertation looks at two such erasure codes: regenerating codes, which optimize the communication costs, and locally repairable codes (LRCs), which optimize the I/...
2013
We consider the design of regenerating codes for distributed storage systems that enjoy the property of local, exact and uncoded repair, i.e., (a) upon failure, a node can be regenerated by simply downloading packets from the surviving nodes and (b) the number of surviving nodes contacted is strictly smaller than the number of nodes that need to be contacted for reconstructing the stored file.
2009
Erasure coding techniques are used to increase the reliability of distributed storage systems while minimizing storage overhead. Also of interest is minimization of the bandwidth required to repair the system following a node failure. In a recent paper, Wu et al. characterize the tradeoff between the repair bandwidth and the amount of data stored per node. They also prove the existence of regenerating codes that achieve this tradeoff.
IEEE INFOCOM 2014 - IEEE Conference on Computer Communications, 2014
The reliability of erasure-coded distributed storage systems, as measured by the mean time to data loss (MTTDL), depends on the repair bandwidth of the code. Repair-efficient codes provide reliability values several orders of magnitude better than conventional erasure codes. Current state of the art codes fix the number of helper nodes (nodes participating in repair) a priori. In practice, however, it is desirable to allow the number of helper nodes to be adaptively determined by the network traffic conditions. In this work, we propose an opportunistic repair framework to address this issue. It is shown that there exists a threshold on the storage overhead, below which such an opportunistic approach does not lose any efficiency from the optimal storage-repairbandwidth tradeoff; i.e. it is possible to construct a code simultaneously optimal for different numbers of helper nodes. We further examine the benefits of such opportunistic codes, and derive the MTTDL improvement for two repair models: one with limited total repair bandwidth and the other with limited individual-node repair bandwidth. In both settings, we show orders of magnitude improvement in MTTDL. Finally, the proposed framework is examined in a network setting where a significant improvement in MTTDL is observed.
IEEE ACM Transactions on Networking, 2018
A significant amount of research on using erasure coding for distributed storage has focused on reducing the amount of data that needs to be transferred to replace failed nodes. This continues to be an active topic as the introduction of faster storage devices looks to put an even greater strain on the network. However, with a few notable exceptions, most published work assumes a flat, static network topology between the nodes of the system. We propose a general framework to find the lowest cost feasible repairs in a more realistic, heterogeneous and dynamic network, and examine how the number of repair strategies to consider can be reduced for three distinct erasure codes. We devote a significant part of the paper to determining the set of feasible repairs for random linear network coding (RLNC) and describe a system of efficient checks using techniques from the arsenal of dynamic programming. Our solution involves decomposing the problem into smaller steps, memorizing, and then reusing intermediate results. All computationally intensive operations are performed prior to the failure of a node to ensure that the repair can start with minimal delay, based on up-todate network information. We show that all three codes benefit from being network aware and find that the extra computations required for RLNC can be reduced to a viable level for a wide range of parameter values.
— In distributed storage systems, regenerating codes can achieve the optimal tradeoff between storage capacity and repair bandwidth. However, a critical drawback of existing regenerating codes, in general, is the high coding and repair complexity , since the coding and repair processes involve expensive multiplication operations in finite field. In this paper, we present a design framework of regenerating codes, which employ binary addition and bitwise cyclic shift as the elemental operations, named BASIC regenerating codes. The proposed BASIC regenerating codes can be regarded as a concatenated code with the outer code being a binary parity-check code, and the inner code being a regenerating code utilizing the binary parity-check code as the alphabet. We show that the proposed functional-repair BASIC regenerating codes can achieve the fundamental tradeoff curve between the storage and repair bandwidth asymptotically of functional-repair regenerating codes with less computational complexity. Furthermore, we demonstrate that the existing exact-repair product-matrix construction of regenerating codes can be modified to exact-repair BASIC product-matrix regenerating codes with much less encoding, repair, and decoding complexity from the theoretical analysis, and with less encoding time, repair time, and decoding time from the implementation results.
EuroSys 2016
With the explosion of data in applications all around us, erasure coded storage has emerged as an attractive alternative to replication because even with significantly lower storage overhead, they provide better reliability against data loss. Reed-Solomon code is the most widely used erasure code because it provides maximum reliability for a given storage overhead and is flexible in the choice of coding parameters that determine the achievable reliability. However, reconstruction time for unavailable data becomes prohibitively long mainly because of network bottlenecks. Some proposed solutions either use additional storage or limit the coding parameters that can be used. In this paper, we propose a novel distributed reconstruction technique, called Partial Parallel Repair (PPR), which divides the reconstruction operation to small partial operations and schedules them on multiple nodes already involved in the data reconstruction. Then a distributed protocol progressively combines these partial results to reconstruct the unavailable data blocks and this technique reduces the network pressure. Theoretically, our technique can complete the network transfer in ⌈(log2(k + 1))⌉ time, compared to k time needed for a (k, m) Reed-Solomon code. Our experiments show that PPR reduces repair time and degraded read time significantly. Moreover, our technique is compatible with existing erasure codes and does not require any additional storage overhead. We demonstrate this by overlaying PPR on top of two prior schemes, Local Reconstruction Code and Rotated Reed-Solomon code, to gain additional savings in reconstruction time.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
2014 IEEE International Symposium on Information Theory, 2014
2013 Information Theory and Applications Workshop (ITA), 2013
Advances in Mathematics of Communications
IEEE Transactions on Information Theory, 2000
Science China Information Sciences, 2018
2010 Information Theory and Applications Workshop (ITA), 2010
IEEE Transactions on Computers, 2008
2014 IEEE International Symposium on Information Theory, 2014
Computing Research Repository, 2010
IEEE Transactions on Information Theory, 2000
2016 IEEE International Symposium on Information Theory (ISIT), 2016
Computer Networks, 2017
International Journal of Innovative Research in Computer and Communication Engineering, 2015
arXiv (Cornell University), 2022