0% found this document useful (0 votes)
48 views19 pages

United States: (12) Patent Application Publication (10) Pub. No.: US 2011/0289351 A1

This patent application describes a distributed storage system that stores source data across storage nodes in an encoded and redundant manner. When a storage node fails, it can be repaired by downloading data from other existing nodes in a way that minimizes storage space and network bandwidth usage. The system allows end users to recover the source data by connecting to subsets of storage nodes. It employs erasure coding and regenerating codes to distribute, store, and repair data in an efficient manner.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views19 pages

United States: (12) Patent Application Publication (10) Pub. No.: US 2011/0289351 A1

This patent application describes a distributed storage system that stores source data across storage nodes in an encoded and redundant manner. When a storage node fails, it can be repaired by downloading data from other existing nodes in a way that minimizes storage space and network bandwidth usage. The system allows end users to recover the source data by connecting to subsets of storage nodes. It employs erasure coding and regenerating codes to distribute, store, and repair data in an efficient manner.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

US 20110289351A1

( 19) United Sta tes


( 12) Pa tent Applica tion Publica tion ( 10) Pub. N o. : US 2011/0289351 A1
RASHMI et a l. ( 4 3) Pub. D a te: N OV. 24 , 2011
( 54 ) D ISTRIBUTED STORAGE SYSTEM AN D A Publica tion C la s s i? ca tion
METHOD THEREOF ( 51) Int- C l
( 7 5) l t K V RASHMI ( US) N 'h B G06F 11/20 ( 2006. 01)
nven ors : . . ' 1 a r .
52 US. C l. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 14 /632~ 7 14 /E11. 084
SHAH, ( US) ; P. Vij a y KUMAR, ( )
( Us ) ( 57 ) ABSTRAC T
_ Embodiments of the dis clos ure rela te to a dis tributed s tora ge
( 7 3) As s lgnee? IN D IAN IN STITUTE OF s y s tem compris ing of s tora ge nodes s toring s ource da ta
SC IEN C E, Ba nga lore ( IN ) a mongs t them in a coded, a nd ty pica lly , redunda nt ma nner.
The da ta to be s tored in the s tora ge nodes ma y be dis s emi
( 21) Appl. N o. : 13/110, 534 na ted from the s ource a cros s the network in a dis tributed
ma nner. The s y s tem a ls o includes end us ers who recover the
( 22) Filed; Ma y 18, 2011 s ource da ta by connecting to s ubs ets of the s tora ge nodes . A
fa iled s tora ge node ma y be repa ired by downloa ding da ta
( 30) Foreign Applica tion Priority D a ta from s ubs ets of ex is ting nodes . The s tora ge s pa ce req uired in
the nodes , a nd the network ba ndwidth utiliZ ed for repa ir a re
Ma y 21, 2010 ( IN ) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 21/C HE/2010 minimiZ ed
114 m0
106
End [ us er
106 102
Stora ge node
116
Repla cement
node
Source W
da ta '
I 110 112
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 1 0f 6 US 2011/0289351 A1
HBO
we
End ueer
Q D RB
106 102
Stora ge nq q e
N etwork
Repla cement
node
114
Source
H03 112
Figure 1
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 2 0f 6 US 2011/0289351 A1
Mes s a ge C ons truction o1b All > Encoding
Sy mbols mes s a ge ma trix '
Figure 2
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 3 0f 6 US 2011/0289351 A1
nd Us er
~ ( a te coilector)
node ik
a ta recovery
nodes
Figure 3[ \
D a ta recons truction block a t end- us er
w? 1' , 7 , 7 1
Input m if . s
from k m. j rj U Era s ure Hr _ Source
da ta * " decoding " da ta
recovery .
nodes r I
- Ji'j r 7
+ 311;
Figure 3B
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 4 0f 6 US 2011/0289351 A1
b1
Helper nodes
Figure 4 A
Heiper N ode:
FYI. As s is ting Block
Figure 4 B
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 5 0f 6 US 2011/0289351 A1
Repa ir Block a t repia cement nply de
Input 1; ? Eur [ 1.
- 1
from it; a ir/ilk 151w
Figure 4 C
N ode 1 N a me 2 N ix ie 3 i ? ii? i 1i
K2 gm, 2114 m? a + my % + m?
* y
K m2 m5 m8 332 + m5 ~+ ~ mg
/ R23 { 116 m5 mg + mg 11 m3
m3 mq ? ny nra mi? my iimg
a m m3 m2 " '1' m2 + m3 mii + 2m; + 3mg mi + 3m? + 2mg
: 21. 1% m5 m4 + m5 7 + mg mgi + 21115 ~ ~ 31: 15 115 1+ 31115 + 2ma
m; m3 m; + ms + mg m; < 1 21% + 3mg m; + Ema + 2mg
N ude % Ma de 2 N a cie 3 iimz ie d N a de 5
Figure 5
Pa tent Applica tion Publica tion N ov. 24 , 2011 Sheet 6 0f 6 US 2011/0289351 A1
Kmies W i? i co? mi da ta
a 4 ! a An P1 0 a t. T E. u nu
g
Fiod? - W iiimm da m
Figure 6
US 2011/0289351A1
D ISTRIBUTED STORAGE SYSTEM AN D A
METHOD THEREOF
TEC HN IC AL FIELD
[ 0001] Embodiments of the pres ent dis clos ure rela te to dis
tributed s tora ge networks . More pa rticula rly , the embodi
ments rela te to a n optimiz ed dis tributed s tora ge s y s tem for
repa iring fa iled s tora ge nodes a nd recons tructing the s ource
da ta from the da ta s tored in s ubs ets of nodes .
BAC KGROUN D
[ 0002] In a dis tributed s tora ge s y s tem, informa tion perta in
ing to a da ta ? le is dis pers ed a cros s nodes in a netW ork in s uch
a ma nner tha t a n end- us er ca n retrieve the da ta s tored by
ta pping into a s ubs et of the nodes . A popula r option tha t
reduces netW ork conges tion a nd tha t lea ds to increa s ed res il
iency in the fa ce of node fa ilures , is to employ era s ure coding,
for ex a mple by ca lling upon ma x imum- dis ta nce- s epa ra ble
( MD S) codes s uch a s Reed- Solomon ( RS) codes . Let B be the
tota l ? le s iz e mea s ured in terms of s y mbols over a ? nite ? eld
Fq of s iz e q . W ith RS codes , da ta is s tored a cros s n nodes in the
netW ork in s uch a W a y tha t the entire da ta ca n be recovered by
a da ta collector by connecting to a ny k nodes , a proces s of
da ta recovery tha t W e W ill refer to a s recons truction. Severa l
dis tributed s tora ge s y s tems s uch a s RAID - 6, Ocea nStore a nd
Tota l Reca ll employ s uch a n era s ure- coding option.
[ 0003] Regenera ting a lgorithms or codes in a dis tributed
s tora ge netW ork perform the regenera tion of a fa iled node.
Upon fa ilure of a n individua l node, a s elf- s us ta ining da ta
s tora ge netW ork mus t neces s a rily pos s es s the a bility to regen
era te ( or repa ir) a fa iled node. An obvious mea ns to a ccom
plis h this is to permit the repla cement node to connect to a ny
k nodes , doW nloa d the entire da ta , a nd ex tra ct the da ta tha t
W a s s tored in the fa iled node. But doW nloa ding the entire B
units of da ta in order to recover the da ta s tored in a s ingle node
tha t s tores only a fra ction of the entire da ta ? le is W a s teful, a nd
ra is es the q ues tion a s to W hether there is a better option.
[ 0004 ] C onventiona l RS codes trea t ea ch fra gment s tored in
a node a s a s ingle s y mbol belonging to the ? nite ? eld Fq . It ca n
be s hoW n tha t W hen individua l nodes a re res tricted to perform
only linea r opera tions over Fq , the tota l a mount of da ta doW n
loa d needed to repa ir a fa iled node ca n be no s ma ller tha n B,
the s iz e of the entire ? le. In contra s t, regenera ting codes a re
codes over a vector a lpha bet a nd hence trea t ea ch fra gment a s
being compris ed of a s y mbols over the ? nite ? eld Fq . Linea r
opera tions over R, in this ca s e, permit the tra ns fer of a fra ction
of the da ta s tored a t a pa rticula r node. Apa rt from this neW
pa ra meter 0t, tW o other pa ra meters ( d, [ 3) a re a s s ocia ted W ith
regenera ting codes . Under the de? nition of regenera ting
codes , a fa iled node is permitted to connect to a ? x ed number
d of the rema ining nodes W hile doW nloa ding [ 32a s y mbols
from ea ch node. This proces s is termed a s regenera tion a nd
the tota l a mount d[ 3 of da ta doW nloa ded for repa ir purpos es a s
the repa ir ba ndW idth. Ty pica lly , W ith a regenera ting code, the
a vera ge repa ir ba ndW idth is s ma ll compa red to the s iz e of the
? le B.
[ 0005] It W ill be a s s umed throughout the des cription, tha t
for [ n, k, d] regenera ting code, k, d a re the minimum va lues
under W hich recons truction a nd regenera tion ca n a lW a y s be
gua ra nteed. This res tricts the ra nge of d to
k d n- l ( 1)
[ 0006] The ? rs t ineq ua lity a ris es beca us e if the regenera
tion pa ra meter d W ere les s tha n the da ta - recons truction
N ov. 24 , 2011
pa ra meter k then one could, in fa ct, recons truct da ta by con
necting to a ny d< k nodes ( trea ting the da ta s tored in every
other node a s a function of tha t s tored in thes e d nodes )
thereby contra dicting the minima lity of k. Fina lly , W hile a
regenera ting code over Fq is a s s ocia ted W ith the collection of
pa ra meters
it W ill be found more convenient to rega rd pa ra meters { n, k,
d} a s prima ry a nd { ( x , [ 3, B} a s s econda ry a nd thus W e W ill
ma ke freq uent references in the s eq uel, to a code W ith thes e
pa ra meters a s a n [ n, k, d] regenera ting code ha ving pa ra meter
s et ( 0t, [ 3, B) .
[ 0007 ] Ex a ct vers us Functiona l Regenera tion: In the con
tex t of a regenera ting code, by functiona l regenera tion of a
fa iled node v, W e W ill mea n, repla cement of the fa iled node by
a neW node v' in s uch a W a y tha t folloW ing repla cement, the
res ulting netW ork of n nodes continues to pos s es s the recon
s truction a nd regenera tion properties . In contra s t, by ex a ct
regenera tion mea ns repla cement of a fa iled node vby a
repla cement node v' tha t s tores ex a ctly the s a me da ta a s W a s
s tored in node v. An ex a ct- regenera ting code is a regenera ting
code tha t ha s the ca pa bility of ex a ctly regenera ting ea ch
ins ta nce of a fa iled node. C lea rly W here it is pos s ible, a n
ex a ct- regenera tion code is to be preferred over a functiona l
regenera tion code s ince under functiona l regenera tion; there
is need for the netW ork to inform a ll nodes in the netW ork of
the repla cement, W herea s this is clea rly unneces s a ry under
ex a ct regenera tion.
[ 0008] C ut- Set Bound a nd the Stora ge- Repa ir- Ba ndW idth
Tra de- off ha s a ma j or res ult in the ? eld of regenera ting codes ,
tha t us es the cut- s et bound of netW ork coding to es ta blis h tha t
the pa ra meters of a functiona l regenera ting code mus t neces
s a rily s a tis fy :
1H ( 2)
[ 0009] It is des ira ble to minimiz e both 0t a s W ell a s [ 3 s ince
minimiz ing a res ults in a minimum s tora ge s olution W hile
minimiz ing [ 3 ( for ? x ed d) res ults in a s tora ge s olution tha t
minimiz es repa ir ba ndW idth. It ca n be deduced from ( 2) tha t
it is not pos s ible to minimiz e both 0t a nd [ 3 s imulta neous ly a nd
there is in fa ct a tra deoffbetW een choices of the pa ra meters 0t
a nd [ 3. The tW o ex treme points in this tra deoff a re termed the
minimum s tora ge regenera tion ( MSR) a nd minimum ba nd
W idth regenera tion ( MBR) points res pectively . The pa ra m
eters 0t a nd [ 3 for the MSR point on the tra deoff ca n be
obta ined by ? rs t minimiz ing 0t a nd then minimiz ing [ 3 to
obta in
_ B ( 3)
a _ I,
_ B
B _ k( d - k + 1)
[ 0010] Revers ing the order, lea ds to the MBR point W hich
thus corres ponds to
US 2011/0289351Al
[ 0011] W e de? ne a n optima l [ n, k, d] regenera ting code a s
a code W ith pa ra meters ( 0t, [ 3, B) s a tis fy ing the tW in req uire
ments tha t
l) the pa ra meter s et ( 0t, [ 3, B) a chieves the cut- s et bound W ith
eq ua lity a nd
2) decrea s ing either 0t or [ 3 W ill res ult in a neW pa ra meter s et
tha t viola tes the cut s et bound.
[ 0012] An MSR code is then de? ned a s a n [ n, k, d] regen
era ting code W hos e pa ra meters ( 0t, [ 3, B) s a tis fy ( 3) a nd s imi
la rly , a n MBR code a s one W ith pa ra meters ( 0t, [ 3, B) s a tis fy
ing ( 4 ) . C lea rly , both MSR a nd MBR codes a re optima l
regenera ting codes .
[ 0013] The na ture of the cut- s et bound permits a divide
a nd- conq uer a pproa ch to be us ed in the a pplica tion of optima l
regenera ting codes to la rge ? le s iZ es , thereby s implify ing
s y s tem implementa tion
[ 0014 ] Given a n optima l [ n, k, d] regenera ting code W ith
pa ra meter s et ( 0t, [ 3, B) , a s econd optima l regenera ting code
W ith pa ra meter s et ( 00: 6 0t, [ 3': 6 [ 3, B': 6 B) for a ny pos itive
integer 6 ca n be cons tructed, by dividing the 6 B mes s a ge
s y mbols into 6 groups of B s y mbols ea ch, a nd a pply ing the
( 0t, [ 3, B) code to ea ch group independently . Secondly , a
common fea ture of both MSR a nd MBR regenera ting codes is
tha t in either ca s e, their pa ra meter s et ( 0t, [ 3, B) is s uch tha t
both 0t a nd B a re multiples of [ 3. It folloW s tha t if one ca n
cons truct a n ( optima l) [ n, k, d] MSR or MBR code W ith [ 3: 1,
then one ca n cons truct a n ( optima l) [ n, k, d] MSR or MBR
code for a ny la rger va lue of [ 3.
[ 0015] The problem of minimiZ ing repa ir ba ndW idth for
functiona l repa ir of a fa iled s tora ge node is cons idered, W here
the evolution of the s tora ge netW ork through a s eq uence of
fa ilures a nd regenera tions is repres ented a s a netW ork, W ith a ll
pos s ible da ta collectors repres ented a s s inks . The da ta - recon
s truction req uirement is formula ted a s a multica s t netW ork
coding problem, W ith the netW ork ha ving a n in? nite number
of nodes . The cut- s et a na ly s is of this netW ork lea ds to the
rela tion betW een the pa ra meters of a regenera ting code given
in eq ua tion ( 2) . It ca n be s een tha t there is a tra deoff betW een
the choice of the pa ra meters 0t a nd [ 3 for a ? x ed B a nd this is
termed a s the s tora ge pa ir ba ndW idth tra deoff. This tra deoff is
a chieva ble under functiona l- regenera tion. HoW ever, the cod
ing s chemes s ugges ted a re not ex plicit a nd req uire la rge ? eld
s iz e.
[ 0016] A code s tructure tha t gua ra ntees ex a ct- regenera tion
of j us t the s y s tema tic nodes is provided for the MSR point
W ith pa ra meters [ n: d+ l, k, d>: 2k l] . This code ma kes us e
of interference a lignment, a nd is termed a s the MISER code.
Subs eq uently , it W a s s hoW n tha t for this s et of pa ra meters a nd
the code introduced for ex a ct- regenera tion of only the s y s
tema tic nodes ca n a ls o be us ed to repa ir the non- s y s tema tic
( pa rity ) node fa ilures ex a ctly provided repa ir cons truction
s chemes a re a ppropria tely des igned. Such a n ex plicit repa ir
s cheme is indeed des igned, W hich conta ins a n ex a ct- regener
a ting MSR code for pa ra meter s et [ n: 5, k: 3, d: 4 ] . A proof of
non- a chieva bility of the cut- s et bound on ex a ct- regenera tion
a t the MSR point, for the pa ra meters [ n, k, d< 2k 3] W hen
N ov. 24 , 2011
[ 3: 1 On the other ha nd, the MSR point is s hoW n to be a chiev
a ble in the limiting ca s e of B a pproa ching in? nity ( i. e. , [ 3
a pproa ching in? nity ) .
SUMMARY
[ 0017 ] The s hortcomings of the prior a rt a re overcome a nd
a dditiona l a dva nta ges a re provided through the provis ion of a
method a nd s y s tem a s des cribed in the des cription.
[ 0018] In one embodiment, the dis clos ure provides a dis
tributed da ta s tora ge netW ork. The netW ork compris es of a
plura lity of s tora ge nodes . An encoder is con? gured to obta in
a nd a s s ign pre- determined a mount of da ta to be s tored in ea ch
s tora ge node in the netW ork. An end us er compris ing of da ta
recons truction block con? gured to obta in s ource da ta by con
necting to one of a plura lity of s ubs ets of the s tora ge nodes .
Als o, W hen a s tora ge node fa ils , it is repla ced by a neW node
a nd cons is ting of a repa ir block for retrieving the da ta s tored
da ta s tored in the fa iled node by connecting to one of a
plura lity of s ubs ets of s tora ge nodes . Further, ea ch s tora ge
node cons is ts of a da ta a s s is ting block to compute a nd tra ns
mit da ta to a s s is t in the repa ir of fa iled nodes .
[ 0019] In a nother embodiment, the dis clos ure provides a
method of obta ining a dis tributed s tora ge netW ork. The
method compris es of performing coding opera tion on s ource
da ta of s iZ e B for obta ining a s et of coded da ta , a s s igning ea ch
node W ith a prede? ned a mount of coded da ta ( 0t) , performing
da ta recons truction opera tion a t the da ta recons truction block
a t a n end us er by connecting to a predetermined number of
s tora ge nodes ( k) , performing repa ir opera tion a t repa ir block
of the neW node repla cing a fa iled s tora ge node by retrieving
da ta s tored in the fa iled node by doW nloa ding prede? ned
a mount of coded da ta ( [ 3) from a predetermined number of
s tora ge nodes ( d) ; a nd a s s is ting in the repa ir of fa iled node by
performing a computa tion by the repa ir a s s is ting block of d
s tora ge nodes chos en by s a id neW node a nd tra ns mitting the
res ult of the computa tion to s a id neW node.
BRIEF D ESC RIPTION OF THE
AC C OMPAN YIN G FIGURES
[ 0020] The novel fea tures a nd cha ra cteris tics of the dis clo
s ure a re s et forth in the a ppended cla ims . The embodiments of
the dis clos ure its elf, hoW ever, a s W ell a s a preferred mode of
us e, further obj ectives a nd a dva nta ges thereof, W ill bes t be
unders tood by reference to the folloW ing deta iled des cription
of a n illus tra tive embodiment W hen rea d in conj unction W ith
the a ccompa ny ing ? gures . One or more embodiments a re
noW des cribed, by W a y of ex a mple only , W ith reference to the
a ccompa ny ing ? gures in W hich:
[ 0021] FIG. 1 illus tra tes one embodiment of a dis tributed
s tora ge netW ork W ith s tora ge nodes .
[ 0022] FIG. 2 illus tra tes the block dia gra m of the Encoder.
[ 0023] FIG. 3A illus tra tes da ta recovery by a n end us er
( da ta collector) by connecting to a ny k nodes ,
[ 0024 ] FIG. 3B illus tra tes a block dia gra m of the da ta
recons truction block a t the end- us ers .
[ 0025] FIG. 4 A illus tra tes repa ir of a fa iled node, node f,
W ith the repla cement node f doW nloa ding da ta from a ny d
rema ining nodes .
[ 0026] FIG. 4 B illus tra tes the intera ction betW een the
repla cement node f a nd repa ir a s s is ting block of the helper
node.
[ 0027 ] FIG. 4 C illus tra tes repa irblock a t repla cement node,
a s one embodiment.
US 2011/0289351Al
[ 0028] FIG. 5 illus tra tes a n ex a mple of the tW in code fra me
W ork.
[ 0029] FIG. 6 illus tra tes da ta deploy ment us ing the TW in
code fra mework.
[ 0030] The ? gures depict embodiments of the dis clos ure
for purpos es of illus tra tion only . One s killed in the a rt W ill
rea dily recogniz e from the folloW ing des cription tha t a lterna
tive embodiments of the s tructures a nd methods illus tra ted
herein ma y be employ ed W ithout depa rting from the prin
ciples of the dis clos ure des cribed herein.
D ETAILED D ESC RIPTION
[ 0031] The foregoing ha s broa dly outlined the fea tures a nd
technica l a dva nta ges of the pres ent dis clos ure in order tha t the
deta iled des cription of the dis clos ure tha t folloW s ma y be
better unders tood. Additiona l fea tures a nd a dva nta ges of the
dis clos ure W ill be des cribed hereina fter W hich form the s ub
j ect of the cla ims of the dis clos ure. It s hould be a pprecia ted by
thos e s killed in the a rt tha t the conception a nd s peci? c
embodiment dis clos ed ma y be rea dily utiliZ ed a s a ba s is for
modify ing or des igning other s tructures for ca rry ing out the
s a me purpos es of the pres ent dis clos ure. It s hould a ls o be
rea liZ ed by thos e s killed in the a rt tha t s uch eq uiva lent con
s tructions do not depa rt from the s pirit a nd s cope of the
dis clos ure a s s et forth in the a ppended cla ims . The novel
fea tures W hich a re believed to be cha ra cteris tic of the dis clo
s ure, both a s to its orga niZ a tion a nd method of opera tion,
together W ith further obj ects a nd a dva nta ges W ill be better
unders tood from the following des cription W hen cons idered
in connection W ith the a ccompa ny ing ? gures . It is to be
ex pres s ly unders tood, hoW ever, tha t ea ch of Figures is pro
vided for the purpos e of illus tra tion a nd des cription only a nd
is not intended a s a de? nition of the limits of the pres ent
dis clos ure.
[ 0032] An ex empla ry embodiment of the pres ent dis clos ure
is a s y s tem 100 to ena ble da ta to be s tored in a dis tributed
ma nner compris ing of a s et of n s tora ge nodes 106, ea ch
ha ving a ca pa city of a s y mbols , connected through a netW ork
a nd s toring s y mbols coded from a s et of B s ource s y mbols .
FIG. 1 is a n illus tra tion of the dis tributed s tora ge s y s tem. An
encoder 108 is illus tra ted a s block dia gra m repres enta tion in
the FIG. 2 W hich computes the coded s y mbols to be s tored in
the s tora ge nodes . End- us ers 102, a ls o ca lled da ta collectors ,
ha ving connectivity to a k- s iZ ed s ubs et of the s tora ge nodes
con? gured to decode the s tored coded- s y mbols in order to
recover the da ta . A repla cement 112 for fa iled node 110, us ing
a regenera ting decoder ( repa ir block 114 ) , being a ble to
recover the da ta tha t W a s s tored in the fa iled node by doW n
loa ding [ 3 s y mbols ea ch from a d- s iZ ed s ubs et of the ex is ting
s tora ge nodes .
[ 0033] The encoder 108 is con? gured to encode the s ource
da ta , a nd a s s ign pre- determined a mount of da ta to be s tored in
ea ch s tora ge node 106 in the netW ork 104 . An end us er 102
compris ing of a proces s ing unit con? gured to obta in s ource
da ta by connecting to one of a plura lity of s ubs ets of the
s tora ge nodes S 106. Als o, ea ch s tora ge node 106 cons is ts of
a repa ir block 114 ( RB) or a mea ns for repla cing a ny fa iled
s tora ge node to repa ir a s tora ge node 110 W ith a neW s tora ge
node 112 a nd retrieving the da ta s tored in s a id fa iled node by
connecting to one of a plura lity of s ubs ets of s tora ge nodes .
The s tora ge node 106, 112 a ls o cons is ts of a repa ir a s s is ting
block 116 ( RAB) or mea ns for computing a nd tra ns mitting
da ta to a s s is t in the repa ir of fa iled nodes . The repa ir a s s is ting
block 116 ( RAB) is s hoW n in FIG. 4 B.
N ov. 24 , 2011
[ 0034 ] FIGS. 3A a nd 3B illus tra te the da ta recovery a t the
end us er ( da ta collector) , a nd FIGS. 4 A, 4 8 a nd 4 C illus tra te
repa ir of a fa iled node. The proces s of a da ta collector recov
ering the s ource da ta is ca lled recons truction, a nd the proces s
of repla cement node obta ining the da ta tha t W a s s tored in the
fa iled node is ca lled repa ir or regenera tion. The a mount of
da ta doW nloa d tha t the repla cement node needs to perform is
termed a s the repa ir ba ndW idth, d[ 3.
[ 0035] D a ta D is tribution is us ed to dis tribute da ta , W hich is
genera ted a t a point s ource, a cros s a netW ork to ena ble recon
s truction of da ta in a dis tributed ma nner. A s tora ge node tha t
conta ins the a ppropria te encoded da ta is referred a s a ? lled
node, a nd s tora ge nodes tha t do not conta in a ny da ta a s the
empty nodes .
[ 0036] Initia lly , a ll the s tora ge nodes a re a s s umed to be
empty . Firs t, the da ta s ource encodes the da ta to be s tored in
its neighbouring empty nodes , a nd tra ns mits this da ta to them.
Subs eq uently , the nodes in the neighbourhood of ? lled nodes
doW nloa d da ta from them us ing the regenera tion a lgorithm.
This proces s goes on till a ll the nodes a re ? lled.
[ 0037 ] The tota l a mount of da ta tra ns fer in this proces s is
the minimum pos s ible, a nd eq ua l to the a mount of da ta tra ns
fer req uired if a centra l node W ould ha ve tra ns mitted a ppro
pria te encoded da ta to a ll the s tora ge nodes . HoW ever, the
dis tributed na ture of the da ta dis tribution ha s s evera l a dva n
ta ges . Loa d ba la ncing is a chieved, a nd the tra ? ic is uniform
a cros s the netW ork. The s ource node ca n become una va ila ble
a fter the initia l tra ns mis s ion of coded da ta to the s tora ge
nodes in its neighbourhood. Though the tota l a mount of da ta
tra ns fer is eq ua l, or nea rly eq ua l, in both ca s es , the number of
hops a cros s the netW ork tha t a ny da ta pa cket W ill ha ve to
tra vel in the ca s e of thes e codes W ill be cons idera bly les s er,
thereby reducing the netW ork conges tion.
[ 0038] The da ta s tored in the ith s tora ge node in the s y s tem
is completely determined a s ingle encoding vector, a s com
pa red to a la rge genera tor ma trix in genera l. The encoding
vector s uf? ces for encoding, recons truction, a nd regenera
tion. The s hort length of the encoding vector reduces the
overhea d a s s ocia ted W ith the need for nodes to communica te
their encoding vectors to the D C during recons truction, a nd to
the repla cement node during regenera tion of a fa iled node.
[ 0039] The dis clos ure provides three va ria tions of s uch dis
tributed s tora ge s y s tems , a s des cribed s ubs eq uently . The
choice of the s y s tem to be us ed W ill depend on the tra de- off
betW een the cos ts of s tora ge s pa ce, netW ork ba ndW idth utili
Z a tion a nd connectivity in the netW ork. The ? rs t va ria tion
W orks on the netW ork ba ndW idth utiliZ a tion a nd repa irs a
fa iled node us ing the minimum pos s ible repa ir ba ndW idth.
Als o, this va ria tion minimiZ es the s tora ge s pa ce. The s econd
va ria tion is s uited W hen there is a higher price for s tora ge
s pa ce. It ? rs t minimiZ es the a mount of da ta needed to be
s tored in ea ch node, a nd then reduces the repa ir ba ndW idth to
the minimum pos s ible for this a mount of s tora ge. Thes e tW o
s y s tems ena ble a da ta collector to recover the entire s ource
da ta by doW nloa ding da ta from a ny k nodes , a nd a ls o permit
node repa ir by connecting to a ny d nodes . The ? rs t a nd s econd
va ria tions a re ca lled a s common product- ma trix fra meW ork.
The third va ria tion s imulta neous ly a chieves both the a b s olute
minimum s tora ge, a s W ell a s a bs olute minimum repa ir ba nd
W idth. The third va ria tion is ca lled a s TW in- code fra meW ork.
[ 004 0] All three va ria tions fa ll under a common fra me
W ork, in W hich the s tora ge nodes s tore coded da ta or s y mbols
tha t is ex pres s ed by the ma trix product ex pres s ion IP M. Ea ch
codeW ord in the dis tributed s tora ge code ca n be repres ented
US 2011/0289351 A1
by a n ( nx a ) code ma trix C whos e ith row cti, conta ins a
s y mbols s tored by the ith node. Ea ch code ma trix is a product
C IW M ( 1)
of a n ( n>< d) encoding ma trix II a nd a n ( d>< 0t) mes s a ge ma trix
M. The entries of the ma trix 11) a re ? x ed a priori a nd a re
independent of the mes s a ge s y mbols . The mes s a ge ma trix M
conta ins the B mes s a ge s y mbols , with s ome s y mbols pos s ibly
repea ted. The ith row 1] ) , - of II a re referred a s the encoding
vector of node i a s it is this vector tha t is us ed to encode the
mes s a ge into the form in which it is s tored within the ith node:
C IZ IIIPIM ( 2)
where the s upers cript t is us ed to denote the tra ns pos e of a
ma trix . The choice of T a nd M depends on the va ria tion under
cons idera tion. Ea ch node is a s s ocia ted with a vector ( a row of
1I ) , a nd the encoder multiplies tha t vector with M to obta in
the s y mbols s tored in the node. The s ource s y mbols or da ta
a re from a ? nite ? eld Fq , a nd a ny s tored s y mbol is a linea r
combina tion of thes e s y mbols over Fq . The s iZ e of the ? nite
? eld req uired is of the order of n, a nd no further res trictions
a re impos ed on the ? eld s iZ e.
[ 004 1] The entire da ta is divided into s tripes of s iZ es corre
s ponding to [ 3: 1. Since ea ch s tripe is of minima l s iZ e, the
complex ity of encoding, recons truction a nd regenera tion
opera tions a re cons idera bly lowered, a nd s o a re the buffer
s iZ es req uired a t da ta collectors a nd repla cement nodes . Fur
thermore, the opera tions tha t need to be performed on ea ch
s tripe a re identica l a nd independent, a nd hence ca n be per
formed in pa ra llel e? iciently by a GPU/FPGA/multi- core
proces s or.
[ 004 2] The end- us ers downloa d the da ta s tored in a s ubs et
of k nodes , a nd a ls o downloa d a diges t ( i. e. , a pos s ibly con
dens ed form) of their node vectors , a nd a decoder a t the
end- us er ca n recover the s ource da ta from this informa tion.
[ 004 3] A repla cement node cons is ts of a decoder tha t ? rs t
a s s imila tes the downloa ded da ta , a nd s ubs eq uently decodes
this coded da ta to recover the da ta tha t wa s s tored in the fa iled
node.
[ 004 4 ] Ea ch s tora ge node ca n a ls o a s s is t in the repa ir of a ny
fa iled node. Onbeing conta cted by the node repla cing a fa iled
node, a s tora ge node computes certa in linea r combina tions of
the s y mbols s tored in it, this linea r combina tion being depen
dent on the encoding vector of the fa iled node, a nd pa s s es on
the res ult.
[ 004 5] The following a re the s teps performed in recon
s truction a nd regenera tion for the three va ria tions of the dis
tributed s tora ge s y s tem.
[ 004 6] In one embodiment, the ? rs t va ria tion is one of the
product- ma trix fra mework which ha s a da ta collector ca n
recover the entire s ource by downloa ding da ta from a ny k
s tora ge nodes , a nd a fa iled node ca n be regenera ted by a ny d
ex is ting nodes . The common s tructure of the code ma trices
lea ds to common a rchitectures for both da ta - recons truction
a nd ex a ct- regenera tion.
[ 004 7 ] D a ta - recons truction a mounts to recovering the mes
s a ge ma trix M from the KO. s y mbols obta ined from a n a rbi
tra ry s et of k s tora ge nodes . Let the s et of k nodes to which the
da ta collector connects be denoted a s { i 1, . . . , ik} . The ith node
in the s et pa s s es on the mes s a ge vector to the da ta
collector. The da ta collector thus obta ins the product ma trix
where 11) D C is the s ub ma trix of cons is ting of the k rows [ i] , .
. . , ik} . It then us es the properties of the ma trices a nd M to
N ov. 24 , 2011
recover the mes s a ge. The precis e procedure for recovering M
is a function of the pa rticula r cons truction. Ea ch node in the
network is a s s ocia ted to a dis tinct ( d>< 1) encoding vector 11) , .
In the regenera tion proces s , a rela ted vector ul- of length 0t is
us ed, tha t conta ins a s ubs et of the components of 11) , .
[ 004 8] The repla cement of a fa iled node is illus tra ted in
FIGS. 4 A, 4 B a nd 4 C . To regenera te a fa iled node f, the node
repla cing the fa iled node connects to a n a rbitra ry s ubs et [ h] ,
. . , hd} of d s tora ge nodes which we will refer to a s the d
helper nodes which is s hown in FIG. 4 A. Ea ch helper node
pa s s es on the inner product of a s y mbols s tored in it with pf,
to the repla cement node a s s hown in FIG. 4 B, the helper node
hj pa s s es
[ 004 9] The repla cement node thus obta ins the product
ma trix which is s hown in FIG. 4 C , i. e.
where wrepa l- r is the s ubma trix of 11) cons is ting of the d rows
{ hp . . . , hd} . From this it turns out, a s will be s hown
s ubs eq uently , tha t one ca n recover the des ired s y mbols . Here
a ga in, the precis e procedure is dependent on the pa rticula r
cons truction. An importa nt fea ture of the product- ma trix con
s truction is tha t ea ch of the nodes pa rticipa ting in the
regenera tion of node f, needs only ha s knowledge of the
encoding vector of the fa iled node f a nd not the identity of the
other nodes pa rticipa ting in the regenera tion. This s igni?
ca ntly s impli? es the opera tion of the s y s tem.
[ 0050] In one embodiment the product- ma trix MBR code
cons truction is des cribed. The s peci? c ma ke- up of the encod
ing ma trix 11) a nd the mes s a ge ma trix M tha t res ults in a n [ n,
k, d] MBR code with [ 3: 1 . A nota ble fea ture of the cons truc
tion is tha t it is a pplica ble to a ll fea s ible va lues of [ n, k, d] , i. e. ,
a ll n, k, d s a tis fy ing k d n- l. Since the code is req uired to
be a n MBR code with [ 3: 1, it mus t pos s es s the da ta - recon
s truction a nd ex a ct- regenera tion properties req uired of a
regenera ting code, a nd in a ddition, ha ve pa ra meters { ( x , B}
tha t s a tis fy the eq ua tions for the MBR point. In pa rticula r, the
ex pres s ion for the ? le s iZ e B a t the MBR point ca n be rewrit
ten in the form:
[ 0051] Thus the pa ra meter s et of the des ired [ n, k, d] MBR
code is ( ( Fd, [ 3: 1, B: ( ( k+ 1) /2+ k( d k) ) .
[ 0052] Let S be a ( k>< k) ma trix cons tructed s o tha t the
k( k+ 1) /2 entries in the upper- tria ngula r ha lf of the ma trix a re
? lled up by k( k+ 1) /2 dis tinct mes s a ge s y mbols dra wn from
the s et { ul- ] >Bi: l. The k( k 1) /2 entries in the s trictly lower
tria ngula r portion of the ma trix a re then chos en s o a s to ma ke
the ma trix S a s y mmetric ma trix . The rema ining k ( d- k)
mes s a ge s y mbols a re us ed to ? ll up a s econd ( k>< ( d k) ) ma trix
T. The mes s a ge ma trix M is then de? ned a s the ( d>< d) s y m
metric ma trix given by
s T ( 3)
: iT' 01
US 2011/0289351 A1
[ 0053] The s y mmetry of the ma trix W ill be ex ploited W hen
ena bling node repa ir. N ex t, de? ne the encoding ma trix to be
a ny ( n>< d) ma trix of the form
W here ( I) a nd A a re ( n>< k) a nd ( n>< ( d k) ) ma trices res pectively ,
chos en in s uch a W a y tha t
[ 0054 ] 1) a ny d roW s of P a re linea rly independent,
[ 0055] 2) a ny k roW s of ( I) a re linea rly independent.
[ 0056] MBR Ex a ct- Regenera tion of a ny fa iled node ca n be
a chieved by connecting to a ny d of the ( n- 1) rema ining
nodes .
[ 0057 ] Let Ip fbe the roW of P corres ponding to the fa iled
node f. The d s y mbols s tored in the fa iled node corres pond to
the vector
111W ( 4 )
[ 0058] The repla cement for the fa iled node f connects to a n
a rbitra ry s et { hJ- Ij II, . . . , d} ofd helper nodes . Upon being
conta cted by the repla cement node, the helper node com
putes the inner product
a nd pa s s es on this va lue to the repla cement node. Thus , in the
pres ent cons truction, the vector pf eq ua ls Ipf its elf. The
repla cement node thus obta ins the d s y mbols Prepm- rM Ipf
from the d helper nodes , W here
21
w!
Pra wn: Th2
1; ,
[ 0059] By cons truction, the ( d>< d) ma trix PrePm- r is invert
ible. Thus , the repla cement node recovers M11) fthrough mul
tiplica tion on the left by Pdwpm- r. Since M is s y mmetric,
a nd this is precis ely the da ta previous ly s tored in the fa iled
node.
[ 0060] In MBR D a ta - Recons truction, a ll the B mes s a ge
s y mbols ca n be recovered by connecting to a ny k nodes , i. e. ,
the mes s a ge s y mbols ca n be recovered through linea r opera
tions on the entries of a ny k roW s of the ma trix C .
be the ( kx d) s ub- ma trix of P, corres ponding to the k roW s of
to W hich the da ta collector connects . Thus the da ta collector
ha s a cces s to the s y mbols
[ 0062] By cons truction, C ID D C is a non- s ingula r ma trix .
Hence, by multiply ing the ma trix PD C M on the left by C IJD C T
1, one ca n recover ? rs t T a nd s ubs eq uently , S.
[ 0063] Any ex a ct- regenera ting code ca n be ma de s y s tem
a tic through a non- s ingula r tra ns forma tion of the mes s a ge
s y mbols . In the pres ent ca s e, there is a s impler a pproa ch, in
W hich the ma trix ca n be chos en in s uch a W a y tha t the code is
a utoma tica lly s y s tema tic. W e s imply ma ke the choice:
N ov. 24 , 2011
W here Ik is the ( kx k) identity ma trix , 0 is a ( k>< ( d k) ) Z ero
ma trix , ( I9 a nd Al a re ma trices of s iZ es ( ( n k) >< k) a nd ( ( n k) ><
( d k) ) res pectively , s uch tha t [ 'EID Al] is a C a uchy ma trix .
C lea rly the code is s y s tema tic. It ca n be veri? ed tha t the
ma trix ha s the properties lis ted in MBR Ex a ct- Regenera tion.
[ 0064 ] A node repla cing the fa iled node needs to doW nloa d
the a bs olute minimum a mount of da ta , i. e. , doW nloa ds only
W ha t it s tores : d[ 3: ot. Further, the s tora ge s pa ce req uired a t
ea ch node a ha s the minimum pos s ible va lue for this s etting,
eq ua l to
[ 0065] As in the fra mework, the coded- s y mbols s tored in
the s tora ge nodes ca n be ex pres s ed by the ma trix product
ex pres s ion [ C IJA] M, W herein a ny k roW s of ( I) a nd a ny d roW s
of P a re linea rly independent.
[ 0066] In one embodiment va ria tion 2 W hich is ca lled MSR
is des cribed. As in va ria tion 1, a da ta collector ca n recover the
entire s ource by doW nloa ding da ta from a ny k s tora ge nodes ,
a nd a fa iled node ca n be regenera ted by a ny d ex is ting nodes .
[ 0067 ] Ea ch s tora ge node s tores the a bs olute minimum
a mount of da ta , i. e. , l/k of the tota l s ource da ta . For this
s etting, W e ena ble a ny fa iled node to be repa ired, W hen
d>: 2k 2, W ith [ 3 ha ving minimum pos s ible va lue.
[ 0068] In a nother embodiment the product- ma trix MSR
code cons truction is ex pla ined, identify the s peci? c ma ke- up
of the encoding ma trix a nd the mes s a ge ma trix M tha t res ults
in a n [ n, k, d] MSR code W ith [ 3: 1. The cons truction a pplies
to a ll pa ra meters [ n, k, dZ 2k- 2] . Since the code is req uired to
be a n MSR code W ith [ 3: 1, it mus t pos s es s the da ta - recon
s truction a nd ex a ct- regenera tion properties req uired of a
regenera ting code, a nd in a ddition, the pa ra meters { ( x , B}
s a tis fy eq ua tion the conditions for the MSR point. W e ? rs t
cons truct MSR codes for the ca s e W hen d: 2k 2 a nd s ubs e
q uently ex tend it to the ca s e W hen d> 2k 2.
[ 0069] The cons truction a n MSR code in the product- ma
trix forma t for d: 2k 2 is a s folloW s :
[ 007 0] At the MSR point W ith d: 2k 2 W e ha ve
0t: d k+ 1 Ik- 1, ( 9)
a nd hence
d: 2a . ( 10)
Als o,
[ 007 1] The ( d>< 0t) mes s a ge ma trix M is
S1 ( 12)
S2 1,
US 2011/0289351 A1
W here S1 a nd S2 a re ( otx ) s y mmetric ma trices cons tructed
s uch tha t the 0t( 0t+ 1) /2 entries in the upper- tria ngula r pa rt of
ea ch of the tW o ma trices a re ? lled up by 0t( 0t+ 1) /2 dis tinct
mes s a ge s y mbols . Thus , a ll the B: 0t( 0t+ 1) mes s a ge s y mbols
a re conta ined in the tW o ma trices S 1 a nd S2. The entries in the
s trictly loW er- tria ngula r portion of the tW o ma trices S l a nd S2
a re chos en s o a s to ma ke the ma trices S 1 a nd S2 s y mmetric.
The encoding ma trix is a n ( n>< d) ma trix given by
W here ( I) is a n ( n>< 0t) ma trix a nd A is a n ( n>< n) dia gona l ma trix .
The elements of a re chos en s uch tha t the folloW ing conditions
a re s a tis ? ed:
1) a ny d roW s of II a re linea rly independent,
2) a ny a roW s of ( I) a re linea rly independent a nd
3) the n dia gona l elements of A a re dis tinct.
[ 007 2] The a bove req uirements a re met by choos ing II to be
a Va ndermonde ma trix W ith elements chos en ca refully to
s a tis fy the third condition. Then under our code- cons truction
fra mework, the roW of the ( n>< 0t) product ma trix C IPM,
conta ins a code s y mbols s tored by the ith node.
[ 007 3] In one embodiment, MSR Ex a ct- Regenera tion of
a ny fa iled node ca n be a chieved by connecting to a ny d: 2k 2
of the rema ining ( n 1) nodes .
[ 007 4 ] Let [ q fj x fq ff] be the roW of corres ponding to the
fa iled node. Thus the ot s y mbols s tored in the fa iled node W ere
W W 4 ) ? ! " : f91+ 7 ~/ f921 ( 14 )
[ 007 5] The repla cement for the fa iled node f connects to a n
a rbitra ry s et { hj | j : 1, . . . , d} ofd helper nodes . Upon being
conta cted by the repla cement node, the helper node com
putes the inner product 1p h_ Mq ) /l a nd pa s s es on this va lue to
the repla cement node. Thus , in the pres ent cons truction, the
vector ufeq ua ls Bf. The repla cement node thus obta ins the d
s y mbols lI'mPm- rMBffrom the d helper nodes , W here
2; ,
if
l repa ir = 3
1; ,
[ 007 6] By cons truction, the ( d>< d) ma trix Prepa y is invert
ible. Thus the repla cement node noW ha s a cces s to
Slff
S2 ff
Q f
[ 007 7 ] As S1 a nd S2 a re s y mmetric ma trices , the repla ce
ment node ha s thus a cq uired through tra ns pos ition, both
( pt/? ll a nd ( PZ SZ I. Us ing this it ca n obta in
W hich is precis ely the da ta previous ly s tored in the fa iled
node.
[ 007 8] In one embodiment of the MSR D a ta - Recons truc
tion, a ll the B mes s a ge s y mbols ca n be recovered by connect
ing to a ny knodes , i. e. , the mes s a ge s y mbols ca nbe recovered
through linea r opera tions on the entries of a ny k roW s of the
code ma trix C .
N ov. 24 , 2011
be the ( k>< d) s ubma trix of, conta ining the k roW s of W hich
corres pond to the k nodes to W hich the da ta collector con
nects . Hence the da ta collector obta ins the s y mbols
[ 0080] The da ta collector ca n pos t- multiply this term W ith
C ID D C to obta in
[ ( 1) 0651 + AD C < I>D C S2I< I>Z C = < I>D C S1< I>Z C + AD C < I>D C S2< I>Z C ( 18)
[ 0081] N ex t, let the ma trices P a nd Q be de? ned a s
PID D C S1D D C T,
Q IQ D C SZ Q D C T' ( 19)
[ 0082] As S1 a nd S2 a re s y mmetric, the s a me is true ofthe
ma trices P a nd Q . In terms of P a nd Q , the da ta collector ha s
a cces s to the s y mbols of the ma trix
P+ AD C Q ( 20)
The ( i, j ) , l i, j k, element of this ma trix is
W hile the ( j , i) element is given by
W here eq ua tion ( 22) folloW s from the s y mmetry of P a nd Q .
By cons truction, a ll the 7 , - a re dis tinct a nd hence us ing ( 21)
a nd ( 22) , the da ta collector ca n s olve for the va lues of Pi] , Q U
for a ll i# j .
[ 0084 ] C ons ider ? rs t the ma trix P. Let C ID D C be given by
2'1 ( 23)
gird
[ 0085] All the non- dia gona l elements of P a re knoW n. The
elements in the ith roW ( ex cluding the dia gona l element) a re
given by
( PC - 511% - - - . >1 i+ 1 - - - Q + 1] - ( 24 )
[ 0086] HoW ever, the ma trix to the right is non- s ingula r by
cons truction a nd hence the da ta collector ca n obta in
[ 0087 ]
a cces s to
Selecting the ? rs t a of thes e, the da ta collector ha s
US 2011/0289351A1
( 26)
[ 0088] The ma trix on the left is a ls o non- s ingula r by con
s truction a nd hence the da ta collector ca n recover S1. Simi
la rly , us ing the va lues of the non- dia gona l elements of Q , the
da ta collector ca n recover S2.
[ 0089] In one embodiment, a s s ta ted in code cons truction
fra mework, tha t every ex a ct- regenera ting code ha s a s y s tem
a tic vers ion a nd further, tha t the code could be ma de s y s tem
a tic through a proces s of mes s a ge- s y mbol rema pping. The
folloW ing ma kes this more ex plicit in the contex t of the prod
uct- ma trix MSR code.
[ 0090] Let lPk be the ( k>< d) s ubma trix of 11', conta ining the
k roW s of II corres ponding to the k nodes W hich a re chos en to
be ma de s y s tema tic. The s et of k0. s y mbols s tored in thes e k
nodes a re given by the elements of the ( k>< 0t) ma trix 1PkM. Let
U be a ( k>< 0t) ma trix conta ining the B: k0t s ource s y mbols .
Ma pping
\ IJkMIU, ( 26)
a nd s olve for the entries of M in terms of the s y mbols in U.
This is precis ely the da ta - recons truction proces s tha t ta kes
pla ce W hen a da ta collector connects to the k chos en nodes .
Thus , the va lue of the entries in M ca n be obta ined by folloW
ing the procedure outlined in MSR D a ta - Recons truction.
Then, us e this M to obta in the code C IPM. C lea rly , in this
repres enta tion, the k chos en nodes s tore the s ource s y mbols U
in uncoded form.
[ 0091] In one embodiment, a n ex plicit MSR Product- Ma
trix C odes for dZ Z k- Z is des cribed. An MSR code for
d: 2k 2 ca n be us ed to obta in MSR codes for a ll dZ Z k- Z .
[ 0092] An ex plicit [ n': n+ 1, k': k+ 1, d': d+ 1] ex a ct- regen
era ting code C ' tha t a chieves the cut- s et bound a t the MSR
point ca n be us ed to cons truct a n ex plicit [ n, k, d] ex a ct
regenera ting code C tha t a ls o a chieves the cut- s et bound a t the
MSR point. Furthermore if d': a k'+ b in code C ', d: a k+ b+ ( a
1) in code C . If C ' is linea r, s o is C .
[ 0093] If both codes opera te a t the MSR point, then the
number of mes s a ge s y mbols B', B in the tW o ca s es mus t
s a tis fy
[ 0094 ] Firs t cons truct a n MSR- point- optima l [ n', d'] ex a ct
regenera ting code C in s y s tema tic form W ith the ? rs t k' roW s
conta ining the B' mes s a ge s y mbols . Let C " be the s ubcode of
C ' cons is ting of a ll code ma trices in C ' W hos e top roW is the
a ll- Z ero roW ( i. e. , the ? rs t a of the B' mes s a ge s y mbols a re a ll
Z ero) . C lea rly , the s ubcode C " is of s iZ e q BLOIq B. N ote tha t
C " a ls o pos s es s es the s a me ex a ct- regenera tion a nd da ta - re
cons truction properties a s does the pa rent code C '.
[ 0095] Let the code C noW be formed from s ubcode C " by
puncturing ( i. e. , deleting) the ? rs t roW in ea ch code ma trix of
C " . C lea rly , code C is a ls o of s iZ e q B. W e cla im tha t C is a n [ n,
k, d] ex a ct- regenera ting code. The da ta - recons truction
req uirement req uires tha t the B underly ing mes s a ge s y mbols
be recovera ble from the contents of a ny k roW s of a code
ma trix C in C . But this folloW s s ince, by a ugmenting the
N ov. 24 , 2011
ma trices of code C by pla cing a t the top a n a dditiona l a ll- Z ero
roW , to obta in a code ma trix in C " a nd code C " ha s the
property tha t the da ta ca n be recovered from a ny ( k+ 1) roW s
of ea ch code ma trix in C " . A s imila r a rgument s hoW s tha t code
C a ls o pos s es s es the ex a ct- regenera tion property . C lea rly if C '
is linea r, s o is code C . Fina lly ,
[ 0096] C odes for the ca s e d>: 2k 2 ca n noW be obta ined
s imply by itera ting the a bove procedure multiple times .
[ 0097 ] In one embodiment of the tW in- code fra mework, the
n s tora ge nodes in the dis tributed netW ork a re pa rtitioned
a rbitra rily into tW o ty pes , la belled a s Ty pe 0 a nd Ty pe 1 nodes .
The number of nodes of Ty pe 0 is denoted by no, a nd the
number of nodes of Ty pe 1 is denoted by n1. To a rrive a t the
da ta s tored in nodes of Ty pe 0, the mes s a ge is encoded us ing
a n a rbitra ry linea r block code C O. To a rrive a t the da ta s tored
in Ty pe 1 nodes , the mes s a ge s y mbols a re ? rs t permuted by
mea ns of s imple tra ns pos ition a nd then encoded by a s econd
a rbitra ry linea r block code C 1. The tW o codes do not neces
s a rily ha ve to be dis tinct. Repa ir of a fa iled node of one ty pe
is a ccomplis hed by doW nloa ding da ta from a s ubs et of nodes
of the other ty pe. Further, a da ta - collector ca n recover the
entire mes s a ge by connecting to a s ubs et of the nodes of the
s a me ty pe The mes s a ge to be s tored a cros s the netW ork com
pris es of B s y mbols dra W n from the ? nite ? eld F q . For iIO, 1,
let C , - be a n a rbitra ry [ n, , k] code over Fq ha ving ( k>< ni) gen
era tor ma trix Gi. Let g( l. , Z ) ( 1 l ni) denote the 1th | column of
Gi.
[ 0098] In one embodiment of the tW in- code fra meW ork is
encoding. The mes s a ge is ? rs t divided into fra gments com
pris ing of k2 s y mbols . The code for a s ingle fra gment a nd
cons tructions for the ca s e W hen the s iZ e of the mes s a ge is
la rger ca n be obta ined by conca tena ting multiple s uch fra g
ments . Thus , W e ha ve
BIk2 ( 27 )
[ 0099] Let the k2 s y mbols be a rra nged in the form of a ( k>< k)
ma trix MO, W hich W e W ill ca ll the mes s a ge ma trix . Let
M1 2 M5, ( 28)
W here the s upers cript t denotes the tra ns pos e. For iIO, 1, the
da ta s tored in the nodes of Ty pe i is obta inedus ing code C l. a nd
mes s a ge ma trix M, - a s folloW s . Ea ch node of Ty pe i s tores the
k s y mbols corres ponding to a column of the ( k>< ni) ma trix
MiGi. More s peci? ca lly , node 1 ( 1 l ni) s tores the s y mbols
in its 1th lcolumn, given by
Migw) ( 29)
g0. is encoding vector of this node. Thus , under our encoding
a lgorithm, every node s tores k s y mbols .
[ 0100] In one embodiment, the proces s es of da ta - recon
s truction a nd node- repa ir is des cribed under the TW in- code
fra meW ork, the proces s of node- repa ir is reduced to one of
era s ure decoding of the cons tituent codes . This is des cribed in
deta il in the s ubs eq uent s ubs ections . Here, W e de? ne the term
fea s ible s et of nodes W hich W ill a id in the des cription.
US 2011/0289351A1
[ 0101] A fea s ible s et ofnodes is a s et ofnodes ofthe s a me
ty pe, s a y Ty pe i, i in { 0, 1} , s uch tha t the components of the
codeword of the C ode C , - s tored in thes e nodes permit era s ure
decoding of C ode C i. For ex a mple, in the ca s e when C ode C O
is MD S, a ny s et ofk nodes of Ty pe 0 is a fea s ible s et.
[ 0102] In one embodiment, the da ta - recons truction a nd
node- repa ir for the ca s e when both C O a nd C 1 a re MD S codes
a re des cribed. MD S codes s erve a s a s imple a nd concrete
ex a mple, a nd MD S codes turn in a s trong performa nce in tha t
they s imulta neous ly minimiz e the s tora ge per node a s well a s
the a mount of downloa d req uired to repa ir a fa iled node. The
codes res ulting from the Twin- code fra mework when the
cons tituent codes a re MD S a re ca lled a s Twin- MD S ( dis trib
uted s tora ge) codes .
[ 0103] Minimiz ing Both Stora ge per N ode a nd Repa ir
Ba ndwidth for the ca s e when both C O a nd C l a re MD S codes
over Fq . Under the Twin- code fra mework, da ta - recons truc
tion ca n be a ccomplis hed by downloa ding the da ta s tored in
a ny fea s ible s et of nodes of the s a me ty pe. A fa iled node of one
ty pe ca n be repa ired by downloa ding one s y mbol ea ch from
a ny fea s ible s et of nodes of the other ty pe. In the pres ent ca s e,
the cons tituent codes C O a nd C 1 a re MD S, a nd permit decod
ing from a ny k nodes . Thus , in this ca s e,
[ 0104 ] a da ta - collector ca n recover the entire ? le by con
necting to a ny k nodes of the s a me ty pe a nd downloa ding
the k2 s y mbols s tored in them, a nd
[ 0105] a repla cement node of a certa in ty pe ca n recover
the k s y mbols tha t were s tored in the fa iled node by
downloa ding a s ingle s y mbol over Fq from a ny s ubs et of
k nodes belonging to the other ty pe.
[ 0106] For the da ta - recons truction in the a bove ca s e, let the
da ta - collector connects to the ? rs t k nodes of Ty pe 1. Thus ,
the da ta - collector ga ins a cces s to the k2 s y mbols :
[ 0107 ] The MD S property of code C l gua ra ntees linea r
independence of the corres ponding k columns of G1, a nd
hence the da ta - collector ca n recover the mes s a ge ma trix M1
by inverting the ma trix formed by thes e columns . Eq uiva
lently , the a ction of the da ta - collector ca n a ls o be viewed a s
era s ure decoding of code C 1 in the pres ence of ( nl k) era
s ures . For this , cons ider the k s y mbols in the ? rs t row of the
ma trix in eq ua tion ( 30) . C lea rly , thes e a re k components of
the codeword of code C l corres ponding to the k mes s a ge
s y mbols in the ? rs t row of M1. Thus , ea ch of the rows of
mes s a ge ma trix Ml ca n be decoded independently , a llowing
the da ta - collector to perform s ca la r decoding of the vector
code.
[ 0108] For node- repa ir, let node f of Ty pe 0 fa ils . Thus the
repla cement node needs to recover the k s y mbols
Mogm) ( 31)
[ 0109] Further, let the node connects to the ? rs t k nodes of
Ty pe 1. Under the Twin- code fra mework, the repla cement
node ma kes known its identity to ea ch of the helper nodes . In
turn, the 1th helper node ( 1 l k) | pa s s es the inner product of
the k s y mbols Mlgm? l s tored in it with the encoding vector
ga m of the fa iled node: gta mMlgui) Thus , the repla cement
node ga ins a cces s to the k s y mbols
N ov. 24 , 2011
[ 0110] N ow, s etting
1 A r ( 33)
H _ 510mm
we s ee tha t the repla cement node ha s a cces s to
lit[ g( l, l) - - - g( l, k) ] ( 34 )
[ 0111] It is clea r tha t [ 1t ca n be recovered by era s ure decod
ing of the MD S code C 1 under ( nl k) era s ures . Reca ll tha t by
cons truction, we ha ve MOIMtI, a nd hence
[ 0112] Thus , the vector 11. compris es of precis ely the k s y m
bols req uired by the repla cement node. It follows tha t repa ir
of a node of Ty pe 0 is reduced to era s ure decoding of code C 1.
[ 0113] The Twin- MD S codes a re optima l in the following
s ens e. Under the da ta - recons truction condition tha t ( a ) the
mes s a ge be recovera ble from one or more s ubs ets of k nodes ,
( b) every node mus t belong to a t lea s t one s uch s ubs et, a nd ( c)
ea ch node s tore a n eq ua l a mount of da ta , the s tora ge s pa ce of
B/kIk s y mbols req uired per node is clea rly the minimum
req uired. Furthermore, the downloa d of k s y mbols from the
network to recover the k s y mbols s tored prior to fa ilure, is
a ls o clea rly the minimum pos s ible. Thus , Twin- MD S codes
s imulta neous ly minimiZ e the s tora ge a nd the repa ir- ba nd
width. Us ing twin code fra mework s imulta neous minimiZ a
tion of the two res ources is a chieved.
[ 0114 ] FIG. 5 illus tra tes a numerica l ex a mple, where the
network ha s n: 9 nodes , with no: 4 , nl: 5, k: 3, giving
B: k2: 9. The code pres ented opera tes over the ? nite ? eld F7 .
The mes s a ge ma trices MO a nd Ml( : M O) popula ted by the
mes s a ge s y mbols { ml- PH, a nd the genera tor ma trices GO a nd
G1 a re:
m1m4 m7 1001 10111
[ 0115] It ca n be veri? ed tha t this code permits a da ta - col
lector to recons truct the entire mes s a ge from a ny three nodes
of the s a me ty pe. The ? gure a ls o ex pla ins the repa ir of node 1
of Ty pe 0, which ha s a n encoding vector [ 1 0 0] , with the help
ofnodes 2, 3 a nd 4 ofTy pe l.
[ 0116] In a Twin- code fra mework, a da ta - collector ca n
recover the entire mes s a ge by connecting to a ny fea s ible
s ubs et of nodes of the s a me ty pe. To s ee how da ta - recons truc
tion is a ccomplis hed, let us s uppos e tha t the da ta - collector
connects to a fea s ible s ubs et of Ty pe 1 nodes . Res tricting our
a ttention to nodes of Ty pe 1 a lone, the s y s tem is identica l to a
dis tributed s tora ge s y s tem employ ing only code C 1. Thus , the
da ta - collector ca n recover the entire mes s a ge by a pply ing the
decoding procedure of code C l, k times .
[ 0117 ] In one embodiment of the Twin- code fra mework, a
repla cement node of a certa in ty pe ca n recover the s y mbols
s tored in the fa iled node by downloa ding a s ingle s y mbol
from a ny fea s ible s ubs et of the other ty pe. The repa ir a lgo
rithm reduces the problem of node- repa ir to one of da ta
recons truction, however with a n a mount of downloa d which
is clos e to a s ma ll fra ction of the entire da ta . Let the fa ilure of
node f is of Ty pe 0. The repla cement node des ires to recover
the k s y mbols
US 2011/0289351A1
M .
05m)
Under the Twin- code fra mework, the repla cement node con
nects to a fea s ible s et of Ty pe 1 nodes a nd ma kes known its
identity . In turn, ea ch helper node pa s s es ba ck a n inner prod
uct of the k s y mbols Mlgui) s tored in it with the encoding
vector ga m of the fa iled node: helper node 1 pa s s es the s y mbol
[ 0118] As in the ca s e of MD S codes , s etting
E g gla nMl ( 37 )
the repla cement node ha s a cces s to the s y mbols
{ u guy blnode Z in the pa rticula r fea s ible s et] >l ( 3 8)
[ 0119] FIG. 6 illus tra tes da ta deploy ment us ing the Twin
code fra mework where ( a ) the point s ource tra ns fers coded
da ta to a s ubs et of nodes . ( b) , ( c) Thes e nodes now help in the
repa ir of nodes in their neighbourhood tha t ha ve not
received da ta . The s y s tem eventua lly rea ches a s ta te a s s hown
in ( d) .
[ 0120] Since the s et of helper nodes form a fea s ible s et of
Ty pe 1 nodes , it is clea r tha t the repla cement node ca n recover
pf through era s ure decoding of code C l. Under our fra me
work, MOIM l a nd hence,
[ 0121] The vector [ 1. thus compris es of precis ely the k s y m
bols req uired by the repla cement node. In this ma nner, our
repa ir a lgorithm req uires a downloa d eq ua l to a fra ction 1/k of
the da ta downloa d req uired during the da ta - recons truction
proces s .
[ 0122] The following a re the a dva nta ges :
[ 0123] D uring regenera tion of a fa iled node, the informa
tion pa s s ed on to the repla cement node by a helper node is
only a function of the index of the fa iled node. Thus , it is
independent of the identity of the d 1 other nodes tha t a re
pa rticipa ting in the regenera tion. Once a ga in, this reduces the
communica tion overhea d by req uiring les s informa tion to be
dis s emina ted.
[ 0124 ] Ha ndling Sy s tem D y na mics for a ny n: The s y s tem
works for a ll va lues of n, a nd independent of the va lues of the
pa ra meters k a nd d. This ma kes the code cons tructions pre
s ented here pra ctica lly a ppea ling, a s in a ny rea l- world dis
tributed s tora ge a pplica tion s uch a s a peer- to- peer s tora ge,
cloud s tora ge, etc, ex pa ns ion or s hrinking of the s y s tem s iZ e
is very na tura l. For ex a mple, in peer- to- peer s y s tems , indi
vidua l nodes a re free to come a nd go a t their own will.
[ 0125] The encoding ma trix 1P, for the codes des cribed, ca n
be chos en a s a Va nderrnonde ma trix . Then ea ch node vector
ca n be des cribed by j us t a s ca la r. Moreover with this choice,
the encoding, recons truction, a nd regenera tion opera tions
a re, for the mos t pa rt, identica l to encoding or decoding of
conventiona l Reed- Solomon codes .
[ 0126] D y na mic Ada pta tion to the N etwork Ba ndwidths is
a chieved. The ba s ic form of the propos ed codes a s s umes a
ra ther rigid s tructure, tha t of connecting to ex a ctly k nodes
a nd downloa ding a ll the da ta for recons truction, a nd connect
ing to ex a ctly d nodes for regenera tion. The propos ed codes
N ov. 24 , 2011
ca n be ea s ily enha nced to a highly ? ex ible s etup, where a da ta
collector or a new node repla cing a fa iled node ca n connect to
a va ria ble number of nodes a nd downloa d different a mounts
of da ta from ea ch node to which it connects to. In this wa y , it
ca n ma ke us e of pa ra llel downloa ding when it ha s a higher
connectivity thereby ma king its downloa d much fa s ter. Als o,
when the links a re not homogeneous , it ca n choos e to down
loa ds higher a mounts from the nodes with fa s ter link, a nd
les s er from nodes with s lower link.
[ 0127 ] The C a s e of N on- uniform s tora ge s pa ces , tha t is
when s tora ge ca pa cities of the nodes a re different, ea ch s tripe
ca n be encoded s epa ra tely with a different n ( a nd pos s ibly a
different k a nd d a s well) . A s tripe will s ee only thos e s tora ge
nodes in which the da ta perta ining to tha t s tripe is s tored, thus
s ea mles s ly ha ndling the ca s e of va ria ble 0t.
[ 0128] Recovering intermedia te ( Pa rtia l) D a ta is a chieved.
Since ea ch s tripe ca n be decoded independently , when only a
s ma ll chunk of intermedia te da ta is req uired, only the s tripes
conta ining the des ired da ta ca n be decoded, ins tea d of decod
ing the entire da ta .
[ 0129] Ea s ter D ownloa ds is a chieved. In the dis tributed
s tora ge s y s tem, a da ta collector ( repla cement node) ca n tra de
the network us a ge to increa s e the downloa d s peed. Ins tea d of
s ending downloa d req ues ts for recons tructing ( or, regenera t
ing) a s tripe to s ome k ( or, d) nodes , the da ta collector ca n
s end req ues ts to s ome k+ 1 ( or, d+ 1) nodes . Out of thes e, the k
( or, d) pa rts of the s tripe tha t a rrive ? rs t ca n be us ed for
recons truction ( regenera tion) , thereby decrea s ing the net
time for da ta recons truction ( node regenera tion) .
[ 0130] For Pa ra meters N ot C onforming to Either va ria tion:
Va ria tion 1 req uires the minimum pos s ible s tora ge per,
wherea s the va ria tion 2 point req uires the minimum pos s ible
repa ir ba ndwidth per s tora ge node ( though the a mount of
s tora ge is higher tha n the MSR point) . D ue to the s triped
na ture of our s y s tem, the s y s tem ca n be opera ted a t a ny
intermedia te point between via s tora ge- s pa ce s ha ring.
REFEREN C ES
[ 0131] [ 1] D . A. Pa tters on, G. Gibs on, a nd R. H. Ka tZ , A
C a s e for Redunda nt Arra y s of Inex pens ive D is ks ( RAID ) ,
in Proc. AC M SIGMOD interna tiona l conference on ma n
a gement of da ta , C hica go, USA, June 1988, pp. 109- 116.
[ 0132] [ 2] S. Rhea , P. Ea ton, D . Geels , H. W ea thers poon, B.
Z ha o, a nd J. Kubia towicZ , Pond: The Ocea nStore Proto
ty pe, in Proc. 2nd USEN IX conference on File a nd Stor
a ge Technologies ( FAST) , 2003, pp. 1- 14 .
[ 0133] [ 3] R. Bha gwa n, K. Ta ti, Y. C . C heng, S. Sa va ge, a nd
G. M. Voelker, Tota l Reca ll: Sy s tem Support for Auto
ma ted Ava ila bility Ma na gement, in Proc. 1s t conference
on Sy mpos ium on N etworked Sy s tems D es ign a nd Imple
menta tion ( N SD I) , 2004 .
[ 0134 ] [ 4 ] A. G. D ima kis , P. B. Godfrey , M. W a inwright,
a nd K. Ra mcha ndra n, N etwork C oding for dis tributed
s tora ge s y s tems , in Proc. 26th IEEE Interna tiona l C onfer
ence on C omputer C ommunica tions ( IN FOC OM) ,
Anchora ge, Ma y 2007 , pp. 2000- 2008.
[ 0135] [ 5] Y. W u, A. G. D ima kis , a nd K. Ra mcha ndra n,
D eterminis tic Regenera ting codes for D is tributed Stor
a ge, in Proc. 4 5thAnnua lAllerton C onference on C ontrol,
C omputing, a nd C ommunica tion, Urba na - C ha mpa ign,
September 2007 .
[ 0136] [ 6] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, Ex plicit C odes Minimiz ing Repa ir Ba nd
US 2011/0289351 A1
W idth for D is tributed Stora ge, in Proc. IEEE Informa tion
Theory W orks hop ( ITW ) , C a iro, Ja nua ry 2010.
[ 0137 ] [ 7 ] C . Sub a nd K. Ra mcha ndra n, Ex a ct- Repa ir
MD S C odes for D is tributed Stora ge Us ing Interference
Alignment, in Proc. IEEE Interna tiona l Sy mpos ium on
Informa tion Theory ( ISIT) , Aus tin, June 2010, pp. 161
165.
[ 0138] [ 8] Y. W u, Ex is tence a nd C ons truction of C a pa city
Achieving N etwork C odes for D is tributed Stora ge, in
IEEE Journa l on Selected Area s in C ommunica tions , vol.
28, no. 2, 2010, pp. 27 7 - 288.
[ 0139] [ 9] A. G. D ima kis , P. B. Godfrey , Y. W u, M. W a in
W right, a nd K. Ra mcha ndra n, N etwork C oding for D is
tributed Stora ge Sy s tems , IEEE Tra ns a ctions on Inforrna
tion Theory , vol. 56, no. 9, pp. 4 539- 4 551, 2010.
[ 014 0] [ 10] A. D uminuco a nd E. Biers a ck, A Pra ctica l
Study of Regenera ting C odes for Peer- to- Peer Ba ckup
Sy s tems , in Proc. 29th IEEE Interna tiona l C onference on
D is tributed C omputing Sy s tems ( IC D C S) , June 2009, pp.
37 6- 384 .
[ 014 1] [ 1 1] Y. W u a ndA. D ima kis , Reducing Repa ir Tra f
? c for Era s ure C oding- Ba s ed Stora ge via Interference
Alignment, in Proc. IEEE Interna tiona l Sy mpos ium on
Informa tion Theory ( ISIT) , Seoul, July 2009, pp. 227 6
2280.
[ 014 2] [ 12] K. V. Ra s hmi, N . B. Sha h, P. V. Kuma r, a nd K.
Ra mcha ndra n, Ex plicit C ons truction of Optima l Ex a ct
Regenera ting C odes for D is tributed Stora ge, in Proc. 4 7 th
Annua l Allerton C onference on C ommunica tion, C ontrol,
a nd C omputing, Urba na - C ha mpa ign, September 2009, pp.
124 3- 124 9.
[ 014 3] [ 13] D . C ullina , A. G. D ima kis , a nd T. Ho, Sea rch
ing for Minimum Stora ge Regenera ting C odes , in Proc.
4 7 th Annua l Allerton C onference on C ommunica tion,
C ontrol, a nd C omputing, Urba na - C ha mpa ign, September
2009.
[ 014 4 ] [ 14 ] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, Interference Alignment in Regenera ting
C odes for D is tributed Stora ge: N eces s ity a nd C ode C on
s tructions , IEEE Tra ns a ctions on Informa tion Theory ,
s ubmitted for publica tion. [ Online] . Ava ila ble: a rXiv:
1005. 1634 v2 [ cs . IT]
[ 014 5] [ 15] Y. W u, A C ons truction of Sy s tema tic MD S
C odes W ith Minimum Repa ir Ba ndW idth, IEEE Tra ns a c
tions on Informa tion Theory , s ubmitted for publica tion.
[ Online] . Ava ila ble: a rXiv: 0910. 24 86 [ cs . IT]
[ 014 6] [ 16] V. R. C a da mbe, S. A. Ja fa r, a nd H. Ma leki,
D is tributed D a ta Stora ge W ith Minimum Stora ge Regen
era ting C odes iEx a ct a nd Functiona l Repa ir a re As y mp
totica lly Eq ua lly Ef? cient. [ Online] . Ava ila ble: a rXiv:
1004 . 4 299 [ cs . IT]
[ 014 7 ] [ 17 ] C . Suh a nd K. Ra mcha ndra n, On the Ex is tence
of Optima l Ex a ct- Repa ir MD S C odes for D is tributed Stor
a ge. [ Online] . Ava ila ble: a rXiv: 1004 . 4 663 [ cs . IT]
[ 014 8] [ 18] N . B. Sha h, K. V. Ra s hmi, a nd P. V. Kuma r, A
Flex ible C la s s of Regenera ting C odes for D is tributed Stor
a ge, in Proc. IEEE Interna tiona l Sy mpos ium on Inforrna
tion Theory ( ISIT) , Aus tin, June 2010, pp. 194 3- 194 7 .
[ 014 9] [ 19] N . B. Sha h, K. V. Ra s hmi, P. V. Kuma r, a nd K.
Ra mcha ndra n, D is tributed Stora ge C odes W ith Repa ir
by - Tra ns fer a nd N on- a chieva bility of Interior Points on the
Stora ge- Ba ndW idth Tra deoff, IEEE Tra ns a ctions on
Informa tion Theory , s ubmitted for publica tion. [ Online] .
Ava ila ble: a rXiv: 1011. 2361 [ cs . IT]
N ov. 24 , 2011
[ 0150] [ 20] D . S. Berns tein, Ma trix ma thema tics : Theory ,
fa cts , a nd formula s W ith a pplica tion to linea r s y s tems
theory . Princeton Univers ity Pres s , 2005.
[ 0151] [ 21] M. Ma dda h- Ali, A. Mota ha ri, a ndA. Kha nda ni,
C ommunica tion over MIMO X C ha nnels : Interference
Alignment, D ecompos ition, a nd Performa nce Ana ly s is ,
IEEE Tra ns a ctions on Informa tion Theory , vol. 54 , no. 8,
pp. 34 57 - 34 7 0, 2008.
[ 0152] [ 22] V. C a da mbe a nd S. Ja fa r, Interference Align
ment a nd Spa tia l D egrees of Freedom for the k Us er Inter
ference C ha nnel, IEEE Tra ns a ctions on Informa tion
Theory , vol. 54 , no. 8, pp. 34 25- 34 4 1, Augus t 2008.
[ 0153] [ 23] A. G. D ima kis , P. B. Godfrey , Y W u, M. W a in
W right, a nd K. Ra mcha ndra n, N etW ork C oding for D is
tributed Stora ge Sy s tems , IEEE Tra ns , on Inf. Theory , vol.
56, no. 9, pp, 4 539- 4 551, 2010.
[ 0154 ] [ 24 ] K. V. Ra s hmi, N . B. Sha h, a nd P. V. Kuma r,
Optima l Ex a ct- Regenera ting C odes for the MSR a nd
MBR Points via a Product- Ma trix C ons truction, IEEE
Tra ns . on Inf. Theory , s ubmitted for publica tion. [ Online] .
Ava ila ble: a rx iv: 1005. 4 17 8 [ cs . IT]
[ 0155] [ 25] Z . W a ng, A. G. D ima kis , a nd J. Bruck,
Rebuilding for Arra y C odes in D is tributed Stora ge Sy s
tems , in AC TEMT, D ecember 2010.
[ 0156] [ 26] P. Elia s , Error- free C oding, IRE Tra ns a ctions
on Informa tion Theory , vol. 4 , no. 4 , pp. 29- 37 , September
1 954 .
[ 0157 ] The pres ent dis clos ure is not to be limited in terms of
the pa rticula r embodiments des cribed in this a pplica tion,
W hich a re intended a s illus tra tions of va rious a s pects . Ma ny
modi? ca tions a nd va ria tions ca n be ma de W ithout depa rting
from its s pirit a nd s cope, a s W ill be a ppa rent to thos e s killed in
the a rt. Functiona lly eq uiva lent methods a nd devices W ithin
the s cope of the dis clos ure, in a ddition to thos e enumera ted
herein, W ill be a ppa rent to thos e s killed in the a rt from the
foregoing des criptions . Such modi? ca tions a nd va ria tions a re
intended to fa ll W ithin the s cope of the a ppended cla ims . The
pres ent dis clos ure is to be limited only by the terms of the
a ppended cla ims , a long W ith the full s cope of eq uiva lents to
W hich s uch cla ims a re entitled. It is a ls o to be unders tood tha t
the terminology us ed herein is for the purpos e of des cribing
pa rticula r embodiments only , a nd is not intended to be limit
ing.
[ 0158] W ith res pect to the us e of a ny plura l a nd/ or s ingula r
terms herein, thos e ha ving s kill in the a rt ca n tra ns la te from
the plura l to the s ingula r a nd/ or from the s ingula r to the plura l
a s is a ppropria te to the contex t a nd/or a pplica tion. The va ri
ous s ingula r/plura l permuta tions ma y be ex pres s ly s et forth
herein for s a ke of cla rity .
[ 0159] In a ddition, W here fea tures or a s pects of the dis clo
s ure a re des cribed in terms of Ma rkus h groups , thos e s killed
in the a rt W ill recogniZ e tha t the dis clos ure is a ls o thereby
des cribed in terms of a ny individua l member or s ubgroup of
members of the Ma rkus h group.
[ 0160] W hile va rious a s pects a nd embodiments ha ve been
dis clos ed herein, other a s pects a nd embodiments W ill be
a ppa rent to thos e s killed in the a rt. The va rious a s pects a nd
embodiments dis clos ed herein a re for purpos es of illus tra tion
US 2011/0289351A1
a nd a re not intended to be limiting, with the true s cope a nd
s pirit being indica ted by the following cla ims .
1. A dis tributed da ta s tora ge network, compris ing:
a plura lity of s tora ge nodes ;
a n encoder con? gured to obta in a nd a s s ign pre- determined
a mount of da ta to be s tored in ea ch s tora ge node in the
network;
a da ta recons truction block a t ea ch end- us er to obta in
s ource da ta by connecting to one of a plura lity of s ubs ets
of the s tora ge nodes ; a nd
mea ns for repla cing a ny fa iled s tora ge node with a new
node;
a repa ir block in ea ch s tora ge node repla cing a fa iled s tor
a ge node for retrieving the da ta s tored in s a id fa iled node
by connecting to one of a plura lity of s ubs ets of s tora ge
nodes ; a nd
a repa ir a s s is ting block in ea ch s tora ge node for computing
a nd tra ns mitting da ta to a s s is t in the repa ir of fa iled
nodes .
2. The dis tributed da ta s tora ge network a s cla imed in cla im
1, wherein the da ta recons truction block a t a n end- us er per
forms da ta recons truction by s electing a ny one of the prede
termined s ubs ets of s tora ge nodes ; downloa ding the da ta
s tored in the s tora ge nodes in the s ubs et s elected; a nd decod
ing the da ta downloa ded to obta in the s ource da ta .
3. The dis tributed da ta s tora ge network a s cla imed in cla im
1, wherein the repa ir block in the new node repla cing a fa iled
s tora ge node choos es a ny one of the predetermined s ubs ets of
s tora ge nodes ; downloa ds a predetermined a mount of da ta
from ea ch of the s tora ge nodes in the chos en s ubs et; a nd
decodes the downloa ded da ta to obta in the da ta tha t wa s
s tored in the fa iled node.
4 . The dis tributed da ta s tora ge network a s cla imed in cla im
3, wherein the predetermined a mount of da ta downloa ded by
the repa ir block is the res ult of computa tion performed by the
repa ir a s s is ting block of the s tora ge nodes in s a id chos en
s ubs et.
5. A method of obta ining a dis tributed s tora ge network
compris ing a cts of:
performing coding opera tion on s ource da ta ( of s iZ e B) for
obta ining a s et of coded da ta ;
a s s igning ea ch node with a prede? ned a mount of coded
da ta ( or) ;
performing da ta recons truction opera tion a t the recons truc
tion block a t a n end us er by connecting to a predeter
mined number of s tora ge nodes ( k) ; a nd
performing repa ir opera tion a t repa ir block of a new node
repla cing a fa iled s tora ge node by retrieving da ta s tored
in the fa iled node by downloa ding prede? ned a mount of
coded da ta ( [ 3) from a predetermined number of s tora ge
nodes ( d) ; a nd a s s is ting in the repa ir of fa iled node by
performing a computa tion by the repa ir a s s is ting block
of d s tora ge nodes chos en by s a id new node a nd tra ns
mitting res ult of computa tion to s a id new node.
6. The method a s cla imed in cla im 5, wherein the pre
de? ned a mount of da ta downloa ded by the repa ir block of a
new node repla cing a fa iled node is minima l compa red to the
conventiona l techniq ues .
7 . The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined by ta king linea r combina tions of the s ource
da ta with a predetermined s et of encoding vectors .
8. The method a s cla imed in cla im 7 , wherein the coded
da ta ca n be repres ented a s a product of a predetermined
encoding ma trix a nd a predetermined mes s a ge ma trix .
11
N ov. 24 , 2011
9. The method a s cla imed in cla im 8, wherein the mes s a ge
ma trix is obta ined us ing the s ource da ta .
10. The method a s cla imed in cla im 8, wherein coded da ta
s tored in a ny of the s tora ge nodes is repres ented a s a product
of a predetermined vector with the mes s a ge ma trix .
11. The method a s cla imed in cla im 5, wherein the com
puta tion performed a t the repa ir a s s is ting block of a s tora ge
node a s s is ting in the repa ir of a ny fa iled node is repres ented a s
a linea r combina tion of s y mbols s tored in s a id s tora ge node.
12. The method a s cla imed in cla im 11, wherein the linea r
combina tion is ba s ed on the encoding vector a s s ocia ted with
the fa iled node.
13. The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined ba s ed on the rela tion between the tota l
a mount of da ta to be s tored ( B) , a mount of da ta s tored in ea ch
node ( or) , a mount of da ta downloa ded from ea ch node while
retrieving the da ta s tored in a fa iled node ( [ 3) , number s tora ge
nodes to which the end- us ers connect ( k) a nd number of
s tora ge nodes to which a new s tora ge node repla cing a fa iled
node connects ( d) .
14 . The method a s cla imed in cla im 5, wherein the coded
da ta is obta ined a nd a s s igned to ea ch s tora ge node ba s ed on
minimum s tora ge regenera tion ( MSR) , minimum ba ndwidth
regenera tion ( MBR) a nd twin- code techniq ue.
15. The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of coded da ta in ea ch s tora ge node is lea s t for
the MSR techniq ue.
16. The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of da ta downloa ded by a new s tora ge node
repla cing a fa iled node is lea s t for the MBR techniq ue.
17 . The method a s cla imed in cla im 13, wherein the pre
de? ned a mount of coded da ta in ea ch s tora ge node a nd the
prede? ned a mount of da ta downloa ded by a new s tora ge node
repla cing a fa iled node a re lea s t for the twin- code techniq ue.
18. The method a s cla imed in cla im 5, wherein coded da ta
is obta ined us ing MSR techniq ue a nd the da ta recons truction
opera tion a t a ny end- us er compris es a cts of s electing one of
the s ubs ets of a predetermined number of nodes ( k) ; down
loa ding the da ta s tored in the s tora ge nodes in the s ubs et
s elected; a nd decoding the da ta downloa ded to obta in the
s ource da ta .
19. The method a s cla imed in cla im 5, wherein coded da ta
is obta ined us ing the MSR techniq ue a nd retrieving da ta
s tored in a fa iled s tora ge node by a new node repla cing the
fa iled node compris es a cts of choos ing one of the s ubs ets of a
predetermined number of nodes ( d) ; downloa ding a predeter
mined a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in
the chos en s ubs et; a nd decoding the downloa ded da ta to
obta in the da ta tha t wa s s tored in the fa iled node.
20. The method a s cla imed in cla im 14 , wherein coded da ta
is obta ined us ing MBR techniq ue a nd the da ta recons truction
opera tion a t a ny end- us er compris es a cts of s electing one of
the s ubs ets of a predetermined number of nodes ( k) ; down
loa ding the da ta s tored in the s tora ge nodes in the s ubs et
s elected; a nd decoding the da ta downloa ded to obta in the
s ource da ta .
21. The method a s cla imed in cla im 14 , wherein coded da ta
is obta ined us ing MBR techniq ue a nd retrieving da ta s tored in
a fa iled s tora ge node by a new node repla cing the fa iled node
compris es a cts of choos ing one of the s ubs ets of a predeter
mined number of nodes ( d) ; downloa ding a predetermined
a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in the
chos en s ubs et; a nd decoding the downloa ded da ta to obta in
the da ta tha t wa s s tored in s a id fa iled node.
US 2011/0289351A1
22. The method a s cla imed in cla im 5, wherein for tW in
code techniq ue ea ch node in the network is a s s igned W ith one
of a plura lity of ty pes .
23. The method a s cla imed in cla im 5, W herein the da ta
recons truction opera tion is performed by choos ing one ty pe
of nodes to connect to from a plura lity of ty pes ; choos ing one
of a ll pos s ible s ubs ets of a predetermined number of nodes ( k)
cons is ting of nodes of the chos en ty pe; doW nloa ding the da ta
s tored in the s tora ge nodes in the chos en s ub s et; a nd decoding
the da ta doW nloa ded to obta in the s ource da ta .
24 . The method a s cla imed in cla im 5, W herein da ta to be
s tored in the neW node repla cing a fa iled s tora ge node is
obta ined by choos ing one ty pe of nodes to connect to from a
plura lity of ty pes , s a id ty pe being dis tinct from the ty pe of the
repla cement node; choos ing one of a ll pos s ible s ubs ets of a
predetermined number of nodes ( k) cons is ting of nodes of the
chos en ty pe; doW nloa ding a predetermined a mount of da ta
( [ 3) from ea ch of the s tora ge nodes in the chos en s ubs et; a nd
decoding the doW nloa ded da ta .
25. The method a s cla imed in cla im 5, W herein the coded
da ta s tored in the s tora ge nodes a s s ocia ted to ea ch one of a
plura lity of ty pes is obta ined us ing a ny a rbitra ry linea r error
correcting code.
26. The method a s cla imed in cla im 25, W herein the da ta
recons truction opera tion is performed by choos ing one ty pe
of nodes to connect to from a plura lity of ty pes ; choos ing one
N ov. 24 , 2011
of a ll pos s ible s ubs ets cons is ting of nodes of the chos en ty pe,
the s a id pos s ible s ubs ets being dependent on the linea r error
correcting code a s s ocia ted W ith the chos en ty pe; doW nloa d
ing the da ta s tored in the s tora ge nodes in the chos en s ubs et;
a nd decoding the da ta doW nloa ded us ing the decoding a lgo
rithm of the s a id linea r error- correcting code.
27 . The method a s cla imed in cla im 25, W herein da ta to be
s tored in the neW node repla cing a fa iled s tora ge node is
obta ined by choos ing one ty pe of nodes to connect to from a
plura lity of ty pes , the s a id ty pe being dis tinct from the ty pe of
the repla cement node; choos ing one of a ll pos s ible s ubs ets
cons is ting of nodes of the chos en ty pe, the s a id pos s ible
s ubs ets being dependent on the linea r error- correcting code
a s s ocia ted W ith the chos en ty pe; doW nloa ding a predeter
mined a mount of da ta ( [ 3) from ea ch of the s tora ge nodes in
the chos en s ubs et; a nd us ing the decoding a lgorithm of the
linea r error- correcting code a s s ocia ted W ith the chos en ty pe.
28. The method a s cla imed in cla im 5, W herein the da ta to
be s tored in the s tora ge nodes ca n be deploy ed to the nodes in
a dis tributed ma nner.
29. The method a s cla imed in 28, W herein a node tha t ha s
not received the da ta a cts a s the repla cement of a fa iled node,
a nd doW nloa ds the des ired da ta from one of a plura lity of
s ubs ets of nodes W hich ha ve received the da ta .
* * * * *

You might also like