0% found this document useful (0 votes)
7 views42 pages

Distributed Storage

The document discusses distributed storage systems, focusing on file organization, retrieval, and access control. It outlines requirements for distribution such as transparency, fault-tolerance, and security, and describes both naive and actual implementations like NFS and GFS. Key components include directory services, client-server architecture, caching strategies, and operations for reading and writing data.

Uploaded by

Maxine Niigambo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views42 pages

Distributed Storage

The document discusses distributed storage systems, focusing on file organization, retrieval, and access control. It outlines requirements for distribution such as transparency, fault-tolerance, and security, and describes both naive and actual implementations like NFS and GFS. Key components include directory services, client-server architecture, caching strategies, and operations for reading and writing data.

Uploaded by

Maxine Niigambo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

distributed storage/1

dsa621s - 2022
distributed le systems
fi
le systems

• organisation, storage and retrieval of les

• naming, sharing and protection of les

• interface to le data

3
fi
fi
fi
fi
le

• data (seq. of bytes)

• attributes

• le length, le type

4
fi
fi
fi
directory

• special le

• maps names to le identi er

5
fi
fi
fi
modules

• directory

• le

• access control

• lower level

• le acces, block , device

6
fi
fi
operations

• create

• open, close

• read, write

• seek

• link, unlink

7
distribution — requirements (1)

• transparency (access, location, etc.)

• concurrent updates

• advisory or mandatory locks

• le or record level locks

8
fi
distribution — requirements (2)

• le/block replication

• fault-tolerance

• consistency

9
fi
distribution — requirements (3)

• hardware & os heterogeneity

• security

• e ciency

10
ffi
abstract architecture (1)

• at le service

• unique le identi er (UFID)

11
fl
fi
fi
fi
abstract architecture (2)

• directory service

• map names to UFID

12
client computer server computer

Application Application directory service


Program Program

client module flat file service

13
• client module

• runs on client machines

• integrates all server modules

14
• server module

• rpc implementation for at le service (e.g., create, read, write, delete)

• rpc implementation for directory service (e.g., lookup, addName)

• checks access rights

• stateless server

15
fl
fi
naive implementation (1)

• rpc for every call to the server

• arg and result serialisation

16
naive implementation (2)

• pro: same behaviour as a local fs

• cons: poor performance, server load and latency, no scalability

17
naive implementation (3)

• mitigation

• caching

• what to cache?

• cache consistency…

18
actual implementations
nfs
client computer server computer

Application Application
Program Program

Unix System
calls Unix kernel

virtual file system virtual file system


local remote

UNIX other UNIX


file file NFS NFS file
System system client NFS protocol server System

21
virtual le system

• helps distinguish between local and remote les

• translates between unix le identi ers and nfs le identi er

22
fi
fi
fi
fi
fi
fi
client module

• similar to the one in the abstract model

• Integrated with the unix kernel

23
access control

• nfs server is stateless

• check user identity against le access

• plug security module (e.g., kerberos)

24
fi
interface

• le

• read, write, lookup, getAttr, setAttr

• directory

• create, remove, rename

25
fi
caching

• server

• write-through cache

• commit

26
caching

• client

• attribute values (Tmserver) piggybacked with results of le operations

• client updates cache if Tmserver changed

• also uses freshness interval

27

fi
(T − Tc < t) ∨ (Tmclient = Tmserver)

28
gfs
Clients GFS Cluster

Client Master

Client

Chunk Chunk
Client
Server Server

Client
Chunk Chunk
Server Server
Client

30
Master

Metadata

ta
da
a
a
at
r m et
ad
m
et
r
se fo
on est
fo
qu
re
Client Chunk Chunk

sp
Server Server

re
Metadata
cache
re
ad
/w
rite

Linux FS Linux FS

31
chunks

• analogous to fs blocks (but larger)

• 64 MB

32
metadata

• le and chunk namespace

• le-to-chunk mapping

• location of a chunk replica

33
fi
fi
membership & group management

• master and chunk servers communicate regularly (hearbeat)

• failure detection

• chunk server

• disk failure

• replica corruption
operations

• read

• mutation

• write

• record append

35
Read Protocol
read operation (1)

Cache

36
23
Read Protocol
read operation (2)

37
write operation (1)

28
38
Write Algorithm
write operation (2)

29
39
write operation (3)

40
Write Algorithm
write operation (4)

31
41
• to be continued…

42

You might also like