Distributed Systems
Principles and Paradigms
Chapter 10
(version 27th November 2001)
Maarten van Steen
Vrije Universiteit Amsterdam, Faculty of Science
Dept. Mathematics and Computer Science
Room R4.20. Tel: (020) 444 7784
E-mail:
[email protected], URL: www.cs.vu.nl/ steen/
01 Introduction
02 Communication
03 Processes
04 Naming
05 Synchronization
06 Consistency and Replication
07 Fault Tolerance
08 Security
09 Distributed Object-Based Systems
10 Distributed File Systems
11 Distributed Document-Based Systems
12 Distributed Coordination-Based Systems
00 – 1 /
Distributed File Systems
Sun NFS
Coda
10 – 1 Distributed File Systems/
Sun NFS
Sun Network File System: Now version 3, version 4
is coming up.
Basic model: Remote file service: try to make a file
system transparently available to remote clients.
Follows remote access model (a)
instead of upload/download model (b):
1. File moved to client
Client Server Client Server
Old file
New file
Requests from
client to access File stays 2. Accesses are
3. When client is done,
remote file on server done on client
file is returned to
server
(a) (b)
10 – 2 Distributed File Systems/10.1 NFS
NFS Architecture
NFS is implemented using the Virtual File System
abstraction, which is now used for lots of different op-
erating systems:
Client Server
System call layer System call layer
Virtual file system Virtual file system
(VFS) layer (VFS) layer
Local file Local file
system interface NFS client NFS server system interface
RPC client RPC server
stub stub
Network
Essence: VFS provides standard file system inter-
face, and allows to hide difference between accessing
local or remote file system.
Question: Is NFS actually a file system?
10 – 3 Distributed File Systems/10.1 NFS
NFS File Operations
Oper. v3 v4 Description
Create Yes No Create a regular file
Create No Yes Create a nonregular file
Link Yes Yes Create a hard link to a file
Symlink Yes No Create a symbolic link to a file
Mkdir Yes No Create a subdirectory
Mknod Yes No Create a special file
Rename Yes Yes Change the name of a file
Remove Yes Yes Remove a file from a file system
Rmdir Yes No Remove an empty subdirectory
Open No Yes Open a file
Close No Yes Close a file
Lookup Yes Yes Look up a file by means of a name
Readdir Yes Yes Read the entries in a directory
Readlink Yes Yes Read the path name in a symbolic link
Getattr Yes Yes Get the attribute values for a file
Setattr Yes Yes Set one or more file-attribute values
Read Yes Yes Read the data contained in a file
Write Yes Yes Write data to a file
Question: Anything unusual between v3 and v4?
10 – 4 Distributed File Systems/10.1 NFS
Communication in NFS
Essence: All communication is based on the (best-
effort) Open Network Computing RPC (ONC RPC).
Version 4 now also supports compound procedures:
Client Server Client Server
LOOKUP
OPEN
LOOKUP READ
Lookup name Lookup name
Open file
READ
Read file data
Read file data
Time Time
(a) (b)
(a) Normal RPC
(b) Compound RPC: first failure breaks execution of
rest of the RPC
Question: What’s the use of compound RPCs?
10 – 5 Distributed File Systems/10.1 NFS
Naming in NFS (1/2)
Essence: NFS provides support for mounting remote
file systems (and even directories) into a client’s local
name space:
Client A Server Client B
remote bin users work bin
vu steen me
mbox mbox mbox
Exported directory Exported directory
mounted by client mounted by client
Network
Watch it: Different clients may have different local
name spaces. This may make file sharing extremely
difficult (Why?).
Question: What are the solutions to this problem?
10 – 6 Distributed File Systems/10.1 NFS
Naming in NFS (2/2)
Note: A server cannot export an imported directory.
The client must mount the server-imported directory:
Exported directory
contains imported
subdirectory
Client Server A Server B
bin packages
Client
imports
directory
draw from draw
server A Server A
imports
directory
install install from install
server B
Network
Client needs to
explicitly import
subdirectory from
server B
10 – 7 Distributed File Systems/10.1 NFS
Automounting in NFS
Problem: To share files, we partly standardize local
name spaces and mount shared directories. Mount-
ing very large directories (e.g., all subdirectories in
) takes a lot of time (Why?).
Solution: Mount on demand — automounting
Client machine Server machine
1. Lookup "/home/alice"
users
3. Mount request
NFS client Automounter
alice
2. Create subdir "alice"
Local file system interface
home
alice
4. Mount subdir "alice"
from server
Question: What’s the main drawback of having the
automounter in the loop?
10 – 8 Distributed File Systems/10.1 NFS
File Sharing Semantics (1/2)
Problem: When dealing with distributed file systems,
we need to take into account the ordering of concur-
rent read/write operations, and expected semantics (=
consistency).
Client machine #1
a b
Process
A
a b c
2. Write "c" 1. Read "ab"
File server
Original file
Single machine a b
a b
Process
A 3. Read gets "ab"
a b c
Client machine #2
Process
a b
B
Process
B
1. Write "c" 2. Read gets "abc"
(a) (b)
10 – 9 Distributed File Systems/10.1 NFS
File Sharing Semantics (2/2)
UNIX semantics: a operation returns the effect
of the last operation can only be imple-
mented for remote access models in which there
is only a single copy of the file
Transaction semantics: the file system supports trans-
actions on a single file issue is how to allow
concurrent access to a physically distributed file
Session semantics: the effects of and op-
erations are seen only to the client that has opened
(a local copy) of the file what happens when a
file is closed (only one client may actually win)
10 – 10 Distributed File Systems/10.1 NFS
File Locking in NFS
Observation: It could have been simple, but it isn’t.
NFS supports an explicit locking protocol (stateful),
but also an implicit share reservation approach:
Current denial state
None Read Write Both
Request Read OK Fail OK Fail
access Write OK OK Fail Fail
Both OK Fail Fail Fail
Requested denial state
None Read Write Both
Current Read OK Fail OK Fail
access Write OK OK Fail Fail
state Both OK Fail Fail Fail
Question: What’s the use of these share reserva-
tions?
10 – 11 Distributed File Systems/10.1 NFS
Caching & Replication
Essence: Clients are on their own.
Open delegation: Server will explicitly permit a client
machine to handle local operations from other clients
on that machine. Good for performance. Does require
that the server can take over when necessary:
1. Client asks for file
Client Server
2. Server delegates file
Old file
Local copy 3. Server recalls delegation
Updated file
4. Client sends returns file
Question: Would this scheme fit into v3
Question: What kind of file access model are we
dealing with?
10 – 12 Distributed File Systems/10.1 NFS
Fault Tolerance
Important: Until v4, fault tolerance was easy due to
the stateless servers. Now, problems come from the
use of an unreliable RPC mechanism, but also stateful
servers that have delegated matters to clients.
RPC: Cannot detect duplicates. Solution: use a duplicate-
request cache:
Client Server Client Server Client Server
XID = 1234 XID = 1234 XID = 1234
XID = 1234
process
request
XID = 1234 reply is lost
Cache Cache Cache
reply
XID = 1234
Time Time Time
(a) (b) (c)
Locking/Open delegation: Essentially, recovered server
offers clients a grace period to reclaim locks. When
period is over, the server starts its normal local man-
ager function again.
10 – 13 Distributed File Systems/10.1 NFS
Security
Essence: Set up a secure RPC channel between
client and server:
Secure NFS: Use Diffie-Hellman key exchange to set
up a secure channel. However, it uses only 192-
bit keys, which have shown to be easy to break.
RPCSEC GSS: A standard interface that allows inte-
gration with existing security services:
Client machine Server machine
NFS client NFS server
RPC client stub RPC server stub
RPCSEC_GSS RPCSEC_GSS
GSS-API GSS-API
Kerberos
Kerberos
LIPKEY
LIPKEY
Other
Other
Network
10 – 14 Distributed File Systems/10.1 NFS
Coda File System
Developed in the 90s as a descendant of the An-
drew File System (CMU)
Now shipped with Linux distributions (after 10 years!)
Emphasis: support for mobile computing, in par-
ticular disconnected operation.
Transparent access
to a Vice file server
Virtue
client
Vice file
server
10 – 15 Distributed File Systems/10.2 Coda
Coda Architecture
Virtue client machine
User User Venus
process process process
RPC client
stub
Local file
Virtual file system layer
system interface
Local OS
Network
Note: The core of the client machine is the Venus
process (yes, it’s a nasty beast). Note that most stuff
is at user level.
10 – 16 Distributed File Systems/10.2 Coda
Communication in Coda (1/2)
Essence: All client-server communication (and server-
server communication) is handled by means of a re-
liable RPC subsystem. Coda RPC supports side ef-
fects:
Client
Server
application
Application-specific
RPC Client protocol Server
side effect side effect
RPC client RPC protocol RPC server
stub stub
Note: side effects allows for separate protocol to han-
dle, e.g., multimedia streams.
10 – 17 Distributed File Systems/10.2 Coda
Communication in Coda (2/2)
Issue: Coda servers allow clients to cache whole files.
Modifications by other clients are notified through in-
validation messages there is a need for multicast
RPC:
Client Client
Invalidate Reply Invalidate Reply
Server Server
Invalidate Reply Invalidate Reply
Client Client
Time Time
(a) (b)
(a) Sequential RPCs
(b) Multicast RPCs
Question: Why do multi RPCs really help?
10 – 18 Distributed File Systems/10.2 Coda
Naming in Coda
Essence: Similar remote mounting mechanism as in
NFS, except that there is a shared name space be-
tween all clients:
Naming inherited from server's name space
Client A Server Client B
afs local afs
bin pkg
bin pkg
Exported directory Exported directory
mounted by client mounted by client
Network
10 – 19 Distributed File Systems/10.2 Coda
File Handles in Coda
Background: Coda assumes that files may be repli-
cated between servers. Issue becomes to track a file
in a location-transparent way:
Volume
replication DB RVID File handle
File server
VID1,
VID2
Server File handle
Server1
Server2 File server
Volume
location DB
Server File handle
Files are contained in a volume (cf. to UNIX file
system on disk)
Volumes have a Replicated Volume Identifier (RVID)
Volumes may be replicated; physical volume has
a VID
10 – 20 Distributed File Systems/10.2 Coda
File Sharing Semantics in Coda
Essence: Coda assumes transactional semantics, but
without the full-fledged capabilities of real transactions.
Session S A
Client
Open(RD) File f Invalidate
Close
Server
Close
Open(WR) File f
Client
Time
Session S B
Note: Transactional issues reappear in the form of
“this ordering could have taken place.”
10 – 21 Distributed File Systems/10.2 Coda
Caching in Coda
Essence: Combined with the transactional seman-
tics, we obtain flexibility when it comes to letting clients
operate on local copies:
Session S A Session SA
Client A
Open(RD) Close Close
Open(RD)
Invalidate
Server File f (callback break) File f
File f OK (no file transfer)
Open(WR)
Open(WR) Close Close
Client B
Time
Session S B Session S B
Note: A writer can continue to work on its local copy;
a reader will have to get a fresh copy on the next open.
Question: Would it be OK if the reader continued to
use its own local copy?
10 – 22 Distributed File Systems/10.2 Coda
Server Replication in Coda (1/2)
Essence: Coda uses ROWA for server replication:
Files are grouped into volumes (cf. traditional UNIX
file system)
Collection of servers replicating the same volume
form that volume’s Volume Storage Group)
Writes are propagated to a file’s VSG
Reads are done from one server in a file’s VSG
Problem: what to do when the VSG partitions and
partition is later healed?
Server Server
S1 S3
Client Broken Client
Server
A network B
S2
10 – 23 Distributed File Systems/10.2 Coda
Server Replication in Coda (2/2)
Solution: Detect inconsistencies using version vec-
tors:
CVVi f j k means that server Si knows that
server S j has seen version k of file f .
When a client reads file f from server Si, it re-
ceives CVVi f .
Updates are multicast to all reachable servers (client’s
accessible VSG), which increment their CVVi f i .
When the partition is restored, comparison of ver-
sion vectors will allow detection of conflicts and
possible reconciliation.
Note: the client informs a server about the servers in
the AVSG where the update has also taken place.
10 – 24 Distributed File Systems/10.2 Coda
Fault Tolerance
Note: Coda achieves high availability through client-
side caching and server replication
Disconnected operation: When a client is no longer
connected to one of the servers, it may continue with
the copies of files that it has cached. Requires that
the cache is properly filled (hoarding).
Compute a priority for each file
Bring the user’s cache into equilibrium (hoard walk):
– There is no uncached file with higher priority
than a cached file
– The cache is full or no uncached file has nonzero
priority
– Each cached file is a copy of a file maintained
by the client’s AVSG
Note: Disconnected operation works best when there
is hardly any write-sharing .
10 – 25 Distributed File Systems/10.2 Coda
Security
Essence: All communication is based on a secure
RPC mechanism that uses secret keys. When logging
into the system, a client receives:
A clear token CT from an AS (containing a gener-
ated shared secret key KS ). CT has time-limited
validity.
A secret token ST Kvice CT Kvice , which is an
encrypted and cryptographically sealed version of
CT .
1
ST, K S (RA )
2
K S (RA +1, R B )
Alice
Bob
3
K S (RB +1)
4
K S (K S2 )
10 – 26 Distributed File Systems/10.2 Coda