Module2 - Part 1
Module2 - Part 1
4
ARCHITECTURES
OF DISTRIBUTED
SYSTEMS
4.1 INTRODUCTION
The term distributed system is used to describe a system with the following charac teristics:
it consists of several com uters that do no are a m m9zy. or a clock·_the com uters
co.IIUI1uJlicate with each other by _exchanging messages over a communica tion network-
(see Fig. 4.1); and each computer has its own memo an runs its own operating system
(see Sec. 4.5.8). The resources owned and controlled by a computer are said to be
it, while the resources owned and controlled by other computers and those that
can only be accessed through the network are said to be remote. Typically, accessing r
esimsore expensive tha11 accessing local resources because of th
communication,delays tharoccur tn the network and the CPU overhead incurred to process
communication ,IW)tocols (see Sec. 4.6). Based on the context, the terms com puter, node,
host, site, machine, processor, and workstation are used interchangeably to denote a
computer throughout this book. ·
The main purpose of this chapter is to serve as an introduction to Part II (Dis
tributed Operating Systems) and Part m (Distributed Resource Management). This
chapter presents the issues that arise in the design of a distributed operating system. In
addi'tion, this chapter discusses how communication is handled between computers and
how programs communicate to perform distributed computations.
71
72 ADVANCl!D CONCEPTS IN OPERATING SYSTEMS
B
tolerant. Services are processes that provide functionality (e.g., a file service provides
IM•...yl file'system management; a mail service provides an electronic mail facility).
FIGURE4.1 - Modular expandability. Distributed computing systems are inherently amenable
'-
Architecture of a distributed system. , to modularexpans1on6ecause new hardware and software resources can be easily added
without replacing the existing resources.
locates files by broadcasting queries. Under s file system: every omputer in the ARCHrrECTUREs OP DIS11UBtnED SYS'T1!MS 77
distributed system is subjected to message handling overhead, urespectlve of whether it Chapter 6 describes'several algorithms for achieving mutual exclusion and compares
has the requested file or not. As the number of users increase and the system gets larger their performance.
the number of file location queries will increase and the overhead will grow larger In distributed sys ms, processes can request resources (local or remote) and re
well, hurting the performance of every computer: In general, any design approach in lease resources in any order that may not be known a priori. JI the sequence of the
which the requirement of a scarce resource (such as storage, communication bandwidth allocation of resources to processes is not controlled in such environments, deadlocks
and manpower) increases linearly with the number of computers in the system, is likel; may occur. It is important that deadlocks are detected and resolved as soon as possi
to be too costly to implement. For example, a design requiring that information ble, otherwise, system performance can degrade severely. Chapter 7 discusses several
regarding deadlock handling,strategies and describes several deadlock detection techniques for
the system's configuration or directories be stored at every computer is not suitabl
even for systems of moderate size [I I]. .. . . . e,
' i,, I
distributed systems. ·
' f ,.. .; ,· -;, I
1, -1;•
the other hand, the distributed system cannot include computers with different DATA MIGRATION.. Inthe process of data migration, data is brought to the
architectures from the same or different vendors. Because of this major restnction, location of the computatioQ that needs access to it by the distribw&ed operating system.
binary compatibility is rarely supported in large distributed systems. · The data in question may be a file (stored locally o·rremotely) or contents of a
,•:_· physical memory (local or of another computer). If a computation updates a set of
Execution level compatibility is said to exist in a distributed system if the same data, the original location (remote or local) may have to be similarly updated. _
source code can be compiled and executed properly on any computer in the system If the data accessed is a file, then the computation's data access request is brought
[ I I]. Both Andrew [3;!] and Athena [ I I] systems support execution level under the purview of the distributed file system by the distributed operating system. A
compatibility. Protocol level compatibility is the least restrictive form of distributed file system is the component of a distributed operating system that imple
ments a common file system available to the autonomous computers in the system. The
compatibility. It achieves interoperability by requiring all system components to
primary goal of a distributed file system is to provide the same functional capability to
support a common set of proto cols. A significant advantage of protocol level
access files regardless of their location in the network as that provided by a file system
compatibility is that individual computers can run different operating systems while not
of a time-sharing mainframe operating system that only accesses files residing at one
sacrificing their interoperability. For ex ample,a distributed system supporting
location. Ideally, the user doesn't need to be aware of the location of files to access
protocol level compatibility employ-scommon protocols for essential system services
them. This property of a distributed file system is known as network transparency. Im
such as file access (for example see Sun NFS,
Sec. 9.5.1), naming, and au ntication. portant issues in the design of a distributed file system, common mechanisms employed
in the building of distributed file systems, and several case studies of distributed file
systems are presented in Chap. 9.
4.5..S Process SynchronJzafion· · · ·_ If, on the other hand, the data accessed is in the physical memory of another
computer, then a computation's·data access request is brought under the purview of
Th sy? hronization of processes in distributed systems is. diflic l-t au e ofthe distributed shared memory management by the distributed operating system. A dis
un availabihty f sh memory. A distributed operating system has to synchronize pro tributed shared memory provides a virtual address space that is shared among all the
cesses runrung at d1fferen_t computers when they try to concurrently accessa shared computers in a distributed system. A distributed shared memory is an implementation
resource, such asa file 1rectory. For correctness, it is necessary that the shared re
:C°:- be accessed bya s ngle process at a time. This problem is known as the mutual
din :troblem, wherein conc nt access to a shared resource by several uncoor-
of the shared memory concept in distributed systems that have no physically shared
memory. The major issues in distributed shared memory implementation concern the
4.5.7 Security · : ·
. i bat
The security ofa system is the responsibility of its operating system. '1\vo i su e s
must be considered in the design of security for computer systems are authenticaliop
and authorization [11}. Authentication is the process of guaranteeing that an entity iB
what it claims to be. Authorization is the process of deciding what privileges an entity
has and making only these privileges available. The security of computer systems and
Various protection mechanisms to achieve security are discussed in Chaps. 14 and 15.
4.5.8 Structuring
interaction (through messages) between the processes providing the system services.
In addition, the microkernel provides services that are typically essential to every
computer in a distributed system, such as task management (e.g., focal scheduling of
tasks), processor management. virtual memory management, etc. The microkemel runs
on all the computers in a distributed system. 1be other (all or a few) may
or may not
·run at a computer depending on the need and the hardware available at that
computer. The collective kernel structure can readily make use of a very
helpful desip technique known as policy and -chanism separation (S2J. By
separating policies and mechanisms in an implementation. one can change any
given policy without changing
the underlying mechanisms. ·
Mach [IJ, V-kemel [12J, Chorus [39J, and Galaxy (43) are examples of operaoq
systems that use the collective kernel structuring technique.
4.6 COMMUNICATION NETWORKS ::, 1 •• i PACKET SWITCHING VERSUS CIRCUIT SWITCHING. The communication net
work can be utilized in one of the following two modes, namely, circuit switching or
This Section introduces the communication aspects of distributed systems, All the co packet switching (46]. In circuit switching, a dedicated path is established between two
puters ina distributed system are interconnected through a computer communication devices wishing to communicate, and the path remains intact for the entire duration
network. A computer can exchange messages with other computers and access data in which the two devices communicate. The telephone system uses circuit switching.
stored at another computer through this network. Communication networks are broadly When one subscriber dials another subscriber's number or when one connects bis ter•
classified as Wide Area Networks and ·Local Area Networks., _, , ., •,·.. •.; -,• .,'•• ,_- minal to a computer by dialing the computer's number, a dedicated path between the
two points is established through various switches. The path is broken when one side
., r,, I 1·.,_1! terminates the conversation.
4.6.1 Wide Area Networks ' · ·_,···, • ·, '·:•-··:,., ,.·_ ··· , In packet switching, a connection is established between the source device
J ;, _, <;.. (termi nal or computer) and its nearest switch (46]. The data or message to be
Wide area networks (WANs) are employed·to interconnect various devices (such·aa communicated is broken down into smaller units called packets (several hundred to
computers and terminals) spread over a wide geographic area that may cover different several thousands bytes in length), with each packet containing the address of the
cities, states, and countries. WANs have also been referred to as Long HaulNetworks destination. The pack ets are then sent to the nearest switch. These packets are
The-communication facility in a WAN consists of switches that are interconnected routed from one switch to another switch in the communication network until they
by communication links. (See Fig. 4.2.) These links may. be established arrive at the switch connected to the destination device, at which point the data is
througlt; delivered to the destination. Thus. in packet switching, a communication path
I, ,t·_ J 1
(switches and links) is not reserved by any two devices wishing to communicate, but
rather is dynamically shared among many devices on a demand basis. The achievable
1
, •j '.>-; '_.. ,II utilization of the communication network is higher under packet switching compared to
circuit switching because the network can be shared by many devices. Also, parallel
Computer; transmission, and hence reduction in data transmission time, is possible because
packets forming one message may travel along different paths. Moreover. data
transmission in computer networks is sporadic ra1her than continuous. Thus, most
\. computer networks use packet switching to pennil better utilization of the network.
.........
r One disadvantage of packet switching, however. is that the breaking of a message
into packets and assembling them back at the destination curies some cost.
··········..•....
THE ISO OSI REFERENCE MODEL. Generally. WANs must interconnect hetero
Compu... geneous types of equipment (e.g., computers, terminals, printers). These types of equip
ftGURE 4.2, ..--;; 1 r._ ; ,,, ... J
ment may differ from each other in their speed, word length. information representation,
A point-to-point network. or in many ther criteria. To communicate in such a heterogeneous environment, the
82 ADVANCED CONCEPTS IN OPERATING SYSTEMS
ISO OSI reference modeJt provides a framework_for communi ation protocols[S ] AllClffll!CIUIID Of DIST1UBUTED SYffl!MS 8J
organizes the protocols as seven la ers and pec fies the func ons of each layer,3 ·
organization of these seven layers 1s shown m Fig. 4.3. In this model, user pro AN OVER EW OF THE O OSI LAYERS. The physical layer's function is
run in the application layer. · ' · ' to allow a device to send raw bit streams of data over the communication network.
When an application program running at comput_er A wants to send a message tjo It is no concerned with transmission errors, how bits are organized, or what they
mean. This layer should, however, be aware of and take care of the communication
application program at computer B, the following cham f events occur. The applicati: • network implementation details such as circuit/packet switching, the type of
at computer A passes the message down to the presentation layer (on computer Aitself)l network (i.e., tel ph ne system, digital transmission, etc.), the voltage levels used for
The presentation layer transforms the data (explained late_r)a, dds a header containi · representingO and
some control information to the message, and passes the resulting message to the Sessi , 1 bits, the number of pins and their assignments in network connectors, etc.
layer. The session layer adds its owner header to the message and passes the·resu}tj The data-link layer is responsible for recovering from transmission errors and
message to the next layer. This continues on until the message reaches the physi, for flow control. Aow control takes care of any disparity between the speeds at which
layer. The physical layer on computer A transmits the raw data bits to the physical la· the bits can be sent and received. The data-link layer makes the communication
running at computer B. Note that the message is routed to compute B through var( facility
provided by the physical layer reliable.
intermediate switches in the communication network. Once the message is receivedOllf The network layer is trtainly responsible for routing and congestion conttOI. It
computer B, the protocol at each layer strips the, header added by its counteipan breaksa message into packets and decides which outgoing line will carry the packets
co puter A, performs the necessary processing identified by the header, and toward their destination.
passes message on to the next layer. This continues until the essage reaches its The transport layer's primary function is to hide all the details of the communica-
destinatioll-if process in the application layer at computer B. . . . - .. tion network from the layers above. It provides a network independent device-to-
. · device ( or end-to-end) communication: This layer can provide the ability to the
The ISO OSI model does not_specify how the layers should be implem i;;{ network to in fonna host that the network has crashed or has lost certain packets.
Every layer is aware only of the protocols and header formats of its Thus, the transpon layer can provide improved reliability if necessary, ., · :
counterpa_rt.It not understand the header or the protocols used by the other ., ,The_sessiof} layer is responsible for establishing and maintaininga connection,
known as·a session, between two processes. Establishing a connection may involve
layers. This makes layer independent, so any layer can change its protocol without
the authentication of tile communicating processes and the selection of the right
affecting other layers as long as the interfaces between the layers remain transport service. In addition, the session layer may keep track of the outstanding
unchanged.··, . .,,, .'.· , 1 requests and replies from processes and order them iil such a manner to simplify the
, We next give a brief overview of the ISO OSI model [46). Complete infonnali0_1l design of user
about the model can be found in (46, 53). , .· . , , ; ,' ., , . , ·· programs. . .. , . , .· : : . ·. •
.. ..
• t •• t ,. • . ' '.. ' .
,. The presentation layer is the interface between a·user program and the rest of
the·network. It provides data transformation utilities to take care of the differences in
:·..:: -1 ,, J.1
representing information at the source and at the destination. In addition, the presentation
.
t, • , ·11 ,
1,
layer may perform data compression, encryption, and conversion to and from network
Application Presentation Session Application standards for terminals and files. '
Transport --;;t '•·;J
Presentation
, se_s
The application layer's function is to provide a facility for the user
Networlc 10_use the ISO OSpI rotocols. Its content is left 10 the users. (For
· Datalink , Session r -: examples,pec_ific_applicanons
Physical cation such as airline and banking may have their own standards for the apphcanon layer.)
Transport'·
,J• i
l
Network. Datalink
. Network
i
:.:• 'lj _I
Network • 11f '·: ! 1i I
-t- the data communicating devices typically include computers, terminals, and peripheral
it:''I
i-;...
iI : devices. Some of the key characteristics of LANs are:
Host t, i Switch , • Switch .. H t B.t: I' ••
:........................................................................................................................... J
• High data transmission rates (IO megabits per second toI 00 megabits per second).
FIGURE 4.3 • The geographic scope of LANs is small, generally confined toa single building or
The ISO OSI reference model. '· ' . ··; .
l· c· . ,, : ;\·,,. perhaps several buildings (such • Low transmission error rate.
·,, as a college campus).
-. .II·:• fl '1r.• ,., ,f •. ,•, j;1
'Wemlli.i Slllldarda zation's Refere Model of Open Systems lntcn:onn tion. '· '
w
84 ADVANCED CONCEPTS IN OPERATING SYSTEMS
, ,The token ring protocol. A widely used access controf protocol to cooaol access
available to all the other devices, but is received only by the addressed device [45].A to the ring is the token ring technique (17, 45]. Under this technique, a token circulates
tree topology LAN is obtained by interconnecting many bus topology LANs to a around the ring. The token is labeled free when no device is transmitting. When a
common bus (see Fig.4.4). Bus topology LANs can be viewed as branches of_a tree. device wishes to transmit, it waits for the token to arrive. labels the token as busy on
_I nt h e bustopology, all the devices connected to the LAN are allowed to transmit at arrival, and retransmits the token. ,Immediately following the release of the token. the
any time.However, as the devices sharea common data path (i.e., the bus), a protocol device transmits data. The transmitting device will mark the token as free when the
to control the access to the bus isnecessary. We next discuss two protocols to busy token returns to the device and the device has completed its transmission. 1be
control access to the bus.·' main advantage of the token ring protocol is that it is not sensitive to the load on the
,. '·,:·•I,
network; the entire bandwidth of the medium can be utilized. The major disadvantage
The CSMA/CD protocol. The most commonly used access control protocol of the token ring protocol is its complexity. The token has to be maintained error-free.
for bus topology is CSMA/CD (Carrier Sense Multiple Access with Collision De If the token is lost, care must be taken to generate only one token. The maintenance of
tection) [45]. Under thisprotocol, a device wishing to transmit listens to the medium the token may require a separate process to monitor iL
to determine whether another transmission is in progress. If so, the device waits for The slotted ring protocol. The slotted ring is another technique used to conaol
a random amount of time before trying again. If no other transmission is in progress, access to a ring network [37. 451. In this· technique, a number of fixed length slots
the device starts transmitting data and continues to listen to the medium while it is continuously circulate around the 1ing. The ring is like a conveyor belL A device
transmitting. If another device starts transmitting simuhaneously, the two transmissions wishing to transmit data waits for a slot marked empty to arrive, marks it fall. and
ollide. Ifa collisi_on is detected, a short jamming signal is transmitted over the bus to inserts the destination's address and the data into the slot as it goes by. 1be device is
inform all the devices that there has been a collision. The devices will then wait fora not allowed 10 retransmit again until this slot returns to the device. at which time it is
rand m amou_ntof. tim be o_re attem tin to transmitagain. The principal marked as empty by the device. As each device knows how many slots are cin:ulating
advantage of this !'rotocoJ 1s its s11 1phc1ty. Its pnnc1pal disadvantage is that undera
around the ring, it can determine the slot it had marked previously. After the newly
emptied slot continues on, the device is again free to transmit data. A few bits are
heavy load, contention for the bus nses and performance degrades because of reserved in each slot so that the result of the transmission (accepted, busy. or rejected)
frequentcollisions. Thusa, bus us_ing the CSMA/CD protocol cannot supporta large
number of devices per bus. Ethernet 1s an example of a LAN that is based on the
CSMA!co
d principle [31J.
•by e v e1 cne ttoheanCSthMA-/CD protocol is used for a tree topology' packets transmitted
one
on an0 th b O 11
er wi enter the common bus unless the destination device 1s can be returned to the source. ,
not The key advantage of the slotted ring technique is it,; simplicity, which translates
er ranch of the tree.Hence, the common bus serves asa backbone connecting into reliability. The prime disadvantage is wasted bandwidth. When the ring is not
86 ADVANCED CONCEPT'S IN OPERATING SYS'reMS
AIIOlmCTtlaES cw DISDJSUTD) SYS1DIS ff7
neavily utilized, many empty slots will be circulating, but a particular device wishing
to transmit considerable amounts of data can only transmit once per round-trip ring time. In the unbuffered option, data is copied from one user buffer ro IIDOtbcr user
buffer directly (29]. In this case, a program issuing a SEND should avoid reusing the
4.7 COMMUNICATION PRIMITIVES
user buffer nti themessage has bee!] transmitted. For large messages (thousands of
bytes). a combmation of unbuffered and nonblocking semantics allows almost complete
The communication network provides l:I'means to se!1d raw bit streams of data in overlap between the communication and the ongoing computational activity iD the user
distributed systems. The communication primitives are the high-level constructs with prognun-
which programs use the underlying communication network. They play a significant role A natural use of nonblocking communication occurs iD produccr-coDIWJIG' rela
in the effective usage of distributed systems. The communication primitives influence a tionships. The consumer process can issue a nonblocking RECEIVE. If a message is
programmer's choice of algorithms as well as the ultimate performance of the programs_ present, the consumer process reads it, otherwise it performs some other computation.
They influence both the ease of use of a system and the efficiency of applications that The producer process can issue nonblocking SENDs. If a SEND iails (Cir any reuoa
are developed for the system [29J. (e.g., the buffer is full), it can be retried later.
We next discuss two communication_models, namely, message passing and remote With blocking primitives, the SEND primitive does not return conbOI to the user
procedure call, that provide communication primitives. These two models have been program until the message has been sent (an unreliable blocking primitive) or un1i1 an
widely used to develop distributed operating systems and applications for distributed acknowledgment has been received (a reliable blocking primitive). In both cases.the
systems. · user buffer can be reused as soon as the control is returned to the user program. 1bc
corresponding RECEIVE primitive does not return control until a mdsage is copied ro
4.7.1 The Mes.sage Passing Model the user buffer. A reliable RECEIVE primitive automatically sends an acknowledgment,
while an unreliable RECEIVE primitive does not send an acknowledgment [47]. 1be
The message passing model provides. two basic communication primitives, name!y, primary advantage of employing blocking primitives is that the behavior of the programs
SEND and RECEIVE [47J. The SEND primitive has two paramete . a message and 1ts is predictable, and hence programming is relatively easy. The primary disadvantage
destination. The RECEIVE primitive has two parameters, the source (including anyone) is the lack of flexibility in programming and the absence of c:oncum:ncy between
ofa message anda buffer for storing the message: An application of these primitives computation and commwtlcation.
can be found in the client-server computation model. In the client-server model,a client
process needing some service (e.g., reading data from a file) sendsa message to the SYNCHRONOUS VS. ASYNCHRONOUS PRJMITIVE& We now address the
server and waits for a reply message. After performing the task, the server process ques tion of whether to buffer or not to buffer messages. With syncluono,u
sends the result in the form of a reply message to the client process. While these two. primitives. a SEND primitive is blocked until a corresponding RECEIVE primitive is
primitives provide the basic communication ability to programs, the semantics of these executed at the receiving computer. This strategy is also referred ro as a rmdezvo,u. A
primitives also playa significant role in ease of developing programs that use them. blocking synchronous primitive can be extended to an unblocking-synchronous primitive
We next discuss two design issues that decide the semantics of these two primitives. by first copying the message to a buffer at the sending side, and then allowing the
, process to perform other computational activity except another SEND.
BLOCKING VS. NONBLOCKING PRIMITIVEs. In the standard message passing
model. messages are copied three times: from the user buffer to the kernel buffer, With asynchronous primitives, the 111essages are buffered. A SEND primitive
does not block even if there is no corresponding execution of a RECEIVE primitive. 1bc
from the kernel buffer on the sending computer to the kernel buffer on the receiving
corresponding RECEIVE primitive can either be a blocking or a nonblocking primitive..
coi:n uter, and finally from the buffer on the receiving computer toa user buffer [29].
This 1s known as the buffered option. . : - , One disadvantage of buffering messages is that it is more complex. u it involves cre
With nonblocking primitives, the SEND primitive returns control to the user pro ating, managing, and destroying buffers. AnodJer issue is what ro do with the messages
·
cess as soon as the message is copied from the user buffer onto the kernel buffer. that are meant for processes that have already died '[47).
=de
The sponding REcEivE primitive signals its intention to receivea message and
a buff tocopy the message. The receiving process may either periodically
k for th amval ofa message or be signaled by the kernel upon arrival ofa mes 4.7.2 Remote Procedure Calls
sother ID
becomes wa;_
pnmary advantageof_nonblocking primitives is that programs havemaximum.
...._, .!,.
com utation and communication in any order theywant. On the
and gnifi t disadvantage
difliculL Proo..,.mr of nonblocking
bee · primitives is that programming
While the message passing communication model provides a highly flexible communi
cation ability, programmers using such a model must handle the following details_
(or 5YUem·sta1cs) an:. . ..--,..., '?3Y ome time-dependent where problems • Pairing of responses with request messages.
IITeproducible, making the programs very difficult to debug[47]. • Data representation (when computers of different architectures or programs written
in different programming languages are communicating).
88 ADVANCED CONCEPTS IN OPERATING SYSTEMS
and data across a communication network (a non-shared memory system) [9]. A remote
Parameter and result passing. To pass parameters or results to a remote
procedure call can be viewed as an interactibn between a client and a server, where procedurea, stu procedure has t_o convei: the parameters and results into an
the client needing a service invokes a procedure at the server. A simple RPC appropriate represcn tat on _(a representa 1on that 1s understood by the remote
mechanism works as follows. · . ., · ', •,· machine) first and then pack them mtoa buffer m a form suitable for transmission.
f ., ,·' ·, ,J•• After the message is received.
Basic RPC operation. On invoking a remote procedure, the calling process (the the message must be unpacked (see Fig. 4.5).
client) is suspended and parameters, if any, are passed to the remote machine (the Converting data into an appropriate representation becomes expensive if it has to
server) where the procedure will execute. On completion of the procedure execution, be done on every call. One way·to avoid conversions is to send the parameten along
witha code identifying the format used so that the receiver can do the conversion,
the results are passed back from the server to the client and the client resumes (only, of course, if it uses a different representation). This approach requires the
execution as if it had called a local procedure. While the RPC mech nism looks simple, machine to know how to convert all the formats that can possibly be used. This
the issues that arise in designing and implementing it are not so simple. We next discuss approach also bas poor portability because whenever a new representation
several of those issues. (because ofa new machine type ora new language) is introduced into the system,
,: existing software needs to be
4.7.3 Design Issues In RPC ' J..:..: updated [47]. ., . . . .
..
.,_ .
· Alternatively, each data type may have a standard format ID the message. In
technique the sender will convert the data to the standard format and the
Structure. A widely used organization for RPC mechanis is based'o; the co'ncept receiver will conv rt from the standard format to its local representation. With this
of stub procedures [4, 9, 40] (see Fig. 4.5). When a program (client) makes a approacha machine doesn't need to know how to convert all the formats that can
-
remote possibly be used.
procedure call, say P(x,y), it actually makes a local call on a dummy procedure o
a client-stub procedure corresponding to procedure P. The client-stub procedure con Biodlll
structs a message containing the identity of the remote procedure and parameters, if
-ft-H ---
-- -
.1 Client Macblne ,f
flt
any, to be passed. It then sends the message to the remote server machine (a on which
U1ct
primitive similar to SEND explained in Sec. 4.7.l may be used for this purpose). A it will be Progm
n
stub pro cedure at the remote machine receives the inessage (a primitive similar to executed FIGURE 4.S
RECEIVE may be used) and makes a local call to the procedure specified in the , upon a Lo
oal
message and passes the parameters received to the procedure. When the remote remote Pro
ccd
procedure completes execution, the control returns to the server-stub procedure. The procedur uie
Ca
ll
server-stub procedure passes the results back to the client-stub procedure at the calling e
machine, which returns , the results to the client. The stub procedures can be invocatio
generated at compile time or can . be linked at run time. ·.: ·'.; · n. The
'', ,'
binding
Binding. Bindir..g is a process that determines the remote procedure, and the machine process
Stub l
Q u e ' } ' "'"
B i l d, .in &
5er1cr
Sa,11 -
.....
y also c eck the compatibility of the pai:ameters passed and the procedure type called. Remote procedure call.
with what 1s expected from the remote procedure. .• . .. : •..'
90 ADVANCED CONCEPTS IN OPERATING SYSTEMS
!his method, however, is wasteful if both the sender and the receiver use the sarne AJICHrTECTtJREs OP DISTllllllfTED SYSTEMS 9J
internal representation [47]. . ·
Correctness condftJon. Panzieri and Srivastava [36] defi11 a simple correctness con
Another issue is how to deal with passing parameters by value and by reference dition for remote procedul'e calls as follows:
Passing parameters by value is simple, as the stub procedure simply opies the pa:
rameters into the message. However, passing param ters br
,:eference IS exceedingly
Let C, denote a call made by a machine and W; l'epl'CSCnt the corresponding
computation invoked at the called machine.
m? complicated. For example, just passing file pointers 1s inadequate, because the , 1 Let C2 happen after 01 (denoted by01 - 02) and computations W1 and W2
privileges associated with the calling process also have to be passed [47]. Moreover share the same data such that W1 and/or W2 modify the shal'ed data. ·
the semantics associated with passing parameters in local procedure calls and remo To be cofl'ect in the presence of failures, an RPC implementation should satisfy
procedure calls can be different. For example, in Argus [28], the structures pointed to the following correctness criterion.
by pointers are copied onto the remote machine, an(! hence, the calling machine and
the remote machine do not share the data structures directly; I
' I Ct - 02
•
implies W1 - W2, •
'I
THER ISSUES. e RPC mechanisms make use of the communication facility
Error handling·,semantics, and correctness. A rei:not pr°<:edure call can fail for
pro vided by the underlying network to pass messages to remote machines. One of the
at least two reasons: computer failures and commurucatJOn failures (such as whena
issues to be resolved is whether to build the RPC mechanism on top of a flow-controlled
transmitted message does not reach its destination). ' ,
and error-controlled virtual-circuit mechanism (similar to establishing sessions in
Handling failures in distributed systems is difficult. For example, consider the WANs) or directly on top of an unreliable, connectionless (datagram) service [47]. (In a
case where messages are lost occasionally. If the remote server is slow, the program datagram service, a machine simply sends a message in the form of packets to the
that invokes the remote procedure may invoke the remote procedure more than once, destination, and there is no extensive two-way communication such as automatic
suspecting a message loss. This could result in more than one execution of the procedure acknowledgments.)
at the remote machine. Also, consider Jhe case where the client machine, afterhaving' . As RPC mechanisms became a widely accepted method for communication in
invokeda remote procedure, crashes immediately. In this case, the remote procedure distributed systems, a need to specify a remote pI'OCedure call as low-latency or high
is executed in vain, as there is no machine to receive the result. On the other hand, if throughput became necessary, depending on the application. Low-latency calls requil'e
the client machine recovers quickly and reissues the remote procedure call, then · there minimum round-trip delay and 8l'e made in case of infrequent calls (such as calls to
isa possibility of more than one execution of the remote procedure. The unwanted a mail-server). On the other hand, the aim of high-throughput calls is to obtain maxi
executions have been refened to as orphans [22]. · mum possible throughput from the underlying communication facility. This type of call
In view o_f the above problems associated with distributed system failure, it is cleai is typically made when bulk data transfer is l'eqUil'ed, such as in the case of calls to
that t!1e emant1cs of R s play a significant role in the ease of development of programs file servers. The ASTRA RPC mechanism [4] provides the ability to a user to specify
for distributed computation. The semantics of RPCs are classified as follows [33,36]:' whether low-latency or high-throughput is desil'ed. For. high-throughput calls, ASTRA
st makes use of virtual circuit (TCP) protocol. For low-latency calls, ASTRA makes use of
1 "Atlea on e" semantics. If the remote procedure call succeeds, it implies that a datagram facility that is mol'e suitable for intennlttent exchange due to its simplicity.
raft theasto
e cal1ne execution
does of t_heremote
not succf!ed 1t is pos 'bfprocedure
th has taken
· ·place at the remote maehine. Stream [28] is another RPC mechanism-designed mainly to achieve higb-t!trougbpuL It
h.
also makes use of the TCP protocol. In both the ASTRA and Stream implementations,
high-throughput is achieved by buffering the messages and immediately returning con
ave taken p1ace. ' si e at zero, a partial, or more executions
one' trol to the user. The user can then make more calls. The buffer is flushed when it is full
"Exactly once" semantics. If the remote d . .. or convenient. By buffering the messages, the overhead to process the communication
exactly one execution of the remot d proce ure call succeeds, 11 implies that protocols on every l'emote proced call is avoid . For l w-late?cy calls, e buffer
H . e proce ure has taken pl
oweve r, 1f the remote procedure c alld ace at e remote machine.
th
is flushed immediately. Future [51] 1s an RPC facility that 1s specifically designed for
or one execution has taken place. oes not ucceed, it is possible that zeroa, partial, low-latency. .
Note that invoking a remote procedure call blocks the callin process. However,
In view of the above, it is apparent thata stro . . ..
these semantics severely limits the concurrency that can be achieved. Several RPC
s for the RPc mechanism to significant! . nger semantics for RPCs are neces
L1skov and Scheiller [28] define the foll improve upon the message passing model. designs have tried to overcome this l tation. . . · ·
"At One way to achieve parallelism 1s through creallng muJnple proc for each
owmg stronger semantics for RPCs.
tha most once" semantics. Same as ex I • , ' ;I remote procedure call [6]. This scheme ows a process 10 make m nple calls to
t do not tenninate nonnaJJy do not od acyt once semantics, but in addition calls many servers and still execute in parallel with the servers. However, cl'eabng processes.
so referred pr, uce any s·d ff, ' switching between processes, and destroying processes may not be economical n er
a suppon t
I stot as Zero-or-one semantics A numbe J e e ects. These semantics are
is, -once semantics [2, 5, 9, , { of RPC mechanisms implemented
a ·mo 36 44
allcirc
umsta
nces.
This
appro
ach
also
does
not
scale
well
for
large
syste
ms
consis
ting
of
hundr
eds of
comp
uters
[40].
-..- t,;.,.. ., • 1•--.. :r -
like in message passing primitives, programming becomes difficult. . • .. , dwere introduced m Sec. 4.5. These areas will be discussed in detail in thei
Bershad et al. [7] pcrfonned a study to determine the frequency of · chapters (5 through 12). orth,commg
intennachine procedure calls. According to their study, less than ten percent of all Local are·a networks pr.ovi.de a basic communication facility for distn'LllU..,I.P..I, sys-
system calls cross the machine boundary. Note that intramachine calls can be made tems, ue to tr 1 w commumc uon delays and low error rares. Wide area nctWorks.
efficient by avoiding the marshaling of data and other :RPC related network protocols. th higher communication delays and error rates. arc mainly employed
becaused of their
ASTRA optimizes the intramachine call by avoiding the above overhead and by to interconnect distributed systems based on LANs and compuicn spread overa wide
using the most efficient interprocess comrnui)ication mechanism provided by the host
geograCpohmicmaurneiac.ation primitives arc, the means through which programs can use
operating system to pass messages.
According to Gifford [18] existing RPC facilities have the following tw the underlying communication network. They play a significant role in the effective
'short- comings: . , :, , usage of distributed systems. Communication primitives influence the progiammcr's
choice of algorithms as well as the ultimate performance of the programs. They
• Incremental results: In the present RPC facilities, a.remote procedure ot easily influence both
return incremental results to the calling process while its execution is still in the ease of use ofa system and the efficiency of applications thal are developed for the
progress. system. Message passing primitives arc the baSic communication lives that prov!de
• Protocol flexibility: In present RPC systems, remote procedures are not first-class
a facility for programs to send and receive messages. However, wnung programs us1_ng
objects. (A first-class object is a value that can be freely stored in memory, passed these primitives is difficult because a programmer has to take care of y de ls,
such as the pairing of respon.ses withthrequest
medssagtaeksi,ndgactaarreeopfrecsocmntma1u1r°uc• auwonmangd
as a parameter to both local and remote procedures, and returned as a result from the • .
add ress of the remote machi ne or sert,veeer, an·delY accepted as the mecti.an•sm to
eh
both local and remote procedures [! ].) Th.is feature can make protocols inflexible.
For example, the following pro !is not 1mpl.ementablc unl.css remote procedures system failures. The RPC mech_anism as n wi tak f the above details.
support communication in distnbutedsyS t emsas tbeYoode call mechanislll,
are first-class objects: A process wishes to provide.a server with a procedure for
use under certain circumstances, and the server then wishes to pass this prOCcdurc They are based on the well known and ll un:1m 10 encompass communication
on to An RPC mechanism extends a procedure-c mec .
another server.
networks.. . · , • , · hope
To overcome the above limitations, G_ifford has proposed a new communication In the coming decade, we can
10
manY more advances in scalabililY see thousand5 of computcn) and in
model called the channel model [18]. In th!s model, r motc prOCcdu s arc first-class techniques (as distributed systems grow t? :n: ludc
. ts An abstraction called a pipe pcmuts the efficient transportation of bulk data differentmachinesrunning linking heterogeneous environments (whic binCS
obJ · Its and an abstraction called channel groups allows the sequencing linked iogether by different types
and mcreme ta 1resud p ocedurcs Complete details on this model are beyond the
scope of calls on pipes an .· different operating systems and groups of mac
of this book and can be found m (18). . , . . ..
of networks).
94 Rl!FBRENCES
95
4.9 FURTHER READING 13. Cheriton, D.R., '"The V Distributed S " ..
March 1988, pp. 314-333. Ystem. o/tlwACM 1'111. 31 3.
The development of distributed operating systems 14, Chin, R. S. and S. T. Chanson. "Distri . ' ' ..,_
operatinghassystems
comea operating
lo g w a y Mach
in [IJ,
decade.
th·ePlat
Some examples of distributed
ar e : Computing Surv_eys, vol. 23, no. I, March Object-Sued sy-.•
Eden [3, 23], ISIS [8], Athena [ll], V-system [13], Clouds: [16], Domain [2.SJ, Ar. JS. Xerox Corporation. ''Courier: The Remote I, PP. 91-IU.
gus [27], Andrew [32], Sprite [35], Galaxy [43], Amoeba [48], and Locus [.SO]. Standard 038112, .Xerox OPD, Dec. 1981 Procmn Call l'nlalcol." hrm s.,.,_ b,upallatt
The distributed object-based programming system is an amalgamation of concepts 16. Dasgupta. P., R. J. LeBlanc Jr 111d W p Appe
Procttedlngs of the 8th fnteni:ztlonal C fermci:e- "Ille Clouda Diloibulm Opauins s,-.•
of object-based programming (which encourages the design of a program as a set PP· 2-9. · on Dinrlbuud c-,,,,t,,1Syn«-1, Jum 1981..
of autonomous components) and distributed systems (which permits a collection of 17, Farmer, W. D., and E. E. Newball, "An EaperimeulaJ.
autonomous components to be treated as a single entity). Chin and Chanson (14] Bursty Computer Traffic," Proceedlnis of the ACM S 'baled Swildlin& Syam., Handle
·for.d, D."KA.,Ca'nMd..N, . Glasser. ''Remote Pipes andr.,....,_,or E.lllcimr·D.isrnlluled Com-
provide of Data
18. Gif Communications, 1969. on P,ob/n,u In dw
a survey of the design and implementation of distributed object-based programming
systems. An in-depth discussion on client-server computing can be found in a paper by
mur.ucauleoin D. ,r'bansadctnio.n,,s. on Computt1r r,,fflllt • no. 3 • ,•-...1, 1 7n a o- .pp. 258-213
A,
vol 6, • . . . . : .. , _ , _ ,
Sinha [42]. '· . , · ,. , . ., ism ute ,,.roting S.ystan 77t, u,0•iall Dul , .. . . .. . _ . - - ,r_·• _
.............................., , _
MoAsc,a1n9s,9
1. G
19 ,,._
Lin and Gannon [26] discuss an atomic RPC scheme which supports at-most once 20. Hutchi son, N. C., 1:-, L M. B. Abboa, 111d S. O'Malley, "RPC iD tbe •-!Canel:
se antics with the help of built-in facilities for backward error recovery. Panzieri and Evaluattn_gN w Design _Techniques," Procet1dln1s of w 12"' ACM s,,,,,_i-,..Opt!rodnl
Shrivastava [36] describe an RPC facility supporting orphan det_ection and·killing. Tay Systems Principles, Special Issue of Operating 5.,.,....,, Review ACM. vol. 23 no. n- 1989
d ,Ananda [49] have presented a survey of RPC mechanisms and a comprehensive pp. 91-101. ,- -.. ' ' .,, .,._ •
bibliography for RPC mechanisms. 21. Jul, E., H. Levy, N. Hutchinson, and A. Black, "Fme-Grained Mobility iD the Emaald S •
ACM Transactions on Computt1r Systems, vol 6, no. I, Feb. 1988. pp. Jc»-133.
22. Lampson,B., "Remote Procedure Calls," /.«tllff Now ill C/lllf/lUIB Sdlna,vol105, Sprinsa'·
REFERENCES Verlag, New York, 1981, pp. 365-370.
23. Lazowska, E.D., H. M. Levy, O. T. AJmes. M. J. Fisher, R. J. Fowler, and S. C. Valal. "'Tbe
1. Acetta, M., R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young: "Mach:
A New Kernel Foundation for UNIX Development," Proceedings of the Summer USENIX
· Architecture of the Eden System." Procet1din1s of tlw 8th ACM s,,,,posiMM Opnrllbtl S, °"
Conference, June 1986, pp. 93-113. · . • · Prlnclples, Dec. 1981, pp. 148-159.
24. LeLann, G., Motivation. Objttetive, and Charo&11ristks oJ Distributed Systaru, L.amplml Cl al.
2. Almes, G., ''Tite Impact of Language and System on Remote Procedure Call Design," Pro•
ceedings of the 6th International Conference on Distributed Computing Systems, May 1986, · (eds.), Distributed Systems-Architecture and lmp,-,,,adan. Sprinpr- New Yen. 1981.
pp. 414-421. , . .
3. Almes, G. T., A. P. Black, E. D. Lazowska, and J. D. Noe, "The Eden System:A Technical
. . . 25. · pLpe.vi1n-e9,.P. ''The DOMAIN System;" p,ocudings oftlw Znll A.CM IGOPS Wort.,/top Mokiltl °"
Distributed Systems Mbrk, Special Issue of<>penlinl Sy- ReviCW, val. 21,..,_ I, J._ 1917,
Review," IEEE Tran.sactlons on Software Engineering, vol. I I, no. I,Jan, 1985. ·
Anada, A. L., B. H. Tay, .and E. IC. Koh, "ASTRA-An Asynchronous Remote Procedure Call pp. 49.84, . ' •
Facility," Proceedings of theI Ith lntttmatlonal Co'lference on Dtnrtb111ed CompUl/ng Systems,
May 1991, pp. 172,-179. ·
26. Lin,K. J.,and J.o.oannon. "Atomic RcmOte Pn,cedure Call. IEEE .,.,.,_.,;,,,u SajtwoN °"
Engineeringv, ol. 11, no. 10, ()ct. 1985, PP· 11 1135. of Asp&," ,. .,,,.
5. Bacor, J. M., and IC. o .. Hamilton, "Distributed Computing with the RPc: The Cambridge 27L.iskov, B., D. Curtis, P. JobnSOD, and sct,eilk:rj, 987 pp. 111-122.
Approach," ACM Tran.saction on Computer System;, vol. 5, o . 3, July 1983, PP,3 -404,
6. Bal, H. E., R. van Renesse, and A. S. Tanenbaum, Implementing Distributed
Algor8it1hmsusing·
· 28. Liskov, B. and R. Scheifler, 'Qua,diaDS and:=
of the 11th ACM Symposium.on ope_ron111 Systfflllt • U . ": 5· ,;..Robull l)ulrilual
55-369, Abo iD A.CM
Remote Procedure Calls." Proceedings of the National Comp111e,- Co'lfe nce, AF/PS Programs," Distributed Procm/ng, /F/P,and wL 5 no. 3 1983 pp. 381-406.
pp. 499-505. . , · .. • 1' Trans ctlons on Programming LangllDB"!A )'llffl' 'eai1 for r-p«-
9.8'7. 2 9M. aggio,M. D.,and D. W. KrufflJD'!, Mui·-• o,,natinl S /lrMW, WIL 2',
7 Bershad, B. N., T. E. Anderson, E. D. Lazowska, and .H. M. Levy, "USbtweight Re
• Procedure Call," Proceedings of the J th ACM Sympo.mun on Operating Sys,,,,,. Princlm0le':
S ial K
Issue of
d Operating Systems Rev, w, vol. 23, no: 5, Dec:·1989, pP, 102--t13_ 'P •
Communication in a Distributed MelDOIY n..,--•
no. 2, Apr. 1991, pp. 4-21. u,tbatd. Paris, and;
-experiellCC widl pARJ'C.• • of
8. Bmnao,
R · an Cooper' "The I.SIS pro9j1ect: Re1a0l3e-x1p0e7nence wi.th a fa.ult tolerant 30. Martin, B., C. Bergan. W.B i-l2.
· . ·
tem" Opt1ratingPn>lll'Bm
Systemsmin g
Review, 19 • PP· · ,... the Winter USENIX Conft1renet1, • 9• l'P·c Distribllted Packet Swiu:hinl for L,oclll CIJIDIIUIO
9. Bs".s
irre•I 'A
·• and B. I. Nelson, "Implementing Remote Procedure Calla," ACM 1>-an.. .,_
2 I Feb 1984 pp. 39-59. ac.,.,ns on 31. Metcalfe,R. M., and D. R. Boggs. Et)lenlevol 19 no. 7 July 1976.
Computer Systems, vol. • no. n pw Madany, and V. F. Russo, "Pri,;cl , Networks" Communications af the ACM, · 'J ti
Howard. D. S. tbal. 1111d F. D.
10. Campbell, R. ".·• G. Nt.e Jo
Oriented Opera ng_SyS m sih ai n, Apr.1989.
otC.O: ,:,:,t
·gd.. Technical Repon R-89-1510, Dept. 1 O_bject
Science, 32. Morris, J.'
H., M.Saty "! ;
Smith. "Andrew: A Distnbllted Perwa-
f,avin)lllllellL- c_,,.,.;c,,n,o,u af tM
.
University of Ilho01s, Urban\Cand\v g N Ruh. "Project .Athena asa Dbtributeci COQi , ACM, vol. 29, no. 3, March J986, "1 PbD tbeSis, eoa,pua:r $dcacc. carocgie Mclloa
JI. Champine, 0. A., D. E. Gee •
System," IEEE Computer, vol.
·9 Se t. 1990, pp. 40-50. . PIiier
l oftw Base for Distributed Systems," 1££e .S: ,... , ,
9'1
33. Nelson, B. J., "Rt1mote p,o,:edMJS •ReJIOc;IMI U-CS-81-119. . .
12 Cheriton, D.R., ''The V Kerne· '!.,.....,., University, 1981. Available as Tecbnic81 in• Distributed MuluP"":"5"" •
· vol I, oo. 2, Apr. 1984, PP· 19-42. · 34. Ousterhout, J. K., "Partitioning and u..CS-S0,112. DepC of C-a,uta' $ciCDCC- ean,o:g,e-
System: Medusa." Technical RePO"