0% found this document useful (0 votes)
17 views24 pages

Module2 - Part 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views24 pages

Module2 - Part 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

CONTENTS ix

Part II Distributed Operating Systems


71
Architectures of Distributed Systems
71
4.1 Introduction
72
4.2 Motivations
73
4.3 System Architecture Types
74
4.4 Distributed Operating Systems
74
4.5 Issues in Distributed Operating Systems
4.5.1 Global Knowledge 74
75
4.5.2 Naming
75
4.5.3 Scalability
4.5.4 Compatibility 76
4. .5 Process Synchronization 76
77
4.5.6 Resource Management
78
4.5.7 Security
4.5.8 Structuring 78
4.5.9 · Client-server Computing Model 80
4.6 Communication Networks 80
4.6.1 Wide Area Networks 80
4.6.2 Local Area Networks 83
4.7 Communication Primitives 86
4.7.1 The Message Passing Model 86
4.7.2 Remote Procedure Calls 87
4.7.3 Design Issues in RPC 88
4.8 Summary 93
4.9 Further Reading 94
References· 94
Keterences

Distributed Mutual Exclusion 121


6.1 Introduction 121
6.2 The Classification of Mutual Exclusion Algorithms 122
X CONTENTS

6.3 Preliminaries 122


6.3.1 Requirements of Mutual Exclusion Algorithms - 123
6.3·2 How to Measure Performance 123
6.4
A Simple Solution to Distributed Mutual Exclusion 125
6.5
6.6 Non-Token-Based Algorithms 125
6.7 Lamport's Algorithm 125
6.8 The Ricart-Agrawala Algorithm 128
Maekawa's Algorithm 131
6.9 A Generalized Non-Token-Based Algorithm 133
6.9.l Information Structures .133
6.9.2 The Generalized Algorithm 134
6.9.3 Static versus Dynamic Information Structures 137
6.10 Token-Based Algorithms 137
6.11 S zuki-Kasami's Broadcast Algorithm 137
6.12 Smghal's Heuristic Algorithm 139
6.13 Raymond's Tree-Based Algorithm 142
6.14 A Comparative Performance Analysis , . 145
6.14.1 Response Time 145
6.14.2 Synchronization Delay. 146
6.14.3 Message Traffic 146
6.14.4 Universal Performance Bounds 147
6.15 Summary 147
6.16 Further Reading 148
Problems 148
References 149

stributed Deadlock Deiection 151


7.1 Introduction _ . 151
· 7.2 Preliminaries 151
7.2.1 The System Model
7.2.2 Resource versus Communication Deadlocks
151
152
7.2.3 A Graph-Theoretic Model
152
7.3 ·0eadlock Handling Strategies in Distributed Systems
153
7.3.1 Deadlock Prevention ·· -. '
153
7.3.2 Deadlock Avoidance 153
7.3.3 Deadlock Detection 154
7.4 Issues in Deadlock Detection and Resolution 154
7 .5 Control Organizations for Distributed Deadlock Detection 155
7.5.1 Centralized Control · ·, 155
7.5.2 Distributed Contro·l 155
7.5.3 Hierarchical Control 156
7 .6 Centralized Deadlock-Detection Algorithms 156
.76.1 ThCompletely Centralized Algorithm 156
7.6.2 The Ho-Ramamoorthy Algorithms 157
7.7 Distributed Deadlock Detection Algorithms 158
7.7.1 A Path-Pushing Algorithm 159
7.7.2 An Edge-Chasing Algorithm 160
7.7.3 A Diffusion Computation Based Algorithm 163
7.7.4 , A Global State Detection Based Algorithm, 164
CONTENTS xi
7.8 Hierarchical Deadlock Detection Algorithms 170
7.8.1 The Menasce-Muntz Algorithm 170
7.8.2 The Ho-Ramamoorthy Algorithm 171
7.9 Perspective 171
7.10 Summary 174
7.11 Further Reading 175
Problems 176
References 176
/
' ., Agreement Protocols 178
8.1
Introduction 178
8.2
The System Model 179
8.2.1
Synchronous versus Asyncl\ronous Computations 179
8.2.2
Model of Processor Failures 180
8.2.3
Authenticated versus Non-Authenticated Messages 180
8.2.4
Performance Aspects 180
8.3
A Classification of Agreement Problems 181
8.3.1 The Byzantine Agreement Problem 181
8.3.2 The Consensus Problem 182
8.3.3 The Interactive Consistency Problem 182
8.3.4 Relations Among the Agreement Problems 182
8.4 Solutions to the Byzantine Agreement Problem 183
8.4.1 The Upper Bouna on the Number of Faulty Processors 183
8.4.2 An Impossibility Result 184
8.4.3 Lamport-Shpstak-Pease Algorithm 185
8.4.4 Dolev et al.'s Algorithm 187
8.5 Applications of Agreement Algorithms 189
8.5.1 Fault-Tolerant Clock Synchronization 190
8.5.2 Atomic Commit in DDBS 192
8.6 Summary 192
8.7 Further Reading 194
Problems 194
References 194
CHAPTER

4
ARCHITECTURES
OF DISTRIBUTED
SYSTEMS

4.1 INTRODUCTION
The term distributed system is used to describe a system with the following charac teristics:
it consists of several com uters that do no are a m m9zy. or a clock·_the com uters
co.IIUI1uJlicate with each other by _exchanging messages over a communica tion network-
(see Fig. 4.1); and each computer has its own memo an runs its own operating system
(see Sec. 4.5.8). The resources owned and controlled by a computer are said to be
it, while the resources owned and controlled by other computers and those that
can only be accessed through the network are said to be remote. Typically, accessing r
esimsore expensive tha11 accessing local resources because of th
communication,delays tharoccur tn the network and the CPU overhead incurred to process
communication ,IW)tocols (see Sec. 4.6). Based on the context, the terms com puter, node,
host, site, machine, processor, and workstation are used interchangeably to denote a
computer throughout this book. ·
The main purpose of this chapter is to serve as an introduction to Part II (Dis
tributed Operating Systems) and Part m (Distributed Resource Management). This
chapter presents the issues that arise in the design of a distributed operating system. In
addi'tion, this chapter discusses how communication is handled between computers and
how programs communicate to perform distributed computations.

71
72 ADVANCl!D CONCEPTS IN OPERATING SYSTEMS

ARCHrIECJlJRES Of' DlffllllllTTED SYSffMS 73

e , Resourcesharing. Since a computer can request a service from another computer


IMc...Iy by senclmg an appropriate request to it over the communication network. hardware and
software resources can be shared among computers. For example,a printer,a compiler,
a text processor, or a database at a computer can be shared with remote computen.
' E rfonnance. A distributed computing system is capable of providing
rapid response time and higher s stem throu hput. This ability is mainly due to the
Communication fact a many tasks can be concurrently executed at different computers. Moreover,
Network distributed systems can employ a load distributing technique to improve response
time. In load distributing, tasks at heavily loaded computers are tranSferred to lightly
loaded computers, thereby reducing the time tasks wait before receiving service.
Improved reliability aod...availabllity. A distributed computing system provides
improved reliability and availability because a few components of the system can fail
@)@ e without affecting.the availability of the rest of the system. Also, through the replication
of data (e.g., files and directories) and services, distributed systems can be made fault

B
tolerant. Services are processes that provide functionality (e.g., a file service provides
IM•...yl file'system management; a mail service provides an electronic mail facility).
FIGURE4.1 - Modular expandability. Distributed computing systems are inherently amenable
'-
Architecture of a distributed system. , to modularexpans1on6ecause new hardware and software resources can be easily added
without replacing the existing resources.

4.3 SYSTEM ARCIDTECTURE TYPES


4.2 MOTIVATIONS , Tanenbaum and Renesse [47] classified distributed sys into three broad categories,
namely, the minicomputer model, the workstation model, and the emcessac pool rnodeL
The impetus behind the development of distributed systems was the availability of ;
po rt'utmicmP-rocessors at low cost as wel!_as significant advances in communicapon
ology. The awilability of powerful yet cheap microprocessors led to the devel
· , In the minicomputer model, the distributed system consists of several minicomput
ers (e.g., VAXs). Each computer s pports multiple users and pt"Qvides access to remote
resources. The ratio of the number of processors to the number of users is normally
v _ r. <'
opment of powerful workstations that satisfy a single user's needs. These powerful less than one.
stand-alone workstations satisfy user need by providing such things as bit-mapped dis In the workstation model, the distributed system consists of a cumber of work
plays and visual interfaces, which traditional time-sharing mainframe systems do not stations (up to several thousand). Each user has a workstation at his disposal. where in ,,.,:,,,.,.,..-
. support. . , . ' ·· · - ·· '. · · , general, all of the user's work is performed. With the help of a disfim11!T]if,: 531i:tem, a
, Whena group of people work together, there is generally a 'n to co user can acces re s of the l 5>n f_t_hedata..o(.!)f e uSC!_'s ,1_:::. \
with eacn other. tub..ar.e..data...and to share expensive resources (sucnisrugh qual workstation. The ratio of the number of processors to the number of users JS \.J
ity pnnters. disk drives, etc.). This.iiiiercoimectingcomputers and resources. normally one. 1be
Designing such systems became feasible with the availability of cheap and powerful . workstations are typically equipped with a powerful processor, memory, a bit-mapped
uires
microprocessors, and advances in communication technology. . : . , · · ·.·: '· display, and in some cases a math co-processor and local disk storage. Athena [11) and
J\!!grew [32] are examples of this workstation model. -
Whena few powerful workstations are interconnected and can communicate with , • In the processor pool model, the ratio of the number of processors to the number ]:. '}- \
each other, the total computing pow r available in such a system can be enormous. of users is normally greater than one. This model attempts..._to allO!;&te one or more \.)
Sucha system generally only costs tens of thousands of dollars. On the other hand, processors according to a user's needs. Once the processors assigned to a user complete
if one tries to obtain a single -machine with the computing power eq al to that of a . their tasks, they retunl to the pool and aw8't a new assigomcru::-Amoeba [48) is an
network of workstations, the cost can be as high as a few million dollars."Hence, the experimental system that is acombination of the workstation and the processor pool
main advantage of distributed systems is that they have isive price/J o e models. In Amoeba, each user has a workstation where the user performs tasks that
e ov r i:nont.. stems (47]. , ... · : require a quick interactive response (such as editing). Io addition to the workstation
er s1gruficant advantages of distributed systems over traditional time-shanng
systems are as follows: , '·
users have access to a pool ?f processors or ng applications that require
greate;
speed (such as parallel algonthms performing s1gruficant numerical computations)..
·l ;
• I ;f
- .,
74 ADVANCED CONCEPTS IN OPERATING SYS'lcMS
AIICH1TEC1tJRES OP DIS11UIIUT1:D SYS'Tl!MS 75
4.4 DISTRIBUTED OPERATING SYSTEMS Chapter 5 describes two logical clock schemes which allow the ordering of events
An operating system isa program that manages the resources in distributed systems despite the absence of a global clock. It also presents a
ofa. omputer sys
mechanism
and provides users with a friendly interface to the system. A d1stnbuted operating employed to obtain a global state in distributed systems.
system extends the concepts of resource management and user friendly interface for Chapter 8 describes several algorithms to achieve consensu. in distributed
shared memory computers a step further, encompassing a distributedc_om uting systems in the absence of global knowledge. For example. a token (a special control
system consisting of several autonomous computers connected by a communication message) controls access to shared data in many mutual exclusion algorithms (see
network. Chap. 6 ), and a token can also be used to control access to the communication network
A distributed operating system appears to its users as a centralized operating (see Sec. 4.6.2). If the token is lost due to a communication or computer failure, then
system for a single machine, but it runs on multiple-independent computers. An identical only one computer should generate a new token and only one token should be
copy of the operating system (or a different operating system providing similar services) generated. This requires that all the computers in the system arrive at the consensus
may run at every computer. On the other hand, some computers in the system that serve a that the token is indeed Jost and consequently decide which computer should generate
special purpose might run an extended version of the_operating system. The key concept a new token. Note that due to unpredictable communication delays, a token may be in
is transparency. In other words, the use of, multiple processors and the accessing of transit, and this may only appear to be lost. Thus, in the absence of global knowledge
remote data should be in•,-isible (transparent) to the user. The user views the system as about the state of the computers and their communication links, arriving at a consensus
a vinual uniprocessor, and not as a collection of distinct machines (47]. For instance, a in distributed systems is a significant challenge.
user simply submits a job to the distributed operating system through a coinputer. The
distributed operating system performs distributed execution of the job. The user does not
know ?n what computers the job was executed, on what computers the files needed for 4.5.2 Naming
execution were sto . orhow the communication and synchronization among Names are used to refer to objects. Objects that can be named in computer systems
different computers were earned out. •,,
include computers, printers, services, files, and users. An example of a service is a
name service. A name service maps a logical name into a physical address by og
4.5 ISSUES IN DISTRIBUTED OPERATING 'svSTEMS use of a table lookup, an algorithm, or through a combination of the two [11). In
the implementation of a table lookup, tables (also known as directories) that store
Some important issues that arise in the design of a distributed operating system include
the unavailability of up-to-date global knowledge, naming, scalability, compatibility, names and their physical addresses are used for mappmg names to their addresses.
process synchronization, resource management, security, and structuring of the operating In distributed systems, the directories may be replicated and stored at many
system. These issues are discussed next. different locations to overcome a single point of failure as well as to increase the
availability of the name service. The two main drawbacks of replication are: (I) It
requires more storage capacity, and (2) synchronization requirements oee_!i be
4.5.1 Global Knowledge
met tones
In the case of shared memory computer systems, the up-to-date state of all the processes updatea, as the directory at each location would need an updated copy. On the
and resources, in other words, the global (entire) state of the system, is completely other hand, directories may be partitioned to overcome the drawbacks of replicated
and accurately known. Hence; the potentially problematic issues that arise in the directories. The problem with partitioned directories is the difficulty encountered when
design of these systems are well understood and efficient solutions lo them exist. In attempting td find the partition containing a name and address of interesL This may
distributed computing systems, these same issues take on new dimensions and their be handled through yet another directory or through a broadcast search (11]. Note,
solutions become much more complex for the following reasons. Due to the however, that a partitioned directory is ,Jess reliable than a replicated directory.
unavailability of If an algorithm is used for mapping, the algorithm would depend upon the
a global memory anda global clock, and due to unpredictable message delays, it is structure of the names. Several examples of name resolving algorithms can be found in
practically impossible for a computer to collect up-to-date information about the Sec. 9.4.1.
global state of the distributed computing system (24). Therefore, a fundamental Another issue in naming is the method of naming objects such that an object can
problem in the design ofa distributed operating system is to determine efficient be located irrespective of its logical name. This topic is treated in Sec. 9.4.1.
techniques to implement decentralized system wide control, where a computer does
not know the current and complete status of the global state. Another significant
problem, given the absence ofa global clock, is the question of how to order all the
4.S.3_S':31ability .
events that occur at different times t different computers pre ent in the_system. Note Systems generally grow with time. The techniques used in designing a system should
that the temporalordering•of events not result in system unavailability or degraded performance when growth occurs. For
isa fun entaI conceptm the design and development of distributed systems (e.g., example, broadcast based protocols work well for small systems (systems having a small
an Operating system may schedule jobs based on their time of arrival).· . , , • number of computers) but not for large systems. Consider a distributed file system that
76 ADVANCED CONCEPTS IN OPERATING SYS'JEMS

locates files by broadcasting queries. Under s file system: every omputer in the ARCHrrECTUREs OP DIS11UBtnED SYS'T1!MS 77
distributed system is subjected to message handling overhead, urespectlve of whether it Chapter 6 describes'several algorithms for achieving mutual exclusion and compares
has the requested file or not. As the number of users increase and the system gets larger their performance.
the number of file location queries will increase and the overhead will grow larger In distributed sys ms, processes can request resources (local or remote) and re
well, hurting the performance of every computer: In general, any design approach in lease resources in any order that may not be known a priori. JI the sequence of the
which the requirement of a scarce resource (such as storage, communication bandwidth allocation of resources to processes is not controlled in such environments, deadlocks
and manpower) increases linearly with the number of computers in the system, is likel; may occur. It is important that deadlocks are detected and resolved as soon as possi
to be too costly to implement. For example, a design requiring that information ble, otherwise, system performance can degrade severely. Chapter 7 discusses several
regarding deadlock handling,strategies and describes several deadlock detection techniques for
the system's configuration or directories be stored at every computer is not suitabl
even for systems of moderate size [I I]. .. . . . e,
' i,, I
distributed systems. ·
' f ,.. .; ,· -;, I
1, -1;•

4.5.4 Compatibility - ·, . 4.5.6 . Resource Management


C 'b'lity refers to the notion of interoperability among the resources in a system. Resource management in distributed operating systems is concerned with making both
ThoemtpharteleI different levels of compatibility that exi•st m• syts ems are the bino.rv local and remote resources available to users in an effective manner. Users of a dis
di stn'b uted tributed operating system should be able to access remote resources as easily as they
level, the execution lr:J!J!.l,_and.JJit;,.JmltO.CQ]_level [I I].
--Iiiasystemthat is compatible at the binary level, all pr°':esso s execute the same can access local resources. In other words, the specific location of resources should
binary instruction repertoire, even though the processors. ay ffer be hidden from the users: The resources of a distributed system are made available
mperforman e. d in input-output. The Emerald distributed system [ I bits bm_ to users in the following ways: data migration, computation migration, and distributed
level_compatib1hty. A significant advantage of binary level compatib1hty 1s that 1t scheduling [41].
1s easier for system development, as the code for many functions provided by the
system programs directly depend on the underlying machine level instructions. On •• • • • i.,

the other hand, the distributed system cannot include computers with different DATA MIGRATION.. Inthe process of data migration, data is brought to the
architectures from the same or different vendors. Because of this major restnction, location of the computatioQ that needs access to it by the distribw&ed operating system.
binary compatibility is rarely supported in large distributed systems. · The data in question may be a file (stored locally o·rremotely) or contents of a
,•:_· physical memory (local or of another computer). If a computation updates a set of
Execution level compatibility is said to exist in a distributed system if the same data, the original location (remote or local) may have to be similarly updated. _
source code can be compiled and executed properly on any computer in the system If the data accessed is a file, then the computation's data access request is brought
[ I I]. Both Andrew [3;!] and Athena [ I I] systems support execution level under the purview of the distributed file system by the distributed operating system. A
compatibility. Protocol level compatibility is the least restrictive form of distributed file system is the component of a distributed operating system that imple
ments a common file system available to the autonomous computers in the system. The
compatibility. It achieves interoperability by requiring all system components to
primary goal of a distributed file system is to provide the same functional capability to
support a common set of proto cols. A significant advantage of protocol level
access files regardless of their location in the network as that provided by a file system
compatibility is that individual computers can run different operating systems while not
of a time-sharing mainframe operating system that only accesses files residing at one
sacrificing their interoperability. For ex ample,a distributed system supporting
location. Ideally, the user doesn't need to be aware of the location of files to access
protocol level compatibility employ-scommon protocols for essential system services
them. This property of a distributed file system is known as network transparency. Im
such as file access (for example see Sun NFS,
Sec. 9.5.1), naming, and au ntication. portant issues in the design of a distributed file system, common mechanisms employed
in the building of distributed file systems, and several case studies of distributed file
systems are presented in Chap. 9.
4.5..S Process SynchronJzafion· · · ·_ If, on the other hand, the data accessed is in the physical memory of another
computer, then a computation's·data access request is brought under the purview of
Th sy? hronization of processes in distributed systems is. diflic l-t au e ofthe distributed shared memory management by the distributed operating system. A dis
un availabihty f sh memory. A distributed operating system has to synchronize pro tributed shared memory provides a virtual address space that is shared among all the
cesses runrung at d1fferen_t computers when they try to concurrently accessa shared computers in a distributed system. A distributed shared memory is an implementation
resource, such asa file 1rectory. For correctness, it is necessary that the shared re
:C°:- be accessed bya s ngle process at a time. This problem is known as the mutual
din :troblem, wherein conc nt access to a shared resource by several uncoor-
of the shared memory concept in distributed systems that have no physically shared
memory. The major issues in distributed shared memory implementation concern the

er requests must be senahzed to secure the integrity of the shared


resource
• -· -- , . ._._. ·=
'18 ADVANCED C0NcEns IN Ol'EJW1No SYS'JEMS

maintenance of consistency of the shared data and the minimization of delays in


access of data. Distributed shared memory management is discussed in Chap. JO.
AIICHfflC1\JUS ar DISRavrB> rmua 79
THEMONOLITHIC KERNEL The traditional method of structuring operating sys
COMPUTATION MIGRATION. Io computation migration. the computation llli. tems is to construct them as one big monolithic kernel. This kernel would c:onsiU of
to another location. Migrating computation may be efficient under certain circums all the services provided by the operating system. However, in the case of disttibuled
For example, when information is needed concerning a remote tile directory, it is , systems that often consist of dislcless workstations. workstations with local disk
efficient to senda message (i.e., a computation) r uestiog th_e n essary •tonae. multiprocessor computers suitable for inten.s.ive numerical computations. etc..
it seems
and receive the information back, rather than haVJng the entire directory ij wasteful for every computer to nm a huge monolithic operating system when not
at h e n finding the necessary information locally. In distributed scheduling( effZY computer would use every service provided by the operating system. For
a example. • diskless workstation would not make use of the storage ated operalioos
next), one computer may require another computer's status (such as its load level). It· provided by the file system. This concern has led to the development of the collective
is more efficient and safe to find this information at the remote computer and BelJd kernel structure.
the required information back, rather than o transfer the _private data structure of
lbl. THE COLLECTIVE KERNEL STRUCTURE. In the collective kernel sttucture. an
operating system at the remote computer to the requesting computer so that it cap operating system is structured as a collection of processes that are largdy indepeodeat
obtain the necessary information. The remote procedure call (RPC) mechanism q; of each other [52J.
been widely used for computation migration and for providing communication between In collective kernel structuring, the operating system services (e.g.• discributed
c mp ters. The RPC mechanism is dis ussed in Sec. 4..2. Note that in compu memory management. distributed file systems, distributed scheduling. name services.
migration, onlya part of the computation of a process 1s normally carried out on· the RPC facility, time management. etc.) are implemented as independent processes.
different machine.
DISTRIBUTED SCHEDULING. In distributed scheduling,
. I .; I The nucleus of the operating system, also referred to as the microl:enwl. supports die
processes are transferred
from one computer to another by the distributed operating system. That is,a PJ'OCCSf
may be executed ata computer different from where it originated. Process relocatloa
may be desirable if the computer where a process originated is overloaded or it does'
:.1:.:::::. of .. - ,,._- ho., V,riou,•pa,ts.;, lhc Opemfug.,..;..
not have the necessary resources (such as a math co-processor) required by the pit#
cess. Distributed scheduling is responsible for judiciously and transparently distributing
processes amongst computers such that overall performance ismaximized.·Improved
performance is mainly due to the enhanced utilization of computers through the concur
rent execution of processes. Various issues in distributed scheduling, several distribut.ed
Cs chhaepd. u 1l i1n. g algorithms, and case studies of several implementations are discussed
in

4.5.7 Security · : ·

. i bat
The security ofa system is the responsibility of its operating system. '1\vo i su e s
must be considered in the design of security for computer systems are authenticaliop
and authorization [11}. Authentication is the process of guaranteeing that an entity iB
what it claims to be. Authorization is the process of deciding what privileges an entity
has and making only these privileges available. The security of computer systems and
Various protection mechanisms to achieve security are discussed in Chaps. 14 and 15.
4.5.8 Structuring
interaction (through messages) between the processes providing the system services.
In addition, the microkernel provides services that are typically essential to every
computer in a distributed system, such as task management (e.g., focal scheduling of
tasks), processor management. virtual memory management, etc. The microkemel runs
on all the computers in a distributed system. 1be other (all or a few) may
or may not
·run at a computer depending on the need and the hardware available at that
computer. The collective kernel structure can readily make use of a very
helpful desip technique known as policy and -chanism separation (S2J. By
separating policies and mechanisms in an implementation. one can change any
given policy without changing
the underlying mechanisms. ·
Mach [IJ, V-kemel [12J, Chorus [39J, and Galaxy (43) are examples of operaoq
systems that use the collective kernel structuring technique.

OBJECT ORIENTED OPERATING SYSTEM. While the various services provided


by an operating system can be realized as a set of processes (this model bas been
referred to as process-model by Goscinsld (19)), another popular approach is to
implement the services as objects. An operating system that is sttuctured using objects
in this manner is known as an object-oriented operating system.
In an object-oriented operating system. the system services are implemented
u a collection of objects. Each object encapsulates a data structure and defines a set of
operations on that data structure. Each object is given a type that designates the prop
erties of the object: process. directory, file, etc. By performing operations defined
on an object. the data encapsulated can be accessed and modified. This model is
amenable to the collective structuring and policy and mechanism separation
techniques. Exam ples of object-oriented operating systems are Eden (3J. Choices
(10), x-kemel (20). Medusa (34}, Clouds (38). Amoeba (48), and Muse (52).
80 ADVANCED CONCEPTS IN OPERATING SYSTEMS

4.5.9 Client-Server Computing Model : ... .


In the client-server model, processes are categorized as ser:vers arid clients: Servers AllOlm;cnJUS CW 01ST11Ja111'E> SYSTDG 81
the processes that provide services. Processes that need,services are referred to as cli , .. telephone lines, satellites, microwave links, or any combination of the lbn:e. Most
In the client-server model, a client process needing a service (e.g., reading 'data rr:nrai
file) sendsa message to the server and waits for a reply message.' The server pr in• WANs employ a technique known as point-to-point or store-and-forward where data
is transferred between computers through a series of switches.
after perfonning the requested task, sends the result in the form of a reply mess '
the client process. ,Note that servers merely respond to the requests of the clients
do not typically initiate conversations with clients. In systems with multiple se
1 Switches are special purpose computers primarily responsible for routing data
from one point to another through an appropriate path while avoiding network
congestion. Note that a path or a ponion of a path may become congested due to heavy
e11n11" data communication through that path, and/or to limited bandwidth (1be bandwidth of •
it is desirable that when providing services to clients, the locations andconversati..li'f communication link refers to the amount of data that can be communicaled over lhe
among the servers are transparent to the clients. Clients typically make use ofa cacl link in a given amount of time). The data being communicated in a WAN can be lost
for any of the following reasons: switch crashes. communication link failure, limited
to minimize the frequency of sending data requests to the servers. Systems structurecf
on the client-server model can easily adapt the collective kernel structuring techniqu-., buffer capacity at switches, transmission error, etc.
I, . • ' i 1 .I I : ' , ,'

4.6 COMMUNICATION NETWORKS ::, 1 •• i PACKET SWITCHING VERSUS CIRCUIT SWITCHING. The communication net
work can be utilized in one of the following two modes, namely, circuit switching or
This Section introduces the communication aspects of distributed systems, All the co packet switching (46]. In circuit switching, a dedicated path is established between two
puters ina distributed system are interconnected through a computer communication devices wishing to communicate, and the path remains intact for the entire duration
network. A computer can exchange messages with other computers and access data in which the two devices communicate. The telephone system uses circuit switching.
stored at another computer through this network. Communication networks are broadly When one subscriber dials another subscriber's number or when one connects bis ter•
classified as Wide Area Networks and ·Local Area Networks., _, , ., •,·.. •.; -,• .,'•• ,_- minal to a computer by dialing the computer's number, a dedicated path between the
two points is established through various switches. The path is broken when one side
., r,, I 1·.,_1! terminates the conversation.
4.6.1 Wide Area Networks ' · ·_,···, • ·, '·:•-··:,., ,.·_ ··· , In packet switching, a connection is established between the source device
J ;, _, <;.. (termi nal or computer) and its nearest switch (46]. The data or message to be
Wide area networks (WANs) are employed·to interconnect various devices (such·aa communicated is broken down into smaller units called packets (several hundred to
computers and terminals) spread over a wide geographic area that may cover different several thousands bytes in length), with each packet containing the address of the
cities, states, and countries. WANs have also been referred to as Long HaulNetworks destination. The pack ets are then sent to the nearest switch. These packets are
The-communication facility in a WAN consists of switches that are interconnected routed from one switch to another switch in the communication network until they
by communication links. (See Fig. 4.2.) These links may. be established arrive at the switch connected to the destination device, at which point the data is
througlt; delivered to the destination. Thus. in packet switching, a communication path
I, ,t·_ J 1
(switches and links) is not reserved by any two devices wishing to communicate, but
rather is dynamically shared among many devices on a demand basis. The achievable
1

, •j '.>-; '_.. ,II utilization of the communication network is higher under packet switching compared to
circuit switching because the network can be shared by many devices. Also, parallel
Computer; transmission, and hence reduction in data transmission time, is possible because
packets forming one message may travel along different paths. Moreover. data
transmission in computer networks is sporadic ra1her than continuous. Thus, most
\. computer networks use packet switching to pennil better utilization of the network.

.........
r One disadvantage of packet switching, however. is that the breaking of a message
into packets and assembling them back at the destination curies some cost.
··········..•....

THE ISO OSI REFERENCE MODEL. Generally. WANs must interconnect hetero
Compu... geneous types of equipment (e.g., computers, terminals, printers). These types of equip
ftGURE 4.2, ..--;; 1 r._ ; ,,, ... J
ment may differ from each other in their speed, word length. information representation,
A point-to-point network. or in many ther criteria. To communicate in such a heterogeneous environment, the
82 ADVANCED CONCEPTS IN OPERATING SYSTEMS

ISO OSI reference modeJt provides a framework_for communi ation protocols[S ] AllClffll!CIUIID Of DIST1UBUTED SYffl!MS 8J
organizes the protocols as seven la ers and pec fies the func ons of each layer,3 ·
organization of these seven layers 1s shown m Fig. 4.3. In this model, user pro AN OVER EW OF THE O OSI LAYERS. The physical layer's function is
run in the application layer. · ' · ' to allow a device to send raw bit streams of data over the communication network.
When an application program running at comput_er A wants to send a message tjo It is no concerned with transmission errors, how bits are organized, or what they
mean. This layer should, however, be aware of and take care of the communication
application program at computer B, the following cham f events occur. The applicati: • network implementation details such as circuit/packet switching, the type of
at computer A passes the message down to the presentation layer (on computer Aitself)l network (i.e., tel ph ne system, digital transmission, etc.), the voltage levels used for
The presentation layer transforms the data (explained late_r)a, dds a header containi · representingO and
some control information to the message, and passes the resulting message to the Sessi , 1 bits, the number of pins and their assignments in network connectors, etc.
layer. The session layer adds its owner header to the message and passes the·resu}tj The data-link layer is responsible for recovering from transmission errors and
message to the next layer. This continues on until the message reaches the physi, for flow control. Aow control takes care of any disparity between the speeds at which
layer. The physical layer on computer A transmits the raw data bits to the physical la· the bits can be sent and received. The data-link layer makes the communication
running at computer B. Note that the message is routed to compute B through var( facility
provided by the physical layer reliable.
intermediate switches in the communication network. Once the message is receivedOllf The network layer is trtainly responsible for routing and congestion conttOI. It
computer B, the protocol at each layer strips the, header added by its counteipan breaksa message into packets and decides which outgoing line will carry the packets
co puter A, performs the necessary processing identified by the header, and toward their destination.
passes message on to the next layer. This continues until the essage reaches its The transport layer's primary function is to hide all the details of the communica-
destinatioll-if process in the application layer at computer B. . . . - .. tion network from the layers above. It provides a network independent device-to-
. · device ( or end-to-end) communication: This layer can provide the ability to the
The ISO OSI model does not_specify how the layers should be implem i;;{ network to in fonna host that the network has crashed or has lost certain packets.
Every layer is aware only of the protocols and header formats of its Thus, the transpon layer can provide improved reliability if necessary, ., · :
counterpa_rt.It not understand the header or the protocols used by the other ., ,The_sessiof} layer is responsible for establishing and maintaininga connection,
known as·a session, between two processes. Establishing a connection may involve
layers. This makes layer independent, so any layer can change its protocol without
the authentication of tile communicating processes and the selection of the right
affecting other layers as long as the interfaces between the layers remain transport service. In addition, the session layer may keep track of the outstanding
unchanged.··, . .,,, .'.· , 1 requests and replies from processes and order them iil such a manner to simplify the
, We next give a brief overview of the ISO OSI model [46). Complete infonnali0_1l design of user
about the model can be found in (46, 53). , .· . , , ; ,' ., , . , ·· programs. . .. , . , .· : : . ·. •
.. ..
• t •• t ,. • . ' '.. ' .
,. The presentation layer is the interface between a·user program and the rest of
the·network. It provides data transformation utilities to take care of the differences in
:·..:: -1 ,, J.1
representing information at the source and at the destination. In addition, the presentation
.
t, • , ·11 ,
1,
layer may perform data compression, encryption, and conversion to and from network
Application Presentation Session Application standards for terminals and files. '
Transport --;;t '•·;J
Presentation
, se_s
The application layer's function is to provide a facility for the user
Networlc 10_use the ISO OSpI rotocols. Its content is left 10 the users. (For
· Datalink , Session r -: examples,pec_ific_applicanons
Physical cation such as airline and banking may have their own standards for the apphcanon layer.)
Transport'·
,J• i
l
Network. Datalink
. Network
i
:.:• 'lj _I
Network • 11f '·: ! 1i I

4.6.2 Local Area Networks


I A,·., local area netw.or,'k (LAN) is a ,c.ommunication network that interconnectsa variety
Physical Datalink '
Datalink 1· 'i .,
Physical of data communication devices within a small geographic area (45]. In our context.
Physical

-t- the data communicating devices typically include computers, terminals, and peripheral
it:''I
i-;...
iI : devices. Some of the key characteristics of LANs are:
Host t, i Switch , • Switch .. H t B.t: I' ••

:........................................................................................................................... J
• High data transmission rates (IO megabits per second toI 00 megabits per second).
FIGURE 4.3 • The geographic scope of LANs is small, generally confined toa single building or
The ISO OSI reference model. '· ' . ··; .
l· c· . ,, : ;\·,,. perhaps several buildings (such • Low transmission error rate.
·,, as a college campus).
-. .II·:• fl '1r.• ,., ,f •. ,•, j;1

'Wemlli.i Slllldarda zation's Refere Model of Open Systems lntcn:onn tion. '· '
w
84 ADVANCED CONCEPTS IN OPERATING SYSTEMS

AIICHITECT1/RES OF DISTIUBUTED SYS1BG 85


many LANs. In large systems, the backbone is normally a high speed medium with •
BUS
bandwidth of 100 megabits per second.
The_token bus protocol. An alternative to the CSMA/CD protocol is the token
bus techmque [ 5]. I this technique, devices physically organized in a bus/tree topol
ogy form log_1ca/ nng, nd each device knows the identity of the devices preceding
and followmg It on the nng. Access to the bus is controlled through a token (a con
trol pac et). Th device holding the token is allowed to transmit, poll other devices.
and receive rephes from other devices. The devices not holding the token can receive
messages and can only respond to polls or requests for acknowledgment A device is
TREE
FIGURE 4.4 allowed to keep the token for a specific amount of duration, after which it has to send
Network topologies. the token to the device following it on the logical ring.
RING TOPOLOGY. Th main alternative to the bus/tree topology is the ring topology
Widely used network topologies for LANs ar bus, ri g, '.111d tree ( ee (see Fig. 4.4). Note that, in the token bus protocol, the ring is logical, whereas in the
Fig.4.4). ring topology, the ring is physical. In this topology, data is transmitted point-to-poioL
The communication media can be coaxial cable, twisted pair wire, or Optical fiber. } At each point, the address on the packet is copied and checked to see if the packet
is meant for the device connected at that point. If the address of the device and the
BUS/fREE TOPOLOGY. In the bus topology, the communication devices transmit address in the packet match, the rest of the packet is copied, otherwise, the entire
data in the form of packets where each packet contain the address of e de packet is retransmitted to the next device on the ring.
tination anda message.A packet propagates throughout the medium, th_e bus; and 1s
,, • t •

, ,The token ring protocol. A widely used access controf protocol to cooaol access
available to all the other devices, but is received only by the addressed device [45].A to the ring is the token ring technique (17, 45]. Under this technique, a token circulates
tree topology LAN is obtained by interconnecting many bus topology LANs to a around the ring. The token is labeled free when no device is transmitting. When a
common bus (see Fig.4.4). Bus topology LANs can be viewed as branches of_a tree. device wishes to transmit, it waits for the token to arrive. labels the token as busy on
_I nt h e bustopology, all the devices connected to the LAN are allowed to transmit at arrival, and retransmits the token. ,Immediately following the release of the token. the
any time.However, as the devices sharea common data path (i.e., the bus), a protocol device transmits data. The transmitting device will mark the token as free when the
to control the access to the bus isnecessary. We next discuss two protocols to busy token returns to the device and the device has completed its transmission. 1be
control access to the bus.·' main advantage of the token ring protocol is that it is not sensitive to the load on the
,. '·,:·•I,
network; the entire bandwidth of the medium can be utilized. The major disadvantage
The CSMA/CD protocol. The most commonly used access control protocol of the token ring protocol is its complexity. The token has to be maintained error-free.
for bus topology is CSMA/CD (Carrier Sense Multiple Access with Collision De If the token is lost, care must be taken to generate only one token. The maintenance of
tection) [45]. Under thisprotocol, a device wishing to transmit listens to the medium the token may require a separate process to monitor iL
to determine whether another transmission is in progress. If so, the device waits for The slotted ring protocol. The slotted ring is another technique used to conaol
a random amount of time before trying again. If no other transmission is in progress, access to a ring network [37. 451. In this· technique, a number of fixed length slots
the device starts transmitting data and continues to listen to the medium while it is continuously circulate around the 1ing. The ring is like a conveyor belL A device
transmitting. If another device starts transmitting simuhaneously, the two transmissions wishing to transmit data waits for a slot marked empty to arrive, marks it fall. and
ollide. Ifa collisi_on is detected, a short jamming signal is transmitted over the bus to inserts the destination's address and the data into the slot as it goes by. 1be device is
inform all the devices that there has been a collision. The devices will then wait fora not allowed 10 retransmit again until this slot returns to the device. at which time it is
rand m amou_ntof. tim be o_re attem tin to transmitagain. The principal marked as empty by the device. As each device knows how many slots are cin:ulating
advantage of this !'rotocoJ 1s its s11 1phc1ty. Its pnnc1pal disadvantage is that undera
around the ring, it can determine the slot it had marked previously. After the newly
emptied slot continues on, the device is again free to transmit data. A few bits are
heavy load, contention for the bus nses and performance degrades because of reserved in each slot so that the result of the transmission (accepted, busy. or rejected)
frequentcollisions. Thusa, bus us_ing the CSMA/CD protocol cannot supporta large
number of devices per bus. Ethernet 1s an example of a LAN that is based on the
CSMA!co
d principle [31J.
•by e v e1 cne ttoheanCSthMA-/CD protocol is used for a tree topology' packets transmitted
one

on an0 th b O 11
er wi enter the common bus unless the destination device 1s can be returned to the source. ,
not The key advantage of the slotted ring technique is it,; simplicity, which translates
er ranch of the tree.Hence, the common bus serves asa backbone connecting into reliability. The prime disadvantage is wasted bandwidth. When the ring is not
86 ADVANCED CONCEPT'S IN OPERATING SYS'reMS
AIIOlmCTtlaES cw DISDJSUTD) SYS1DIS ff7
neavily utilized, many empty slots will be circulating, but a particular device wishing
to transmit considerable amounts of data can only transmit once per round-trip ring time. In the unbuffered option, data is copied from one user buffer ro IIDOtbcr user
buffer directly (29]. In this case, a program issuing a SEND should avoid reusing the
4.7 COMMUNICATION PRIMITIVES
user buffer nti themessage has bee!] transmitted. For large messages (thousands of
bytes). a combmation of unbuffered and nonblocking semantics allows almost complete
The communication network provides l:I'means to se!1d raw bit streams of data in overlap between the communication and the ongoing computational activity iD the user
distributed systems. The communication primitives are the high-level constructs with prognun-
which programs use the underlying communication network. They play a significant role A natural use of nonblocking communication occurs iD produccr-coDIWJIG' rela
in the effective usage of distributed systems. The communication primitives influence a tionships. The consumer process can issue a nonblocking RECEIVE. If a message is
programmer's choice of algorithms as well as the ultimate performance of the programs_ present, the consumer process reads it, otherwise it performs some other computation.
They influence both the ease of use of a system and the efficiency of applications that The producer process can issue nonblocking SENDs. If a SEND iails (Cir any reuoa
are developed for the system [29J. (e.g., the buffer is full), it can be retried later.
We next discuss two communication_models, namely, message passing and remote With blocking primitives, the SEND primitive does not return conbOI to the user
procedure call, that provide communication primitives. These two models have been program until the message has been sent (an unreliable blocking primitive) or un1i1 an
widely used to develop distributed operating systems and applications for distributed acknowledgment has been received (a reliable blocking primitive). In both cases.the
systems. · user buffer can be reused as soon as the control is returned to the user program. 1bc
corresponding RECEIVE primitive does not return control until a mdsage is copied ro
4.7.1 The Mes.sage Passing Model the user buffer. A reliable RECEIVE primitive automatically sends an acknowledgment,
while an unreliable RECEIVE primitive does not send an acknowledgment [47]. 1be
The message passing model provides. two basic communication primitives, name!y, primary advantage of employing blocking primitives is that the behavior of the programs
SEND and RECEIVE [47J. The SEND primitive has two paramete . a message and 1ts is predictable, and hence programming is relatively easy. The primary disadvantage
destination. The RECEIVE primitive has two parameters, the source (including anyone) is the lack of flexibility in programming and the absence of c:oncum:ncy between
ofa message anda buffer for storing the message: An application of these primitives computation and commwtlcation.
can be found in the client-server computation model. In the client-server model,a client
process needing some service (e.g., reading data from a file) sendsa message to the SYNCHRONOUS VS. ASYNCHRONOUS PRJMITIVE& We now address the
server and waits for a reply message. After performing the task, the server process ques tion of whether to buffer or not to buffer messages. With syncluono,u
sends the result in the form of a reply message to the client process. While these two. primitives. a SEND primitive is blocked until a corresponding RECEIVE primitive is
primitives provide the basic communication ability to programs, the semantics of these executed at the receiving computer. This strategy is also referred ro as a rmdezvo,u. A
primitives also playa significant role in ease of developing programs that use them. blocking synchronous primitive can be extended to an unblocking-synchronous primitive
We next discuss two design issues that decide the semantics of these two primitives. by first copying the message to a buffer at the sending side, and then allowing the
, process to perform other computational activity except another SEND.
BLOCKING VS. NONBLOCKING PRIMITIVEs. In the standard message passing
model. messages are copied three times: from the user buffer to the kernel buffer, With asynchronous primitives, the 111essages are buffered. A SEND primitive
does not block even if there is no corresponding execution of a RECEIVE primitive. 1bc
from the kernel buffer on the sending computer to the kernel buffer on the receiving
corresponding RECEIVE primitive can either be a blocking or a nonblocking primitive..
coi:n uter, and finally from the buffer on the receiving computer toa user buffer [29].
This 1s known as the buffered option. . : - , One disadvantage of buffering messages is that it is more complex. u it involves cre
With nonblocking primitives, the SEND primitive returns control to the user pro ating, managing, and destroying buffers. AnodJer issue is what ro do with the messages
·
cess as soon as the message is copied from the user buffer onto the kernel buffer. that are meant for processes that have already died '[47).

=de
The sponding REcEivE primitive signals its intention to receivea message and
a buff tocopy the message. The receiving process may either periodically
k for th amval ofa message or be signaled by the kernel upon arrival ofa mes 4.7.2 Remote Procedure Calls

sother ID
becomes wa;_
pnmary advantageof_nonblocking primitives is that programs havemaximum.

...._, .!,.
com utation and communication in any order theywant. On the
and gnifi t disadvantage
difliculL Proo..,.mr of nonblocking
bee · primitives is that programming
While the message passing communication model provides a highly flexible communi
cation ability, programmers using such a model must handle the following details_

(or 5YUem·sta1cs) an:. . ..--,..., '?3Y ome time-dependent where problems • Pairing of responses with request messages.
IITeproducible, making the programs very difficult to debug[47]. • Data representation (when computers of different architectures or programs written
in different programming languages are communicating).
88 ADVANCED CONCEPTS IN OPERATING SYSTEMS

• Knowing the address of the remote machine or the server.


• Taking care of communication and system failures.· AllCHrTl:CT\JW OF DIST1UIIUT!D SYfflMS 89
One approach for binding in the client-server model makes use of a binding
The handling of all these details in programs makes the development of progra i ·· server [9, 44] (see Fig. 4.5). The servers register the services they provide with
distributed computations difficult. In addition, these·programs can be _time-depende: the binding server. The binding server essentially stores the server machine addresses
making it almost impossible to reproduce errors and debug. These difficulties ledt along with the services they provide. When a client makes a remote procedure
the development of the remote procedure call (RPC) mechanism [9]. RPC mechanismo call, the client-stub procedure obtains the address of the server machine by querying the
hide all the above details from programmers. The RPC mechanism is based·on th: binding server. Another approach used for binding is where the client specifies the
machine and the service required, and the binding server returns the port number
observation that procedure call is a well known and well understood mechanism fo
ffllUired for communicating with t:1e service requested [4]. Note that the fint
transfer of control and ·data within a program running on a single computer (a sh
method is location transparent while the latter •is not. ·
memory system). The RPC mechanism extends this same mechanism to transfer control , .• lj j r_,' ,l .', _

and data across a communication network (a non-shared memory system) [9]. A remote
Parameter and result passing. To pass parameters or results to a remote
procedure call can be viewed as an interactibn between a client and a server, where procedurea, stu procedure has t_o convei: the parameters and results into an
the client needing a service invokes a procedure at the server. A simple RPC appropriate represcn tat on _(a representa 1on that 1s understood by the remote
mechanism works as follows. · . ., · ', •,· machine) first and then pack them mtoa buffer m a form suitable for transmission.
f ., ,·' ·, ,J•• After the message is received.
Basic RPC operation. On invoking a remote procedure, the calling process (the the message must be unpacked (see Fig. 4.5).
client) is suspended and parameters, if any, are passed to the remote machine (the Converting data into an appropriate representation becomes expensive if it has to
server) where the procedure will execute. On completion of the procedure execution, be done on every call. One way·to avoid conversions is to send the parameten along
witha code identifying the format used so that the receiver can do the conversion,
the results are passed back from the server to the client and the client resumes (only, of course, if it uses a different representation). This approach requires the
execution as if it had called a local procedure. While the RPC mech nism looks simple, machine to know how to convert all the formats that can possibly be used. This
the issues that arise in designing and implementing it are not so simple. We next discuss approach also bas poor portability because whenever a new representation
several of those issues. (because ofa new machine type ora new language) is introduced into the system,
,: existing software needs to be
4.7.3 Design Issues In RPC ' J..:..: updated [47]. ., . . . .
..
.,_ .
· Alternatively, each data type may have a standard format ID the message. In
technique the sender will convert the data to the standard format and the
Structure. A widely used organization for RPC mechanis is based'o; the co'ncept receiver will conv rt from the standard format to its local representation. With this
of stub procedures [4, 9, 40] (see Fig. 4.5). When a program (client) makes a approacha machine doesn't need to know how to convert all the formats that can

-
remote possibly be used.
procedure call, say P(x,y), it actually makes a local call on a dummy procedure o
a client-stub procedure corresponding to procedure P. The client-stub procedure con Biodlll

structs a message containing the identity of the remote procedure and parameters, if
-ft-H ---
-- -
.1 Client Macblne ,f

flt
any, to be passed. It then sends the message to the remote server machine (a on which
U1ct
primitive similar to SEND explained in Sec. 4.7.l may be used for this purpose). A it will be Progm
n
stub pro cedure at the remote machine receives the inessage (a primitive similar to executed FIGURE 4.S
RECEIVE may be used) and makes a local call to the procedure specified in the , upon a Lo
oal
message and passes the parameters received to the procedure. When the remote remote Pro
ccd
procedure completes execution, the control returns to the server-stub procedure. The procedur uie
Ca
ll
server-stub procedure passes the results back to the client-stub procedure at the calling e
machine, which returns , the results to the client. The stub procedures can be invocatio
generated at compile time or can . be linked at run time. ·.: ·'.; · n. The
'', ,'
binding
Binding. Bindir..g is a process that determines the remote procedure, and the machine process
Stub l
Q u e ' } ' "'"
B i l d, .in &
5er1cr
Sa,11 -
.....
y also c eck the compatibility of the pai:ameters passed and the procedure type called. Remote procedure call.
with what 1s expected from the remote procedure. .• . .. : •..'
90 ADVANCED CONCEPTS IN OPERATING SYSTEMS

!his method, however, is wasteful if both the sender and the receiver use the sarne AJICHrTECTtJREs OP DISTllllllfTED SYSTEMS 9J
internal representation [47]. . ·
Correctness condftJon. Panzieri and Srivastava [36] defi11 a simple correctness con
Another issue is how to deal with passing parameters by value and by reference dition for remote procedul'e calls as follows:
Passing parameters by value is simple, as the stub procedure simply opies the pa:
rameters into the message. However, passing param ters br
,:eference IS exceedingly
Let C, denote a call made by a machine and W; l'epl'CSCnt the corresponding
computation invoked at the called machine.
m? complicated. For example, just passing file pointers 1s inadequate, because the , 1 Let C2 happen after 01 (denoted by01 - 02) and computations W1 and W2
privileges associated with the calling process also have to be passed [47]. Moreover share the same data such that W1 and/or W2 modify the shal'ed data. ·
the semantics associated with passing parameters in local procedure calls and remo To be cofl'ect in the presence of failures, an RPC implementation should satisfy
procedure calls can be different. For example, in Argus [28], the structures pointed to the following correctness criterion.
by pointers are copied onto the remote machine, an(! hence, the calling machine and
the remote machine do not share the data structures directly; I
' I Ct - 02

implies W1 - W2, •
'I
THER ISSUES. e RPC mechanisms make use of the communication facility
Error handling·,semantics, and correctness. A rei:not pr°<:edure call can fail for
pro vided by the underlying network to pass messages to remote machines. One of the
at least two reasons: computer failures and commurucatJOn failures (such as whena
issues to be resolved is whether to build the RPC mechanism on top of a flow-controlled
transmitted message does not reach its destination). ' ,
and error-controlled virtual-circuit mechanism (similar to establishing sessions in
Handling failures in distributed systems is difficult. For example, consider the WANs) or directly on top of an unreliable, connectionless (datagram) service [47]. (In a
case where messages are lost occasionally. If the remote server is slow, the program datagram service, a machine simply sends a message in the form of packets to the
that invokes the remote procedure may invoke the remote procedure more than once, destination, and there is no extensive two-way communication such as automatic
suspecting a message loss. This could result in more than one execution of the procedure acknowledgments.)
at the remote machine. Also, consider Jhe case where the client machine, afterhaving' . As RPC mechanisms became a widely accepted method for communication in
invokeda remote procedure, crashes immediately. In this case, the remote procedure distributed systems, a need to specify a remote pI'OCedure call as low-latency or high
is executed in vain, as there is no machine to receive the result. On the other hand, if throughput became necessary, depending on the application. Low-latency calls requil'e
the client machine recovers quickly and reissues the remote procedure call, then · there minimum round-trip delay and 8l'e made in case of infrequent calls (such as calls to
isa possibility of more than one execution of the remote procedure. The unwanted a mail-server). On the other hand, the aim of high-throughput calls is to obtain maxi
executions have been refened to as orphans [22]. · mum possible throughput from the underlying communication facility. This type of call
In view o_f the above problems associated with distributed system failure, it is cleai is typically made when bulk data transfer is l'eqUil'ed, such as in the case of calls to
that t!1e emant1cs of R s play a significant role in the ease of development of programs file servers. The ASTRA RPC mechanism [4] provides the ability to a user to specify
for distributed computation. The semantics of RPCs are classified as follows [33,36]:' whether low-latency or high-throughput is desil'ed. For. high-throughput calls, ASTRA
st makes use of virtual circuit (TCP) protocol. For low-latency calls, ASTRA makes use of
1 "Atlea on e" semantics. If the remote procedure call succeeds, it implies that a datagram facility that is mol'e suitable for intennlttent exchange due to its simplicity.
raft theasto
e cal1ne execution
does of t_heremote
not succf!ed 1t is pos 'bfprocedure
th has taken
· ·place at the remote maehine. Stream [28] is another RPC mechanism-designed mainly to achieve higb-t!trougbpuL It
h.
also makes use of the TCP protocol. In both the ASTRA and Stream implementations,
high-throughput is achieved by buffering the messages and immediately returning con
ave taken p1ace. ' si e at zero, a partial, or more executions
one' trol to the user. The user can then make more calls. The buffer is flushed when it is full
"Exactly once" semantics. If the remote d . .. or convenient. By buffering the messages, the overhead to process the communication
exactly one execution of the remot d proce ure call succeeds, 11 implies that protocols on every l'emote proced call is avoid . For l w-late?cy calls, e buffer
H . e proce ure has taken pl
oweve r, 1f the remote procedure c alld ace at e remote machine.
th
is flushed immediately. Future [51] 1s an RPC facility that 1s specifically designed for
or one execution has taken place. oes not ucceed, it is possible that zeroa, partial, low-latency. .
Note that invoking a remote procedure call blocks the callin process. However,
In view of the above, it is apparent thata stro . . ..
these semantics severely limits the concurrency that can be achieved. Several RPC
s for the RPc mechanism to significant! . nger semantics for RPCs are neces
L1skov and Scheiller [28] define the foll improve upon the message passing model. designs have tried to overcome this l tation. . . · ·
"At One way to achieve parallelism 1s through creallng muJnple proc for each
owmg stronger semantics for RPCs.
tha most once" semantics. Same as ex I • , ' ;I remote procedure call [6]. This scheme ows a process 10 make m nple calls to
t do not tenninate nonnaJJy do not od acyt once semantics, but in addition calls many servers and still execute in parallel with the servers. However, cl'eabng processes.
so referred pr, uce any s·d ff, ' switching between processes, and destroying processes may not be economical n er
a suppon t
I stot as Zero-or-one semantics A numbe J e e ects. These semantics are
is, -once semantics [2, 5, 9, , { of RPC mechanisms implemented
a ·mo 36 44
allcirc
umsta
nces.
This
appro
ach
also
does
not
scale
well
for
large
syste
ms
consis
ting
of
hundr
eds of
comp
uters
[40].
-..- t,;.,.. ., • 1•--.. :r -

92 ADVANCED CONCEPTS IN OPERATING SYSTEMS


AIClt!TPCMEs CJ1 Dlffllll/lU) smna 93
In MultiRPC [40], a process is allowed. to invokea proc ure on m y
4.8 SUMMARY
servers The driving force behind the de 1 1
concurrently but not two different procedures m p lel.:r?e callin process 1s
blocked
until all the responses are received or until the call is explicitly termm ted by the of powerful yet cheap microprocveessooprms ct nIt of distribu"•.."..' ·.,,,•....,.m.s II the
calling process. · · · · ' , vailabili technology. T. hese developments have maade .IoiIw co'bslt and advancesm•
commu.
ruca•uolYn
Parallel procedure call (PARPC) [30] is another scheme which is similar to Multi- systems composed of many computers, in •e to design and develop distributed
RPC.In PARPC, invoking a parallel procedure call executes ·aprocedure in n different The pnmary advantages of distributed nnccted by communicalion nctworb.
address spaces (e.g., at n different servers) in parallel. The caller remains blocked while . systems d are: lowprice/performance rati o , re s ou r ceerstcranditbioenal time-sharing mainf,,..,
y s te m s o v _.
improve system response through load d.' . . shared among many U5ffl,
the n procedures execute. When a result becomes available, the caller is unblocked to and mod lar expandabi)ity. . is1nbutmg. higher availability and reliability,
execute a statement to process the result of the returned call. After executing the While many architectures arc possible for : . ,, ,
state ment, the caller reblocks to wait for the next result. The caller resumes model, wherea number of workstations are. 1 distnbuted systems. the workstalion
execution after all the n calls have returned or when the caller explicitly tenninates employed m el for distributed systems. m crconnected by LANs, is the most widely
the PARPC call.
To overcome the limitations of blocking semantics, asynchronous RPC mecha , .. To reahze the benefits of a distributed syste. . . 115 ·
nisms have been developed where a calling process does not block after invoking a managed efficiently. It is the responsibility of di:; that rcsoun:a are
remote procedure [4, 44]. ASTRA [4] offers further flexibility by allowing client pro re.s.oubrcesd efficie. ntly and to .provide a friendly interface toopme
lltguhsmsysSt.eevmesratloamreaansagoef
cesses to accept replies in any order. The main disadvantage of these semantics is that, 1stn• ute opera• ting systems m which a signi·ficant am011tD W0:1.1.,, bas been done
Of

like in message passing primitives, programming becomes difficult. . • .. , dwere introduced m Sec. 4.5. These areas will be discussed in detail in thei
Bershad et al. [7] pcrfonned a study to determine the frequency of · chapters (5 through 12). orth,commg
intennachine procedure calls. According to their study, less than ten percent of all Local are·a networks pr.ovi.de a basic communication facility for distn'LllU..,I.P..I, sys-
system calls cross the machine boundary. Note that intramachine calls can be made tems, ue to tr 1 w commumc uon delays and low error rares. Wide area nctWorks.
efficient by avoiding the marshaling of data and other :RPC related network protocols. th higher communication delays and error rates. arc mainly employed
becaused of their
ASTRA optimizes the intramachine call by avoiding the above overhead and by to interconnect distributed systems based on LANs and compuicn spread overa wide
using the most efficient interprocess comrnui)ication mechanism provided by the host
geograCpohmicmaurneiac.ation primitives arc, the means through which programs can use
operating system to pass messages.
According to Gifford [18] existing RPC facilities have the following tw the underlying communication network. They play a significant role in the effective
'short- comings: . , :, , usage of distributed systems. Communication primitives influence the progiammcr's
choice of algorithms as well as the ultimate performance of the programs. They
• Incremental results: In the present RPC facilities, a.remote procedure ot easily influence both
return incremental results to the calling process while its execution is still in the ease of use ofa system and the efficiency of applications thal are developed for the
progress. system. Message passing primitives arc the baSic communication lives that prov!de
• Protocol flexibility: In present RPC systems, remote procedures are not first-class
a facility for programs to send and receive messages. However, wnung programs us1_ng
objects. (A first-class object is a value that can be freely stored in memory, passed these primitives is difficult because a programmer has to take care of y de ls,
such as the pairing of respon.ses withthrequest
medssagtaeksi,ndgactaarreeopfrecsocmntma1u1r°uc• auwonmangd
as a parameter to both local and remote procedures, and returned as a result from the • .
add ress of the remote machi ne or sert,veeer, an·delY accepted as the mecti.an•sm to
eh
both local and remote procedures [! ].) Th.is feature can make protocols inflexible.
For example, the following pro !is not 1mpl.ementablc unl.css remote procedures system failures. The RPC mech_anism as n wi tak f the above details.
support communication in distnbutedsyS t emsas tbeYoode call mechanislll,
are first-class objects: A process wishes to provide.a server with a procedure for
use under certain circumstances, and the server then wishes to pass this prOCcdurc They are based on the well known and ll un:1m 10 encompass communication
on to An RPC mechanism extends a procedure-c mec .
another server.
networks.. . · , • , · hope
To overcome the above limitations, G_ifford has proposed a new communication In the coming decade, we can
10
manY more advances in scalabililY see thousand5 of computcn) and in
model called the channel model [18]. In th!s model, r motc prOCcdu s arc first-class techniques (as distributed systems grow t? :n: ludc
. ts An abstraction called a pipe pcmuts the efficient transportation of bulk data differentmachinesrunning linking heterogeneous environments (whic binCS
obJ · Its and an abstraction called channel groups allows the sequencing linked iogether by different types
and mcreme ta 1resud p ocedurcs Complete details on this model are beyond the
scope of calls on pipes an .· different operating systems and groups of mac
of this book and can be found m (18). . , . . ..
of networks).
94 Rl!FBRENCES
95
4.9 FURTHER READING 13. Cheriton, D.R., '"The V Distributed S " ..
March 1988, pp. 314-333. Ystem. o/tlwACM 1'111. 31 3.
The development of distributed operating systems 14, Chin, R. S. and S. T. Chanson. "Distri . ' ' ..,_
operatinghassystems
comea operating
lo g w a y Mach
in [IJ,
decade.
th·ePlat
Some examples of distributed
ar e : Computing Surv_eys, vol. 23, no. I, March Object-Sued sy-.•

Eden [3, 23], ISIS [8], Athena [ll], V-system [13], Clouds: [16], Domain [2.SJ, Ar. JS. Xerox Corporation. ''Courier: The Remote I, PP. 91-IU.
gus [27], Andrew [32], Sprite [35], Galaxy [43], Amoeba [48], and Locus [.SO]. Standard 038112, .Xerox OPD, Dec. 1981 Procmn Call l'nlalcol." hrm s.,.,_ b,upallatt
The distributed object-based programming system is an amalgamation of concepts 16. Dasgupta. P., R. J. LeBlanc Jr 111d W p Appe
Procttedlngs of the 8th fnteni:ztlonal C fermci:e- "Ille Clouda Diloibulm Opauins s,-.•
of object-based programming (which encourages the design of a program as a set PP· 2-9. · on Dinrlbuud c-,,,,t,,1Syn«-1, Jum 1981..
of autonomous components) and distributed systems (which permits a collection of 17, Farmer, W. D., and E. E. Newball, "An EaperimeulaJ.
autonomous components to be treated as a single entity). Chin and Chanson (14] Bursty Computer Traffic," Proceedlnis of the ACM S 'baled Swildlin& Syam., Handle
·for.d, D."KA.,Ca'nMd..N, . Glasser. ''Remote Pipes andr.,....,_,or E.lllcimr·D.isrnlluled Com-
provide of Data
18. Gif Communications, 1969. on P,ob/n,u In dw
a survey of the design and implementation of distributed object-based programming
systems. An in-depth discussion on client-server computing can be found in a paper by
mur.ucauleoin D. ,r'bansadctnio.n,,s. on Computt1r r,,fflllt • no. 3 • ,•-...1, 1 7n a o- .pp. 258-213
A,
vol 6, • . . . . : .. , _ , _ ,
Sinha [42]. '· . , · ,. , . ., ism ute ,,.roting S.ystan 77t, u,0•iall Dul , .. . . .. . _ . - - ,r_·• _
.............................., , _
MoAsc,a1n9s,9
1. G
19 ,,._

Lin and Gannon [26] discuss an atomic RPC scheme which supports at-most once 20. Hutchi son, N. C., 1:-, L M. B. Abboa, 111d S. O'Malley, "RPC iD tbe •-!Canel:
se antics with the help of built-in facilities for backward error recovery. Panzieri and Evaluattn_gN w Design _Techniques," Procet1dln1s of w 12"' ACM s,,,,,_i-,..Opt!rodnl
Shrivastava [36] describe an RPC facility supporting orphan det_ection and·killing. Tay Systems Principles, Special Issue of Operating 5.,.,....,, Review ACM. vol. 23 no. n- 1989
d ,Ananda [49] have presented a survey of RPC mechanisms and a comprehensive pp. 91-101. ,- -.. ' ' .,, .,._ •
bibliography for RPC mechanisms. 21. Jul, E., H. Levy, N. Hutchinson, and A. Black, "Fme-Grained Mobility iD the Emaald S •
ACM Transactions on Computt1r Systems, vol 6, no. I, Feb. 1988. pp. Jc»-133.
22. Lampson,B., "Remote Procedure Calls," /.«tllff Now ill C/lllf/lUIB Sdlna,vol105, Sprinsa'·
REFERENCES Verlag, New York, 1981, pp. 365-370.
23. Lazowska, E.D., H. M. Levy, O. T. AJmes. M. J. Fisher, R. J. Fowler, and S. C. Valal. "'Tbe
1. Acetta, M., R. Baron, W. Bolosky, D. Golub, R. Rashid, A. Tevanian, and M. Young: "Mach:
A New Kernel Foundation for UNIX Development," Proceedings of the Summer USENIX
· Architecture of the Eden System." Procet1din1s of tlw 8th ACM s,,,,posiMM Opnrllbtl S, °"
Conference, June 1986, pp. 93-113. · . • · Prlnclples, Dec. 1981, pp. 148-159.
24. LeLann, G., Motivation. Objttetive, and Charo&11ristks oJ Distributed Systaru, L.amplml Cl al.
2. Almes, G., ''Tite Impact of Language and System on Remote Procedure Call Design," Pro•
ceedings of the 6th International Conference on Distributed Computing Systems, May 1986, · (eds.), Distributed Systems-Architecture and lmp,-,,,adan. Sprinpr- New Yen. 1981.
pp. 414-421. , . .
3. Almes, G. T., A. P. Black, E. D. Lazowska, and J. D. Noe, "The Eden System:A Technical
. . . 25. · pLpe.vi1n-e9,.P. ''The DOMAIN System;" p,ocudings oftlw Znll A.CM IGOPS Wort.,/top Mokiltl °"
Distributed Systems Mbrk, Special Issue of<>penlinl Sy- ReviCW, val. 21,..,_ I, J._ 1917,
Review," IEEE Tran.sactlons on Software Engineering, vol. I I, no. I,Jan, 1985. ·
Anada, A. L., B. H. Tay, .and E. IC. Koh, "ASTRA-An Asynchronous Remote Procedure Call pp. 49.84, . ' •
Facility," Proceedings of theI Ith lntttmatlonal Co'lference on Dtnrtb111ed CompUl/ng Systems,
May 1991, pp. 172,-179. ·
26. Lin,K. J.,and J.o.oannon. "Atomic RcmOte Pn,cedure Call. IEEE .,.,.,_.,;,,,u SajtwoN °"
Engineeringv, ol. 11, no. 10, ()ct. 1985, PP· 11 1135. of Asp&," ,. .,,,.
5. Bacor, J. M., and IC. o .. Hamilton, "Distributed Computing with the RPc: The Cambridge 27L.iskov, B., D. Curtis, P. JobnSOD, and sct,eilk:rj, 987 pp. 111-122.
Approach," ACM Tran.saction on Computer System;, vol. 5, o . 3, July 1983, PP,3 -404,
6. Bal, H. E., R. van Renesse, and A. S. Tanenbaum, Implementing Distributed
Algor8it1hmsusing·
· 28. Liskov, B. and R. Scheifler, 'Qua,diaDS and:=
of the 11th ACM Symposium.on ope_ron111 Systfflllt • U . ": 5· ,;..Robull l)ulrilual
55-369, Abo iD A.CM

Remote Procedure Calls." Proceedings of the National Comp111e,- Co'lfe nce, AF/PS Programs," Distributed Procm/ng, /F/P,and wL 5 no. 3 1983 pp. 381-406.
pp. 499-505. . , · .. • 1' Trans ctlons on Programming LangllDB"!A )'llffl' 'eai1 for r-p«-
9.8'7. 2 9M. aggio,M. D.,and D. W. KrufflJD'!, Mui·-• o,,natinl S /lrMW, WIL 2',
7 Bershad, B. N., T. E. Anderson, E. D. Lazowska, and .H. M. Levy, "USbtweight Re
• Procedure Call," Proceedings of the J th ACM Sympo.mun on Operating Sys,,,,,. Princlm0le':
S ial K
Issue of
d Operating Systems Rev, w, vol. 23, no: 5, Dec:·1989, pP, 102--t13_ 'P •
Communication in a Distributed MelDOIY n..,--•
no. 2, Apr. 1991, pp. 4-21. u,tbatd. Paris, and;
-experiellCC widl pARJ'C.• • of
8. Bmnao,
R · an Cooper' "The I.SIS pro9j1ect: Re1a0l3e-x1p0e7nence wi.th a fa.ult tolerant 30. Martin, B., C. Bergan. W.B i-l2.
· . ·
tem" Opt1ratingPn>lll'Bm
Systemsmin g
Review, 19 • PP· · ,... the Winter USENIX Conft1renet1, • 9• l'P·c Distribllted Packet Swiu:hinl for L,oclll CIJIDIIUIO
9. Bs".s
irre•I 'A
·• and B. I. Nelson, "Implementing Remote Procedure Calla," ACM 1>-an.. .,_
2 I Feb 1984 pp. 39-59. ac.,.,ns on 31. Metcalfe,R. M., and D. R. Boggs. Et)lenlevol 19 no. 7 July 1976.
Computer Systems, vol. • no. n pw Madany, and V. F. Russo, "Pri,;cl , Networks" Communications af the ACM, · 'J ti
Howard. D. S. tbal. 1111d F. D.
10. Campbell, R. ".·• G. Nt.e Jo
Oriented Opera ng_SyS m sih ai n, Apr.1989.
otC.O: ,:,:,t
·gd.. Technical Repon R-89-1510, Dept. 1 O_bject
Science, 32. Morris, J.'
H., M.Saty "! ;
Smith. "Andrew: A Distnbllted Perwa-
f,avin)lllllellL- c_,,.,.;c,,n,o,u af tM
.
University of Ilho01s, Urban\Cand\v g N Ruh. "Project .Athena asa Dbtributeci COQi , ACM, vol. 29, no. 3, March J986, "1 PbD tbeSis, eoa,pua:r $dcacc. carocgie Mclloa
JI. Champine, 0. A., D. E. Gee •
System," IEEE Computer, vol.
·9 Se t. 1990, pp. 40-50. . PIiier
l oftw Base for Distributed Systems," 1££e .S: ,... , ,
9'1
33. Nelson, B. J., "Rt1mote p,o,:edMJS •ReJIOc;IMI U-CS-81-119. . .
12 Cheriton, D.R., ''The V Kerne· '!.,.....,., University, 1981. Available as Tecbnic81 in• Distributed MuluP"":"5"" •
· vol I, oo. 2, Apr. 1984, PP· 19-42. · 34. Ousterhout, J. K., "Partitioning and u..CS-S0,112. DepC of C-a,uta' $ciCDCC- ean,o:g,e-
System: Medusa." Technical RePO"

You might also like