CS3551 – Distributed Systems
UNIT I INTRODUCTION
Introduction: Definition-Relation to Computer System Components –
Motivation – Message -Passing Systems versus Shared Memory Systems –
Primitives for Distributed Communication – Synchronous versus
Asynchronous Executions – Design Issues and Challenges; A Model of
Distributed Computations: A Distributed Program – A Model of Distributed
Executions – Models of Communication Networks – Global State of a
Distributed System.
DEFINITION:
A distributed system is a collection of independent entities that cooperate to solve
a problem that cannot be individually solved.
Autonomous processors communicating over a communication network. Some
characteristics:
• Crash of a computer never prevents doing work.
• Communicate by a messages passing over a communication network.
• Cooperate to address a problem collectively.
Features:
• No common physical clock: It introduces the element of “distribution” in the
system and gives rise to the inherent asynchrony amongst the processors.
• No shared memory: It may be noted that a distributed system may still
provide the abstractionof a common address space via the distributed shared
memory abstraction.
• Geographical separation: Recently, the network/cluster of workstations
(NOW/COW) configuration connecting processors on a LAN is also being
increasingly regarded as a small distributed system.
• Autonomy and heterogeneity: The processors are “loosely coupled”
in that they have different speeds and each can be running a different
operating system.
-------------------------------------------------------------------------------------------
Course Instructor: Dr. Suja A. Alex, AP/IT Page 1
CS3551 – Distributed Systems
RELATION TO COMPUTER SYSTEM COMPONENTS
Distributed System Model
Figure 1.1 A distributed system connects processors by a communication network.
A typical distributed system is shown in Figure 1.1. Each computer has a memory-
processing unit and the computers are connected by a communication network.
Figure 1.2 shows the relationships of the software components that run on each of
the computers and use the local operating system and network protocol stack for
functioning. The distributed software is also termed as middleware. A distributed
execution is the execution of processes across the distributed system to
collaboratively achieve a common goal. An execution is also sometimes termed a
computation or a run.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 2
CS3551 – Distributed Systems
Figure 1.2: Interaction of the software components at each process.
Figure 1.2 schematically shows the interaction of this software with these system
components at each processor. Here we assume that the middleware layer does not
contain the traditional application layer functions of the network protocol stack, such
as http, mail, ftp, and telnet. Various primitives and calls to functions defined in
various libraries of the middleware layer are embedded in the user program code.
There exist several libraries to choose from to invoke primitives for the more
common functions – such as reliable and ordered multicasting – of the middleware
layer. There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) [36], and the remote
procedure call (RPC) mechanism
--------------------------------------------------------------------------------------------
MOTIVATION
• Inherently distributed computation:
In many applications such as money transfer in banking, or reaching
consensus among parties that are geographically distant, the computation is
inherently distributed.
• Resource sharing:
Resources such as peripherals, complete data sets in databases, special
libraries, as well as data (variable/files) cannot be fully replicated at all the
sites because it is often neither practical nor cost-effective. For example,
distributed databases such as DB2 partition the data sets across several
servers.
• Access to remote resources :
The data cannot be replicated at every site participating in the distributed
execution because it may be too large or too sensitive to be replicated. For
example, payroll data within a multinational corporation is both too large and
too sensitive to be replicated at every branch office/site. It is therefore stored
at a central server which can be queried by branch offices
• Increased performance/cost ratio Reliability:
Course Instructor: Dr. Suja A. Alex, AP/IT Page 3
CS3551 – Distributed Systems
By resource sharing and accessing geographically remote data and resources,
the performance/cost ratio is increased. Any task can be partitioned across the
various computers in the distributed system.
• availability, integrity, fault-tolerance:
A distributed system has the inherent potential to provide increased reliability
because of the possibility of replicating resources and executions.
availability, i.e., the resource should be accessible at all times;
integrity, i.e., the value/state of the resource should be correct, in the
face of concurrent access from multiple processors, as per the semantics
expected by the application;
fault-tolerance, i.e., the ability to recover from system failures
• Scalability:
As the processors are usually connected by a wide-area network, adding more
processors does not pose a direct bottleneck for the communication network.
• Modularity and incremental expandability:
Heterogeneous processors May be easily added into the system without
affecting the performance, as long as those processors are running the same
middleware algorithms. Similarly, existing processors may be easily replaced
by other processors.
--------------------------------------------------------------------------------------
MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS
Shared memory systems are those in which there is a (common) shared
address space throughout the system. Communication among processors takes place
via shared data variables, and control variables for synchronization among the
processors. Semaphores and monitors that were originally designed for shared
memory uniprocessors and multiprocessors are examples of how synchronization
can be achieved in shared memory systems. All multicomputer (Non Uniform
Memory Access [NUMA] as well as message-passing) systems that do not have a
shared address space provided by the underlying architecture and hardware
necessarily communicate by message passing.
The abstraction called shared memory is sometimes provided to simulate a
shared address space. For a distributed system, this abstraction is called distributed
Course Instructor: Dr. Suja A. Alex, AP/IT Page 4
CS3551 – Distributed Systems
shared memory. Implementing this abstraction has a certain cost but it simplifies the
task of the application programmer. There also exists a well-known folklore result
that communication via message-passing can be simulated by communication via
shared memory and vice-versa.
Emulating MP over SM:
The shared address space can be partitioned into disjoint parts, one part being
assigned to each processor. “Send” and “receive” operations can be implemented by
writing to and reading from the destination/sender processor’s address space,
respectively.
Specifically, a separate location can be reserved as the mailbox for each ordered pair
of processes. A Pi–Pj message-passing can be emulated by a write by Pi to the
mailbox and then a read by Pj from the mailbox. In the simplest case, these
mailboxes can be assumed to have unbounded size. The write and read operations
need to be controlled using synchronization primitives to inform the receiver/sender
after the data has been sent/received.
Emulating SM over MP:
This involves the use of “send” and “receive” operations for “write” and “read”
operations. Each shared location can be modeled as a separate process; “write” to a
shared location is emulated by sending an update message to the corresponding
owner process; a “read” to a shared location is emulated by sending a query message
to the owner process. As accessing another processor’s memory requires send and
receive operations, this emulation is expensive. Although emulating shared memory
might seem to be more attractive from a programmer’s perspective, it must be
remembered that in a distributed system, it is only an abstraction. Thus, the latencies
involved in read and write operations may be high even when using shared memory
emulation because the read and write operations are implemented by using network-
wide communication under the covers.
In a MIMD message-passing multicomputer system, each “processor” may be a
tightly coupled multiprocessor system with shared memory. Within the
multiprocessor system, the processors communicate via shared memory. Between
two computers, the communication is by message passing.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 5
CS3551 – Distributed Systems
--------------------------------------------------------------------------------------------------------------------------------------------
PRIMITIVES FOR DISTRIBUTED COMMUNICATION
Blocking/non-blocking, synchronous/asynchronous primitives:
• Send() and Receive(),Message send and message receive communication
primitives are denoted Send() and Receive(), respectively.
A Send primitive has at least two parameters
o the destination, and
o the buffer in the user space, containing the data to be sent.
Receive primitive has at least two parameters –
o The source from which the data is to be received (this could be a
wildcard), and
o the user buffer into which the data is to be received.
There are two ways of sending data when the Send primitive is invoked –the buffered
option and the unbuffered option.
• The buffered option which is the standard option copies the data from the
user buffer to the kernel [Link] data later gets copied from the kernel
buffer onto the network.
• In the unbuffered option, the data gets copied directly from the user buffer
onto the network. For the Receive primitive, the buffered option is usually
required because the data may already have arrived when the primitive is
invoked, and needs a storage place in the kernel.
Synchronous (send/receive)
• Handshake between sender and receiver
• Send completes when Receive completes
• Receive completes when data copied into buffer
Asynchronous (send)
• Control returns to process when data copied out of user-specified buffer
Course Instructor: Dr. Suja A. Alex, AP/IT Page 6
CS3551 – Distributed Systems
Blocking (send/receive)
• Control returns to invoking process after processing of primitive (whether
sync or async) completes
Nonblocking (send/receive)
• Control returns to process immediately after invocation
• Send: control returns to the process even before data copied out of user buffer
• Receive: control returns to the process even before data may have arrived
from sender
Non-blocking Primitive
Figure 1.7: A nonblocking send primitive. When the Wait call returns, at least one
of its parameters is posted.
If at the time that Wait() is issued, the processing for the primitive (whether
synchronous or asynchronous) has completed, the Wait returns immediately. The
completion of the processing of the primitive is detectable by checking the value of
handlek. If the processing of the primitive has not completed, the Wait blocks and
waits for a signal to wake it up. When the processing for the primitive completes,
the communication subsystem software sets the value of handlek and wakes up
(signals) any process with a Wait call blocked on this handlek. This is called posting
the completion of the operation.
Blocking/nonblocking; Synchronous/asynchronous; send/receive primities
Send primitive
◦ synchronous blocking,
◦ synchronous non-blocking,
Course Instructor: Dr. Suja A. Alex, AP/IT Page 7
CS3551 – Distributed Systems
◦ asynchronous blocking, and
◦ Asynchronous non-blocking.
Receive primitive
◦ the blocking synchronous
◦ non-blocking synchronous versions.
Timing diagram
Blocking synchronous Send
non-blocking synchronous Send
Blocking asynchronous Send
non-blocking asynchronous Send
Blocking Receive
non-blocking Receive
Blocking synchronous Send: The data gets copied from the user buffer to the kernel
buffer and is then sent over the network. After the data is copied to the receiver’s
system buffer and a Receive call has been issued, an acknowledgement back to the
sender causes control to return to the process that invoked the Send operation and
completes the Send.
Non-blocking synchronous Send: Control returns back to the invoking process as
soon as the copy of data from the user buffer to the kernel buffer is initiated. A
parameter in the non-blocking call also gets set with the handle of a location that the
user process can later check for the completion of the synchronous send operation.
The location gets posted after an acknowledgement returns from the receiver. The
user process can keep checking for the completion of the non-blocking synchronous
Send by testing the returned handle, or it can invoke the blocking Wait operation on
the returned handle.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 8
CS3551 – Distributed Systems
Figure 1.8 Blocking/non-blocking and synchronous/asynchronous primitives [12].
Process Pi is sending and process Pj is receiving. (a) Blocking synchronous Send
and blocking (synchronous) Receive. (b) Non-blocking synchronous Send and
nonblocking (synchronous) Receive. (c) Blocking asynchronous Send. (d) Non-
blocking asynchronous Send.
Blocking asynchronous Send: The user process that invokes the Send is blocked
until the data is copied from the user’s buffer to the kernel buffer.
Non-blocking asynchronous Send: The user process that invokes the Send is
blocked until the transfer of the data from the user’s buffer to the kernel buffer is
initiated. Control returns to the user process as soon as this transfer is initiated, and
Course Instructor: Dr. Suja A. Alex, AP/IT Page 9
CS3551 – Distributed Systems
a parameter in the non-blocking callalso gets set with the handle of a location that
the user process can check later using the Wait operation for the completion of the
asynchronous Send operation.
Blocking Receive: The Receive call blocks until the data expected arrives and is
written in the specified user buffer.
Non-blocking Receive: The Receive call will cause the kernel to register the call
and return the handle of a location that the user process can later check for the
completion of the non-blocking Receive operation. The user process can check for
the completion of the non-blocking Receive by invoking the Wait operation on the
returned handle.
Processor synchrony
• Processor synchrony indicates that all the processors execute in lock-step
with their clocks synchronized.
This abstraction is implemented using some form of barrier synchronization to
ensure that no processor begins executing the next step of code until all the
processors have completed executing the previous steps of code assigned to each of
the processors.
Libraries and standards
• software products (banking, payroll, etc., applications) use proprietary
primitive libraries
• The message-passing interface (MPI) library and the PVM (parallel
virtual machine) library
• socket primitives or socket-like transport layer primitives are invoked to
call the procedure remotely.
• software, libraries for remote method invocation (RMI) and remote object
invocation (ROI).
• CORBA (common object request broker architecture)
• DCOM (distributed component object model)
Course Instructor: Dr. Suja A. Alex, AP/IT Page 10
CS3551 – Distributed Systems
---------------------------------------------------------------------------------------------------
SYNCHRONOUS VERSUS ASYNCHRONOUS EXECUTIONS
➢ asynchronous execution
◦ no processor synchrony
◦ no bound on the drift rate of processor clocks,
◦ message delays (transmission + propagation times) are finite
◦ no upper bound on the time taken by a process to execute a step
Figure 1.9 An example of an asynchronous execution in a message-passing
system.
A timing diagram is used to illustrate the execution. The arrows denote the
messages; the tail and head of an arrow mark the send and receive event for that
message, denoted by a circle and vertical line,
➢ synchronous execution
• processors are synchronized
• clock drift rate between any two processors is bounded.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 11
CS3551 – Distributed Systems
• message delivery (transmission + delivery) times are such that they occur
in one logical step or round.
• known upper bound on the time taken by a process to execute a step.
Figure 1.10 An example of a synchronous execution in a message-passing system.
All the messages sent in a round are received within that same round. The arrows
denote the messages.
Emulating an asynchronous system by a synchronous system (A→S)
An asynchronous program (written for an asynchronous system) can be
emulated on a synchronous system fairly trivially as the synchronous system
is a special case of an asynchronous system – all communication finishes
within the same round in which it is initiated.
Emulating a synchronous system by an asynchronous system (S →A)
A synchronous program (written for a synchronous system) can be emulated
on an asynchronous system using a tool called synchronizer,
System Emulations
Using the emulations shown, any class can be emulated by any other. If
system A can be emulated by system B, denoted A/B, and if a problem is not solvable
in B, then it is also not solvable in A. Likewise, if a problem is solvable in A, it is
also solvable in B. Hence, in a sense, all four classes are equivalent in terms of
“computability” – what can and cannot be computed – in failure-free systems.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 12
CS3551 – Distributed Systems
Figure 1.11 Emulations among the principal system classes in a failure-free system.
---------------------------------------------------------------------------------------------
DESIGN ISSUES AND CHALLENGES
The primary issues in the design of the distributed systems included providing access
to remote data in the face of failures, file system design, and directory structure
design.
Important design issues and challenges:
(i) having a greater component related to systems design and operating
systems design
(ii) having a greater component related to algorithm design,
(iii) Emerging from recent technology advances and/or driven by new
applications.
Challenges: System Perspective
• Communication mechanisms: This task involves designing appropriate
mechanisms for communication among the processes in the network E.g.,
Remote Procedure Call (RPC), remote object invocation (ROI), message-
oriented vs. stream-oriented communication.
• Processes: Code migration, process/thread management at clients and servers,
design of software and mobile agents.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 13
CS3551 – Distributed Systems
Naming: Easy to use identifiers needed to locate resources and processes
transparently and scalable. Naming in mobile systems provides additional challenges
Synchronization: Mechanisms for synchronization or coordination among the
processes are essential Eg: Mutual exclusion.
Data storage and access
• Schemes for data storage, and implicitly for accessing the data in a fast and
scalable manner across the network are important for efficiency.
• Revisit file system design.
Consistency and replication
To avoid bottlenecks, to provide fast access to data, and to provide scalability,
replication of data objects is highly desirable. This leads to issues of managing the
replicas, and dealing with consistency among the replicas/caches in a distributed
setting.
Fault-tolerance: Fault tolerance requires maintaining correct and efficient operation
in spite of any failures of links, nodes, and processes. Process resilience, reliable
communication, distributed commit, checkpointing and recovery, agreement and
consensus, failure detection, and self-stabilization are some of the mechanisms to
provide fault-tolerance.
Distributed systems security
Distributed systems security involves various aspects of cryptography,secure
channels, access control, key management – generation and distribution,
authorization, and secure group management.
Scalability and modularity of algorithms, data, services
The algorithms, data (objects), and services must be as distributed as possible.
Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.
Some experimental systems: Globe, Globus, Grid
Course Instructor: Dr. Suja A. Alex, AP/IT Page 14
CS3551 – Distributed Systems
API for communications, services: The API for communication and other
specialized services is important for the ease of use and wider adoption of the
distributed systems services by non-technical users.
Transparency: hiding implementation policies from user
• Access: hide differences in data rep across systems, provide uniform
operations to access resources
• Location: locations of resources are transparent
• Migration: relocate resources without renaming
• Relocation: relocate resources as they are being accessed
• Replication: hide replication from the users
• Concurrency: mask the use of shared resources
• Failure: reliable and fault-tolerant operation.
Algorithmic challenges in distributed computing:
• Useful execution models and frameworks: to reason with and design correct
distributed programs
o Interleaving model
o Partial order model
o Input/Output automata
o Temporal Logic of Actions
- Useful for operational reasoning and the design of distributed algorithms.
- provide different degrees of infrastructure for reasoning more formally
with and proving the correctness of distributed programs.
• Dynamic distributed graph algorithms and routing algorithms
o System topology: distributed graph, with only local neighborhood
knowledge
o Graph algorithms: building blocks for group communication, data
dissemination, object location, object search functions.
o Algorithms need to deal with dynamically changing graphs
o Algorithm efficiency: also impacts resource consumption, latency,
traffic,congestion in the network.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 15
CS3551 – Distributed Systems
Time and global state
• The challenges pertain to providing accurate physical time, and to providing
a variant of time, called logical time. Logical time captures inter-process
dependencies and tracks relative time progress at each process.
• Global state observation: Due to the inherent distributed nature of the
system, it is not possible for any one process to directly observe a meaningful
global state across all the processes.
• Concurrency measures: concurrency depends on program logic, execution
speeds within logical threads, communication speeds
Synchronization/coordination mechanisms
Synchronization is essential for the distributed processes to overcome the limited
observation of the system state from the viewpoint of any one process.
• Physical clock synchronization: Physical clocks usually diverge in their
values due to hardware limitations.
• Leader election: All the processes need to agree on which process will
play the role of a distinguished process – called a leader process.
• Mutual exclusion :access to the critical resource(s) has to be coordinated.
• Distributed deadlock detection and resolution: Deadlock detection should
be coordinated to avoid duplicate work, and deadlock resolution should
be coordinated to avoid unnecessary aborts of processes.
• Termination detection: This requires cooperation among the processes
to detect the specific global state of quiescence.
• Garbage collection: garbage requires coordination among the processes.
Group communication, multicast, and ordered message delivery
• Group: A group is a collection of processes that share a common context and
collaborate on a common task within an application domain. Specific
algorithms need to be designed to enable efficient group communication and
group management wherein processes can join and leave groups dynamically,
or even fail.
Monitoring distributed events and predicates
Course Instructor: Dr. Suja A. Alex, AP/IT Page 16
CS3551 – Distributed Systems
• Predicates defined on program variables that are local to different processes
are used for specifying conditions on the global system state, and are useful
for applications such as debugging, sensing the environment, and in industrial
process control.
• An important paradigm for monitoring distributed events is that of event
streaming, wherein streams of relevant events reported from different
processes are examined collectively to detect predicates.
Distributed program design and verification tools
Debugging distributed programs
Debugging sequential programs is hard; debugging distributed programs is that
much harder because of the concurrency in actions and the ensuing uncertainty due
to the large number of possible executions defined by the interleaved concurrent
actions.
Data replication, consistency models, and caching
Managing such replicas in the face of updates introduces the problems of ensuring
consistency among the replicas and cached copies. Additionally, placement of the
replicas in the systems is also a challenge because resources usually cannot be freely
replicated.
World Wide Web design: caching, searching, scheduling
The Web is an example of a widespread distributed system with a direct interface to
the end user, wherein the operations are predominantly read-intensive on most
objects. Object search and navigation on the web are important functions in the
operation of the web, and are very resource-intensive. Designing mechanisms to do
this efficiently and accurately is a great challenge.
Distributed shared memory abstraction
In the middleware layer, the abstraction of a shared address space has to be
implemented by using message-passing. Hence, in terms of overheads, the shared
memory abstraction is not less expensive.
• Wait-free algorithm design: Wait-freedom, which can be informally defined
Course Instructor: Dr. Suja A. Alex, AP/IT Page 17
CS3551 – Distributed Systems
as the ability of a process to complete its execution irrespective of the actions of
other processes, gained prominence in the design of algorithms to control acccess
to shared resources in the shared memory abstraction. While wait-free algorithms
are highly desirable, they are also expensive, and designing low overhead wait-
free algorithms is a challenge.
Mutual exclusion
• Bakery algorithm, semaphores, based on atomic hardware primitives, fast
algorithms when contention-free access
Register constructions
In light of promising and emerging technologies of tomorrow – such as bio-
computing and quantum computing – that can alter the present foundations of
computer “hardware” design. Register constructions deals with the design of
registers from scratch, with very weak assumptions on the accesses allowed to a
register. This field forms a foundation for future architectures that allow concurrent
access even to primitive units of memory (independent of technology) without any
restrictions on the concurrency permitted.
Consistency models:
For multiple copies of a variable/object, varying degrees of consistency among the
replicas can be allowed. Clearly, a strict definition of consistency (such as in a
uniprocessor system) would be expensive to implement in terms of high latency,
high message overhead, and low concurrency.
Reliable and fault-tolerant distributed systems
• Consensus algorithms: Consensus algorithms allow correctly functioning
processes to reach agreement among themselves in spite of the existence of
some malicious (adversarial) processes whose identities are not known to
the correctly functioning processes.
• Replication and replica management
A classical method of providing fault-tolerance. The triple modular
redundancy (TMR) technique has long been used in software as well
as hardware installations.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 18
CS3551 – Distributed Systems
• Distributed databases, commit: For distributed databases, the traditional
properties of the transaction (A.C.I.D. – atomicity, consistency, isolation,
durability) need to be preserved in the distributed setting.
• Self-stabilizing systems: A self-stabilizing algorithm is any algorithm that is
guaranteed to eventually take the system to a good state even if a bad state
were to arise due to some error. Self-stabilizing algorithms require some in-
built redundancy to track additional variables of the state and do extra
[Link] efficient self-stabilizing algorithms is a challenge.
• Checkpointing and recovery algorithms: roll back and restart from earlier
”saved” state. Checkpointing in a distributed environment is difficult because
if the checkpoints at the different processes are not coordinated, the local
checkpoints may become useless because they are inconsistent with the
checkpoints at other processes.
• Failure detectors:
Failure detectors represent a class of algorithms that probabilistically suspect
another process as having failed (such as after timing out after non-receipt of
a message for some time), and then converge on a determination of the
up/down status of the suspected process.
• Load balancing:
Load balancing may be necessary because of a variety of factors such as high
network traffic or high request rate causing the network connection to be a
bottleneck, or high computational load.
The following are some forms of load balancing:
• Data migration: The ability to move data (which may be replicated) around
in the system, based on the access pattern of the users.
• Computation migration: The ability to relocate processes in order to
perform a redistribution of the workload.
• Distributed scheduling: This achieves a better turnaround time for the users
by using idle processing power in the system more efficiently.
• Real-time scheduling:
Course Instructor: Dr. Suja A. Alex, AP/IT Page 19
CS3551 – Distributed Systems
Real-time scheduling is important for mission-critical applications, to
accomplish the task execution on schedule. The problem becomes more
challenging in a distributed system where a global view of the system state is
absent. On-line or dynamic changes to the schedule are also harder to make
without a global view of the state.
• Performance modeling and analysis:
In large distributed systems, network latency (propagation and transmission
times) and access to shared resources can lead to large delays which must be
minimized.
The following are some example issues arise in determining the performance:
Metrics Appropriate metrics must be defined or identified for measuring the
performance of theoretical distributed algorithms, as well as for
implementations of such algorithms.
Measurement methods/tools As a real distributed system is a complex entity
and has to deal with all the difficulties that arise in measuring performance
over a WAN/the Internet, appropriate methodologies and tools must be
developed for measuring the performance metrics.
Applications of distributed computing and newer challenges
• Mobile systems
Mobile systems typically use wireless communication which is based on
electromagnetic waves and utilizes a shared broadcast medium.
• many issues such as range of transmission and power of transmission come
into play, besides various engineering issues such as battery power
conservation, interfacing with the wired Internet, signal processing and
interference.
• There are two popular architectures for a mobile network. The first is the base-
station approach, also known as the cellular approach, wherein a cell which
is the geographical region within range of a static but powerful base
transmission station is associated with that base station. All mobile processes
Course Instructor: Dr. Suja A. Alex, AP/IT Page 20
CS3551 – Distributed Systems
in that cell communicate with the rest of the system via the base station. The
second approach is the ad-hoc network approach where there is no base
station (which essentially acted as a centralized node for its cell). All
responsibility for communication is distributed among the mobile nodes,
wherein mobile nodes have to participate in routing by forwarding packets of
other pairs of communicating nodes.
• Sensor networks:
A sensor is a processor with an electro-mechanical interface that is capable of
sensing physical parameters, such as temperature, velocity, pressure,
humidity, and chemicals. Recent developments in cost-effective hardware
technology have made it possible to deploy very large (of the order of 106 or
higher) low-cost sensors. An important paradigm for monitoring distributed
events is that of event streaming.
Sensors may be mobile or static; sensors may communicate wirelessly,
although they may also communicate across a wire when they are statically
installed. Sensors may have to self-configure to form an ad-hoc network,
which introduces a whole new set of challenges, such as position estimation
and time estimation.
Ubiquitous or pervasive computing
Ubiquitous systems represent a class of computing where the processors
embedded in and seamlessly pervading through the environment perform
application functions in the background, much like in sci-fi movies.
o E.g., intelligent home, smart workplace
The processors may be connected to more powerful networks and processing
resources in the background for processing and collating data.
• Peer-to-peer computing
o Peer-to-peer (P2P) computing represents computing over an
application layer network wherein all interactions among the processors
are at a “peer” level.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 21
CS3551 – Distributed Systems
o Some of the key challenges include: object storage mechanisms,
efficient object lookup, and retrieval in a scalable manner; dynamic
reconfiguration with nodes as well as objects joining and leaving the
network randomly; replication strategies to expedite object search;
tradeoffs between object size latency and table sizes; anonymity,
privacy, and security.
• Publish/subscribe, content distribution
o In a dynamic environment where the information constantly fluctuates
(varying stock prices is a typical example), there needs to be:
(i) an efficient mechanism for distributing this information (publish),
(ii) an efficient mechanism to allow end users to indicate interest in
receiving specific kinds of information (subscribe), and
(iii) an efficient mechanism for aggregating large volumes of published
information and filtering it as per the user’s subscription filter.
o Content distribution refers to a class of mechanisms, primarily in the
web and P2P computing context, whereby specific information which
can be broadly characterized by a set of parameters is to be distributed
to interested processes.
• Distributed agents
Agents are software processes or robots that can move around the system to
do specific tasks for which they are specially programmed.
Challenges in distributed agent systems include coordination mechanisms
among the agents, controlling the mobility of the agents, and their software
design and interfaces. Research in agents is inter-disciplinary: spanning
artificial intelligence, mobile computing, economic market models, software
engineering, and distributed computing.
• Distributed data mining
o Data mining algorithms examine large amounts of data to detect patterns and
trends in the data, to mine or extract useful information.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 22
CS3551 – Distributed Systems
A traditional example is: examining the purchasing patterns of customers in
order to profile the customers and enhance the efficacy of directed marketing
schemes. The mining can be done by applying database and artificial intelligence
techniques to a data repository. In many situations, the data is necessarily
distributed and cannot be collected in a single repository, as in banking
applications where the data is private and sensitive, or in atmospheric weather
prediction where the data sets are far too massive to collect and process at a
single repository in real-time. In such cases, efficient distributed data mining
algorithms are required.
Grid computing
• Many challenges in making grid computing a reality include:
o scheduling jobs in such a distributed environment,
o a framework for implementing quality of service and real-time
guarantees,
o security of individual machines as well as of jobs being
executed in this setting.
• Security
o The traditional challenges of security in a distributed setting include:
confidentiality (ensuring that only authorized processes can access certain
information),
o authentication (ensuring the source of received information and the
identity of the sending process), and
o availability (maintaining allowed access to services despite malicious
actions).
o The goal is to meet these challenges with efficient and scalable solutions.
Issues: e.g., Lack of trust, broadcast media, resource-constrained, lack of structure
---------------------------------------------------------------------------------------------
A Model of Distributed Computations
Course Instructor: Dr. Suja A. Alex, AP/IT Page 23
CS3551 – Distributed Systems
• A distributed system consists of a set of processors that are connected by a
communication network.
• The processors do not share a common global memory and communicate
solely by passing messages over the communication network.
• The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.
• A distributed application runs as a collection of processes on a distributed
system.
A Distributed Program
o A distributed program is composed of a set of n asynchronous processes, p1,
p2, ..., pi , ..., pn.
o The processes do not share a global memory and communicate solely by
passing messages.
o The processes do not share a global clock that is instantaneously accessible
to these processes.
o Process execution and message transfer are asynchronous.
o Without loss of generality, we assume that each process is running on a
different processor.
o Let Cij denote the channel from process pi to process pj and let mij denote a
message sent by pi to pj.
o The message transmission delay is finite and unpredictable.
A Model of Distributed Executions
o The execution of a process consists of a sequential execution of its actions.
o The actions are atomic and the actions of a process are modeled as three types
of events, namely, internal events, message send events, and message
receive events.
o Let ex denote the x th event at process pi .
o For a message m, let send (m) and rec (m) denote its send and receive events,
respectively.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 24
CS3551 – Distributed Systems
o The occurrence of events changes the states of respective processes and
channels.
o An internal event changes the state of the process at which it occurs.
o A send event changes the state of the process that sends the message and the
state of the channel on which the message is sent.
o A receive event changes the state of the process that receives the message and
the state of the channel on which the message is received.
o The events at a process are linearly ordered by their order of occurrence.
o The execution of process pi produces a sequence of events e1, e2, ..., ex ,
o ex+1, ... and is denoted by Hi where
o Hi = (hi , →i )
o hi is the set of events produced by pi and
o binary relation →i defines a linear order on these events.
o Relation →i expresses causal dependencies among the events of pi .
o The send and the receive events signify the flow of information between
processes and establish causal dependency from the sender process to the
receiver process.
o A relation →msg that captures the causal dependency due to message
exchange, is defined as follows. For every message m that is exchanged
between two processes, we have
o send(m) →msg rec (m).
o Relation →msg defines causal dependencies between the pairs of
corresponding send and receive events.
o The evolution of a distributed execution is depicted by a space-time diagram.
o A horizontal line represents the progress of the process; a dot indicates an
event; a slant arrow indicates a message transfer.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 25
CS3551 – Distributed Systems
o Since we assume that an event execution is atomic (hence, indivisible and
instantaneous), it is justified to denote it as a dot on a process line.
o In the Figure 2.1, for process p1, the second event is a message send event,
the third event is an internal event, and the fourth event is a message receive
event.
Figure 2.1: The space-time diagram of a distributed execution.
Causal Precedence Relation
The execution of a distributed application results in a set of distributed events
produced by the processes.
Let H=∪i hi denote the set of events executed in a distributed computation.
Define a binary relation → on the set H as follows that expresses causal
dependencies between events in the distributed execution.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 26
CS3551 – Distributed Systems
The causal precedence relation induces an irreflexive partial order on the events of
a distributed computation that is denoted as H=(H, →).
Note that the relation → is nothing but Lamport’s “happens before” relation.
For any two events eiand ej, if ei→ ej, then event ejis directly or transitively
dependent on event ei. (Graphically, it means that there exists a path consisting of
message arrows and process-line segments (along increasing time) in the space-time
diagram that starts at eiand ends at ej.)
For example, in Figure 2.1, e11 →e33 and e33 → e26.
The relation → denotes flow of information in a distributed computation and ei→
ejdictates that all the information available at eiis potentially accessible at ej.
For example, in Figure 2.1, event e6 has the knowledge of all other events shown in
the figure.
For any two events eiand ej,eiƒ→ ejdenotes the fact that event ejdoes not directly or
transitively dependent on event ei. That is, event eidoes not causally affect event ej.
In this case, event ejis not aware of the execution of eior any event executed after
eion the same process.
For example, in Figure 2.1, e3 ƒ→ e3 and e4 ƒ→ e1.3
Note the following two rules:
For any two events eiand ej,eiƒ→ ejƒ⇒ejƒ→ ei.
For any two events eiand ej,ei→ ej⇒ejƒ→ ei.
Concurrent events
For any two events eiand ej, if eiƒ→ ejand ejƒ→ ei,
then events eiand ejare said to be concurrent (denoted as eiǁ ej).
In the execution of Figure 2.1, e13 ǁ e33 and e24 ǁ e31.
The relation ǁ is not transitive; that is, (eiǁ ej)∧ (ejǁ ek) ƒ⇒eiǁ ek.
For example, in Figure 2.1, e33 ǁe24 and e24 ǁ e15, however, e33 ƒǁe15.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 27
CS3551 – Distributed Systems
For any two events eiand ejin a distributed execution,
ei→ ejor ej→ ei, or eiǁ ej.
Logical vs. Physical Concurrency
In a distributed computation, two events are logically concurrent if and only if they
do not causally affect each other.
Physical concurrency, on the other hand, has a connotation that the events occur at
the same instant in physical time.
Two or more events may be logically concurrent even though they do not occur at
the same instant in physical time.
However, if processor speed and message delays would have been different, the
execution of these events could have very well coincided in physical time.
Whether a set of logically concurrent events coincide in the physical time or not,
does not change the outcome of the computation.
Therefore, even though a set of logically concurrent events may not have occurred
at the same instant in physical time, we can assume that these events occured at the
same instant in physical time.
Models of Communication Networks
There are several models of the service provided by communication networks,
namely, FIFO, Non-FIFO, and causal ordering.
In the FIFO model, each channel acts as a first-in first-out message queue and thus,
message ordering is preserved by a channel.
In the non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random order.
The “causal ordering” model is based on Lamport’s “happens before” relation.
A system that supports the causal ordering model satisfies the following property:
Course Instructor: Dr. Suja A. Alex, AP/IT Page 28
CS3551 – Distributed Systems
CO: For any two messages mij and mkj, if send (mij) −→send(mkj), then
rec (mij) −→ rec (mkj).
This property ensures that causally related messages destined to the same
destination are delivered in an order that is consistent with their causality relation.
Causally ordered delivery of messages implies FIFO message delivery. (Note that
CO ⊂ FIFO ⊂ Non-FIFO.)
Causal ordering model considerably simplifies the design of distributed algorithms
because it provides a built-in synchronization.
Global State of a Distributed System
“A collection of the local states of its components, namely, the processes and
the communication channels.”
The state of a process is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application.
The state of channel is given by the set of messages in transit in the channel.
The occurrence of events changes the states of respective processes and channels.
An internal event changes the state of the process at which it occurs.
A send event changes the state of the process that sends the message and the state
of the channel on which the message is sent.
A receive event changes the state of the process that or receives the message and
the state of the channel on which the message is received.
Notations
Course Instructor: Dr. Suja A. Alex, AP/IT Page 29
CS3551 – Distributed Systems
A Channel State
The state of a channel depends upon the states of the processes it connects.
The state of a channel is defined as follows:
Global State
The global state of a distributed system is a collection of the local states of the
processes and the channels.
Notationally, global state GS is defined as,
For a global state to be meaningful, the states of all the components of the distributed
system must be recorded at the same instant.
This will be possible if the local clocks at processes were perfectly synchronized or
if there were a global system clock that can be instantaneously read by the processes.
(However, both are impossible.)
A Consistent Global State
Course Instructor: Dr. Suja A. Alex, AP/IT Page 30
CS3551 – Distributed Systems
Even if the state of all the components is not recorded at the same instant, such a
state will be meaningful provided every message that is recorded as received is also
recorded as sent.
Basic idea is that a state should not violate causality – an effect should not be present
without its cause. A message cannot be received if it was not sent.
Such states are called consistent global states and are meaningful global states.
Inconsistent global states are not meaningful in the sense that a distributed system
can never be in an inconsistent state.
An Example
Consider the distributed execution of Figure 2.2.
Figure 2.2: The space-time diagram of a distributed execution.
In Figure 2.2:
A global state is inconsistent
Course Instructor: Dr. Suja A. Alex, AP/IT Page 31
CS3551 – Distributed Systems
because the state of p2 has recorded the receipt of message m12, however, the state of
p1 has not recorded its send.
A global state GS2 consisting of local states
is consistent; all the channels are empty except C21 that
Contains message m21.
is transitless iff
all channels are recorded as empty in a transitless global state.
Cuts of a Distributed Computation
“In the space-time diagram of a distributed computation, a cut is a zigzag line
joining one arbitrary point on each process line.”
• A cut slices the space-time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE.
• The PAST contains all the events to the left of the cut and the FUTURE
contains all the events to the right of the cut.
• For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the
PAST and FUTURE of C, respectively.
• Every cut corresponds to a global state and every global state can be
graphically represented as a cut in the computation’s space-time diagram.
• Cuts in a space-time diagram provide a powerful graphical aid in representing
and reasoning about global states of a computation.
Figure 2.3: Illustration of cuts in a distributed execution.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 32
CS3551 – Distributed Systems
• In a consistent cut, every message received in the PAST of the cut was sent
in the PAST of that cut. (In Figure 2.3, cut C2 is a consistent cut.)
• All messages that cross the cut from the PAST to the FUTURE are in transit
in the corresponding consistent global state.
• A cut is inconsistent if a message crosses the cut from the FUTURE to the
PAST. (In Figure 2.3, cut C1 is an inconsistent cut.)
Course Instructor: Dr. Suja A. Alex, AP/IT Page 33