0% found this document useful (0 votes)

20 views33 pages

Unit I Introduction - Updated

CS3551 – Distributed Systems introduces the concept of distributed systems, defining them as collections of independent entities that collaborate to solve problems. It discusses the characteristics, features, and models of distributed systems, emphasizing the differences between message-passing and shared memory systems, as well as the various communication primitives. The document also highlights the motivations for distributed systems, including resource sharing, increased performance, reliability, and scalability.

Uploaded by

sudharshan1504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views33 pages

Unit I Introduction - Updated

Uploaded by

sudharshan1504

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

CS3551 – Distributed Systems

UNIT I INTRODUCTION

Introduction: Definition-Relation to Computer System Components –

Motivation – Message -Passing Systems versus Shared Memory Systems –
Primitives for Distributed Communication – Synchronous versus
Asynchronous Executions – Design Issues and Challenges; A Model of
Distributed Computations: A Distributed Program – A Model of Distributed
Executions – Models of Communication Networks – Global State of a
Distributed System.

DEFINITION:

A distributed system is a collection of independent entities that cooperate to solve

a problem that cannot be individually solved.

Autonomous processors communicating over a communication network. Some

characteristics:

• Crash of a computer never prevents doing work.

• Communicate by a messages passing over a communication network.

• Cooperate to address a problem collectively.

Features:

• No common physical clock: It introduces the element of “distribution” in the

system and gives rise to the inherent asynchrony amongst the processors.

• No shared memory: It may be noted that a distributed system may still

provide the abstractionof a common address space via the distributed shared
memory abstraction.
• Geographical separation: Recently, the network/cluster of workstations
(NOW/COW) configuration connecting processors on a LAN is also being
increasingly regarded as a small distributed system.
• Autonomy and heterogeneity: The processors are “loosely coupled”
in that they have different speeds and each can be running a different
operating system.
-------------------------------------------------------------------------------------------

Course Instructor: Dr. Suja A. Alex, AP/IT Page 1

CS3551 – Distributed Systems

RELATION TO COMPUTER SYSTEM COMPONENTS

Distributed System Model

Figure 1.1 A distributed system connects processors by a communication network.

A typical distributed system is shown in Figure 1.1. Each computer has a memory-
processing unit and the computers are connected by a communication network.

Figure 1.2 shows the relationships of the software components that run on each of
the computers and use the local operating system and network protocol stack for
functioning. The distributed software is also termed as middleware. A distributed
execution is the execution of processes across the distributed system to
collaboratively achieve a common goal. An execution is also sometimes termed a
computation or a run.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 2

CS3551 – Distributed Systems

Figure 1.2: Interaction of the software components at each process.

Figure 1.2 schematically shows the interaction of this software with these system
components at each processor. Here we assume that the middleware layer does not
contain the traditional application layer functions of the network protocol stack, such
as http, mail, ftp, and telnet. Various primitives and calls to functions defined in
various libraries of the middleware layer are embedded in the user program code.
There exist several libraries to choose from to invoke primitives for the more
common functions – such as reliable and ordered multicasting – of the middleware
layer. There are several standards such as Object Management Group’s (OMG)
common object request broker architecture (CORBA) [36], and the remote
procedure call (RPC) mechanism

--------------------------------------------------------------------------------------------

MOTIVATION

• Inherently distributed computation:

In many applications such as money transfer in banking, or reaching
consensus among parties that are geographically distant, the computation is
inherently distributed.
• Resource sharing:
Resources such as peripherals, complete data sets in databases, special
libraries, as well as data (variable/files) cannot be fully replicated at all the
sites because it is often neither practical nor cost-effective. For example,
distributed databases such as DB2 partition the data sets across several
servers.
• Access to remote resources :
The data cannot be replicated at every site participating in the distributed
execution because it may be too large or too sensitive to be replicated. For
example, payroll data within a multinational corporation is both too large and
too sensitive to be replicated at every branch office/site. It is therefore stored
at a central server which can be queried by branch offices
• Increased performance/cost ratio Reliability:

Course Instructor: Dr. Suja A. Alex, AP/IT Page 3

CS3551 – Distributed Systems

By resource sharing and accessing geographically remote data and resources,

the performance/cost ratio is increased. Any task can be partitioned across the
various computers in the distributed system.

• availability, integrity, fault-tolerance:

A distributed system has the inherent potential to provide increased reliability
because of the possibility of replicating resources and executions.
availability, i.e., the resource should be accessible at all times;
integrity, i.e., the value/state of the resource should be correct, in the
face of concurrent access from multiple processors, as per the semantics
expected by the application;
fault-tolerance, i.e., the ability to recover from system failures
• Scalability:
As the processors are usually connected by a wide-area network, adding more
processors does not pose a direct bottleneck for the communication network.
• Modularity and incremental expandability:
Heterogeneous processors May be easily added into the system without
affecting the performance, as long as those processors are running the same
middleware algorithms. Similarly, existing processors may be easily replaced
by other processors.
--------------------------------------------------------------------------------------

MESSAGE-PASSING SYSTEMS VERSUS SHARED MEMORY SYSTEMS

Shared memory systems are those in which there is a (common) shared

address space throughout the system. Communication among processors takes place
via shared data variables, and control variables for synchronization among the
processors. Semaphores and monitors that were originally designed for shared
memory uniprocessors and multiprocessors are examples of how synchronization
can be achieved in shared memory systems. All multicomputer (Non Uniform
Memory Access [NUMA] as well as message-passing) systems that do not have a
shared address space provided by the underlying architecture and hardware
necessarily communicate by message passing.

The abstraction called shared memory is sometimes provided to simulate a

shared address space. For a distributed system, this abstraction is called distributed
Course Instructor: Dr. Suja A. Alex, AP/IT Page 4
CS3551 – Distributed Systems

shared memory. Implementing this abstraction has a certain cost but it simplifies the
task of the application programmer. There also exists a well-known folklore result
that communication via message-passing can be simulated by communication via
shared memory and vice-versa.

Emulating MP over SM:

The shared address space can be partitioned into disjoint parts, one part being
assigned to each processor. “Send” and “receive” operations can be implemented by
writing to and reading from the destination/sender processor’s address space,
respectively.

Specifically, a separate location can be reserved as the mailbox for each ordered pair
of processes. A Pi–Pj message-passing can be emulated by a write by Pi to the
mailbox and then a read by Pj from the mailbox. In the simplest case, these
mailboxes can be assumed to have unbounded size. The write and read operations
need to be controlled using synchronization primitives to inform the receiver/sender
after the data has been sent/received.

Emulating SM over MP:

This involves the use of “send” and “receive” operations for “write” and “read”
operations. Each shared location can be modeled as a separate process; “write” to a
shared location is emulated by sending an update message to the corresponding
owner process; a “read” to a shared location is emulated by sending a query message
to the owner process. As accessing another processor’s memory requires send and
receive operations, this emulation is expensive. Although emulating shared memory
might seem to be more attractive from a programmer’s perspective, it must be
remembered that in a distributed system, it is only an abstraction. Thus, the latencies
involved in read and write operations may be high even when using shared memory
emulation because the read and write operations are implemented by using network-
wide communication under the covers.

In a MIMD message-passing multicomputer system, each “processor” may be a

tightly coupled multiprocessor system with shared memory. Within the
multiprocessor system, the processors communicate via shared memory. Between
two computers, the communication is by message passing.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 5
CS3551 – Distributed Systems

--------------------------------------------------------------------------------------------------------------------------------------------

PRIMITIVES FOR DISTRIBUTED COMMUNICATION

Blocking/non-blocking, synchronous/asynchronous primitives:

• Send() and Receive(),Message send and message receive communication
primitives are denoted Send() and Receive(), respectively.

A Send primitive has at least two parameters

o the destination, and

o the buffer in the user space, containing the data to be sent.

Receive primitive has at least two parameters –

o The source from which the data is to be received (this could be a

wildcard), and
o the user buffer into which the data is to be received.

There are two ways of sending data when the Send primitive is invoked –the buffered
option and the unbuffered option.
• The buffered option which is the standard option copies the data from the
user buffer to the kernel [Link] data later gets copied from the kernel
buffer onto the network.
• In the unbuffered option, the data gets copied directly from the user buffer
onto the network. For the Receive primitive, the buffered option is usually
required because the data may already have arrived when the primitive is
invoked, and needs a storage place in the kernel.

Synchronous (send/receive)

• Handshake between sender and receiver

• Send completes when Receive completes
• Receive completes when data copied into buffer

Asynchronous (send)

• Control returns to process when data copied out of user-specified buffer

Course Instructor: Dr. Suja A. Alex, AP/IT Page 6
CS3551 – Distributed Systems

Blocking (send/receive)

• Control returns to invoking process after processing of primitive (whether

sync or async) completes

Nonblocking (send/receive)

• Control returns to process immediately after invocation

• Send: control returns to the process even before data copied out of user buffer
• Receive: control returns to the process even before data may have arrived
from sender

Non-blocking Primitive

Figure 1.7: A nonblocking send primitive. When the Wait call returns, at least one
of its parameters is posted.

If at the time that Wait() is issued, the processing for the primitive (whether
synchronous or asynchronous) has completed, the Wait returns immediately. The
completion of the processing of the primitive is detectable by checking the value of
handlek. If the processing of the primitive has not completed, the Wait blocks and
waits for a signal to wake it up. When the processing for the primitive completes,
the communication subsystem software sets the value of handlek and wakes up
(signals) any process with a Wait call blocked on this handlek. This is called posting
the completion of the operation.

Blocking/nonblocking; Synchronous/asynchronous; send/receive primities

 Send primitive

◦ synchronous blocking,

◦ synchronous non-blocking,

Course Instructor: Dr. Suja A. Alex, AP/IT Page 7

CS3551 – Distributed Systems

◦ asynchronous blocking, and

◦ Asynchronous non-blocking.

 Receive primitive

◦ the blocking synchronous

◦ non-blocking synchronous versions.

Timing diagram

 Blocking synchronous Send

 non-blocking synchronous Send

 Blocking asynchronous Send

 non-blocking asynchronous Send

 Blocking Receive

 non-blocking Receive

Blocking synchronous Send: The data gets copied from the user buffer to the kernel
buffer and is then sent over the network. After the data is copied to the receiver’s
system buffer and a Receive call has been issued, an acknowledgement back to the
sender causes control to return to the process that invoked the Send operation and
completes the Send.

Non-blocking synchronous Send: Control returns back to the invoking process as

soon as the copy of data from the user buffer to the kernel buffer is initiated. A
parameter in the non-blocking call also gets set with the handle of a location that the
user process can later check for the completion of the synchronous send operation.
The location gets posted after an acknowledgement returns from the receiver. The
user process can keep checking for the completion of the non-blocking synchronous
Send by testing the returned handle, or it can invoke the blocking Wait operation on
the returned handle.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 8

CS3551 – Distributed Systems

Figure 1.8 Blocking/non-blocking and synchronous/asynchronous primitives [12].

Process Pi is sending and process Pj is receiving. (a) Blocking synchronous Send
and blocking (synchronous) Receive. (b) Non-blocking synchronous Send and
nonblocking (synchronous) Receive. (c) Blocking asynchronous Send. (d) Non-
blocking asynchronous Send.

Blocking asynchronous Send: The user process that invokes the Send is blocked
until the data is copied from the user’s buffer to the kernel buffer.

Non-blocking asynchronous Send: The user process that invokes the Send is
blocked until the transfer of the data from the user’s buffer to the kernel buffer is
initiated. Control returns to the user process as soon as this transfer is initiated, and
Course Instructor: Dr. Suja A. Alex, AP/IT Page 9
CS3551 – Distributed Systems

a parameter in the non-blocking callalso gets set with the handle of a location that
the user process can check later using the Wait operation for the completion of the
asynchronous Send operation.

Blocking Receive: The Receive call blocks until the data expected arrives and is
written in the specified user buffer.

Non-blocking Receive: The Receive call will cause the kernel to register the call
and return the handle of a location that the user process can later check for the
completion of the non-blocking Receive operation. The user process can check for
the completion of the non-blocking Receive by invoking the Wait operation on the
returned handle.

Processor synchrony

• Processor synchrony indicates that all the processors execute in lock-step

with their clocks synchronized.

This abstraction is implemented using some form of barrier synchronization to

ensure that no processor begins executing the next step of code until all the
processors have completed executing the previous steps of code assigned to each of
the processors.

Libraries and standards

• software products (banking, payroll, etc., applications) use proprietary

primitive libraries

• The message-passing interface (MPI) library and the PVM (parallel

virtual machine) library

• socket primitives or socket-like transport layer primitives are invoked to

call the procedure remotely.

• software, libraries for remote method invocation (RMI) and remote object
invocation (ROI).

• CORBA (common object request broker architecture)

• DCOM (distributed component object model)

Course Instructor: Dr. Suja A. Alex, AP/IT Page 10

CS3551 – Distributed Systems

---------------------------------------------------------------------------------------------------

SYNCHRONOUS VERSUS ASYNCHRONOUS EXECUTIONS

➢ asynchronous execution

◦ no processor synchrony

◦ no bound on the drift rate of processor clocks,

◦ message delays (transmission + propagation times) are finite

◦ no upper bound on the time taken by a process to execute a step

Figure 1.9 An example of an asynchronous execution in a message-passing

system.

A timing diagram is used to illustrate the execution. The arrows denote the
messages; the tail and head of an arrow mark the send and receive event for that
message, denoted by a circle and vertical line,

➢ synchronous execution
• processors are synchronized
• clock drift rate between any two processors is bounded.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 11
CS3551 – Distributed Systems

• message delivery (transmission + delivery) times are such that they occur
in one logical step or round.
• known upper bound on the time taken by a process to execute a step.

Figure 1.10 An example of a synchronous execution in a message-passing system.

All the messages sent in a round are received within that same round. The arrows
denote the messages.

 Emulating an asynchronous system by a synchronous system (A→S)

An asynchronous program (written for an asynchronous system) can be

emulated on a synchronous system fairly trivially as the synchronous system
is a special case of an asynchronous system – all communication finishes
within the same round in which it is initiated.

 Emulating a synchronous system by an asynchronous system (S →A)

A synchronous program (written for a synchronous system) can be emulated

on an asynchronous system using a tool called synchronizer,

System Emulations

Using the emulations shown, any class can be emulated by any other. If
system A can be emulated by system B, denoted A/B, and if a problem is not solvable
in B, then it is also not solvable in A. Likewise, if a problem is solvable in A, it is
also solvable in B. Hence, in a sense, all four classes are equivalent in terms of
“computability” – what can and cannot be computed – in failure-free systems.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 12

CS3551 – Distributed Systems

Figure 1.11 Emulations among the principal system classes in a failure-free system.

---------------------------------------------------------------------------------------------

DESIGN ISSUES AND CHALLENGES

The primary issues in the design of the distributed systems included providing access
to remote data in the face of failures, file system design, and directory structure
design.

Important design issues and challenges:

(i) having a greater component related to systems design and operating

systems design

(ii) having a greater component related to algorithm design,

(iii) Emerging from recent technology advances and/or driven by new

applications.

Challenges: System Perspective

• Communication mechanisms: This task involves designing appropriate

mechanisms for communication among the processes in the network E.g.,
Remote Procedure Call (RPC), remote object invocation (ROI), message-
oriented vs. stream-oriented communication.

• Processes: Code migration, process/thread management at clients and servers,

design of software and mobile agents.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 13

CS3551 – Distributed Systems

Naming: Easy to use identifiers needed to locate resources and processes

transparently and scalable. Naming in mobile systems provides additional challenges

Synchronization: Mechanisms for synchronization or coordination among the

processes are essential Eg: Mutual exclusion.

Data storage and access

• Schemes for data storage, and implicitly for accessing the data in a fast and
scalable manner across the network are important for efficiency.
• Revisit file system design.

Consistency and replication

To avoid bottlenecks, to provide fast access to data, and to provide scalability,

replication of data objects is highly desirable. This leads to issues of managing the
replicas, and dealing with consistency among the replicas/caches in a distributed
setting.

Fault-tolerance: Fault tolerance requires maintaining correct and efficient operation

in spite of any failures of links, nodes, and processes. Process resilience, reliable
communication, distributed commit, checkpointing and recovery, agreement and
consensus, failure detection, and self-stabilization are some of the mechanisms to
provide fault-tolerance.

Distributed systems security

Distributed systems security involves various aspects of cryptography,secure

channels, access control, key management – generation and distribution,
authorization, and secure group management.

Scalability and modularity of algorithms, data, services

The algorithms, data (objects), and services must be as distributed as possible.

Various techniques such as replication, caching and cache management, and
asynchronous processing help to achieve scalability.

Some experimental systems: Globe, Globus, Grid

Course Instructor: Dr. Suja A. Alex, AP/IT Page 14

CS3551 – Distributed Systems

API for communications, services: The API for communication and other
specialized services is important for the ease of use and wider adoption of the
distributed systems services by non-technical users.

Transparency: hiding implementation policies from user

• Access: hide differences in data rep across systems, provide uniform

operations to access resources
• Location: locations of resources are transparent
• Migration: relocate resources without renaming
• Relocation: relocate resources as they are being accessed
• Replication: hide replication from the users
• Concurrency: mask the use of shared resources
• Failure: reliable and fault-tolerant operation.

Algorithmic challenges in distributed computing:

• Useful execution models and frameworks: to reason with and design correct
distributed programs

o Interleaving model
o Partial order model
o Input/Output automata
o Temporal Logic of Actions
- Useful for operational reasoning and the design of distributed algorithms.
- provide different degrees of infrastructure for reasoning more formally
with and proving the correctness of distributed programs.

• Dynamic distributed graph algorithms and routing algorithms

o System topology: distributed graph, with only local neighborhood

knowledge
o Graph algorithms: building blocks for group communication, data
dissemination, object location, object search functions.
o Algorithms need to deal with dynamically changing graphs
o Algorithm efficiency: also impacts resource consumption, latency,
traffic,congestion in the network.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 15

CS3551 – Distributed Systems

Time and global state

• The challenges pertain to providing accurate physical time, and to providing

a variant of time, called logical time. Logical time captures inter-process
dependencies and tracks relative time progress at each process.
• Global state observation: Due to the inherent distributed nature of the
system, it is not possible for any one process to directly observe a meaningful
global state across all the processes.
• Concurrency measures: concurrency depends on program logic, execution
speeds within logical threads, communication speeds

Synchronization/coordination mechanisms

Synchronization is essential for the distributed processes to overcome the limited

observation of the system state from the viewpoint of any one process.

• Physical clock synchronization: Physical clocks usually diverge in their

values due to hardware limitations.
• Leader election: All the processes need to agree on which process will
play the role of a distinguished process – called a leader process.
• Mutual exclusion :access to the critical resource(s) has to be coordinated.
• Distributed deadlock detection and resolution: Deadlock detection should
be coordinated to avoid duplicate work, and deadlock resolution should
be coordinated to avoid unnecessary aborts of processes.
• Termination detection: This requires cooperation among the processes
to detect the specific global state of quiescence.
• Garbage collection: garbage requires coordination among the processes.

Group communication, multicast, and ordered message delivery

• Group: A group is a collection of processes that share a common context and

collaborate on a common task within an application domain. Specific
algorithms need to be designed to enable efficient group communication and
group management wherein processes can join and leave groups dynamically,
or even fail.

Monitoring distributed events and predicates

Course Instructor: Dr. Suja A. Alex, AP/IT Page 16

CS3551 – Distributed Systems

• Predicates defined on program variables that are local to different processes

are used for specifying conditions on the global system state, and are useful
for applications such as debugging, sensing the environment, and in industrial
process control.

• An important paradigm for monitoring distributed events is that of event

streaming, wherein streams of relevant events reported from different
processes are examined collectively to detect predicates.

Distributed program design and verification tools

Debugging distributed programs

Debugging sequential programs is hard; debugging distributed programs is that

much harder because of the concurrency in actions and the ensuing uncertainty due
to the large number of possible executions defined by the interleaved concurrent
actions.

Data replication, consistency models, and caching

Managing such replicas in the face of updates introduces the problems of ensuring
consistency among the replicas and cached copies. Additionally, placement of the
replicas in the systems is also a challenge because resources usually cannot be freely
replicated.

World Wide Web design: caching, searching, scheduling

The Web is an example of a widespread distributed system with a direct interface to

the end user, wherein the operations are predominantly read-intensive on most
objects. Object search and navigation on the web are important functions in the
operation of the web, and are very resource-intensive. Designing mechanisms to do
this efficiently and accurately is a great challenge.

Distributed shared memory abstraction

In the middleware layer, the abstraction of a shared address space has to be

implemented by using message-passing. Hence, in terms of overheads, the shared
memory abstraction is not less expensive.

• Wait-free algorithm design: Wait-freedom, which can be informally defined

Course Instructor: Dr. Suja A. Alex, AP/IT Page 17
CS3551 – Distributed Systems

as the ability of a process to complete its execution irrespective of the actions of

other processes, gained prominence in the design of algorithms to control acccess
to shared resources in the shared memory abstraction. While wait-free algorithms
are highly desirable, they are also expensive, and designing low overhead wait-
free algorithms is a challenge.

Mutual exclusion

• Bakery algorithm, semaphores, based on atomic hardware primitives, fast

algorithms when contention-free access

In light of promising and emerging technologies of tomorrow – such as bio-

computing and quantum computing – that can alter the present foundations of
computer “hardware” design. Register constructions deals with the design of
registers from scratch, with very weak assumptions on the accesses allowed to a
register. This field forms a foundation for future architectures that allow concurrent
access even to primitive units of memory (independent of technology) without any
restrictions on the concurrency permitted.

Consistency models:

For multiple copies of a variable/object, varying degrees of consistency among the

replicas can be allowed. Clearly, a strict definition of consistency (such as in a
uniprocessor system) would be expensive to implement in terms of high latency,
high message overhead, and low concurrency.

Reliable and fault-tolerant distributed systems

• Consensus algorithms: Consensus algorithms allow correctly functioning

processes to reach agreement among themselves in spite of the existence of
some malicious (adversarial) processes whose identities are not known to
the correctly functioning processes.
• Replication and replica management
A classical method of providing fault-tolerance. The triple modular
redundancy (TMR) technique has long been used in software as well
as hardware installations.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 18

CS3551 – Distributed Systems

• Distributed databases, commit: For distributed databases, the traditional

properties of the transaction (A.C.I.D. – atomicity, consistency, isolation,
durability) need to be preserved in the distributed setting.
• Self-stabilizing systems: A self-stabilizing algorithm is any algorithm that is
guaranteed to eventually take the system to a good state even if a bad state
were to arise due to some error. Self-stabilizing algorithms require some in-
built redundancy to track additional variables of the state and do extra
[Link] efficient self-stabilizing algorithms is a challenge.
• Checkpointing and recovery algorithms: roll back and restart from earlier
”saved” state. Checkpointing in a distributed environment is difficult because
if the checkpoints at the different processes are not coordinated, the local
checkpoints may become useless because they are inconsistent with the
checkpoints at other processes.

• Failure detectors:

Failure detectors represent a class of algorithms that probabilistically suspect

another process as having failed (such as after timing out after non-receipt of
a message for some time), and then converge on a determination of the
up/down status of the suspected process.

• Load balancing:

Load balancing may be necessary because of a variety of factors such as high

network traffic or high request rate causing the network connection to be a
bottleneck, or high computational load.

The following are some forms of load balancing:

• Data migration: The ability to move data (which may be replicated) around
in the system, based on the access pattern of the users.

• Computation migration: The ability to relocate processes in order to

perform a redistribution of the workload.

• Distributed scheduling: This achieves a better turnaround time for the users
by using idle processing power in the system more efficiently.

• Real-time scheduling:
Course Instructor: Dr. Suja A. Alex, AP/IT Page 19
CS3551 – Distributed Systems

Real-time scheduling is important for mission-critical applications, to

accomplish the task execution on schedule. The problem becomes more
challenging in a distributed system where a global view of the system state is
absent. On-line or dynamic changes to the schedule are also harder to make
without a global view of the state.

• Performance modeling and analysis:

In large distributed systems, network latency (propagation and transmission

times) and access to shared resources can lead to large delays which must be
minimized.

The following are some example issues arise in determining the performance:

Metrics Appropriate metrics must be defined or identified for measuring the

performance of theoretical distributed algorithms, as well as for
implementations of such algorithms.

Measurement methods/tools As a real distributed system is a complex entity

and has to deal with all the difficulties that arise in measuring performance
over a WAN/the Internet, appropriate methodologies and tools must be
developed for measuring the performance metrics.

Applications of distributed computing and newer challenges

• Mobile systems

Mobile systems typically use wireless communication which is based on

electromagnetic waves and utilizes a shared broadcast medium.

• many issues such as range of transmission and power of transmission come

into play, besides various engineering issues such as battery power
conservation, interfacing with the wired Internet, signal processing and
interference.

• There are two popular architectures for a mobile network. The first is the base-
station approach, also known as the cellular approach, wherein a cell which
is the geographical region within range of a static but powerful base
transmission station is associated with that base station. All mobile processes

Course Instructor: Dr. Suja A. Alex, AP/IT Page 20

CS3551 – Distributed Systems

in that cell communicate with the rest of the system via the base station. The
second approach is the ad-hoc network approach where there is no base
station (which essentially acted as a centralized node for its cell). All
responsibility for communication is distributed among the mobile nodes,
wherein mobile nodes have to participate in routing by forwarding packets of
other pairs of communicating nodes.

• Sensor networks:

A sensor is a processor with an electro-mechanical interface that is capable of

sensing physical parameters, such as temperature, velocity, pressure,
humidity, and chemicals. Recent developments in cost-effective hardware
technology have made it possible to deploy very large (of the order of 106 or
higher) low-cost sensors. An important paradigm for monitoring distributed
events is that of event streaming.

Sensors may be mobile or static; sensors may communicate wirelessly,

although they may also communicate across a wire when they are statically
installed. Sensors may have to self-configure to form an ad-hoc network,
which introduces a whole new set of challenges, such as position estimation
and time estimation.

Ubiquitous or pervasive computing

Ubiquitous systems represent a class of computing where the processors

embedded in and seamlessly pervading through the environment perform
application functions in the background, much like in sci-fi movies.

o E.g., intelligent home, smart workplace

The processors may be connected to more powerful networks and processing

resources in the background for processing and collating data.

• Peer-to-peer computing

o Peer-to-peer (P2P) computing represents computing over an

application layer network wherein all interactions among the processors
are at a “peer” level.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 21

CS3551 – Distributed Systems

o Some of the key challenges include: object storage mechanisms,

efficient object lookup, and retrieval in a scalable manner; dynamic
reconfiguration with nodes as well as objects joining and leaving the
network randomly; replication strategies to expedite object search;
tradeoffs between object size latency and table sizes; anonymity,
privacy, and security.

• Publish/subscribe, content distribution

o In a dynamic environment where the information constantly fluctuates

(varying stock prices is a typical example), there needs to be:

(i) an efficient mechanism for distributing this information (publish),

(ii) an efficient mechanism to allow end users to indicate interest in

receiving specific kinds of information (subscribe), and

(iii) an efficient mechanism for aggregating large volumes of published

information and filtering it as per the user’s subscription filter.

o Content distribution refers to a class of mechanisms, primarily in the

web and P2P computing context, whereby specific information which
can be broadly characterized by a set of parameters is to be distributed
to interested processes.

• Distributed agents

Agents are software processes or robots that can move around the system to
do specific tasks for which they are specially programmed.

Challenges in distributed agent systems include coordination mechanisms

among the agents, controlling the mobility of the agents, and their software
design and interfaces. Research in agents is inter-disciplinary: spanning
artificial intelligence, mobile computing, economic market models, software
engineering, and distributed computing.

• Distributed data mining

o Data mining algorithms examine large amounts of data to detect patterns and
trends in the data, to mine or extract useful information.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 22
CS3551 – Distributed Systems

A traditional example is: examining the purchasing patterns of customers in

order to profile the customers and enhance the efficacy of directed marketing
schemes. The mining can be done by applying database and artificial intelligence
techniques to a data repository. In many situations, the data is necessarily
distributed and cannot be collected in a single repository, as in banking
applications where the data is private and sensitive, or in atmospheric weather
prediction where the data sets are far too massive to collect and process at a
single repository in real-time. In such cases, efficient distributed data mining
algorithms are required.

Grid computing
• Many challenges in making grid computing a reality include:
o scheduling jobs in such a distributed environment,
o a framework for implementing quality of service and real-time
guarantees,
o security of individual machines as well as of jobs being
executed in this setting.

• Security
o The traditional challenges of security in a distributed setting include:
confidentiality (ensuring that only authorized processes can access certain
information),
o authentication (ensuring the source of received information and the
identity of the sending process), and
o availability (maintaining allowed access to services despite malicious
actions).
o The goal is to meet these challenges with efficient and scalable solutions.

Issues: e.g., Lack of trust, broadcast media, resource-constrained, lack of structure

---------------------------------------------------------------------------------------------

A Model of Distributed Computations

Course Instructor: Dr. Suja A. Alex, AP/IT Page 23

CS3551 – Distributed Systems

• A distributed system consists of a set of processors that are connected by a

communication network.

• The processors do not share a common global memory and communicate

solely by passing messages over the communication network.

• The system can be modeled as a directed graph in which vertices represent the
processes and edges represent unidirectional communication channels.

• A distributed application runs as a collection of processes on a distributed

system.

A Distributed Program

o A distributed program is composed of a set of n asynchronous processes, p1,

p2, ..., pi , ..., pn.
o The processes do not share a global memory and communicate solely by
passing messages.
o The processes do not share a global clock that is instantaneously accessible
to these processes.
o Process execution and message transfer are asynchronous.
o Without loss of generality, we assume that each process is running on a
different processor.
o Let Cij denote the channel from process pi to process pj and let mij denote a
message sent by pi to pj.
o The message transmission delay is finite and unpredictable.

A Model of Distributed Executions

o The execution of a process consists of a sequential execution of its actions.

o The actions are atomic and the actions of a process are modeled as three types
of events, namely, internal events, message send events, and message
receive events.

o Let ex denote the x th event at process pi .

o For a message m, let send (m) and rec (m) denote its send and receive events,
respectively.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 24

CS3551 – Distributed Systems

o The occurrence of events changes the states of respective processes and

channels.

o An internal event changes the state of the process at which it occurs.

o A send event changes the state of the process that sends the message and the
state of the channel on which the message is sent.

o A receive event changes the state of the process that receives the message and
the state of the channel on which the message is received.

o The events at a process are linearly ordered by their order of occurrence.

o The execution of process pi produces a sequence of events e1, e2, ..., ex ,

o ex+1, ... and is denoted by Hi where

o Hi = (hi , →i )

o hi is the set of events produced by pi and

o binary relation →i defines a linear order on these events.

o Relation →i expresses causal dependencies among the events of pi .

o The send and the receive events signify the flow of information between
processes and establish causal dependency from the sender process to the
receiver process.

o A relation →msg that captures the causal dependency due to message

exchange, is defined as follows. For every message m that is exchanged
between two processes, we have

o send(m) →msg rec (m).

o Relation →msg defines causal dependencies between the pairs of

corresponding send and receive events.

o The evolution of a distributed execution is depicted by a space-time diagram.

o A horizontal line represents the progress of the process; a dot indicates an

event; a slant arrow indicates a message transfer.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 25

CS3551 – Distributed Systems

o Since we assume that an event execution is atomic (hence, indivisible and

instantaneous), it is justified to denote it as a dot on a process line.

o In the Figure 2.1, for process p1, the second event is a message send event,
the third event is an internal event, and the fourth event is a message receive
event.

Figure 2.1: The space-time diagram of a distributed execution.

Causal Precedence Relation

The execution of a distributed application results in a set of distributed events

produced by the processes.

Let H=∪i hi denote the set of events executed in a distributed computation.

Define a binary relation → on the set H as follows that expresses causal

dependencies between events in the distributed execution.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 26

CS3551 – Distributed Systems

The causal precedence relation induces an irreflexive partial order on the events of
a distributed computation that is denoted as H=(H, →).

Note that the relation → is nothing but Lamport’s “happens before” relation.

For any two events eiand ej, if ei→ ej, then event ejis directly or transitively
dependent on event ei. (Graphically, it means that there exists a path consisting of
message arrows and process-line segments (along increasing time) in the space-time
diagram that starts at eiand ends at ej.)

For example, in Figure 2.1, e11 →e33 and e33 → e26.

The relation → denotes flow of information in a distributed computation and ei→

ejdictates that all the information available at eiis potentially accessible at ej.

For example, in Figure 2.1, event e6 has the knowledge of all other events shown in
the figure.

For any two events eiand ej,eiƒ→ ejdenotes the fact that event ejdoes not directly or
transitively dependent on event ei. That is, event eidoes not causally affect event ej.

In this case, event ejis not aware of the execution of eior any event executed after
eion the same process.

For example, in Figure 2.1, e3 ƒ→ e3 and e4 ƒ→ e1.3

Note the following two rules:

For any two events eiand ej,eiƒ→ ejƒ⇒ejƒ→ ei.

For any two events eiand ej,ei→ ej⇒ejƒ→ ei.

Concurrent events

For any two events eiand ej, if eiƒ→ ejand ejƒ→ ei,

then events eiand ejare said to be concurrent (denoted as eiǁ ej).

In the execution of Figure 2.1, e13 ǁ e33 and e24 ǁ e31.

The relation ǁ is not transitive; that is, (eiǁ ej)∧ (ejǁ ek) ƒ⇒eiǁ ek.

For example, in Figure 2.1, e33 ǁe24 and e24 ǁ e15, however, e33 ƒǁe15.
Course Instructor: Dr. Suja A. Alex, AP/IT Page 27
CS3551 – Distributed Systems

For any two events eiand ejin a distributed execution,

ei→ ejor ej→ ei, or eiǁ ej.

Logical vs. Physical Concurrency

In a distributed computation, two events are logically concurrent if and only if they
do not causally affect each other.

Physical concurrency, on the other hand, has a connotation that the events occur at
the same instant in physical time.

Two or more events may be logically concurrent even though they do not occur at
the same instant in physical time.

However, if processor speed and message delays would have been different, the
execution of these events could have very well coincided in physical time.

Whether a set of logically concurrent events coincide in the physical time or not,
does not change the outcome of the computation.

Therefore, even though a set of logically concurrent events may not have occurred
at the same instant in physical time, we can assume that these events occured at the
same instant in physical time.

Models of Communication Networks

There are several models of the service provided by communication networks,

namely, FIFO, Non-FIFO, and causal ordering.

In the FIFO model, each channel acts as a first-in first-out message queue and thus,
message ordering is preserved by a channel.

In the non-FIFO model, a channel acts like a set in which the sender process adds
messages and the receiver process removes messages from it in a random order.

The “causal ordering” model is based on Lamport’s “happens before” relation.

A system that supports the causal ordering model satisfies the following property:

Course Instructor: Dr. Suja A. Alex, AP/IT Page 28

CS3551 – Distributed Systems

CO: For any two messages mij and mkj, if send (mij) −→send(mkj), then

rec (mij) −→ rec (mkj).

This property ensures that causally related messages destined to the same
destination are delivered in an order that is consistent with their causality relation.

Causally ordered delivery of messages implies FIFO message delivery. (Note that
CO ⊂ FIFO ⊂ Non-FIFO.)

Causal ordering model considerably simplifies the design of distributed algorithms

because it provides a built-in synchronization.

Global State of a Distributed System

“A collection of the local states of its components, namely, the processes and
the communication channels.”

The state of a process is defined by the contents of processor registers, stacks, local
memory, etc. and depends on the local context of the distributed application.

The state of channel is given by the set of messages in transit in the channel.

The occurrence of events changes the states of respective processes and channels.

An internal event changes the state of the process at which it occurs.

A send event changes the state of the process that sends the message and the state
of the channel on which the message is sent.

A receive event changes the state of the process that or receives the message and
the state of the channel on which the message is received.

Notations

Course Instructor: Dr. Suja A. Alex, AP/IT Page 29

CS3551 – Distributed Systems

A Channel State

The state of a channel depends upon the states of the processes it connects.

The state of a channel is defined as follows:

Global State

The global state of a distributed system is a collection of the local states of the
processes and the channels.

Notationally, global state GS is defined as,

For a global state to be meaningful, the states of all the components of the distributed
system must be recorded at the same instant.

This will be possible if the local clocks at processes were perfectly synchronized or
if there were a global system clock that can be instantaneously read by the processes.
(However, both are impossible.)

A Consistent Global State

Course Instructor: Dr. Suja A. Alex, AP/IT Page 30

CS3551 – Distributed Systems

Even if the state of all the components is not recorded at the same instant, such a
state will be meaningful provided every message that is recorded as received is also
recorded as sent.

Basic idea is that a state should not violate causality – an effect should not be present
without its cause. A message cannot be received if it was not sent.

Such states are called consistent global states and are meaningful global states.

Inconsistent global states are not meaningful in the sense that a distributed system
can never be in an inconsistent state.

An Example

Consider the distributed execution of Figure 2.2.

Figure 2.2: The space-time diagram of a distributed execution.

In Figure 2.2:

A global state is inconsistent

Course Instructor: Dr. Suja A. Alex, AP/IT Page 31

CS3551 – Distributed Systems

because the state of p2 has recorded the receipt of message m12, however, the state of
p1 has not recorded its send.

A global state GS2 consisting of local states

is consistent; all the channels are empty except C21 that

Contains message m21.

is transitless iff

all channels are recorded as empty in a transitless global state.

Cuts of a Distributed Computation

“In the space-time diagram of a distributed computation, a cut is a zigzag line

joining one arbitrary point on each process line.”

• A cut slices the space-time diagram, and thus the set of events in the
distributed computation, into a PAST and a FUTURE.
• The PAST contains all the events to the left of the cut and the FUTURE
contains all the events to the right of the cut.
• For a cut C, let PAST(C) and FUTURE(C) denote the set of events in the
PAST and FUTURE of C, respectively.
• Every cut corresponds to a global state and every global state can be
graphically represented as a cut in the computation’s space-time diagram.
• Cuts in a space-time diagram provide a powerful graphical aid in representing
and reasoning about global states of a computation.

Figure 2.3: Illustration of cuts in a distributed execution.

Course Instructor: Dr. Suja A. Alex, AP/IT Page 32

CS3551 – Distributed Systems

• In a consistent cut, every message received in the PAST of the cut was sent
in the PAST of that cut. (In Figure 2.3, cut C2 is a consistent cut.)
• All messages that cross the cut from the PAST to the FUTURE are in transit
in the corresponding consistent global state.
• A cut is inconsistent if a message crosses the cut from the FUTURE to the
PAST. (In Figure 2.3, cut C1 is an inconsistent cut.)

Course Instructor: Dr. Suja A. Alex, AP/IT Page 33

Unit I DS
No ratings yet
Unit I DS
33 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
29 pages
Rohini 89923516027
No ratings yet
Rohini 89923516027
6 pages
DC - Unit 1 - Introduction Final
No ratings yet
DC - Unit 1 - Introduction Final
53 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
29 pages
DC - Unit 1 - Introduction
No ratings yet
DC - Unit 1 - Introduction
68 pages
1st Unit DC
No ratings yet
1st Unit DC
66 pages
CS3551 - 1 - Merged
No ratings yet
CS3551 - 1 - Merged
117 pages
Unit 1 DC
No ratings yet
Unit 1 DC
19 pages
Unit 1
No ratings yet
Unit 1
22 pages
Unit 1notes Full
No ratings yet
Unit 1notes Full
20 pages
DC - Unit 1 Complte Notes
No ratings yet
DC - Unit 1 Complte Notes
66 pages
Unit I
No ratings yet
Unit I
40 pages
DC Notes
No ratings yet
DC Notes
137 pages
cs3551 Unit 1 and 2
No ratings yet
cs3551 Unit 1 and 2
48 pages
CS3551 Unit 1 and 2
No ratings yet
CS3551 Unit 1 and 2
48 pages
Unit I Notes DC
No ratings yet
Unit I Notes DC
28 pages
Chapter 1
No ratings yet
Chapter 1
20 pages
Overview of Distributed Systems Concepts
No ratings yet
Overview of Distributed Systems Concepts
23 pages
UNIT I Notes Final
No ratings yet
UNIT I Notes Final
88 pages
Distributed Systems Course Overview
No ratings yet
Distributed Systems Course Overview
119 pages
Introduction to Distributed Systems
No ratings yet
Introduction to Distributed Systems
25 pages
Distributed-Computing Notes
No ratings yet
Distributed-Computing Notes
108 pages
Unit-2 (A)
No ratings yet
Unit-2 (A)
40 pages
Distributed Computing
No ratings yet
Distributed Computing
189 pages
Cs3551 Distributed Computing Unit-1
No ratings yet
Cs3551 Distributed Computing Unit-1
52 pages
Distributed Computing Book
No ratings yet
Distributed Computing Book
130 pages
Introduction to Distributed Computing
No ratings yet
Introduction to Distributed Computing
25 pages
DC Unit 1 - Notes
No ratings yet
DC Unit 1 - Notes
36 pages
# Definition: Distributed Computing Module-1
No ratings yet
# Definition: Distributed Computing Module-1
13 pages
DSCC Unit 1 PDF
No ratings yet
DSCC Unit 1 PDF
14 pages
Relation To Computer System Components: M.D.Boomija, Ap/Cse
100% (1)
Relation To Computer System Components: M.D.Boomija, Ap/Cse
39 pages
Unit I
No ratings yet
Unit I
19 pages
Introduction to Parallel Computing Systems
No ratings yet
Introduction to Parallel Computing Systems
34 pages
CS8603 U.I
No ratings yet
CS8603 U.I
37 pages
Distributed Systems R19 - Unit-1
No ratings yet
Distributed Systems R19 - Unit-1
35 pages
CS3551 - Distributed Computing
No ratings yet
CS3551 - Distributed Computing
106 pages
Unit 1
No ratings yet
Unit 1
21 pages
Dc-Unit 1
No ratings yet
Dc-Unit 1
60 pages
Introduction to Distributed Systems
No ratings yet
Introduction to Distributed Systems
36 pages
Distributed Computing
No ratings yet
Distributed Computing
204 pages
Unit 2. Distributed Os and Issue
No ratings yet
Unit 2. Distributed Os and Issue
1 page
Ds 01
No ratings yet
Ds 01
41 pages
Distributed Systems: University of Pennsylvania
No ratings yet
Distributed Systems: University of Pennsylvania
26 pages
DSys CA1
No ratings yet
DSys CA1
3 pages
What Is A Distributed System ??
No ratings yet
What Is A Distributed System ??
8 pages
Distributed Computing
No ratings yet
Distributed Computing
36 pages
Edited - Distributed Computing - CS3551 - Notes
No ratings yet
Edited - Distributed Computing - CS3551 - Notes
17 pages
Adobe Scan 02-Oct-2023
No ratings yet
Adobe Scan 02-Oct-2023
4 pages
DC Module1
No ratings yet
DC Module1
54 pages
DS Mod 1
No ratings yet
DS Mod 1
44 pages
UNIT-1 by Satish
No ratings yet
UNIT-1 by Satish
37 pages
Ds-Module-1 (1) - 240125 - 093156
No ratings yet
Ds-Module-1 (1) - 240125 - 093156
63 pages
Module 1
No ratings yet
Module 1
51 pages
Class Notes
No ratings yet
Class Notes
36 pages
Module 1 - Chapter 1
No ratings yet
Module 1 - Chapter 1
6 pages
Introduction to Distributed Systems
No ratings yet
Introduction to Distributed Systems
6 pages
CS3551 Unit 1
No ratings yet
CS3551 Unit 1
24 pages
Overview of Distributed Systems Concepts
No ratings yet
Overview of Distributed Systems Concepts
14 pages
DACE AI - Lab - Manual 2nd Year
No ratings yet
DACE AI - Lab - Manual 2nd Year
14 pages
Unit II Message Ordering - Updated
No ratings yet
Unit II Message Ordering - Updated
42 pages
DL Unit1
100% (1)
DL Unit1
61 pages
Process Analysis Fundamentals With Uopath Process Mining PDF
No ratings yet
Process Analysis Fundamentals With Uopath Process Mining PDF
1 page
A2.Elementary - Unit7 FINAL
100% (2)
A2.Elementary - Unit7 FINAL
22 pages
Building Single Page Applications in Net Core 3 Jumpstart Coding Using Blazor and C 1st Edition Michele Aponte PDF Download
100% (1)
Building Single Page Applications in Net Core 3 Jumpstart Coding Using Blazor and C 1st Edition Michele Aponte PDF Download
18 pages
OSI Model: A Guide for Developers
100% (1)
OSI Model: A Guide for Developers
6 pages
Understanding Text Structures in EAPP
No ratings yet
Understanding Text Structures in EAPP
35 pages
Sandhinirmocana Sutra - Profound Secrets
No ratings yet
Sandhinirmocana Sutra - Profound Secrets
12 pages
2.A - Some Basic Relationships Between Pixels Draft
No ratings yet
2.A - Some Basic Relationships Between Pixels Draft
32 pages
.NET Overview: Features and Languages
No ratings yet
.NET Overview: Features and Languages
172 pages
2-Data Types (E-Next - In)
No ratings yet
2-Data Types (E-Next - In)
19 pages
Voy A Apagar La Luz Lyric
No ratings yet
Voy A Apagar La Luz Lyric
2 pages
Book of Abstracts-AKKSHI 2023
No ratings yet
Book of Abstracts-AKKSHI 2023
33 pages
System Variables For 0i
No ratings yet
System Variables For 0i
17 pages
Quick 7
No ratings yet
Quick 7
3 pages
STAAD 3D Rendering Issue with Section Wizard
0% (1)
STAAD 3D Rendering Issue with Section Wizard
7 pages
Understanding the Hadith of Gabriel
No ratings yet
Understanding the Hadith of Gabriel
12 pages
Đề Thi Học Kì 2 Lớp 5 Môn Tiếng Anh
No ratings yet
Đề Thi Học Kì 2 Lớp 5 Môn Tiếng Anh
6 pages
APA Plagiarism Detection Exercise
No ratings yet
APA Plagiarism Detection Exercise
3 pages
History and Significance of Hadith
No ratings yet
History and Significance of Hadith
10 pages
The Mahamaya Tantra - 84000
No ratings yet
The Mahamaya Tantra - 84000
40 pages
Smarthome Project
No ratings yet
Smarthome Project
47 pages
Eliade's Phenomenology of Religion
No ratings yet
Eliade's Phenomenology of Religion
3 pages
Final K.G Planner PDF
No ratings yet
Final K.G Planner PDF
43 pages
The Summary Response of A TED Talk Speech (F2019)
No ratings yet
The Summary Response of A TED Talk Speech (F2019)
2 pages
Abyssinia/Ethiopia: State Formation and Nation Building Project
No ratings yet
Abyssinia/Ethiopia: State Formation and Nation Building Project
140 pages
Chemical Industries in Mumbai
0% (1)
Chemical Industries in Mumbai
2 pages
FEBE1002A CompLit
No ratings yet
FEBE1002A CompLit
7 pages
Hannah Setter's 2019 Teaching Experience Report
No ratings yet
Hannah Setter's 2019 Teaching Experience Report
6 pages
Aspiring Elementary Educator
No ratings yet
Aspiring Elementary Educator
2 pages
HSK 3 Vocabulary Flashcards & List
100% (4)
HSK 3 Vocabulary Flashcards & List
13 pages
Period of New Society
100% (1)
Period of New Society
21 pages
Bahan Ajar Language Material Development BA - LANGUAGE MATERIAL DEVELOPMENT - SEMESTER 2023-2024 - Sri Wahyuni
No ratings yet
Bahan Ajar Language Material Development BA - LANGUAGE MATERIAL DEVELOPMENT - SEMESTER 2023-2024 - Sri Wahyuni
24 pages