Distributed System
Dr. D.S. Kushwaha
Computer Architectures
Computer architectures consisting of
interconnected, multiple processors are
basically of two types:
Tightly coupled systems
Loosely coupled systems
Tightly coupled systems
There is a single system wide primary
memory (address space) that is shared by all
the processors.
These are also known as parallel processing
systems.
System wide
CPU CPU CPU CPU
Shared memory
Interconnection hardware
Loosely coupled systems
The processors do not share memory, and
each processor has its own local memory.
These are also known as distributed
computing systems or simply distributed
systems.
Local memory Local memory Local memory Local memory
CPU CPU CPU CPU
Communication Network
Distributed Computing System
A DCS is a collection of independent computers that
appears to its users as a single coherent system,
or
A collection of processors interconnected by a
communication network in which
Each processor has its own local memory and other
peripherals, and
The communication between any two processors of
the system takes place by message passing.
For a particular processor, its own resources are
local, whereas the other processors and their
resources are remote.
Cont…
Together, a processor and its resources are
usually referred to as a node or site or machine
of the distributed computing system.
A distributed system is organized as middleware.
Note that the middleware layer extends over multiple
machines.
Distributed System
Distributed Computing System Models
1. Minicomputer Model
2. Workstation model
3. Workstation-server model
4. Processor-pool model
5. Hybrid model
Factors that led to the emergence of
distributed computing system
Inherently distributed applications
Organizations once based in a particular location have gone
global.
Information sharing among distributed users
CSCW
Resource sharing
Such as software libraries, databases, and hardware resources.
Factors that led to the emergence of
distributed computing system
Better price-performance ratio
Shorter response times and higher throughput
Higher reliability
Reliability refers to the degree of tolerance against errors
and component failures in a system.
Achieved by multiplicity of resources.
Factors that led to the emergence of
distributed computing system
Extensibility and incremental growth
By adding additional resources to the system as and when
the need arise.
These are termed as open distributed System.
Better flexibility in meeting user’s needs
Distributed Operating system
OS controls the resources of a computer system and
provides its users with an interface or virtual machine
that is more convenient to use than the bare machine.
Primary tasks of an operating system:
To present users with a virtual machine that is easier to
program than the underlying hardware.
To manage the various resources of the system.
Types of operating systems used for distributed
computing systems:
Network operating systems
Distributed operating systems.
Uniprocessor Operating Systems
Separating applications from operating system code
through a microkernel
Multicomputer Operating Systems
Message Transfer
Network Operating System
Network Operating System
Distributed System as Middleware
Middleware and Openness
In an open middleware-based distributed system, the protocols
used by each middleware layer should be the same, as well as
the interfaces they offer to applications
Comparison between Systems
Distributed OS
Network Middleware-
Item Multi- Multi- OS based OS
processor computer
Degree of
Very High High Low High
transparency
Same OS on all
Yes Yes No No
nodes
Number of copies
1 N N N
of OS
Basis for Shared
Messages Files Model specific
communication memory
Resource Global, Global,
Per node Per node
management central distributed
Scalability No Moderately Yes Varies
Openness Closed Closed Open Open
Major Differences
1. System image
In NOS, user perceives the DCS as a group of nodes
connected by a communication N/W. Hence, user is aware of
multiple computers.
DOS hides the existence of multiple computers and provides
a single system image to its users. Hence group of Networked
nodes act as virtual Uniprocessor.
In NOS, by default user’s job is executed on the machine on
which the user is currently logged on, else he has to remote
login.
DOS dynamically allocates jobs to the various machines of
the system for processing.
Cont…
2. AUTONOMY
Different nodes in NOS may use different O.S. but communicate
with each other by using a mutually agreed upon communication
protocol.
In DOS, there exists single system wide O.S. and each node of
the DCS runs a part of OS (i.e. identical kernels run on all the
nodes of DCS).
This ensures same set of system calls globally valid.
3. FAULT TOLERANCE
Low in Network OS
High in Distribute OS
DCS using NOS generally referred as N/W System
DCS using DOS generally referred as Distributed System
Issues
Transparency
Reliability
Flexibility
Performance
Scalability
Heterogeneity
Security
Emulation to existing operating systems
TRANSPARENCY
To make the existence of multiple computers transparent &
Provide a single system image to its users.
ISO 1992 identifies eight types of transparencies:
Access Transparency
Location Transparency
Replication Transparency
Failure Transparency
Migration Transparency
Concurrency Transparency
Performance Transparency
Scaling Transparency
Access Transparency
Allows users to access remote resources in the same
way as local i.e. user interface which is in the form of a
set of system calls should not distinguish between local
and remote resources.
Location transparency
Aspects of location transparency
Name Transparency
Name of a resource should not reveal any information
about physical location of resource.
Even movable resources such as files must be
allowed to move without having their name changed.
User Mobility
No matter which machine a user is logged onto, he
should be able to access a resource with the same
name.
Replication Transparency
Replicas are used for better performance and reliability.
Replicated resource and the replication activity should be
transparent to user.
Two important issues related to replication transparency are:
Naming of Replicas
Replication Control
There should be a method to map a user-supplied name of the
resource to an appropriate replica of the resource.
Failure Transparency
In partial failures, system continues to function may be
with degraded transparency.
Complete failure transparency is not achievable.
Failure of the communication network of a distributed
system normally disrupts the work of its users and is
noticeable by the users.
Migration Transparency
Migration may be done for performance, reliability and
security reasons.
Aim of migration transparency
To ensure that the movement of the object is handled
automatically by the system in a user-transparent manner.
Issues for migration transparency:
Migration decisions
Migration of an object
Migration object as a process
Concurrency transparency
Virtualization that:
One is sole user of the system &
Other users do not exist in the system.
Properties for providing concurrency transparency:
Event Ordering Property
It ensures that all access requests to various system
resources are properly ordered to provide a consistent view
to all users of the system.
Mutual Exclusion property
At any time, at most one process accesses a shared
resource & not used simultaneously.
Cont…
No starvation policy
It ensures that if every process that is granted a
resource, which must not be used simultaneously by
multiple processes, eventually releases it, every request
for that resource is eventually granted.
No deadlock
It ensures that a situation will never occur in which
competing processes prevent their mutual progress even
though no single one requests more resources than
available in the system.
Performance transparency
The aim of performance transparency is to allow the
system to be automatically reconfigured to improve
performance, as loads vary dynamically in the system.
A situation in which one processor of the system is
overloaded with jobs while another processor is idle
should not be allowed to occur.
Scaling transparency
Allows the system to expand and scale without
disrupting the activities of the users.
Requires open system architecture and use of scalable
algorithms.
RELIABILITY
The distributed systems are required to be
more reliable than centralized systems due to
The multiple instances of resources.
Failure should be avoided & their types are:
Fail stop Failure: System stops functioning after
changing to a state in which its failure can be
detected.
Byzantine Failure: System continues to function but
produces wrong results.
Cont…
Methods to handle failure in distributed system :
Fault Avoidance
Fault Tolerance
Fault Detection
Fault detection and recovery
Fault Avoidance
It Deals with designing the components of the system
in such a way that the occurrence of faults is
minimized.
For this, reliable hardware components and intensive
software test are used.
Fault Tolerance
Ability of a system to continue functioning in the event
of partial system failure.
Method for tolerating the faults:
Redundancy techniques:
Avoid single point of failure by replicating critical
hardware and software components,
Additional overhead is needed to maintain multiple
copies of replicated resource & consistency issue.
Distributed control
A highly available distributed file system should have
multiple and independent file servers controlling multiple
and independent storage device.
Fault detection and recovery
Atomic Transaction
Use of stateless servers
History of the serviced requests between Client and Server
affects the execution of service request.
Acknowledgements and timeout-based retransmissions
of messages
For IPC between two processes, the system must have ways
to detect lost messages so that these could be retransmitted.
FLEXIBILITY
The design of a distributed operating system should be flexible due
to:
Ease of Modification
Ease of Enhancement
Kernel is the most important influencing design factor which
operates in a separate address space that is inaccessible to user
processes.
Commonly used models for kernel design in distributed operating
systems:
Microkernel model
Monolithic model
Monolithic kernel model
Most operating system services such as process management,
memory management, device management , file management,
name management, and inter-process communication are
provided by the kernel.
Result: The kernel has a large monolithic structure.
The large size of the kernel reduces the overall flexibility and
configurability of the resulting OS.
A request may be serviced faster.
No message passing and no context switching are required while the
kernel is performing the job.
Cont…
Node 1 Node 2 Node n
user user user
applications applications applications
Monolithic Monolithic Monolithic
kernel kernel kernel
(includes most (includes most (includes most
OS services) OS services) OS services)
Network Hardware
Microkernel model
Goal is to keep the kernel as small as possible.
Kernel provides only the minimal facilities necessary for
implementing additional operating system services like:
inter-process communication,
low-level device management, &
some memory management.
All other OS services are implemented as user-level server
processes. So it is easy to modify the design or add new
services.
Each server has its own address space and can be
programmed separately.
Cont…
Node 1 Node 2 Node n
user user user
applications applications applications
Server/manager Server/manager Server/manager
modules modules modules
Microkernel Microkernel Microkernel
(has only (has only (has only
minimal facilities) minimal facilities) minimal facilities)
Network Hardware
Cont…
Modular in nature, OS is easy to design, implement and
install.
For adding or changing a service, there is no need to stop
the system and boot a new kernel, as in the case of
monolithic kernel.
Performance penalty :
Each server module has to its own address space. Hence
some form of message based IPC is required while
performing some job.
Message passing between server processes and the
microkernel requires context switches, resulting in additional
performance overhead.
Cont…
Advantages of microkernel model over monolithic kernel
model:
Flexibility to design, maintenance and portability.
In practice, the performance penalty is not too much due to
other factors and the small overhead involved in exchanging
messages which is usually negligible.
PERFORMANCE
Design principles for better performance are:
Batch if possible
Transfer of data in large chunk is more efficient than
individual pages.
Caching whenever possible
Saves a large amount of time and network bandwidth.
Minimize copying of data
While a message is transferred from sender to receiver, it
takes the following path:
From senders stack to its message buffer
From message buffer in senders address apace to
message buffer in kernels address space
Cont…
Finally from kernel to NIC
Similarly on receipt ,hence six copy operations are
required.
Minimize Network traffic
Migrate process closer to resource.
Take advantage of fine-grain parallelism for
multiprocessing
Use of threads for structuring server processes.
Fine-grained concurrency control of simultaneous
accesses by multiple processes to a shared resource for
better performance.
SCALABILITY
Scalability refers to the capability of a system to adapt to
increased service load.
Some principles for designing the scalable distributed
systems are:
Avoid centralized entities
Central file server
Centralized database
Avoid centralized algorithms
Perform most operations on client workstation
Heterogeneity
Dissimilar hardware or software systems.
Some incompatibilities in a heterogeneous distributed
system are:
Internal formatting schemes
Communication protocols and topologies of different networks
Different servers at different nodes
Some form of data translation is necessary for
interaction.
An intermediate standard data format can be used.
Security
More difficult than in a centralized system because of:
The lack of a single point of control &
The use of insecure networks for data communication.
Requirements for security
It should be possible for the sender of a message to know
that the message was received by the intended receiver.
It should be possible for the receiver of a message to know
that the message was sent by the genuine receiver.
Cont…
It should be possible for both the sender and receiver of a
message to be guaranteed that the contents of the message
were not changed while it was in transfer.
Cryptography is the solution.