CSCE455/855 Distributed Operating Systems
Introduction
Dr. Ying Lu
[email protected]
Schorr Center 106
CSCE455/855 Distributed Operating Systems
Giving credit where credit is due:
Most of the lecture notes are from the textbook
companion website
Some of the lecture notes are based on slides
created by Dr. Zahorjan at Univ. of Washington
and Dr. Konev at Univ. of Liverpool
I have modified them and added new slides
What is an Operating System?
The text:
“an intermediary between the user of a computer
and the computer hardware”
“manages the computer hardware”
“an amazing aspect of operating systems is how
varied they are in accomplishing these tasks …
mainframe operating systems … personal
computer operating systems … operating systems
for handheld computers …”
What is an Operating System?
An operating system (OS) is:
a software layer to abstract away and manage details
of hardware resources
a set of utilities to simplify application development
Applications
OS
Hardware
The major OS issues
structure: how is the OS organized?
sharing: how are resources shared across users?
naming: how are resources named (by users or
programs)?
security: how is the integrity of the OS and its
resources ensured?
protection: how is one user/program protected from
another?
performance: how do we make it all go fast?
More OS issues…
concurrency: how are parallel activities (computation
and I/O) created and controlled?
scale: what happens as demands or resources increase?
persistence: how do you make data last longer than
program executions?
reliability: what happens if something goes wrong
(either with hardware or with a program)?
extensibility: can we add new features?
flexibility: are we in the way of new apps?
communication: how do programs exchange
information?
Course Aims
Provide an understanding of the technical issues
involved in the design of modern distributed
systems.
Appreciation of the main principles underlying
distributed systems: processes, communication,
naming, synchronization, consistency, fault
tolerance, and security.
Definition of a Distributed System (1)
A collection of independent computers that
appears to its users as a single coherent
system.
Two aspects of this definition:
1. Hardware: the machines are autonomous.
2. Software: the users think they are dealing a single system
(which is achieved by software).
Examples of Distributed Systems
Computer world:
University computer network
GRID (distributed computing facilities)
Ordinary life:
WWW, P2P systems (such as Napster)
Banks (Cash machines)
Ticket reservation
Characteristics of a Distributed
System (I)
Differences between the various computers
and the way in which they communicate are
hidden from users.
Users and applications can interact with a
distributed system in a consistent and uniform
way, regardless of where and when
interaction takes place.
Characteristics of a Distributed
System (II)
As a direct consequence of having
independent computers and hiding their
differences, a distributed system should be
relatively easy to expand or scale.
A distributed system will normally be
continuously available, although perhaps
certain parts may be temporarily out of order.
Definition of a Distributed System (2)
Figure 1-1. A distributed system organized as middleware. The
middleware layer extends over multiple machines, and offers
each application the same interface.
Goals of Distributed Systems
Easily connect users/resources
Exhibit transparency
Support openness
Be scalable
Connecting Users and Resources
Typical resources
Printers, computers, computing power, data
Why sharing
Economics
Collaboration, information exchange (groupware)
Problems with sharing
Security
Unwanted collaboration
Transparency in a Distributed System
Transparent distributed system:
Looks to its users as if it were only a single
computer system
Transparency in a Distributed System
Figure 1-2. Different forms of transparency in a
distributed system (ISO, 1995).
Degree of transparency
Distribution transparency is generally preferable, but
is not always a good idea:
It is undesirable to hide the location of a shared
printer from its users.
There is a need for trade-off between a high degree of
transparency and the performance of a system.
It is impossible to hide the fact that Mother Nature
will not allow it to send a message from one process
in San Francisco to the other in Amsterdam in less
than approximately 35 ms.
Openness
Open systems
Offer services according to standard rules that
describe the syntax and semantics of these services
Enjoy neutral and complete specifications
Network protocols
Advantage of being open:
Interoperability: open systems can work together
Portability: ability to transform an application
from one software or hardware platform to another
Scalability
Along three different dimensions
Size (the number of users and/or processes)
Geographical (maximum distance between
participants)
Administrative (number of administrative
domains)
Centralized Solutions: Obstacles for
Achieving Size Scalability
Concept Example
Centralized services A single server for all users
Centralized data A single on-line telephone book
Centralized Doing routing based on complete
algorithms information
Examples of scalability limitations.
How scalability is accomplished for
routing in the Internet?
Characteristics of Decentralized
Algorithms
No machine has complete information about the system state
Machines make decisions based only on local information
Failure of one machine does not ruin the algorithm
Three is no implicit assumption that a global clock exists
Difficulties for Achieving
Geographical Scalability
Synchronous communication suitable for LAN is
unsuitable for WAN due to different communication
latency
Communication in WAN is inherently unreliable and
virtually always point-to-point, whereas LAN
communication is generally highly reliable and based
on broadcasting (e.g., location service)
Centralized solutions lead to a waste of network
resources and degrade system performance
Difficulties for Achieving
Administrative Scalability
Administrative scalability is hindered by conflicting
resource usage, management, and security policies in
multiple, independent administrative domains
Pitfalls when Developing
Distributed Systems
False assumptions made by first time developer:
• The network is reliable.
• The network is secure.
• The network is homogeneous.
• The topology does not change.
• Latency is zero.
• Bandwidth is infinite.
• Transport cost is zero.
• There is one administrator.
Scaling Techniques
Three techniques:
Hiding communication latencies
Try to avoid waiting for responses to remote service
requests as much as possible
Distribution
Replication
Scaling Techniques (1)
1.4
How to hide the communication latencies?
How to avoid waiting for the responses to
remote service request as much as possible?
Scaling Techniques (1)
1.4
The difference between letting a server or a client check
forms as they are being filled
Shipping code approach widely supported by the Web in
the form of Java applets and Javascript
Scaling Techniques (2)
1.5
An example of dividing the DNS name space into zones.
e.g. resolve name flits.cs.vu.nl
Scaling Techniques (2)
World Wide Web
An enormous document-based information
system
How scalability is achieve?
distribution of documents
Scaling Techniques (3)
Replication
Increases availability
Balances the load
Reduces communication latency
But causes consistency problems
Many existing consistency maintenance solutions are
inherently non-scalable
Caching (client-driven)
Why Middleware is an Important
Component of Distributed Systems
Distributed system
A collection of independent computers that
appears to its users as a single coherent system.
Two aspects of this definition:
1. Hardware: the machines are autonomous.
2. Software: the users think they are dealing a single system
(which is achieved by software).
Middleware was invented to meet practical
needs for building distributed systems
Cluster Computing Systems
Figure 1-6. An example of a cluster computing
system.
Grid Computing Systems
Figure 1-7. A layered architecture for grid computing systems.
Transaction Processing Systems (1)
Figure 1-8. Example primitives for
transactions.
Transaction Processing Systems (2)
Characteristic properties of transactions (ACID):
• Atomic: To the outside world, the transaction happens
indivisibly.
• Consistent: The transaction does not violate system
invariants.
• Isolated: Concurrent transactions do not interfere with
each other.
• Durable: Once a transaction commits, the changes are
permanent.
Transaction Processing Systems (3)
Figure 1-9. A nested transaction.
Transaction Processing Systems (4)
Figure 1-10. The role of a TP monitor in
distributed systems.
Enterprise Application Integration
Figure 1-11. Middleware as a communication facilitator in enterprise
application integration.
Distributed Pervasive Systems
Home Systems
Location/context aware
Electronic Health Care Systems
Security
Sensor Networks
Data storage/aggregation/query
Sensor Networks (2)
Figure 1-13. Organizing a sensor network database, while
storing and processing data (a) only at the operator’s site or …
Sensor Networks (3)
Figure 1-13. Organizing a sensor network database, while
storing and processing data … or (b) only at the sensors.