Chapter 3
Process
12/17/24
1
Introduction
Process is an instance of a computer program that is being executed.
It contains the program code and its current activity.
communication takes place between processes
a process is a program in execution
from OS perspective, management and scheduling of processes is
important
other important issues arise in distributed systems
multi-threading to enhance performance
how are clients and servers organized
process or code migration to achieve scalability and to dynamically
configure clients and servers
12/17/24
2
3.1 Threads and their Implementation
threads can be used in both distributed and non distributed systems
Threads in Non distributed Systems
a process has an address space (containing program text and data)
and a single thread of control, as well as other resources such as
open files, child processes, accounting information, etc.
Process 1 Process 2 Process 3
three processes each with one thread one process with three threads
12/17/24
3
each thread has its own program counter, registers, stack,
and state; but all threads of a process share address space,
global variables and other resources such as open files, etc.
12/17/24
4
threads are particularly useful to structure large applications into parts
that could be logically executed at the same time.
Threads allow multiple executions to take place in the same process
environment, called multithreading
Thread Usage – Why do we need threads?
a word processor has different parts; parts for
interacting with the user
formatting the page as soon as changes are made
timed savings (for auto recovery)
spelling and grammar checking, etc.
1. Simplifying the programming model: since many activities are
going on at once
2. They are easier to create and destroy than processes since
they do not have any resources attached to them
3. Performance improves by overlapping activities if there is too
much I/O; i.e., to avoid blocking when waiting for input or doing
calculations, say in a spreadsheet
4. Real parallelism is possible in a multiprocessor system
12/17/24 5
Thread Implementation
threads are usually provided in the form of a thread package
the package contains operations to create and destroy a thread,
operations on synchronization variables such as mutexes and
condition variables
two approaches of constructing a thread package
a. construct a thread library that is executed entirely in user mode
(the OS is not aware of threads)
cheap to create and destroy threads; just allocate and free
memory
context switching can be done using few instructions; store
and reload only CPU register values
disadvantage: invocation of a blocking system call will block
the entire process to which the thread belongs and all other
threads in that process
b. implement them in the OS’s kernel
let the kernel be aware of threads and schedule them
expensive for thread operations such as creation and deletion
since each requires a system call
12/17/24 6
solution: use a hybrid form of user-level and kernel-level threads,
called lightweight process (LWP)
a LWP runs in the context of a single (heavy-weight) process, and
there can be several LWPs per process
the system also offers a user-level thread package for some
operations such as creating and destroying threads, for thread
synchronization (mutexes and condition variables)
the thread package can be shared by multiple LWPs
The important issue is that the thread package is implemented
entirely in user space. In other words. all operations on threads are
carried out without intervention of the kernel.
12/17/24 combining kernel-level lightweight processes and user-level threads 7
Threads in Distributed Systems
main contribution of threads in distributed systems is that they allow clients and
servers to be constructed such that communication and local processing can overlap,
resulting in a high level of performance.
Multithreaded Clients
consider a Web browser; fetching different parts of a page can be
implemented as a separate thread, each opening its own TCP/IP connection
to the server or to separate and replicated servers
each can display the results as it gets its part of the page
Multithreaded Servers
servers can be constructed in three ways
a. single-threaded process
it gets a request, examines it, carries it out to completion before getting
the next request
the server is idle while waiting for disk read, i.e., system calls are blocking
12/17/24 8
b. threads
threads are more important for implementing servers
e.g., a file server
the dispatcher thread reads incoming requests clients
and passes it to an idle worker thread
Worker thread is selected by the server to process a
request
12/17/24 a multithreaded server organized in a dispatcher/worker model
9
c. finite-state machine
if threads are not available
it gets a request, examines it, tries to fulfill the request from cache, else
sends a request to the file system; but instead of blocking it records the
state of the current request and proceeds to the next request
Summary
Model Characteristics
Single-threaded process No parallelism, blocking system calls
Parallelism, blocking system calls
Threads
(thread only)
Parallelism, non-blocking system
Finite-state machine
calls
three ways to construct a server
12/17/24
10
3.2 Anatomy of Clients
Two issues: user interfaces and client-side software for
distribution transparency
a. User Interfaces
to create a convenient environment for the interaction of a
human user and a remote server; e.g. mobile phones with
simple displays and a set of keys
GUIs are most commonly used
The X Window System (or simply X)
it has the X kernel: the part of the OS that controls the
terminal (monitor, keyboard, pointing device like a
mouse) and is hardware dependent
It contains all terminal-specific device drivers through
the library called xlib
12/17/24
11
the basic organization of the X Window System
12/17/24
12
b. Client-Side Software for Distribution Transparency
in addition to the user interface, parts of the processing and data
level in a client-server application are executed at the client side
an example is embedded client software for ATMs, cash registers,
etc.
moreover, client software can also include components to achieve
distribution transparency
e.g., replication transparency
assume a distributed system with replicated servers; the client
proxy can send requests to each replica and a client side software
can transparently collect all responses and passes a single return
value to the client application
12/17/24 13
transparent replication of a server using a client-side solution
access transparency and failure transparency can also be
achieved using client-side software
12/17/24
14
3.3 Servers and design issues
3.3.1 General Design Issues
How to organize servers?
Where do clients contact a server?
Whether and how a server can be interrupted
Whether or not the server is stateless
a. How to organize servers?
Iterative server
the server itself handles the request and returns the
result
Concurrent server
it passes a request to a separate process or thread and
waits for the next incoming request; e.g., a multithreaded
server; or by forking a new process as is done in Unix
12/17/24
15
b. Where do clients contact a server?
using endpoints or ports at the machine where the server
is running where each server listens to a specific endpoint
how do clients know the endpoint of a service?
globally assign endpoints for well-known services; e.g.
FTP is on TCP port 21, HTTP is on TCP port 80
for services that do not require preassigned endpoints,
it can be dynamically assigned by the local OS
IANA (Internet Assigned Numbers Authority) Ranges
IANA divided the port numbers into three ranges
Well-known ports: assigned and controlled by IANA
for standard services, e.g., DNS uses port 53
12/17/24
16
Registered ports: are not assigned and controlled by IANA;
can only be registered with IANA to prevent duplication e.g.,
MySQL uses port 3306
Dynamic ports or ephemeral ports : neither controlled nor
registered by IANA
how can the client know this endpoint? two approaches
i. have a daemon(sprit) running and listening to endpoint; it
keeps track of all endpoints of services on the collocated
server
the client will first contact the daemon which provides it
with the endpoint, and then the client contacts the
specific server
12/17/24
17
Client-to-server binding using a daemon
ii. use a superserver (as in UNIX) that listens to all endpoints and then forks a
process to take care of the request; this is instead of having a lot of servers
running simultaneously and most of them idle
12/17/24 18
Client-to-Server binding using a superserver
c. Whether and how a server can be interrupted
for instance, a user may want to interrupt a file transfer, may be it was
the wrong file
let the client exit the client application; this will break the connection
to the server; the server will tear down the connection assuming that
the client had crashed
or
let the client send out-of-bound data, data to be processed by the
server before any other data from the client; the server may listen on
a separate control endpoint; or send it on the same connection as
urgent data as is in TCP
d. Whether or not the server is stateless
a stateless server does not keep information on the state of its
clients; for instance a Web server
soft state: a server promises to maintain state for a limited time; e.g.,
to keep a client informed about updates; after the time expires, the
client has to poll
12/17/24 19
a stateful server maintains information about its clients; for instance a file server that allows a client to keep a local copy of a file
and can make update operations
3.3.2 Server Clusters
a server cluster is a collection of machines connected
through a network (normally a LAN with high bandwidth and
low latency) where each machine runs one or more servers
it is logically organized into three tiers
12/17/24
20
the general organization of a three-tiered server cluster
12/17/24
21
Distributed Servers
the problem with a server cluster is when the logical switch
(single access point) fails making the cluster unavailable
hence, several access points can be provided where the
addresses are publicly available leading to a distributed
server
e.g., the DNS can return several addresses for the same host
name
12/17/24
22
3.4 Code Migration
It is an act of moving a piece of code/process from one machine to another
so far, communication was concerned on passing data
we may pass programs, even while running and in
heterogeneous systems
code migration also involves moving data as well: when a
program migrates while running, its status, pending signals,
and other environment variables such as the stack and the
program counter also have to be moved
12/17/24
23
Reasons for Migrating Code
to improve performance; move processes from heavily-
loaded to lightly-loaded machines (load balancing)
to reduce communication: move a client application that
performs many database operations to a server if the
database resides on the server; then send only results to the
client
to exploit parallelism (for nonparallel programs
12/17/24
24
to have flexibility by dynamically configuring distributed systems:
instead of having a multitiered client-server application deciding
in advance which parts of a program are to be run where
the principle of dynamically configuring a client to communicate to a server; the client first
fetches the necessary software, and then invokes the server
12/17/24 25
Models for Code Migration
a process consists of three segments: code segment (set of
instructions), resource segment (references to external resources such
as files, printers, ...), and execution segment (to store the current
execution state of a process such as private data, the stack, the
program counter)
Weak Mobility
transfer only the code segment and may be some initialization data;
in this case a program always starts from its initial stage, e.g. Java
Applets
execution can be by the target process (in its own address space
like in Java Applets) or by a separate process
12/17/24 26
Strong Mobility
transfer code and execution segments; helps to migrate a
process in execution
can also be supported by remote cloning; having an exact
copy of the original process and running on a different
machine; executed in parallel to the original process; UNIX
does this by forking a child process
migration can be
sender-initiated: the machine where the code resides or is
currently running; e.g., uploading programs to a server; may
need authentication or that the client is a registered one
receiver-initiated: by the target machine; e.g., Java Applets;
easier to implement
12/17/24
27
Summary of models of code migration
alternatives for code migration
12/17/24
28
Migration and Local Resources
how to migrate the resource segment
not always possible to move a resource; e.g., a reference to
TCP port held by a process to communicate with other
processes
Types of Process-to-Resource Bindings
Binding by identifier (the strongest): a resource is referred by
its identifier; e.g., a URL to refer to a Web page or an FTP
server referred by its Internet (IP) address
Binding by value (weaker): when only the value of a resource is
needed; in this case another resource can provide the same
value; e.g., standard libraries of programming languages such
as C or Java which are normally locally available, but their
location in the file system may vary from site to site
Binding by type (weakest): a process needs a resource of a
specific type; reference to local devices, such as monitors,
printers, ...
12/17/24
29
in migrating code, the above bindings cannot change, but the references to
resources can
how can a reference be changed? depends whether the resource can be moved
along with the code, i.e., resource-to-machine binding
Types of Resource-to-Machine Bindings
Unattached Resources: can be easily moved with the migrating
program (such as data files associated with the program)
Fastened Resources: such as local databases and complete Web
sites; moving or copying may be possible, but very costly
Fixed Resources: intimately bound to a specific machine or
environment such as local devices and cannot be moved
we have nine combinations to consider
12/17/24 30
Resource-to machine binding
Process-to- Unattached Fastened Fixed
resource binding By identifier MV (or GR) GR (or MV) GR
By value CP (or MV, GR) GR (or CP) GR
By actions
type RB (or GR, CP) RB (or GR, CP) RB (or GR)
to be taken with respect to the references to local resources when migrating code to another machine
GR: Establish a global system wide reference
MV: Move the resource
CP: Copy the value of the resource
RB: Rebind process to a locally available resource
12/17/24
31
Migration in Heterogeneous Systems
distributed systems are constructed on a heterogeneous collection of platforms,
each with its own OS and machine architecture
heterogeneity problems are similar to those of portability
easier in some languages
for scripting languages the source code is interpreted
for Java an intermediary code is generated by the compiler for a virtual machine
in weak mobility
since there is no runtime information, compile the source code for each
potential platform
in strong mobility
difficult to transfer the execution segment since there may be platform-
dependent information such as register values; Read the book about possible
solutions
12/17/24
32
Thank you!
Q?&A
12/17/24
33