Naming and Threads
Chapter 3
Introduction
Name in a distributed system is a string of
bits or characters that is used to refer to an
entity.
To operate on an entity, it is necessary to
access it, for which we need an access point.
An access point is special kind of entity in a
distributed system.
An entity can offer more than one access
point. (Phone owning).
An entity may change its access points in
the course of time. (moving to another
country)
Cont..
An address is a special kind of name: it
refers to an access point of an entity.
There are other types of names that are
used to uniquely identify an entity
referred to us Identifiers.
A true identifier is a name that has the
following properties
– 1. An identifier refers to at most one entity.
– 2. Each entity is referred to by at most one
identifier.
– 3. An identifier always refers to the same
entity.
Name Spaces
A general naming graph with a single root node
Name Spaces
The general organization of the UNIX file system
implementation on a logical, contiguous disk
blocks
Linking and Mounting
The concept of a symbolic link in a naming graph
Linking and Mounting
Mounting remote name spaces through a specific process protocol
Name Space Distribution
An example partitioning of the DNS name space,
including Internet-accessible files, into three layers
Name Space Distribution
Item Global Administrational Managerial
Geographical scale of network Worldwide Organization Department
Total number of nodes Few Many Vast numbers
Responsiveness to lookups Seconds Milliseconds Immediate
Update propagation Lazy Immediate Immediate
Number of replicas Many None or few None
Is client-side caching applied? Yes Yes Sometimes
A comparison between name servers for implementing nodes from a large-
scale name space partitioned into a global layer, as an administrational
layer, and a managerial layer
Implementation of Name
Resolution
The principle of iterative name resolution
Implementation of Name
Resolution
The principle of recursive name resolution
Implementation of Name
Resolution
Server for Should Passes to Receives Returns to
Looks up
node resolve child and caches requester
cs <ftp> #<ftp> -- -- #<ftp>
vu <cs,ftp> #<cs> <ftp> #<ftp> #<cs>
#<cs, ftp>
ni <vu,cs,ftp> #<vu> <cs,ftp> #<cs> #<vu>
#<cs,ftp> #<vu,cs>
#<vu,cs,ftp>
root <ni,vu,cs,ftp> #<nl> <vu,cs,ftp> #<vu> #<nl>
#<vu,cs> #<nl,vu>
#<vu,cs,ftp> #<nl,vu,cs>
#<nl,vu,cs,ftp>
Recursive name resolution of <nl, vu, cs, ftp>. Name servers
cache intermediate results for subsequent lookups
Implementation of Name
Resolution
The comparison between recursive and iterative name
resolution with respect to communication costs
The DNS Name Space
Type of Associated
Description
record entity
SOA Zone Holds information on the represented zone
A Host Contains an IP address of the host this node represents
MX Domain Refers to a mail server to handle mail addressed to this node
SRV Domain Refers to a server handling a specific service
NS Zone Refers to a name server that implements the represented zone
CNAME Node Symbolic link with the primary name of the represented node
PTR Host Contains the canonical name of a host
HINFO Host Holds information on the host this node represents
TXT Any kind Contains any entity-specific information considered useful
The most important types of resource records forming
the contents of nodes in the DNS name space
Home-Based Approaches
The principle of Mobile IP
Hierarchical Approaches
Hierarchical organization of a location service into
domains, each having an associated directory node
Java RMI
Java Remote Method Invocation (Java
RMI) is an extension of the Java object
model to support distributed objects
• Methods of remote Java objects can be invoked from other Java
virtual machines, possibly on different hosts
RMI uses object serialization to marshal
and unmarshal
• Any serializable object can be used as
parameter or method return
RMI Architecture &
Components
Remote reference module is responsible for
providing addressing to the proxy (stub) object
Proxy is used to implement a stub and provide
transparency to the client.
Communication module is responsible for
networking
Dispatcher selects the proper skeleton and forward
message to it
Skeleton un-marshals the request and calls the
remote object
Cont..
RMI Architecture and Components
Cont..
RMI Steps
RMI Implementation
Cont..
Cont..
Process and Threads
To execute a program, the OS creates a
number of virtual processors.
Process: program in execution
Execution environment:
• an address space
• higher level resources
• CPU context
Expensive creation
Expensive context switch (may also require
memory swap)
The solution is
• Split a process in threads
Cont..
Threads: no attempt to provide
concurrency transparency
Executed in the same address space
threads of the same process share the
same execution environment
Thread context only the CPU context
(+information for thread management)
No performance overhead, but
Harder to use, require extra effort to
protect against each other
More on process creation
Two issues
Creation of an execution environment
(and an initial thread within it)
• Share
• Copy
• Copy-on-Write
Choice of target host (in distributed
systems)
• Transfer Policy (local or remote host)
• Location Policy (to which node to transfer)
• Load sharing (sender vs receiver initiated)
Introduction: Processes vs
Threads
Creating a new thread within an existing
process is cheaper than creating a
process (~10-20 times)
Traditional Unix process
Child processes created from a parent
process using the command fork.
Drawbacks:
• fork is expensive: Memory is copied from a parent
to its children.
• Inter process communication is needed to pass
information from parent to children and vice versa
Cont..
Switching to a different thread within the
same process is cheaper than switching
between threads belonging to different
processes (5-50 times).
Threads within a process may share data
and other resources conveniently and
efficiently compared with separate
processes.
Threads within a process are not
protected from one another.
Cont..
Thread implementation
Threads can be implemented:
in the OS kernel (Win NT, Solaris, Mach)
at user level (e.g. by a thread library: C
threads, pthreads), or in the language Java
Advantages:
lightweight - no system calls
modifiable scheduler
low cost enables more threads to be employed
Disadvantages
not pre-emptive, cannot schedule threads within different
procedures
Cannot exploit multiple processors
Blocking system calls (page fault) blocks the process and
thus all threads
User-space solution
have nothing to do with the kernel, so all
operations can be completely handled
within a single process
so implementations can be extremely
efficient.
All services provided by the kernel are
done on behalf of the process in which a
thread resides
In practice we want to use threads when
there are lots of external events: threads
block on a per-event basis if the kernel
can’t distinguish threads.
Kernel solution
The whole idea is to have the kernel contain
the implementation of a thread package.
Operations that block a thread are no
longer a problem: the kernel schedules
another available thread within the same
process.
Handling external events is simple: the
kernel (which catches all events) schedules
the thread associated with the event.
The big problem is the loss of efficiency due
to the fact that each thread operation
requires a trap to the kernel.
Conclusion: mix user-level and kernel-level
Cont..
Lightweight processes (LWP) that can execute
user-level threads.
An LWP runs in the context of a single (heavy-weight)
process and a user level thread package (create,
destroy and synchronization thread )
Assign a thread to an LWP (hidden from the
programmer)
When an LWP is created (through a system call) gets
its own stack and execute a scheduling routine (that
searches for a thread to execute)
If many LWPs, each executes the scheduler, they share
a thread table (with the current set of threads),
synchronization among them in user space.
Cont.. (LWP)
When a thread calls a blocking user-level operation
(e.g., blocks on a mutex or a condition variable), it calls
the scheduling routine, when another runnable thread
is found, a context switch is made to that thread which
is then bound to the same LWP (the LWP that is
executing the thread need not be informed)
When a user-level thread does a blocking system call,
the LWP that is executing that thread blocks. The
thread remains bound to the LWP.
The kernel can simply schedule another LWP having a
runnable thread bound to it.
Note that this thread can switch to any other runnable
thread currently in user space.
When there are no threads to schedule, an LWP may
remain idle, and may even be removed (destroyed) by
the kernel.
Cont..
Multithreaded Servers
Why threads?
In a single-threaded system process
whenever a blocking system call is
executed, the process as a whole is
blocked
executing a program on a multiprocessor
system (assign each thread to a different
CPU)
Example
Each request takes on average 2msecs of
processing and 8 msecs of I/O delay (no
caching)
Maximum server throughput (measured as
client requests handled per sec)
Single thread
– Turnaround time for each request: 2 + 8 = 10
msecs
– Throughput: 100 req/sec
Two threads (if disk requests are serialized)
– Turnaround time for each request: 8 msecs
– Throughput: 125 req/sec
Cont..
A pool of “worker” threads to process the
requests.
One I/O (dispatcher) thread: receives
requests from a collections of ports and
places them on a shared request queue
for retrieval by the workers.
Cont..
Improve Performance
Starting a thread to handle an incoming
request is much cheaper than starting a
new process
Having a single-threaded server prohibits
simply scaling the server to a
multiprocessor system
Hide network latency by reacting to next
request while previous one is being
replied
Cont..
Better Structure
Most servers have high I/O demands.
Using simple, well-understood blocking
calls simplifies the overall structure.
Multithreaded programs tend to be
smaller and easier to understand due to
simplified flow of control.
Multithreaded Clients
Multithreaded Web client
Web browser scans an incoming HTML page,
and finds that more files need to be fetched
Each file is fetched by a separate thread, each
doing a (blocking) HTTP request
As files come in, the browser displays them
Multiple RPCs
A client does several RPCs at the same time,
each one by a different thread
It then waits until all results have been returned.
Note: if RPCs are to different servers, we may
have a linear speed-up compared to doing RPCs
one after the other
Thread Functions
Thread(ThreadGroup group, Runnable target,
String name)
• Creates a new thread in the SUSPENDED
setPriority(int newPriority), getPriority()
• Set and return the thread’s priority.
run(): A thread executes the run() method of its
target object
start(): Change the state of the thread from
SUSPENDED to RUNNABLE.
sleep(int millisecs): Cause the thread to enter
the SUSPENDED state for the specified time.
yield(): Enter the READY state and invoke the
scheduler.
destroy(): Destroy the thread.
Java thread lifetimes
New thread created in the same JVM as
its creator in the SUSPENDED state
start() makes it runnable
• It executes the run() method of an object
designated in its constructor
A thread ends its life when it returns
from run() or when its destroy() method
is called
Execute on top of the OS
Servers
Basic model: A server is a process that
waits for incoming service requests at a
specific transport address.
Iterative server: the server itself
handles the request.
Concurrent server: does not handle
the request itself, but passes it to a
separate thread or another process, and
then immediately waits for the next
request.
Endpoints
Clients sends requests to an endpoint (port)
at the machine where the server is running.
One server per endpoint
Cont..
Stateless server: does not keep information of
the state of its clients and can change its own
state without informing its clients (e.g., a web
server)
Examples:
Don’t record whether a file has been opened (simply close it
again after access)
Don’t keep track of your clients
Clients and servers are completely independent
State inconsistencies due to client or server
crashes are reduced
Possible loss of performance because, e.g., a
server cannot anticipate client behaviour (think
of prefetching file blocks)
Cont..
Stateful server: maintain information
about its clients
Examples:
Record that a file has been opened, so that
prefetching can be done
Knows which data a client has cached, and
allows clients to keep local copies of shared
data
The performance of Stateful servers can
be extremely high, provided clients are
allowed to keep local copies.
Cookies?
Code Migration
Reasons for Migrating Code
Performance
• Load balancing
• Process data close to where they reside
• Parallelism (e.g., web search)
Flexibility/dynamic configuration
• Dynamically downloading client-side software
Models for Code Migration
What is moved?
The three segments of a process:
Code segment: the part that contains
the set of instructions that make up the
program that is being executed.
Resource segment: references to
external resources needed by the process
(e.g., files, devices, other processes)
Execution segment: the current
execution state of the process (program
counter, stack, etc)
Cont..
Weak mobility: move only the code
segment (plus perhaps some
initialization data)
Always start from its initial state
Example: Java applets
code shipping (push) code fetching (pull)
Strong mobility: move also the
execution segment
The process resumes execution from where
it was left off
Harder to implement
Cont..
Who initiates the movement?
Sender-initiated: migration is initiated
at the machine where the code currently
resides or is being executed
Example: uploading programs, sending
programs across the Internet
simpler to implement
Receiver-initiated: the initiative for
migration is taken by the target machine
Example: Java Applets
The End
Thanks