Chapter 2: ARCHITECTURES
Temesgen.H
Architectures
Distributed systems are complex.
In order to manage their intrinsic complexity,
distributed systems should be organized
properly.
Organization is mostly expressed in terms of its
software components.
Different ways to look at organization of
distributed systems –two ways
Software architecture – logical organization (of
software components and interconnections)
System architecture – physical realization (the
instantiation of software components on real
machines)
2
Architectural style
A architectural style is formulated in terms of
Components,
The way that components are connected to each other,
The data exchanged between components, and finally
A component is a modular unit with well-defined
interfaces that is replaceable within its environment.
A connector is a mechanism that mediates
communication, coordination, or cooperation
among components.
It allows for the flow of control between components
E.g., facilities for remote procedure call, message passing, or
streaming data.
3
Types of Architectural Styles
Common architectural styles of distributed
systems
• Layered architectures
• Object-based architectures
• Data-centered architectures
• Event-based architectures
The common technique:
Organize your system into logically different
components, and distribute those components over
the various machines.
Goal :
achieving (at a reasonable level) distribution
transparency.
4
Layered architectural style
It is hierarchical organization
Components are organized in a layered fashion
Components of a layer makes a down-call to components of the layer
below
Only in exception, an up-call is made to higher level component
Each layer exposes an interface to be used by above layers
“Multi-level client-server”
Each layer acts as a
Server: service provider to layers “above”
Client :service consumer of layer(s) “below”
Communication protocol-stacks are a typical examples
OSI Reference model
TCP/IP
Figure 2-1. The (a) layered architectural style
5
Layered Architecture: Example
6
Object-Based Architectures
Components are objects
Objects are easy to be replaced so long as the interface is not touched
It is less structured and hence a relatively loose organization
The calling object might not run on the same machine as the
called object
Connectors are RPC and RMI
Notes:
Layered and object-based styles are the most important styles for distributed
systems today
7
Data-Centered Architectures
Access and update of data store is the main purpose of the
system.
Processes communicate/exchange info primarily by reading
and modifying data in some shared repository (e.g database,
distributed file system).
Example, web based Distributed systems are largely data
centric
Components:
Data store,
Components, that interact with the store
Connectors:
Queries
8
Event–based Architecture
Event based architecture supports publish-subscribe
communication
Publisher: components that announce data to be shared
Subscriber: components register their interest for
published data.
Decouples sender and receiver (asynchronous
communication)
Both parties don’t need to be up at the time of communication
Event can be considered as “a significant change in
state”
Components:
Can be an instance of a class or simply a module.
Connectors:
Event buses
9
System Architectures
The software components, their interactions,
and their placement leads to an instance of a
software architecture, also called a system
architecture.
System architecture are of three types:
Centralized - most components located on a
single machine
Decentralized - most machines have
approximately the same functionality
Hybrid - some combination
10
Centralized Architecture
In the basic client-server model, processes in a distributed system
are divided into two (possibly overlapping) groups.
Server:- is a process implementing a specific service E.g File
server
Client:-is a process that requests a service
Clients and servers can be on different machines
Clients follow request/reply model with respect to using services
Figure General interaction between a client and a server.
11
Cont...
Communication between a client and a server can be
implemented by :
Connectionless protocol when the underlying network is fairly
reliable like local-area networks (UDP)
Connection-oriented protocol in WANs, (TCP)
Connectionless communication is efficient
Simply packages a message for the server, identifying the service
it wants, along with the necessary input data
But, it is hard for a sender to detect if the message is
successfully received
Failure of any sort means no reply
Possibilities:
Request message was lost
Reply message was lost
Server failed either before, during or after performing the service
12
Cont ...
Typical tackling to lost request in connectionless
communication:
Re-transmission (resending request )
Good for idempotent operations, i.e., operations that could be
repeated more than once without harm. E.g., “Return
current value of X”
Not good for non idempotent operations like “ increase
value of x by 100”
Because, may result in performing the operation twice
In this case reporting an error is appropriate, than resending
For these reason many distributed systems use connection-
oriented protocols
Not good enough in LAN as it is slow
However, it fits the unreliable WAN environment
Example, Virtually all internet applications are based on TCP/IP
connections
13
Application Layering
The client-server model has been subject to many debates and
controversies
One issue was how to draw a clear distinction between a client and a server
Many client-server system can be divided into three levels
The user-interface level
The processing level
The data level
The simplified organization of an Internet search engine into three different layers.
14
Logical Architecture vs. Physical Architecture
Layer and tier are roughly equivalent terms, but
Layer typically implies software and
Tier is more likely to refer to hardware.
Logical organization is not physical organization.
Physical architecture may or may not match the logical
architecture.
Logically separate components might reside on single machine
or on different machines
Clients and servers could be placed on the same node, or
be distributed according to several different topologies.
Single-Tier Architecture: dumb terminal/mainframe
configuration
Two-Tier Architecture: client/single server configuration
Three-Tier Architecture: each layer on separate machine
Two-tier and three-tier are the most common
15
Two-Tiered Architecture
Where are the three application-layers placed?
On the client machines, or on the server machines?
A range of possible solutions:
Thin-Client- A client machine only implements (part of) the
user-interface level
A server machine implementing the rest, i.e, the processing
and data levels
Pros: easier to manage, more reliable, client machines don’t
need to be so large and powerful
Con: perceived performance loss at client
Fat-Client - All user interface, application processing and
some data resides at the client
Pros: reduces work load at server;
More scalable
Cons: harder to manage by system admin,
Less secure
Other solutions in between thin-client and fat-client
16
Two-tiered Architectures
Thin client --------
Fat-client
17
Three-tiered
The server tier in two-tiered architecture becomes more and
more distributed
A single server is no longer adequate for modern information
systems
This leads to three-tiered architecture
Server may acting as a client
Three-tiered: each of the three layers corresponds to three
separate machines.
18
Decentralized Architectures
Placing logically different components on
different machines is called vertical distribution
User-interface, Processing components and a data
level are on different machine
It is similar with the concept of vertical fragmentation
in distributed database where
Tables are split into column wise and distributed on different
machines
The advantage of VD is that each machine can be
tailored for specific type of function
19
Cont…
An alternative to VD is horizontal distribution
A client or server may be physically split up into logically
equivalent parts
Each part operates on its own share of the complete data
set,
This results in balanced work load
Again this one is similar with that of horizontal
fragmentation in distributed database where
Tables are split row wise, and subset of rows distributed
onto different machines
Peer-to-peer systems are a class of modern
architectures that support horizontal
distribution.
The functions that need to be carried out are represented
by every process that constitute the distributed system
20
Peer-to-peer systems
P2P systems partitions tasks or work loads
between peers
Often, the processes that constitute the system
are all equal
Nodes act as both client and server;
Much of the interaction is symmetric.
Advantages of peer-to-peer system
Have better load balancing
More resistant to denial-of-service attacks,
But, harder to manage than client-server systems.
21
Overlay network
Nodes of the P2P distributed system are connected
using overlay network
It is network that is built on top of another network
Nodes are formed by the processes of the network.
Overlay networks in the P2P system:
Define the structure between nodes in the system.
Allow nodes to route requests to locations that may not
be known at time of request.
The main question for peer-to-peer system is
How to organize the processes in an overlay network
Their organization can be:
Unstructured P2P:
Structured P2P:
Hybrid P2P:
22
Unstructured P2P architecture
Largely relying on randomized algorithm to
construct the overlay network
Each node has a list of neighbours, which is more
or less constructed in a random way
One challenge is how to efficiently locate a
needed data item
The two common approaches are
Flooding
Random walk
23
Cont…
Flooding:
Issuing node u passes request for data d to all
neighbors.
Request is ignored when receiving node had
seen it before. Otherwise, v searches locally
for d (recursively).
Return d if found, Otherwise forward the
request to the neighbors
However, this approach causes high
signalling traffic over the network
May be limited by a Time-To-Live: a maximum number
of hops.
24
Cont…
Random walk:
Issuing node u passes request for d to
randomly chosen neighbor, v.
If v does not have d, it forwards request to one
of its randomly chosen neighbors, and so on.
25
Structured P2P
Nodes are organized following a specific distributed
data structure.
The most common one is distributed hash table
(DHT)
In such systems, each data item is uniquely
associated with a key, in turn used as an index.
Each node is responsible to store data that are
associated with subset of these keys
P2P system now responsible for storing (key, value) pairs
Looking up data d with key k means routing request
to node with identifier k.
Example
chord
26
Hybrid Architectures
Many distributed systems require properties from both
client-server and peer-to-peer architectures.
So, they put together features from both centralized and
decentralized architectures, resulting in hybrid architectures.
Some nodes are appointed special functions in a well
organized fashion
Examples
Edge-server systems: placed at the edge of enterprise
network
E.g., ISPs, which act as servers to their clients, but cooperate
with other edge servers to host shared content
Collaborative distributed systems:
E.g., BitTorrent, which supports parallel downloading and
uploading of chunks of a file.
First, interact with client-server system to download the torrent
file, and then operate in decentralized manner.
27
End of Chapter 2