0% found this document useful (0 votes)
83 views49 pages

Unit 1 - Distributed System

Uploaded by

sreedhanitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views49 pages

Unit 1 - Distributed System

Uploaded by

sreedhanitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Characteristics of Distributed System:

• Resource Sharing: It is the ability to use any Hardware, Software, or Data anywhere
in the System.
• Openness: It is concerned with Extensions and improvements in the system (i.e., How
openly the software is developed and shared with others)
• Concurrency: It is naturally present in Distributed Systems, that deal with the same
activity or functionality that can be performed by separate users who are in remote
locations. Every local system has its independent Operating Systems and Resources.
• Scalability: It increases the scale of the system as a number of processors communicate
with more users by accommodating to improve the responsiveness of the system.
• Fault tolerance: It cares about the reliability of the system if there is a failure in
Hardware or Software, the system continues to operate properly without degrading the
performance the system.
• Transparency: It hides the complexity of the Distributed Systems to the Users and
Application programs as there should be privacy in every system.

Properties:
• Scalability:
The ability to expand a system by adding more nodes to handle increased workload without
significant performance degradation.
• Fault Tolerance:
The system's capability to continue functioning even when individual components fail,
achieved through redundancy and error handling mechanisms.
• Availability:
The system remains accessible to users even during partial failures, ensuring continuous
service.
• Consistency:
Maintaining data integrity across multiple nodes, ensuring consistent access to information
regardless of which node is accessed.
• Concurrency:
Multiple operations can be executed simultaneously on different nodes within the system.
• Transparency:
Users should not be aware of the underlying distributed nature of the system, experiencing a
seamless interaction.
• Reliability:
The system is dependable and can be relied upon to perform its intended functions consistently.

Types of Distributed System

TYPES OF DISTRIBUTED SYSTEMS


Before starting to discuss the principles of distributed systems, let us first take a closer look at the
various types of distributed systems. In the following we make a distinction between distributed
computing systems, distributed information systems, and distributed embedded systems.
Types of Distributed Systems
• Distributed Computing Systems
o Cluster Computing Systems
o Grid Computing Systems
• Distributed Information Systems
o Transaction Processing Systems
o Enterprise Application Integration
• Distributed Pervasive Systems
o Home Systems
o Electronic Health Care Systems
o Sensor Networks

1. Distributed Computing Systems


An important class of distributed systems is the one used for high-performance computing tasks.
Roughly speaking, one can make a distinction between two subgroups. In cluster computing the
underlying hardware consists of a collection of similar workstations or PCs, closely connected by means
of a highspeed local-area network. In addition, each node runs the same operating system.
The situation becomes quite different in the case of grid computing. This subgroup consists of
distributed systems that are often constructed as a federation of computer systems, where each system
may fall under a different administrative domain, and may be very different when it comes to hardware,
software, and deployed network technology.
a. Cluster Computing Systems
Cluster computing systems became popular when the price/performance ratio of personal computers
and workstations improved. At a certain point, it became financially and technically attractive to build
a supercomputer using off-the-shelf technology by simply hooking up a collection of relatively simple
computers in a high-speed network. In virtually all cases, cluster computing is used for parallel
programming in which a single (compute intensive) program is run in parallel on multiple machines.
b. Grid Computing Systems
A characteristic feature of cluster computing is its homogeneity. In most cases, the computers in a cluster
are largely the same, they all have the same operating system, and are all connected through the same
network. In contrast, grid computing systems have a high degree of heterogeneity: no assumptions are
made concerning hardware, operating systems, networks, administrative domains, security policies, etc.

2. Distributed Information Systems


Another important class of distributed systems is found in organizations that were confronted with a
wealth of networked applications, but for which interoperability turned out to be a painful experience.
Many of the existing middleware solutions are the result of working with an infrastructure in which it
was easier to integrate applications into an enterprise-wide information system
a. Transaction Processing Systems
BEGIN_ TRANSACTION and END_TRANSACTION are used to delimit the scope of a transaction.
The operations between them form the body of the transaction. The characteristic feature of a
transaction is either all of these operations are executed or none are executed. These may be system
calls, library procedures, or bracketing statements in a language, depending on the implementation.
This all-or-nothing property of transactions is one of the four characteristic properties that transactions
have. More specifically, transactions are:
1. Atomic: To the outside world, the transaction happens indivisibly.
2. Consistent: The transaction does not violate system invariants.
3. Isolated: Concurrent transactions do not interfere with each other.
4. Durable: Once a transaction commits, the changes are permanent.
These properties are often referred to by their initial letters: ACID. transactions have been defined on a
single database. A nested transaction is constructed from a number of subtransactions, as shown in Fig.
1-9. The top-level transaction may fork off children that run in parallel with one another, on different
machines, to gain performance or simplify programming. Each of these children may also execute one
or more subtransactions, or fork off its own children.

b. Enterprise Application Integration


The main idea was that existing applications could directly exchange information, as shown in Fig. 1-
11.
[Link] Pervasive Systems
a. Home Systems
An increasingly popular type of pervasive system, but which may perhaps be the least constrained, are
systems built around home networks. These systems generally consist of one or more personal
computers, but more importantly integrate typical consumer electronics such as TVs, audio and video
equipment. Gaming devices, (smart) phones, PDAs, and other personal wearables into a single system.
In addition, we can expect that all kinds of devices such as kitchen appliances, surveillance cameras,
clocks, controllers for lighting, and so on, will all be hooked up into a single distributed system.

b. Electronic Health Care Systems


Another important and upcoming class of pervasive systems are those related to (personal) electronic
health care. With the increasing cost of medical treatment, new devices are being developed to monitor
the well-being of individuals and to automatically contact physicians when needed. In many of these
systems, a major goal is to prevent people from being hospitalized.
c. Sensor Networks
Our last example of pervasive systems is sensor networks. These networks in many cases form part of
the enabling technology for pervasiveness and we see that many solutions for sensor networks return in
pervasive applications. What makes sensor networks interesting from a distributed system's perspective
is that in virtually all cases they are used for processing information.

GOALS
Just because it is possible to build distributed systems does not necessarily mean that it is a
good idea. After all, with current technology it is also possible to put four floppy disk drives
on a personal computer. It is just that doing so would be pointless. In this section we discuss
four important goals that should be met to make building a distributed system worth the effort.
A distributed system should make resources easily accessible; it should reasonably hide the
fact that resources are distributed across a network; it should be open; and it should be scalable.
1. Making Resources Accessible
The main goal of a distributed system is to make it easy for the users to access remote resources,
and to share them in a controlled and efficient way. Resources can be just about anything, but
typical examples include2. Enhancing Reliability things like printers, computers, storage
facilities, data, files, Web pages, and net works, to name just a few. There are many reasons for
wanting to share resources. One obvious reason is that of economics. For example, it is cheaper
to let a printer be shared by several users in a smaJl office than having to buy and maintain a
separate printer for each user. Likewise, it makes economic sense to share costly resources such
as supercomputers, high-performance storage systems, imagesetters, and other expensive
peripherals.
2. Distribution Transparency
An important goal of a distributed system is to hide the fact that its processes and resources are
physically distributed across multiple computers. A distributed system that is able to present
itself to users and applications as if it were only a single computer system is said to be
transparent.
Types of Transparency
The concept of transparency can be applied to several aspects of a distributed
system, the most important ones shown in Fig 1-2

3. Openness
Another important goal of distributed systems is openness. An open distributed system is a
system that offers services according to standard rules that describe the syntax and semantics
of those services. For example, in computer networks, standard rules govern the format,
contents, and meaning of messages sent and received. Such rules are formalized in protocols.
In distributed systems, services are generally specified through interfaces, which are often
described in an Interface Definition Language (IDL). Interface definitions written in an IDL
nearly always capture only the syntax of services. In other words, they specify precisely the
names of the functions that are available together with types of the parameters, return values,
possible exceptions that can be raised, and so on. The hard part is specifying precisely what
those services do, that is, the semantics of interfaces. In practice, such specifications are always
given in an informal way by means of natural language.

4. Scalability
Worldwide connectivity through the Internet is rapidly becoming as common as being able to
send a postcard to anyone anywhere around the world. With this in mind, scalability is one of
the most important design goals for developers of distributed systems.
Scalability Problems
When a system needs to scale, very different types of problems need to be solved. Let us first
consider scaling with respect to size. If more users or resources need to be supported, we are
often confronted with the limitations of centralized services, data, and algorithms
Only decentralized algorithms should be
used. These algorithms generally have the following characteristics, which distinzuish
them from centralized algorithms: e
1. No machine has complete information about the system state.
2. Machines make decisions based only on local information,
3. Failure of one machine does not ruin the algorithm.
4. There is no implicit assumption that a global clock exists.
Scaling Techniques
Approach of shipping code is now widely supported by the Web in the form of Java applets
and Javascript.

Another important scaling technique is distribution.


Distribution involves taking a component, splitting it into smaller parts, and subsequently
spreading those parts across the system. An excellent example of distribution is the Internet
Domain Name System (DNS). The DNS name space is hierarchically organized into a tree of
domains, which are divided into nonoverlapping zones, as shown in Fig. 1-5. The names in
each zone are handled by a single name server. Without going into too many details, one can
think of each path name, being the name of a host in the Internet, and thus associated with a
network address of that host

Pitfalls
It should be clear by now that developing distributed systems can be a formidable task. As we
will see many times throughout this book, there are so many issues to consider at the same time
that it seems that only complexity can be the result. Nevertheless, by following a number of
design principles, distributed systems can be developed that strongly adhere to the goals we set
out in this chapter. Many principles follow the basic rules of decent software engineering and
wiJI not be repeated here.
However, distributed systems differ from traditional software because components are
dispersed across a network. Not taking this dispersion into account during design time is what
makes so many systems needlessly complex and results in mistakes that need to be patched
later on. Peter Deutsch, then at Sun Microsystems, formulated these mistakes as the following
false assumptions that everyone makes when developing a distributed application for the first
time:
1. The network is reliable.
2. The network is secure.
3. The network is homogeneous.
4. The topology does not change.
5. Latency is zero.
6. Bandwidth is infinite.
7. Transport cost is zero.
8. There is one administrator.

Types of Transparency in Distributed System


In distributed systems, transparency plays a pivotal role in abstracting complexities and
enhancing user experience by hiding system intricacies. This article explores various types of
transparency—ranging from location and access to failure and security—essential for seamless
operation and efficient management in distributed computing environments. Understanding
these transparency types illuminates how distributed systems achieve reliability, scalability,
and maintainability.

1. Location Transparency
Location transparency refers to the ability to access distributed resources without knowing their
physical or network locations. It hides the details of where resources are located, providing a
uniform interface for accessing them.
• Importance: Enhances system flexibility and scalability by allowing resources to be
relocated or replicated without affecting applications.
• Examples:
o DNS (Domain Name System): Maps domain names to IP addresses, providing
location transparency for web services.
o Virtual Machines (VMs): Abstract hardware details, allowing applications to
run without knowledge of the underlying physical servers.
2. Access Transparency
Access transparency ensures that users and applications can access distributed resources
uniformly, regardless of the distribution of those resources across the network.
• Significance: Simplifies application development and maintenance by providing a
consistent method for accessing distributed services and data.
• Methods:
o Remote Procedure Call (RPC): Allows a program to call procedures located
on remote systems as if they were local.
o Message Queues: Enable asynchronous communication between distributed
components without exposing the underlying communication mechanism.
3. Concurrency Transparency
Concurrency transparency hides the complexities of concurrent access to shared resources in
distributed systems from the application developer. It ensures that concurrent operations do not
interfere with each other.
• Challenges: Managing synchronization, consistency, and deadlock avoidance in a
distributed environment where multiple processes or threads may access shared
resources simultaneously.
• Techniques:
o Locking Mechanisms: Ensure mutual exclusion to prevent simultaneous
access to critical sections of code or data.
o Transaction Management: Guarantees atomicity, consistency, isolation, and
durability (ACID properties) across distributed transactions.
4. Replication Transparency
Replication transparency ensures that clients interact with a set of replicated resources as if
they were a single resource. It hides the presence of replicas and manages consistency among
them.
• Strategies: Maintaining consistency through techniques like primary-backup
replication, where one replica (primary) handles updates and others (backups) replicate
changes.
• Applications:
o Content Delivery Networks (CDNs): Replicate content across geographically
distributed servers to reduce latency and improve availability.
o Database Replication: Copies data across multiple database instances to
enhance fault tolerance and scalability.
5. Failure Transparency
Failure transparency ensures that the occurrence of failures in a distributed system does not
disrupt service availability or correctness. It involves mechanisms for fault detection, recovery,
and resilience.
• Approaches:
o Heartbeating: Periodically checks the availability of nodes or services to detect
failures.
o Replication and Redundancy: Uses redundant components or data replicas to
continue operation despite failures.
• Examples:
o Load Balancers: Distribute traffic across healthy servers and remove failed
ones from the pool automatically.
o Automatic Failover: Redirects requests to backup resources or nodes when
primary resources fail.
6. Performance Transparency
Performance transparency ensures consistent performance levels across distributed nodes
despite variations in workload, network conditions, or hardware capabilities.
• Challenges: Optimizing resource allocation and workload distribution to maintain
predictable performance levels across distributed systems.
• Strategies:
o Load Balancing: Distributes incoming traffic evenly across multiple servers to
optimize resource utilization and response times.
o Caching: Stores frequently accessed data closer to clients or within nodes to
reduce latency and improve responsiveness.
7. Security Transparency
Security transparency ensures that security mechanisms and protocols are integrated into a
distributed system seamlessly, protecting data and resources from unauthorized access or
breaches.
• Importance: Ensures confidentiality, integrity, and availability of data and services in
distributed environments.
• Techniques:
o Encryption: Secures data at rest and in transit using cryptographic algorithms
to prevent eavesdropping or tampering.
o Access Control: Manages permissions and authentication to restrict access to
sensitive resources based on user roles and policies.
8. Management Transparency
Management transparency simplifies the monitoring, control, and administration of distributed
systems by providing unified visibility and control over distributed resources.
• Methods: Utilizes automation, monitoring tools, and centralized management
interfaces to streamline operations and reduce administrative overhead.
• Examples:
o Cloud Management Platforms (CMPs): Provide unified interfaces for
provisioning, monitoring, and managing cloud resources across multiple
providers.
o Configuration Management Tools: Automate deployment, configuration, and
updates of software and infrastructure components in distributed environments.
These types of transparency are essential for designing robust, scalable, and maintainable
distributed systems, ensuring seamless operation, optimal performance, and enhanced security
in cloud computing and other distributed computing environments.
2.1 ARCHITECTURAL STYLES

Architecture styles in distributed system, define how components interact and are structured to
achieve scalability, reliability, and efficiency. This article explores key architecture styles—
including Peer-to-Peer, SOA, and others—highlighting their concepts, advantages, and
applications in building robust distributed systems.

Using components and connectors, we can come to various configurations, which, in tum have
been classified into architectural styles. Several styles have by now been identified, of which the
most important ones for distributed systems are:

1. Layered architectures
2. Object-based architectures
3. Data-centered architectures
4. Event-based architectures

1. Layered Architecture in Distributed Systems

Layered Architecture in distributed systems organizes the system into hierarchical layers, each
with specific functions and responsibilities. This design pattern helps manage complexity and
promotes separation of concerns. Here’s a detailed explanation:
• In a layered architecture, the system is divided into distinct layers, where each layer
provides specific services and interacts only with adjacent layers.
• This separation helps in managing and scaling the system more effectively.

Layers and Their Functions

• Presentation Layer
o Function: Handles user interaction and presentation of data. It is responsible for
user interfaces and client-side interactions.
o Responsibilities: Rendering data, accepting user inputs, and sending requests to
the underlying layers.
• Application Layer
o Function: Contains the business logic and application-specific functionalities.
o Responsibilities: Processes requests from the presentation layer, executes
business rules, and provides responses back to the presentation layer.

Advantages of Layered Architecture in Distributed System

• Separation of Concerns: Each layer focuses on a specific aspect of the system, making it
easier to develop, test, and maintain.
• Modularity: Changes in one layer do not necessarily affect others, allowing for more
flexible updates and enhancements.
• Reusability: Layers can be reused across different applications or services within the same
system.
• Scalability: Different layers can be scaled independently to handle increased load or
performance requirements.

Disadvantages of Layered Architecture in Distributed System

• Performance Overhead: Each layer introduces additional overhead due to data passing
and processing between layers.
• Complexity: Managing interactions between layers and ensuring proper integration can be
complex, particularly in large-scale systems.
• Rigidity: The strict separation of concerns might lead to rigidity, where changes in the
system’s requirements could require substantial modifications across multiple layers.

Examples of Layered Architecture in Distributed System


• Web Applications: A common example includes web applications with a presentation layer
(user interface), application layer (business logic), and data access layer (database
interactions).
• Enterprise Systems: Large enterprise systems often use layered architecture to separate
user interfaces, business logic, and data management.

2. Peer-to-Peer (P2P) Architecture in Distributed Systems

Peer-to-Peer (P2P) Architecture is a decentralized network design where each node, or “peer,”
acts as both a client and a server, contributing resources and services to the network. This
architecture contrasts with traditional client-server models, where nodes have distinct roles as
clients or servers.
• In a P2P architecture, all nodes (peers) are equal participants in the network, each capable
of initiating and receiving requests.
• Peers collaborate to share resources, such as files or computational power, without relying
on a central server.
Key Features of Peer-to-Peer (P2P) Architecture in Distributed Systems

• Decentralization
o Function: There is no central server or authority. Each peer operates
independently and communicates directly with other peers.
o Advantages: Reduces single points of failure and avoids central bottlenecks,
enhancing robustness and fault tolerance.
• Resource Sharing
o Function: Peers share resources such as processing power, storage space, or
data with other peers.
o Advantages: Increases resource availability and utilization across the network.

• Scalability
o Function: The network can scale easily by adding more peers. Each new peer
contributes additional resources and capacity.
o Advantages: The system can handle growth in demand without requiring
significant changes to the underlying infrastructure.
• Self-Organization
o Function: Peers organize themselves and manage network connections
dynamically, adapting to changes such as peer arrivals and departures.
o Advantages: Facilitates network management and resilience without central
coordination.

Advantages of Peer-to-Peer (P2P) Architecture in Distributed Systems

• Fault Tolerance: The decentralized nature ensures that the failure of one or several peers
does not bring down the entire network.
• Cost Efficiency: Eliminates the need for expensive central servers and infrastructure by
leveraging existing resources of the peers.
• Scalability: Easily accommodates a growing number of peers, as each new peer enhances
the network’s capacity.
Disadvantages of Peer-to-Peer (P2P) Architecture in Distributed Systems

• Security: Decentralization can make it challenging to enforce security policies and manage
malicious activity, as there is no central authority to oversee or control the network.
• Performance Variability: The quality of services can vary depending on the peers’
resources and their availability, leading to inconsistent performance.
• Complexity: Managing connections, data consistency, and network coordination without
central control can be complex and may require sophisticated protocols.

Examples of Peer-to-Peer (P2P) Architecture in Distributed Systems


• File Sharing Networks: Systems like BitTorrent allow users to share and download files
from multiple peers, with each peer contributing to the upload and download processes.
• Decentralized Applications (DApps): Applications that run on decentralized networks,
leveraging P2P architecture for tasks like data storage and computation.

3. Data-Centric Architecture in Distributed Systems

Data-Centric Architecture is an architectural style that focuses on the central management and
utilization of data. In this approach, data is treated as a critical asset, and the system is
designed around data management, storage, and retrieval processes rather than just the
application logic or user interfaces.
• The core idea of Data-Centric Architecture is to design systems where data is the primary
concern, and various components or services are organized to support efficient data
management and manipulation.
• Data is centrally managed and accessed by multiple applications or services, ensuring
consistency and coherence across the system.

Key Principles of Data-Centric Architecture in Distributed Systems

• Centralized Data Management:


o Function: Data is managed and stored in a central repository or database,
making it accessible to various applications and services.
o Principle: Ensures data consistency and integrity by maintaining a single source
of truth.
• Data Abstraction:
o Function: Abstracts the data from the application logic, allowing different
services or applications to interact with data through well-defined interfaces.
o Principle: Simplifies data access and manipulation while hiding the underlying
complexity.
• Data Normalization:
o Function: Organizes data in a structured manner, often using normalization
techniques to reduce redundancy and improve data integrity.
o Principle: Enhances data quality and reduces data anomalies by ensuring
consistent data storage.
• Data Integration:
o Function: Integrates data from various sources and systems to provide a unified
view and enable comprehensive data analysis.
o Principle: Supports interoperability and facilitates comprehensive data analysis
across diverse data sources.
• Scalability and Performance:
o Function: Designs the data storage and management systems to handle
increasing volumes of data efficiently.
o Principle: Ensures the system can scale to accommodate growing data needs
while maintaining performance.

Advantages and Disadvantages of Data-Centric Architecture in Distributed Systems


• Advantages:
o Consistency: Centralized data management helps maintain a single source of
truth, ensuring data consistency across the system.
o Integration: Facilitates easy integration of data from various sources, providing
a unified view and enabling better decision-making.
o Data Quality: Data normalization and abstraction help improve data quality and
reduce redundancy, leading to more accurate and reliable information.
o Efficiency: Centralized management can optimize data access and retrieval
processes, improving overall system efficiency.

• Disadvantages:
o Single Point of Failure: Centralized data repositories can become a bottleneck
or single point of failure, potentially impacting system reliability.
o Performance Overhead: Managing large volumes of centralized data can
introduce performance overhead, requiring robust infrastructure and
optimization strategies.
o Complexity: Designing and managing a centralized data system can be
complex, especially when dealing with large and diverse datasets.
o Scalability Challenges: Scaling centralized data systems to accommodate
increasing data volumes and access demands can be challenging and may
require significant infrastructure investment.

Examples of Data-Centric Architecture in Distributed Systems


• Relational Databases: Systems like MySQL, PostgreSQL, and Oracle use Data-Centric
Architecture to manage and store structured data efficiently, providing consistent access
and integration across applications.
• Data Warehouses: Platforms such as Amazon Redshift and Google BigQuery are designed
to centralize and analyze large volumes of data from various sources, enabling complex
queries and data analysis.
• Enterprise Resource Planning (ERP) Systems: ERP systems like SAP and Oracle ERP
integrate various business functions (e.g., finance, HR, supply chain) around a centralized
data repository to support enterprise-wide operations and decision-making.

4. Event-Based Architecture in Distributed Systems


Event-Driven Architecture (EDA) is an architectural pattern where the flow of data and control
in a system is driven by events. Components in an EDA system communicate by producing
and consuming events, which represent state changes or actions within the system.

Key Principles of Event-Based Architecture in Distributed Systems


• Event Producers: Components or services that generate events to signal state changes or
actions.
• Event Consumers: Components or services that listen for and react to events, processing
them as needed.
• Event Channels: Mechanisms for transmitting events between producers and consumers,
such as message queues or event streams.
• Loose Coupling: Producers and consumers are decoupled, interacting through events
rather than direct calls, allowing for more flexible system interactions.

Advantages and Disadvantages of Event-Based Architecture in Distributed Systems


• Advantages:
o Scalability: Supports scalable and responsive systems by decoupling event
producers from consumers.
o Flexibility: Allows for dynamic and real-time processing of events, adapting to
changing conditions.
o Responsiveness: Enables systems to react immediately to events, improving
responsiveness and user experience.
• Disadvantages:
o Complexity: Managing event flow, ensuring reliable delivery, and handling
event processing can be complex.
o Event Ordering: Ensuring correct processing order of events can be
challenging, especially in distributed systems.
o Debugging and Testing: Troubleshooting issues in an event-driven system can
be difficult due to asynchronous and distributed nature.
Examples and Use Cases of Event-Based Architecture in Distributed Systems
• Real-Time Analytics: Systems like stock trading platforms use EDA to process and respond
to market events in real time.
• IoT Systems: Internet of Things (IoT) applications use EDA to manage and respond to data
from various sensors and devices.
• Fraud Detection: Financial institutions use EDA to detect and respond to suspicious
activities or anomalies in real time.

2.2 SYSTEM ARCHITECTURES

Now that we have briefly discussed some common architectural styles, let us take a look at how
many distributed systems are actually organized by considering where software components are
placed. Deciding on software components, their interaction, and their placement leads 10 an
instance of a software architecture, also called a system architecture (Bass et aI., 2003). We will
discuss centralized and decentralized organizations, as wen as various hybrid forms.

2.2.1 Centralized Architectures

The centralized architecture is defined as every node being connected to a central coordination
system, and whatever information they desire to exchange will be shared by that system. A
centralized architecture does not automatically require that all functions must be in a single
place or circuit, but rather that most parts are grouped and none are repeated elsewhere as
would be the case in a distributed architecture.
It consists following types of architecture:
• Client-server
• Application Layering

Client Server
Processes in a distributed system are split into two (potentially overlapping) groups in the
fundamental client-server architecture. A server is a program that provides a particular service,
such as a database service or a file system service. A client is a process that sends a request to
a server and then waits for the server to respond before requesting a service from it. This
client-server interaction, also known as request-reply behavior is shown in the figure below:
When the underlying network is reasonably dependable, as it is in many local-area networks,
communication between a client and a server can be implemented using a straightforward
connection-less protocol. In these circumstances, a client simply bundles a message for the
server specifying the service they want along with the relevant input data when they make a
service request. After that, the server receives the message. The latter, on the other hand, will
always await an incoming request, process it after that, and then package the outcomes in a
reply message that is then provided to the client.
Efficiency is a clear benefit of using a connectionless protocol. The request/reply protocol just
sketched up works as long as communications do not get lost or damaged. It is unfortunately
not easy to make the protocol robust against occasional transmission errors. When no reply
message is received, our only option is to perhaps allow the client to resubmit the request.
However, there is a problem with the client’s ability to determine if the original request
message was lost or if the transmission of the reply failed.
A reliable connection-oriented protocol is used as an alternative by many client-server systems.
Due to its relatively poor performance, this method is not totally suitable for local-area
networks, but it is ideal for wide-area networks where communication is inherently unreliable.

Application Layering
However, many individuals have urged a distinction between the three levels below,
effectively adhering to the layered architectural approach we previously described, given that
many client-server applications are intended to provide user access to databases:

• The user-interface level


• The processing level
• The data level
Everything required to connect with the user, such as display management directly, is
contained at the user-interface level. Applications are often found at the processor level. The
actual data that is used to make decisions is managed at the data level.
The User Interface Level
The user-interface level is often implemented by clients. Programs that let users interact with
applications make up this level. The sophistication level across application programs differs
significantly. A character-based screen is the most basic user-interface application.
Typically, mainframe environments have employed this kind of interface. One hardly ever
speaks of a client-server setup in situations where the mainframe manages every aspect of
interaction, including the keyboard and monitor.
The Processing Level
This is the middle part of the architecture. This is a logical part of the system where all the
processing actions are performed on the user interaction and the data level. It processes the
requests from the user interface and performs certain operations.
The Data Level
The data level in the client-server model contains the programs that maintain the actual data on
which the applications operate. An important property of this level is that data are often
persistent, that is, even if no application is running, data will be stored somewhere for the next
use. In its simplest form, the data level consists of a file system, but it is more common to use a
full-fledged database. In the client-server model, the data level is typically implemented on the
server side.

Consider an Internet search engine. The user interface of a search engine is very simple: a user
types in a string of keywords and is subsequently presented with a list of titles of Web pages.
The back end is formed by a huge database of Web pages that have been prefetched and
indexed. The core of the search engine is a program that transforms the user’s string of
keywords into one or more database queries. It subsequently ranks the results into a list and
transforms that list into a series of HTML pages. Within the client-server model, this
information retrieval part is typically placed at the processing level.
Multitiered Architectures
Multitiered Architectures in Distributed Systems explains how complex computer systems are
organized into different layers or tiers to improve performance and manageability. Each tier
has a specific role, such as handling user interactions, processing data, or storing information.
• By dividing tasks among these tiers, systems can run more efficiently, be more secure, and
handle more users at once.
• This architecture is widely used in modern applications like web services, where front-end
interfaces, business logic, and databases are separated to enhance functionality and
scalability.

Tiered Architecture in Distributed System


Tiered architecture, or multitiered architecture, is a design approach used in software
development to separate functionalities into distinct layers or tiers. Each tier has a specific role,
allowing for better organization, scalability, and system maintainability. Here's an overview of
the common tiers in a multitiered architecture:

1. Presentation Tier:
The presentation tier, also known as the user interface tier, is responsible for presenting
information to users and accepting user inputs. Its main purpose is to handle user interactions
and display data in a human-readable format. This tier provides the interface through which
users interact with the application.
Components:
• User Interfaces: This includes web browsers, mobile applications, desktop applications, or
any other means through which users interact with the system.
• UI Components: Components such as forms, buttons, menus, and other graphical elements
that enable user interaction.
• Presentation Logic: Code responsible for controlling the behavior and appearance of the
user interface.
Example: Consider an e-commerce website. The presentation tier would include the web
pages users see when browsing products, adding items to their cart, and completing their
purchases. It encompasses the visual design, layout, and interactive elements like buttons and
forms.

2. Application Tier:
The application tier, also referred to as the business logic tier or middle tier, contains the core
logic and functionality of the application. It processes user requests, implements business
rules, performs computations, and coordinates the application's overall behavior. This tier
acts as an intermediary between the presentation tier and the data tier.
Components:
• Application Servers: These servers execute the application's code and handle business
logic tasks.
• APIs (Application Programming Interfaces): Interfaces that allow different parts of the
application to communicate with each other or with external systems.
• Business Logic: Code responsible for implementing the application's rules, workflows, and
algorithms.
• Middleware: Software components that facilitate communication and integration between
different parts of the system.
Example: In an online banking system, the application tier would manage tasks such as
validating user credentials, processing transactions, checking account balances, and generating
reports. It ensures that business rules are enforced, transactions are secure, and data integrity is
maintained.
3. Data Tier:
The data tier, also known as the persistence tier or backend tier, is responsible for managing
data storage, retrieval, and manipulation. It stores the application's data in a structured
format, making it accessible to other tiers as needed. This tier ensures data integrity, security,
and efficiency in data operations.
Components:
• Databases: Systems such as relational databases, NoSQL databases, or file systems used to
store and organize data.
• Data Access Layer: Components responsible for interacting with the database, executing
queries, and handling data retrieval and manipulation.
• Data Models: Structures that represent the organization and relationships of the data stored
in the database.
Example: In a social media platform, the data tier would manage user profiles, posts,
comments, and other content. It would include databases to store user information,
relationships between users, posts, comments, media files, and any other relevant data. The
data tier ensures that data is stored securely, retrieved efficiently, and remains consistent across
the application.
Communication Between Tiers
Communication between tiers in a multitiered architecture is essential for the overall
functionality and performance of the system. Here's how communication typically occurs
between the presentation, application, and data tiers:
1. Presentation Tier to Application Tier:
• User Requests: Users interact with the presentation tier by submitting requests through the
user interface.
• HTTP Requests: In web-based applications, user requests are typically sent over HTTP
(Hypertext Transfer Protocol) to the application tier.
• API Calls: The presentation tier may make API (Application Programming Interface) calls
to the application tier to retrieve data, submit forms, or perform other actions.
• Data Transfer: Data is transferred between the presentation tier and the application tier in
a format such as JSON (JavaScript Object Notation) or XML (eXtensible Markup
Language).
2. Application Tier to Data Tier:
• Business Logic Execution: The application tier executes business logic and processes user
requests.
• Data Access: When data is required, the application tier interacts with the data tier to
retrieve or update information stored in the database.
• Database Queries: The application tier constructs and sends queries to the data tier to
fetch or modify data.
• Data Manipulation: Upon receiving data from the data tier, the application tier may
perform additional processing or transformation before sending it back to the presentation
tier.
3. Data Tier to Application Tier:
• Query Processing: The data tier processes incoming queries from the application tier, such
as SELECT, INSERT, UPDATE, or DELETE operations.
• Data Retrieval: If the query involves data retrieval, the data tier accesses the database,
retrieves the requested data, and prepares it for transmission.
• Data Serialization: Data is serialized into a format suitable for transmission, such as JSON
or XML, before being sent back to the application tier.
• Response Sending: The data tier sends the response containing the requested data or the
result of the operation back to the application tier.
4. Application Tier to Presentation Tier:
• Data Processing: The application tier processes the data received from the data tier or user
inputs.
• Response Generation: Based on the processed data and business logic, the application tier
generates a response to be sent back to the presentation tier.
• Rendering: In web applications, the application tier may generate HTML (Hypertext
Markup Language) or other markup language to be rendered by the browser.
• Response Sending: The application tier sends the response back to the presentation tier
over the network.
Scalability and Load Balancing in Multitiered Architectures
Scalability and load balancing are crucial considerations in multitiered architectures to ensure
that systems can handle increasing user loads while maintaining performance and reliability.
Here's how these concepts are applied in such architectures:
1. Scalability in Multitiered Architectures:
Scalability refers to the ability of a system to handle growing amounts of work by adding
resources or scaling out horizontally without negatively impacting performance or user
experience.
Types of Scalability:
• Vertical Scalability: Involves adding more resources, such as CPU, memory, or storage, to
a single server or instance. However, there is a limit to how much a single server can scale
vertically.
• Horizontal Scalability: Involves adding more instances of servers or nodes to distribute
the workload across multiple machines. This approach allows for virtually unlimited
scalability by adding more servers as needed.
Scalability in Each Tier:
• Presentation Tier: Scalability can be achieved by deploying multiple instances of web
servers or load balancers to handle increasing user requests.
• Application Tier: Applications can be designed to scale horizontally by deploying
multiple instances of application servers or microservices and using technologies like
containerization and orchestration (e.g., Docker and Kubernetes).
• Data Tier: Database scalability can be achieved through techniques like database sharding,
replication, or using distributed database systems to distribute data across multiple nodes.
2. Load Balancing in Multitiered Architectures:
Load balancing involves distributing incoming network traffic across multiple servers or
resources to optimize resource utilization, maximize throughput, minimize response time, and
ensure high availability.
Types of Load Balancers:
• Hardware Load Balancers: Dedicated physical appliances designed to distribute traffic
across servers. They offer high performance and scalability but can be expensive.
• Software Load Balancers: Implemented as software solutions that run on standard server
hardware or virtual machines. They provide flexibility and can be deployed in cloud
environments.
• DNS Load Balancing: Distributes traffic by resolving domain names to multiple IP
addresses, allowing DNS servers to direct clients to different servers based on predefined
policies.
Load Balancing Strategies:
• Round Robin: Distributes incoming requests equally among servers in a cyclic manner.
• Least Connections: Routes new requests to the server with the fewest active connections,
aiming to distribute the load evenly.
• IP Hash: Assigns requests to servers based on the client's IP address, ensuring that requests
from the same client are consistently routed to the same server.
Fault Tolerance and Reliability in Multitiered Architectures
Fault tolerance and reliability are critical aspects of multitiered architectures, ensuring that
systems remain available and responsive even in the face of failures or errors. Here's how these
concepts are applied in such architectures:
1. Fault Tolerance in Multitiered Architectures:
Fault tolerance is the ability of a system to continue operating properly in the event of the
failure of some of its components. It involves designing systems to anticipate and recover from
failures gracefully without causing a complete system outage.
Techniques for Fault Tolerance:
• Redundancy: Introducing redundancy by replicating critical components or data across
multiple servers or locations. This ensures that if one component fails, another can take
over seamlessly.
• Failover: Implementing mechanisms to detect failures automatically and redirect traffic or
operations to backup components or systems. This minimizes downtime and ensures
continuity of service.
• Isolation: Isolating components to contain the impact of failures and prevent them from
spreading to other parts of the system. This can be achieved through techniques like
containerization or microservices architecture.
• Graceful Degradation: Designing systems to gracefully degrade performance or
functionality in the event of failures, rather than crashing or becoming unavailable entirely.
This ensures that users can still access essential features even under degraded conditions.
2. Reliability in Multitiered Architectures:
Reliability refers to the ability of a system to consistently perform its intended functions
accurately and without failure over a specified period. It involves building systems that can
withstand various types of stresses and environmental conditions without experiencing
unexpected failures.
Factors Affecting Reliability:
• Robust Design: Designing systems with robust architecture, well-defined interfaces, and
clear error handling mechanisms to minimize the likelihood of failures.
• Redundancy and Backup Systems: Implementing redundant components, backup
systems, and data replication to ensure continuous operation even in the face of hardware
or software failures.
• Monitoring and Alerting: Deploying monitoring tools and systems to continuously
monitor the health and performance of the system, detect anomalies or failures, and trigger
alerts for timely intervention.
• Regular Testing and Maintenance: Conducting regular testing, maintenance, and updates
to identify and address potential points of failure before they can affect system reliability.
Technologies for Improving Fault Tolerance and Reliability:
• Clustering: Creating clusters of servers or nodes that work together to provide fault
tolerance and high availability by automatically redistributing workloads and resources in
the event of failures.
• Load Balancing: Distributing incoming traffic across multiple servers to prevent
overloading and ensure that no single server becomes a single point of failure.
• Data Replication: Replicating data across multiple servers or data centers to ensure data
availability and integrity even in the event of hardware failures or disasters.
• Automatic Failover: Implementing automated failover mechanisms to detect and respond
to failures quickly, minimizing downtime and ensuring uninterrupted service.
Use Cases of Multitiered Architectures in Distributed System
Multitiered architectures are widely used in distributed systems across various industries and
applications to achieve scalability, reliability, and maintainability. Here are some common use
cases:
1. E-commerce Platforms:
• Presentation Tier: Web interfaces or mobile apps where users browse products, add items
to carts, and make purchases.
• Application Tier: Handles business logic such as inventory management, order
processing, and user authentication.
• Data Tier: Manages product catalogs, customer profiles, order histories, and transaction
data in databases.
• Use Case: E-commerce platforms like Amazon or eBay employ multitiered architectures to
efficiently handle a large number of users, transactions, and product listings while ensuring
scalability and reliability.
3. Social Media Platforms:
• Presentation Tier: User interfaces for posting content, interacting with friends, and
exploring feeds.
• Application Tier: Manages user authentication, friend connections, content
recommendation algorithms, and privacy settings.
• Data Tier: Stores user profiles, posts, comments, media files, and social graphs in
databases or distributed storage systems.
• Use Case: Social media platforms like Facebook, Twitter, and Instagram utilize multitiered
architectures to handle millions of active users, interactions, and content updates while
delivering personalized experiences and maintaining data consistency.
5. Healthcare Information Systems:
• Presentation Tier: Interfaces for healthcare providers to access patient records, schedule
appointments, and view medical images.
• Application Tier: Manages patient data, medical histories, treatment plans, and regulatory
compliance.
• Data Tier: Stores electronic health records (EHRs), diagnostic reports, medication
histories, and medical imaging data in secure databases.
• Use Case: Healthcare organizations leverage multitiered architectures to ensure the
confidentiality, availability, and integrity of patient information while facilitating seamless
communication and collaboration among healthcare professionals.

2.2.2 Decentralized Architecture in Distributed System

Decentralized architecture in distributed systems means that the control and data are
distributed and not necessarily controlled from a central point. This architecture increases
system dependability, expansion potential, and error resilience due to the lack of specific
critical points and balanced load. Decentralized systems are commonly found in P2P networks,
Blockchain Technologies, and Large-scale internet services are made up of.
Decentralized architecture means a conceptual design of a system’s components as elements
that are mutually integrated and act according to the general principles of a system without a
specific coordinative center.
• In such systems, the control and data are present in different nodes and each node is
independent and can make decisions independently.
• This architecture improves the system’s capability on scalability, fault tolerance and
robustness in eliminating a single centralized point of failure and integrating peers to work
together. Distribution is a usual practice in the blockchain networks, P2P systems, and
distributed ledger technology field.

Key Concepts in Decentralized Systems


Below are the key concepts of distributed systems:
• Peer-to-Peer (P2P) Networks: the distributed network in which most of the
communicating entities have equal capabilities, and each peer is on an equal level,
• using the resources of other peers directly without any interference from the hub. These
include; the Bit Torrent and the Block chain networks.
• Consensus Mechanisms: A procedure applied to solve a consistency problem in
distributed computing, to have a unique value or state among the participating processes or
systems. They also apply decentralization procedures to make sure they are not only
uniform and safe as well as trustworthy.
• Decentralized Autonomous Organizations (DAOs): Organizations depicted by rules
written in a program that is, clear; open for modification by organization members, and not
dominated by central government. A smart contract is a digital contract that exists on a
blockchain and DAOs are fully automated.
• Data Replication: The write operation of creating backups of the data, which is committed
to several nodes to warrant its accessibility and integrity. The consensus helps in providing
a certain level of repetitiveness, and error tolerance within the decentralized applications.

Benefits of Decentralized Architecture


Below are the benefits of decentralized architecture in distributed systems:
• Scalability: Decentralized systems can easily scale up by adding more nodes and,
therefore, do not have significant alterations to their structures. This enables the system to
serve larger amounts of loads and users without making any considerable impact.
• Fault Tolerance and Resilience: Another crucial advantage of these distributed systems is
that they are less vulnerable to node failures than centralized systems. If one node of the
framework fails at that particular point of time then others can work, thus making the
application highly available and reliable.
• Enhanced Security: Such structures are characterized by the use of very strong encryption
and consensus mechanisms; it will not be easy for an attacker to penetrate the system. The
total distribution of data and control works well in creating and maintaining security.
• Resource Optimization: Decentralized systems take advantage of multiple nodes and
hence result in optimal utilization of resources such as computational power, storage, and
bandwidth. This can lead to efficiency and effectiveness in the organization, meaning that
costs are cut and performance is optimised.
• Reduced Latency: Compared to centralized systems, decentralized systems may be able to
decrease response time by sending out data and services closer to the customers and use
since it decreases latency.
Challenges of Decentralized Architecture
Below are the challenges of decentralized architecture in distributed systems:
• Complexity in Design and Implementation: Decentralized systems call for coordinated,
reliable, and consistent design of distributed nodes, and therefore the designs implied by
their working entail such complexity. Assessing the actualization of these systems entails
dealing’ with algorithms and protocols and this is not easy.
• Data Consistency: It is often challenging to ensure that data created and stored in nodes at
different locations will remain consistent. The fact that all the nodes have to keep the same
view of the data, let alone in real-time, typically entails complicated consensus work.
• Network Latency and Bandwidth: The communication between nodes can add a certain
amount of latency and consume a lot of bandwidth. This can have an impact on its
performance particularly for applications that are heavily dependent on real-time
processing.
• Resource Management: Coordinating the use of resources including storage, processing
power and energy in such distributed nodes may be difficult. Managing the load and
preventing one node from overloading is a very challenging process that requires special
approaches.
• Governance and Coordination: Decentralized systems do not feature a centralized body
of control in some and or any usually it becomes very hard to implement certain policies,
manage updates and surmount certain conflicts relating to the system. This may cause
issues in the struggle to sustain the integrity as well as the consistency of the system.

Use Cases of Decentralized Architecture


Below are the use cases of decentralized architecture in distributed system:
• Decentralized Autonomous Organizations (DAOs): Smart contracts on the blockchain
networks help DAOs for decentralised control and management. It is easy to control and
facilitate since the participants would make decisions and also control resources on their
own.
• Supply Chain Management: Decentralized systems can increase transparency and
versioning in supply chains together with increasing the immunity to failure. Through the
use of blockchain, it is shown that; recording of products’ movements in the blockchain
helps in the verification of the products.
• Content Distribution Networks (CDNs): Decentralized CDNs take the content, and
divide it among various nodes, to increase service time and decrease server strain. Theta
and Filecoin are examples of Decentralized CDN to some degree.
• IoT Networks: Decentralization approaches can improve the robustness and the
accommodating capacity of IoT networks through the elimination of centralized
architectures for data processing and storage.
• Decentralized Identity Management: Services such as uPort and Sovrin provide
decentralized approaches to identity assurance where the user retains control over their
identifiers and associated information without the need for a central entity.
• Gaming and Virtual Worlds: Other games like Decentraland and Axie Infinity provide
digital assets for in-game use, property, and commerce without the interference of a central
hub, giving power to the players of the game’s economies.
Important Decentralized Algorithms and Protocols
Below are some important decentralized algorithms and protocols in distributed systems:
1. Consensus Algorithms:
• Proof of Work (PoW): Designed and implemented in cryptocurrencies like Bitcoin, PoW
necessitates miners, who are nodes, to solve hard computational problems to confirm
transactions and integrate new blocks in the blockchain network. This process on the other
hand guarantees security and eliminates the probability of double spending but it is costly
when it comes to energy consumption.
• Proof of Stake (PoS): Used by Ethereum 2. 0 and the other networks, PoS chooses the
validators depending on their token amount and what they are willing to put up for staking.
PoS also turned out to be less energy-consuming than PoW although it offers the same level
of security.
• Delegated Proof of Stake (DPoS): It is a form of PoS where ordinary token owners elect
just a few individuals who perform transactions’ validation and blockchain maintenance.
This approach enhances the scalability and optimality of the system since it does not call
for extensive use of databases.
• Practical Byzantine Fault Tolerance (PBFT): An approach intended for consensus in
distributed systems in the presence of malicious nodes. It is applied in systems such as
Hyperledger Fabric to offer high throughput and low latency.
• Raft: An algorithm used in distributed systems for the replication of a log. It is supposed to
be comprehendible and is employed in etcd and Consul systems.
2. Routing Protocols:
• Chord: A DHT is a decentralized lookup protocol that enables one to find other nodes and
data in a P2P network based on CH. This depends on the number of instructions
implemented, and its simplicity is admired by most people.
• Kademlia: A DHT protocol employed in peer-to-peer systems like that of BitTorrent. It
allows for the efficient forwarding/routing searching and access of a particular datum since
nodes are organized in the structure of an overlay network.
• Pastry: A large-scale, distributed object name and address space for Internet applications.
It is a DHT that offers the routing of the distributed data in a highly efficient manner.
3. Consensus Mechanisms for Distributed Databases:
• Paxos: Aerap is a family of protocols aimed at the agreement construction in a network of
processors which can be considered unreliable. Paxos algorithm is commonly applied in
distributed databases and systems to guarantee convergence and correctness.
• Raft: Developed as a competitor to Paxos, but being markedly more comprehensible and
extensible, Raft is a consensus algorithm. It is utilized for consensus serving such as in etcd
and Consul for leader selection together with log synchronization.
4. Blockchain Protocols:
• Bitcoin Protocol: The first kind of blockchain implementation where through PoW the
consensus in the network is reached.
• Ethereum Protocol: Expands the concept of blockchain with smart contracts and
Decentralised applications (dApps). Ethereum at the start employed PoW, however, it plans
to change over to PoS with Ethereum 2. 0.
• Ripple Protocol: It implements the Ripple Protocol Consensus Algorithm (RPCA) that
enables quick and secure transaction processing, especially in the financial sectors.
5. Communication Protocols:
• Gossip Protocols: Used for broadcasting information across the decentralized network.
Each node works independently to broadcast information with other nodes to enhance
information transfer while minimizing error.
• SCP (Stellar Consensus Protocol): Applicable in the Stellar network, SCP allows for
direct, leaderless consensus with modular trust, which is suitable for financial projects.
Design Principles for Decentralized Architecture
Below are the design principles for decentralized architecture:
• Decentralization: Ensure no centralized points are created because all the data, control,
and processing are crucial and should be distributed across nodes. Every node is to be able
to work on its own.
• Scalability: Make sure data is presented in such a way that as the number of nodes and
transactions increases the architecture is ready to accommodate them. Use a type of scaling
known as horizontal scaling by incorporating more nodes into the system.
• Consistency and Consensus: Use consensus algorithms such as Paxos, Raft or consensus
mechanisms based on blockchain technology which is PoW or PoS to make sure all nodes
are in consensus about the state of the system.
• Security: Implement cryptography for protection of the messages being sent or received,
storing of information and performing of a transaction. Incorporate measures to minimize
the occurrence of negative actions.
• Transparency and Auditability: They want all happenings associated with transactions
and changes of state to be visible and traceable by all nodes. Record every transaction with
an impenetrable ledger in digital technology such as blockchain.
Best Practices for Decentralized Architecture
Below are the best practices for decentralized architecture:
• Node Distribution: Use nodes at different geographical areas to increase the level of
redundancy as well as decrease the meantime for response. Use a cloud provider and/or a
combination of local/on-premise and cloud nodes.
• Data Partitioning and Replication: Shard data to distribute the load between nodes, and
make replicates of the same data set for availability and fault tolerance purposes. Sharding
and Distributed Hash Tables (DHTs) should be used.
• Consensus Mechanism Choice: Select the agreement protocol that you want to implement
based on your requirements of usage. For instance, PoW for extended security, PoS for low
energy consumption and Raft or Paxos as the practical implementations of distributed
databases.
• Regular Audits and Monitoring: Periodically check the system’s effectiveness, its
vulnerability, and whether it meets the standards of the organization. Employ logging,
alerting, and auditing tools for monitoring the health status and for identifying signs of any
error.
• Smart Contract Security: For blockchain-based systems, perform robust testing of smart
contracts to identify susceptibilities. Apply mathematical reasoning and get the service of
auditors to improve the quality of contracts.
Tools and Frameworks for Decentralized Architecture
Below are some tools and framework for decentralized architecture:
• Blockchain Platforms:
o Ethereum: A decentralized digital marketplace for launching and running smart
stories as well as decentralized applications. Complies with Solidity for contract
development.
o Hyperledger Fabric: An enterprise-grade permissioned blockchain platform for
building applications. Support pluggable consensus and chain code, now clever
contracts are also alluded to as chain code.
o Corda: A blockchain solution for enterprises that focuses on providing privacy
and the ability to scale.
• Consensus Libraries:
o Raft: A reasonable consensus algorithm for the replicated log. Examples of
implementations are etched and HashiCorp Consul.
• Distributed Storage:
o IPFS (InterPlanetary File System): Protocol and computer communication
network that stores and further shares hypermedia in a Decentralized File
System.
o Apache Cassandra: A primarily externally persisted, high-scale NoSQL
database for distributed data management. It supports data duplication with the
other nodes of the cluster.
• Decentralized Communication:
o libp2p: A system for constructing peer-to-peer applications where the latter is
divided into a stack of modules. Implemented in IPFS and generally utilized in
other decentralized projects.
o Whisper: It should also be noted that Whisper is one of the protocols actively
used in Ethereum to solve problems related to messaging while ensuring the
anonymity and security of the participants.

2.2.3 Hybrid Data Center Architecture

So far, we have focused on client-server architectures and a number of peerto-peer architectures.


Many distributed systems combine architectural features, as we already came across in superpeer
networks. In this section we take a look at some specific classes of distributed systems in which
client-server solutions are combined with decentralized architectures. Edge-Server Systems An
important class of distributed systems that is organized according to a hybrid architecture is
formed by edge-server systems.
These systems are deployed on the Internet where servers are placed "at the edge" of the
network. This edge is formed by the boundary between enterprise networks and the actual
Internet, for example, as provided by an Internet Service Provider (ISP). Likewise, where end
users at home connect to the Internet through their ISP, the ISP can be considered as residing at
the edge of the Internet.
Hybrid architectures integrate on-premises data centers with cloud resources, offering a
balance between security and flexibility.
• Advantages:
o Balance of Security and Scalability: Combines the control of on-premises
systems with the scalability of the cloud.
o Tailored Solutions: Can be customized to meet specific business needs.
• Disadvantages:
o Complex Management: Coordinating between cloud and on-premises systems
can complicate operations and integration.

Collaborative Distributed Systems

• Hybrid structures are notably deployed in collaborative distributed systems. The main
issue in many of these systems to first get started, for which often a traditional client-
server scheme is deployed. Once a node has joined the system, it can use a fully
decentralized scheme for collaboration. To make matters concrete, let us first consider the
BitTorrent file-sharing system (Cohen, 2003). BitTorrent is a peer-to-peer file
downloading system. Its principal working is shown in Fig. 2-14 The basic idea is that
when an end user is looking for a file, he downloads chunks of the file from other users
until the downloaded chunks can be assembled together yielding the complete file.
• An important design goal was to ensure collaboration. In most file-sharing systems, a
significant fraction of participants merely download files but otherwise contribute close
to nothing (Adar and Huberman, 2000; Saroiu et al., 2003; and Yang et al., 2005). To this
end, a file can be downloaded only when the downloading client is providing content to
someone else. We will return to this "tit-for-tat" behavior shortly.
2.3 ARCHITECTURES VERSUS MIDDLEW ARE

When considering the architectural issues we have discussed so far, a question that comes to
mind is where middleware fits in.
An important purpose is to provide a degree of distribution transparency, that is, to a certain
extent hiding the distribution of-data, processing, and control from applications. What is
commonly seen in practice is that middleware systems actually follow a specific architectural
style.
For example, many middleware solutions have adopted an object-based architectural style, such
as CORBA (OMG. 2004a). Others, like TIB/Rendezvous (TIBCO, 2005) provide middleware
that follow event-based architectural style. In later chapters, we will come across more examples
of architectural styles. Having middleware model according to a specific architectural style has
the benefit that designing applications may become simpler. However, an obvious drawback is
that the middleware may no longer be optimal for what an application developer had in mind.
For example, COREA initially offered only objects that could be invoked by remote clients.
Later, it was felt that having only this form of interaction was too restrictive, so that other
interaction patterns such as messaging were added. Obviously, adding new features can easily
lead to bloated middleware solutions. In addition, although middleware is meant to provide
distribution transparency, it is generally felt that specific solutions should be adaptable to
application requirements. One solution to this problem is to make several versions of a
middleware system, where each version is tailored to a specific class of applications. An
approach that is generally considered better is to make middleware systems such that they are
easy to configure, adapt, and customize as needed by an application. As a result, systems are now
being developed in which a stricter separation between policies and mechanisms is being made.
This has led to several mechanisms by which the behavior of middleware can be modified
(Sadjadi and McKinley, 2003). Let us take a look at some of the commonly followed approaches

2.3.1 Interceptors
Conceptually, an interceptor is nothing but a software construct that will break the usual flow of
control and allow other (application specific) code to be executed. To make interceptors generic
may require a substantial implementation effort, as illustrated in Schmidt et al. (2000), and it is
unclear whether in such cases generality should be preferred over restricted applicability and
simplicity. Also, in many cases having only limited interception facilities will improve
management of the software and the distributed system as a whole. To make matters concrete,
consider interception as supported in many objectbased distributed systems. The basic idea is
simple: an object A can call a method that belongs to an object B, while the latter resides on a
different machine than A. As we explain in detail later in the book, such a remote-object
invocation is carried as a three-step approach:
1. Object A is offered a local interface that is exactly the same as the interface offered by object
B. A simply calls the method available in' that interface.
2. The call by A is transformed into a generic object invocation, made possible through a general
object-invocation interface offered by the middleware at the machine where A resides.
3. Finally, the generic object invocation is transformed into a message that is sent through the
transport-level network interface as offered by A's local operating system.

2.3.2 General Approaches to Adaptive Software


What interceptors actually offer is a means to adapt the middleware. The need for adaptation
comes from the fact that the environment in which distributed applications are executed changes
continuously. Changes include those resulting from mobility, a strong variance in the quality-of-
service of networks, failing hardware, and battery drainage, amongst others. Rather than making
applications responsible for reacting to changes, this task is placed in the middleware. These
strong influences from the environment have brought many designers of middleware to consider
the construction of adaptive software. However, adaptive software has not been as successful as
anticipated. As many researchers and developers consider it to be an important aspect of modern
distributed systems, let us briefly pay some attention to it. McKinley et al. (2004) distinguish
three basic techniques to come to software adaptation:
1. Separation of concerns
2. Computational reflection
3. Component-based design

2.4 SELF -MANAGEMENT IN DISTRIBUTED SYSTEMS


Distributed systems-and notably their associated middleware-need to provide general solutions
toward shielding undesirable features inherent to networking so that they can support as many
applications as possible. On the other hand, full distribution transparency is not what most
applications actually want, resulting in application-specific solutions that need to be supported as
well. We have argued that, for this reason, distributed systems should be adaptive, but notably
when it comes to adapting their execution behavior and not the software components they
comprise.
When adaptation needs to be done automatically, we see a strong interplay between system
architectures and software architectures. On the one hand, we need to organize the components
of a distributed system such that monitoring and adjustments can be done, while on the other
hand we need to decide where the processes are to be executed that handle the adaptation.
In this section we pay explicit attention to organizing distributed systems as high-level
feedback-control systems allowing automatic adaptations to changes. This phenomenon is also
known as autonomic computing (Kephart, 2003) or self..star systems (Babaoglu et al., 2005).
The latter name indicates the variety by which automatic adaptations are being captured: self-
managing, self-healing, self-configuring, self-optimizing, and so on. We resort simply to using
the name self-managing systems as coverage of its many variants
2.4.1 The Feedback Control Model

There are many different views on self-managing systems, but what most have in common
(either explicitly or implicitly) is the assumption that adaptations take place by means of one or
more feedback control loops. Accordingly, systems that are organized by means of such loops are
referred to as feedback COl)- trol systems. Feedback control has since long been applied in
various engineering fields, and its mathematical foundations are gradually also finding their way
in computing systems (Hellerstein et al., 2004; and Diao et al., 2005). For selfmanaging systems,
the architectural issues are initially the most interesting. The basic idea behind this organization
is quite simple, as shown in Fig. 2-16.

The core of a feedback control system is formed by the components that need to be managed.
These components are assumed to be driven through controllable input parameters, but their
behavior may be influenced by all kinds of uncontrollable input, also known as disturbance or
noise input. Although disturbance will often come from the environment in which a distributed
system is executing, it may well be the case that unanticipated component interaction causes
unexpected behavior.
There are essentially three elements that form the feedback control loop. First, the system itself
needs to be monitored, which requires that various aspects of the system need to be measured. In
many cases, measuring behavior is easier said than done. For example, round-trip delays in the
Internet may vary wildly, and also depend on what exactly is being measured. In such cases,
accurately estimating a delay may be difficult indeed. Matters are further complicated when a
node A needs to estimate the latency between two other completely different nodes B and C,
without being able to intrude on either two nodes. For reasons as this, a feedback control loop
generally contains a logical metric estimation component.

2.4.2 Example: Systems Monitoring with Astrolabe


As our first example, we consider Astrolabe (Van Renesse et aI., 2003), which is a system that
can support general monitoring of very large distributed systems. In the context of self-managing
systems, Astrolabe is to be positioned as a general tool for observing systems behavior. Its output
can be used to feed into an analysis component for deciding on corrective actions.
Astrolabe organizes a large collection of hosts into a hierarchy of zones. The lowest-level zones
consist of just a single host, which are subsequently grouped into zones of increasing size. The
top-level zone covers all hosts. Every host runs an Astrolabe process, called an agent, that
collects information on the zones in which that host is contained. The agent also communicates
with other agents with the aim to spread zone information across the entire system.
Chapter 4: Communication
In this chapter, we look at three widely-used models for communication: Remote Procedure Call (RPC),
Message-Oriented Middleware (MOM), and data streaming. We also discuss the general problem of
sending data to multiple receivers, called multicasting.

Our first model for communication in distributed systems is the remote procedure call (RPC). An RPC
aims at hiding most of the intricacies of message passing, and is ideal for client-server applications.

We first recapitulate some of the fundamental issues related to communication.

1.1.1 Layered Protocols

To make it easier to deal with the numerous levels and issues involved in communication, the
International Standards Organization (ISO) developed a reference model that clearly identifies the
various levels involved, gives them standard names, and points out which level should do which
job. This model is called the Open Systems Interconnection Reference Model (Day and
Zimmerman, 1983), usually abbreviated as ISO OSI or sometimes just the OSI model.

The OSI model is designed to allow open systems to communicate. An open system is one that is
prepared to communicate with any other open system by using standard rules that govern the
format, contents, and meaning of the messages sent and received. These rules are formalized in
what are called protocols.

In the OSI model, communication is divided up into seven levels or layers, as shown in Fig. 4-1.
Each layer deals with one specific aspect of the communication. In this way, the problem can be
divided up into manageable pieces, each of which can be solved independent of the others. Each
layer provides an interface to the one above it.
Lower-Level Protocols

We start with discussing the three lowest layers of the OSI protocol suite. Together, these layers
implement the basic functions that encompass a computer network.

The physical layer protocol deals with standardizing the electrical, mechanical, and signalling
interfaces so that when one machine sends a 0 bit it is actually received as a 0 bit and not a 1bit.
Many physical layer standards have been developed (for different media), for example, the RS-232-
C standard for serial communication lines.

The real communication networks are subject to errors, so some mechanism is needed to detect
and correct them. This mechanism is the main task of the data link layer. What it does is to group
the bits into units, sometimes called frames, and see that each frame is correctly received. The
data link layer does its work by putting a special bit pattern on the start and end of each frame to
mark them, as well as computing a checksum by adding up all the bytes in the frame in a certain
way. The data link layer appends the checksum to the frame. When the frame arrives, the receiver
recomputes the checksum from the data and compares the result to the checksum following the
frame. If the two agree, the frame is considered correct and is accepted. It they disagree. The
receiver asks the sender to retransmit it. Frames are assigned sequence numbers (in the header),
so everyone can tell which is which.

The question of how to choose the best path is called routing, and is essentially the primary task
of the network layer. The problem is complicated by the fact that the shortest route is not always
the best route. What really matters is the amount of delay on a given route, which, in tum, is
related to the amount of traffic and the number of messages queued up for transmission over the
various lines. The delay can thus change over the course of time. Some routing algorithms try to
adapt to changing loads, whereas others are content to make decisions based on long-term
averages. At present, the most widely used network protocol is the connectionless IP (Internet
Protocol), which is part of the Internet protocol suite. An IP packet (the technical term for a
message in the network layer) can be sent without any setup. Each IP packet is routed to its
destination independent of all others. No internal path is selected and remembered.

Transport Protocols

The transport layer forms the last part of what could be called a basic network protocol stack, in
the sense that it implements all those services that are not provided at the interface of the network
layer, but which are reasonably needed to build network applications. Packets can be lost on the
way from the sender to the receiver. Although some applications can handle their own error
recovery, others prefer a reliable connection. The job of the transport layer is to provide this
service. The idea is that the application layer should be able to deliver a message to the transport
layer with the expectation that it will be delivered without loss. Upon receiving a message from
the application layer, the transport layer breaks it into pieces small enough for transmission,
assigns each one a sequence number, and then sends them all.

The Internet transport protocol is called TCP (Transmission Control Proto col) and is described in
detail in Comer (2006). The combination TCPIIP is now used as a de facto standard for network
communication. The Internet protocol suite also supports a connectionless transport protocol
called UDP (Universal Datagram Protocol), which is essentially just IP with some minor additions.
User programs that do not need a connection-oriented protocol normally use UDP. Additional
transport protocols are regularly proposed. For example, to support real-time data transfer, the
Real-time Transport Protocol (RTP) has been de fined. RTP is a framework protocol in the sense
that it specifies packet formats for real-time data without providing the actual mechanisms for
guaranteeing data delivery.

Higher- Level Protocols

The session layer is essentially an enhanced version of the transport layer. It provides dialog
control, to keep track of which party is currently talking, and it provides synchronization facilities.
The latter are useful to allow users to insert checkpoints into long transfers, so that in the event of
a crash, it is necessary to go back only to the last checkpoint, rather than all the way back to the
beginning.

In the presentation layer it is possible to define records contain ing fields like these and then have
the sender notify the receiver that a message contains a particular record in a certain format. This
makes it easier for machines with different internal representations to communicate with each
other.

The OSI application layer was originally intended to contain a collection of standard network
applications such as those for electronic mail, file transfer, and terminal emulation. By now, it has
become the container for all applications and protocols that in one way or the other do not fit into
one of the underlying layers. From the perspective of the OSI reference model, virtually all
distributed systems are just applications.

Middleware Protocols

Middleware is an application that logically lives (mostly) in the application layer, but which contains
many general-purpose protocols that warrant their own layers, independent of other, more
specific applications. There are numerous protocols to support a variety of middleware services.
Authentication protocols are not closely tied to any specific application, but instead, can be
integrated into a middleware system as a general service. Likewise, authorization protocols by
which authenticated users and processes are granted access only to those resources for which
they have authorization tend to have a general, application-independent nature. Middleware
communication protocols support high-level communication services.
1.1.2 Types of Communication

To understand the various alternatives in communication that middleware can offer to


applications, we view the middleware as an additional service in client-server computing, as shown
in Fig. 4-4. Consider, for example an electronic mail system. In principle, the core of the mail
delivery system can be seen as a middleware communication service. Each host runs a user agent
allowing users to compose, send, and receive e-mail. A sending user agent passes such mail to the
mail delivery system, expecting it, in turn, to eventually deliver the mail to the intended recipient.
Likewise, the user agent at the receiver's side connects to the mail delivery system to see whether
any mail has come in. If so, the messages are transferred to the user agent so that they can be
displayed and read by the user.

An electronic mail system is a typical example in which communication is persistent. With


persistent communication, a message that has been submitted for transmission is stored by the
communication middleware as long as it takes to deliver it to the receiver. In this case, the
middleware will store the message at one or several of the storage facilities shown in Fig. 4-4.

In contrast, with transient communication, a message is stored by the communication system only
as long as the sending and receiving application are executing. More precisely, in terms of Fig. 4-
4, the middleware cannot deliver a message due to a transmission interrupt, or because the
recipient is currently not active, it will simply be discarded. Typically, all transport-level
communication services offer only transient communication. In this case, the communication
system consists traditional store-and-forward routers. If a router cannot deliver a message to the
next one or the destination host, it will simply drop the message.

Besides being persistent or transient, communication can also be asynchronous or synchronous.


The characteristic feature of asynchronous communication is that a sender continues immediately
after it has submitted its message for transmission. This means that the message is (temporarily)
stored immediately by the middleware upon submission. With synchronous communication, the
sender is blocked until its request is known to be accepted.
Various combinations of persistence and synchronization occur in practice. Besides persistence
and synchronization, we should also make a distinction between discrete and streaming
communication. The examples so far all fall in the category of discrete communication: the parties
communicate by messages, each message forming a complete unit of information. In contrast,
streaming involves sending multiple messages, one after the other, where the messages are
related to each other by the order they are sent, or because there is a temporal relationship. We
return to streaming communication extensively below.

4.2 REMOTE PROCEDURE CALL

Birrell and Nelson (1984) introduced Remote Procedure Call (RPC), a simpler way to handle
communication in distributed systems. Instead of sending explicit messages, RPC lets a program
on one machine call a procedure on another machine. The caller pauses, and the procedure runs
on the remote machine. Data is passed through parameters, and results are returned seamlessly,
hiding the complexity of message passing from the programmer.

However, RPC has challenges. The caller and remote procedure run in different memory spaces,
making parameter passing tricky, especially between different machines. Crashes on either
machine also create complications. Despite these issues, RPC is widely used in distributed systems
because it simplifies communication.

4.2.1 Basic RPC Operation

1. Conventional Procedure Call (Single Machine):

- Example: count = read(fd, buf, nbytes);

- `fd`: File identifier.

- `buf`: Array to store data.

- `nbytes`: Number of bytes to read.

- Before the call, the stack is in its initial state.

- The caller pushes parameters onto the stack in reverse order (last parameter first).

- After the procedure runs, the return value is placed in a register, the return address is removed,
and control goes back to the caller.

- The caller removes parameters, restoring the stack to its original state.

[Link] Procedure Call (RPC):

- Splits the procedure call into two parts:

- Client: Runs on one machine (initiates the call).

- Server:Runs on another machine (executes the procedure).

- The client sends parameters to the server.


- The server executes the procedure and sends the result back to the client.

- No direct stack manipulation across machines; communication happens behind the scenes.

Client and Server Stubs

RPC aims to make remote procedure calls look like local ones, keeping the process transparent.
For example, when a program reads data from a file, the `read` call works the same whether it’s
local or remote. In a local system, `read` is a library procedure that interacts with the operating
system. With RPC, a client stub replaces the local `read`. It looks the same to the programmer but
works differently: instead of fetching data locally, it packs the parameters into a message, sends it
to the server, and waits for a reply. This hides the complexity of remote communication.

When the message reaches the server, the **server stub** takes over. It unpacks the parameters
and calls the server procedure as if it were a local call. The server performs its task (e.g., filling a
buffer with data) and returns the result to the server stub. The stub then packs the result into a
message and sends it back to the client. On the client side, the **client stub** receives the
message, unpacks the result, and passes it to the caller. To the client, the remote call feels like a
local procedure call, hiding all the complexity of message passing. This transparency is the key
strength of RPC.

To summarize, a remote procedure call occurs in the following steps:

1. The client procedure calls the client stub in the normal way.

2. The client stub builds a message and calls the local operating system.

3. The client's as sends the message to the remote as.

4. The remote as gives the message to the server stub.

5. The server stub unpacks the parameters and calls the server.

6. The server does the work and returns the result to the stub.

7. The server stub packs it in a message and calls its local as.

8. The server's as sends the message to the client's as.

9. The client's as gives the message to the client stub.


10. The stub unpacks the result and returns to the client

4.2.2 Parameter Passing

Passing Value Parameters

Packing parameters into a message is called parameter marshalling. For example, a remote
procedure `add(i, j)` takes two integers and returns their sum. The client stub packs the parameters
and procedure name into a message. When the message reaches the server, the server stub
identifies the procedure and makes the call. After the server computes the result, the server stub
packs it into a message and sends it back to the client stub, which unpacks it and returns the result
to the client. This works well if the client and server machines are identical, but problems arise if
they use different data representations (e.g., ASCII vs. EBCDIC, little-endian vs. big-endian). These
differences can cause errors in interpreting parameters, requiring additional steps to ensure
compatibility.

Passing Reference Parameters

Passing pointers or references in RPC is challenging because pointers are only valid within the
address space of the process using them. For example, a buffer address on the client (like 1000)
won’t work on the server. One solution is to copy the data (e.g., an array) into the message and
send it to the server. The server stub uses a local pointer to access the data, and changes are sent
back to the client. This replaces call-by-reference with copy/restore, which works well for simple
cases like arrays. An optimization skips unnecessary copies if the stubs know whether the data is
input or output. However, handling complex data structures (e.g., graphs) remains difficult, as it
may require special code or additional client-server communication.

4.3 MESSAGE-ORIENTED COMMUNICATION

Remote procedure calls (RPC) and remote object invocations improve access transparency in
distributed systems by hiding communication details. However, they aren’t always suitable,
especially when the receiver isn’t active or when synchronous communication (blocking the client)
isn’t ideal. In such cases, messaging is used as an alternative. Messaging systems can work
synchronously (both parties active) or asynchronously (via message queues), allowing
communication even if the other party isn’t running at the time. This flexibility makes messaging
a powerful tool for distributed systems.

4.3.1 Message-Oriented Transient Communication

Many distributed systems and applications use the basic messaging model provided by the
transport layer. To understand messaging in middleware, we first look at transport-level sockets,
like Berkeley Sockets, which simplify network programming. Sockets act as communication
endpoints, allowing applications to send and receive data over a network. They abstract the details
of the underlying transport protocol, making it easier to write portable applications. For example,
in TCP, servers typically use four key primitives:

1. Socket: Creates a new communication endpoint for a specific protocol.

2. Bind: Associates a local address (e.g., IP and port) with the socket.

3. Listen: Prepares the socket to accept incoming connections.


4. Accept: Waits for and accepts a connection from a client.

These steps allow servers to receive messages on specific addresses and ports, while the operating
system manages the resources needed for communication. Sockets and similar interfaces, like XTI,
standardize network programming, making it easier to build distributed systems.

The Message-Passing Interface (MPI): It was developed for high-performance multicomputers to


enable efficient parallel applications. Unlike sockets, MPI offers higher-level, optimized
communication primitives tailored for fast interconnection networks used in server clusters.
Sockets were insufficient due to their low-level abstraction and reliance on general-purpose
protocols like TCP/IP. Proprietary communication libraries emerged, but their incompatibility
caused portability issues. MPI was created as a standard to solve this, focusing on transient
communication within known groups of processes. Each process is identified by a (group ID,
process ID) pair, replacing traditional addresses. MPI supports transient asynchronous
communication, such as with MPI_bsend, where messages are copied to a local buffer and
transmitted when the receiver is ready, allowing the sender to continue without waiting. This
makes MPI ideal for parallel computing, though it assumes failures like crashes are fatal and
doesn’t handle automatic recovery.

4.4 STREAM-ORIENTED COMMUNICATION

So far, we’ve focused on communication involving independent, complete units of information,


like procedure calls or messages in queuing systems. Timing doesn’t affect correctness in these
cases. However, some communication, like audio or video streams, is time-sensitive. For example,
a CD-quality audio stream requires playing 44,100 samples per second in exact order and timing.
Playing them too fast or slow distorts the sound. This section explores how distributed systems
can support time-dependent data, such as audio and video streams, ensuring proper timing and
synchronization. Network protocols and multimedia systems address these challenges, with
research also focusing on processing data streams efficiently.

4.4.1 Support for Continuous Media

Support for time-dependent information, like audio or video, is called **continuous media**.
It relies on correct timing between data items, such as playing audio samples or displaying
video frames at precise intervals. In contrast, **discrete media**, like text or images, doesn’t
depend on timing for interpretation. Continuous media requires maintaining temporal
relationships (e.g., 44,100 audio samples per second or 30-40 ms per video frame), while
discrete media, such as text or still images, doesn’t. This distinction is key for handling different
types of data in distributed systems.
Data Stream

Distributed systems use data streams to handle time-dependent information, which can be
discrete (like files or TCP connections) or continuous (like audio or video). Timing is critical for
continuous streams, leading to three transmission modes:

1. Asynchronous: No timing constraints (e.g., file transfers).

2. Synchronous: Maximum delay per data unit (e.g., sensor data).

3. Isochronous: Strict timing with bounded delay (crucial for audio/video).

Streams can be simple (single sequence) or complex (multiple synchronized sub streams, like
stereo audio or movies with video, audio, and subtitles). Synchronization is vital for complex
streams to ensure proper playback.

For streaming stored data (not live), systems use a client-server architecture, focusing on
compression to save storage and bandwidth, quality control, and synchronization. These
elements are key for effective multimedia streaming.

4.4.2 Streams and Quality of Service

Timing (and other non-functional) requirements are generally expressed as Quality of Service
(QoS) requirements.

From an application's perspective, in many cases it boils down to specifying a few important
properties (Halsall, 2001):

1. The required bit rate at which data should be transported.


2. The maximum delay until a session has been set up (i.e., when an ap plication
can start sending data).
3. The maximum end-to-end delay (i.e., how long it will take until a data unit
makes it to a recipient).
4. The maximum delay variance, or jitter.
5. The maximum round-trip delay.
4.4.3 Stream Synchronization
Multimedia systems require synchronization between streams to maintain proper timing.
There are two types:

1. Discrete and Continuous Streams: For example, syncing audio with slides in a presentation.

2. Continuous Streams: Like syncing video and audio in a movie (lip-sync) or stereo audio
channels (within 20 µsec for clear sound).

Synchronization happens at the level of data units. For CD-quality audio, synchronization could
occur every 23 µsec (44,100 samples/sec). For video and audio sync, coarser units (e.g., 33
msec for NTSC video) work well, grouping audio samples into larger chunks (e.g., 1,470
samples). Larger units (40-80 msec) are often acceptable in practice.

4.5 MULTICAST COMMUNICATION

An important topic in communication in distributed systems is the support for sending data to
multiple receivers, also known as multicast communication.

4.5.1 Application-Level Multicasting

Application-level multicasting involves nodes forming an overlay network to share information


without using network routers. This means connections between nodes may cross multiple
physical links, making routing less efficient than network-level solutions. Overlay networks can
be organized as trees, with one path between nodes, or meshes, which offer multiple paths
and better robustness in case of failures. In systems like Scribe, a node starts a multicast
session by creating a unique identifier and finding a root node. Other nodes join by sending
lookup requests to the root, passing through intermediate nodes that become forwarders. If
a node is already a forwarder, it adds the new node as a child without contacting the root
again. This creates a multicast tree with two types of nodes: pure forwarders and nodes that
explicitly joined. Multicasting works by sending messages toward the root, which then
distributes them along the tree.

Overlay Construction

From the high-level description given above, it should be clear that although building a tree
by itself is not that difficult once we have organized the nodes into an overlay, building an
efficient tree may be a different story. Note that in our description so far, the selection of nodes
that participate in the tree does not take into account any performance metrics: it is purely
based on the (logical) routing of messages through the overlay.
To understand the problem at hand, take a look at Fig. 4-31 which shows a small set of four
nodes that are organized in a simple overlay network, with node A forming the root of a
multicast tree. The costs for traversing a physical link are also shown. The quality of an
application-level multicast tree is generally measured by three different metrics: link stress,
stretch, and tree cost. Link stress is defined per link and counts how often a packet crosses the
same link.

The stretch or Relative Delay Penalty (RDP) measures the ratio in the delay between two
nodes in the overlay, and the delay that those two nodes would experience in the underlying
network. Finally, the tree cost is a global metric, generally related to minimizing the aggregated
link costs.

4.5.2 Gossip-Based Data Dissemination

The main goal of these epidemic protocols is to rapidly propagate information among a large
col lection of nodes using only local information. In other words, there is no central component
by which information dissemination is coordinated.

Information Dissemination Models

Epidemic algorithms are inspired by how diseases spread, but instead of infections, they
spread information in distributed systems. The goal is to ensure that all nodes receive updates
quickly. A node with new data is "infected," while one without is "susceptible." Once a node
receives the data but stops sharing, it is "removed."

A common model, anti-entropy, involves nodes randomly exchanging updates in three ways:
pushing updates, pulling updates, or both (push-pull). The push-only method is slow when
many nodes are already infected, as they struggle to find susceptible nodes. The pull-based
approach works better, as susceptible nodes actively seek updates.

Push-pull is the fastest and ensures updates spread in O(log(N)) rounds, making it scalable. A
variation, gossiping, mimics real-life rumor spreading. Nodes share updates randomly but may
stop if they find others already updated, just like people lose interest in repeating old gossip.

Removing Data

Epidemic algorithms are extremely good for spreading updates. However, they have a rather
strange side-effect: spreading the deletion of a data item is hard. The essence of the problem
lies in the fact that deletion of a data item destroys all information on that item. Consequently,
when a data item is simply re moved from a node, that node will eventually receive old copies
of the data item and interpret those as updates on something it did not have before.

Applications

To finalize this presentation, let us take a look at some interesting applications of epidemic
protocols. Gossiping can be used to discover nodes that have a few outgoing wide-area links,
to subsequently apply directional gossiping. Another interesting application area is simply
collecting, or actually aggregating information.

You might also like