Distributed System
Definition:
Distributed System is a collection of autonomous computer systems that
are physically separated but are connected by a centralized computer network
that is equipped with distributed system software. The autonomous computers
will communicate among each system by sharing resources and files and
performing the tasks assigned to them.
“A distributed system is a system whose components are located on
different networked computers, which communicate and coordinate their
actions by passing messages to one another” — Wikipedia.
Types of Distributed Systems:
There are many models and architectures of distributed systems in use today.
Client-server systems the most traditional and simple type of
distributed system, involve a multitude of networked computers that
interact with a central server for data storage, processing or other
common goal.
Peer-to-peer networks distribute workloads among hundreds or
thousands of computers all running the same software.
Middleware is an application which sits between two different
applications and provides services and benefits to both .
Three Tier system uses a distinct layer and server for each program
function. The client's data is stored in the middle tier. It contains an
application layer, a data layer, and a presentation layer. This three-tier
system is most commonly used in web or online applications.
N Tier is also known as a multitier distributed system. As the name
suggests, this system may contain any number of functions, similar to
the three-tier system. This N-tier system is more commonly used in web
applications and data systems.
Real Time Distributed System – World Wide Web:
Definition:
World Wide Web, which is also known as a Web, is a collection of websites or
web pages stored in web servers and connected to local computers through the
internet. These websites contain text pages, digital images, audios, videos, etc.
Users can access the content of these sites from any part of the world over the
internet using their devices such as computers, laptops, cell phones, etc.
Why World Wide Web – A Distributed System:
The web is currently the fastest growing Internet information system, with new
resources being added regularly. The web relies on a set of protocols, conventions
and software to operate. The web is a distributed system of delivering linked
documents over the Internet.
It is called a distributed system because information can reside on different
computers around the world. The web uses hypertext to create links from one
resource to another.
Characteristics Of World Wide Web – Distributed System:
Decentralized Architecture: The web is built on a decentralized architecture,
where information is stored on servers located all around the world. These servers
are connected through the internet, and users access information by sending
requests to these servers.
Fault Tolerance: Because the web is distributed, it is more resilient to failures.
If one server goes down, users can still access the information they need from
other servers. This fault tolerance is essential for ensuring continuous access to
information.
Scalability: The distributed nature of the web allows it to scale effectively to
accommodate a large number of users and requests. New servers can be added to
the network to handle increased traffic without significantly affecting
performance.
Algorithm Used In World Wide Web:
Message Passing System
Message passing means how a message can be sent from one end to the other
end. Either it may be a client-server model or it may be from one node to another
node.
Message passing in the World Wide Web refers to the communication protocol
used by web servers and clients to exchange data. In the context of the web,
message passing typically involves HTTP (Hypertext Transfer Protocol), which
is the foundation of data communication on the web. When a client (such as a
web browser) requests a resource from a server (such as a website), it sends an
HTTP request message. The server then processes the request and sends back an
HTTP response message containing the requested resource, such as a web page
or a file.
World Wide Web uses various type of message passing which include
Synchronous Message Passing: Synchronous message passing in the World
Wide Web involves the exchange of messages between client and server in real-
time, where both parties are actively engaged in the communication process. This
synchronous interaction occurs when a client sends a request to a server and waits
for a response before proceeding further.
Asynchronous Message Passing: Asynchronous message passing in the World
Wide Web involves the exchange of messages between client and server without
requiring immediate or direct responses. This paradigm enables non-blocking
communication, where a client can continue other tasks while waiting for a
response from the server.
Domain Name System:
The Domain Name System also called as Domain Name Server helps to identify
domain name of the given ip address, Thus domain name system plays an
important role in development of world wide web as a distributed system.
The Domain Name System (DNS) is the phonebook of the Internet. Humans
access information online through domain names, like cricbuzz.com or espn.com.
Web browsers interact through Internet Protocol (IP) addresses. DNS translates
domain names to IP addresses so browsers can load Internet resources.
The process of DNS resolution involves converting a hostname (such as
www.example.com) into a computer-friendly IP address (such as 192.168.1.1).
An IP address is given to each device on the Internet, and that address is necessary
to find the appropriate Internet device - like a street address is used to find a
particular home. When a user wants to load a webpage, a translation must occur
between what a user types into their web browser (example.com) and the
machine-friendly address necessary to locate the example.com webpage.
DNS Lookup or DNS Resolution can be simply termed as the process that helps
in allowing devices and applications that translate readable domain names to the
corresponding IP Addresses used by the computers for communicating over the
web.
DNS Servers are responsible for translating the domain name into the
corresponding IP Address of the web server hosting the website. Here is the list
of main DNS servers involved in loading a Webpage.
Local DNS Resolver
Root DNS Servers
Top-Level Domain (TLD) DNS Servers
Authoritative DNS Servers
Web Server
This hierarchical system of DNS servers ensures that when you type a domain
name into your web browser, it can be translated into the correct IP address,
allowing you to access the desired webpage on the internet.
Page Rank:
PageRank is a way of measuring the importance of website pages. PageRank
works by counting the number and quality of links to a page to determine a
rough estimate of how important the website is. The underlying assumption is
that more important websites are likely to receive more links from other
websites.
Distributed PageRank is an algorithm used to calculate the importance or ranking
of web pages in a distributed computing environment. It is an extension of the
original PageRank algorithm, which was developed by Larry Page and Sergey
Brin at Google.
The distributed PageRank algorithm typically follows these steps:
1. Partitioning: The web graph is divided into smaller subsets or partitions,
and each partition is assigned to a different machine in the distributed
system. This partitioning allows the computation to be distributed across
multiple machines, enabling parallel processing.
2. Iterative Computation: Each machine processes its assigned portion of the
web graph independently. Initially, each page is assigned an equal
probability of being visited (equal PageRank value). In each iteration, the
machines exchange information about the links between pages and update
the PageRank values based on the incoming links.
3. Communication and Synchronization: To ensure consistency and
accuracy, the machines periodically exchange information about the
PageRank values and links between pages. This communication allows the
distributed system to converge towards a stable and accurate ranking.
4. Convergence: The iterative computation continues until the PageRank
values stabilize, indicating that the algorithm has converged. Typically, a
convergence criterion is defined to determine when the computation can
be considered complete.
The distributed PageRank algorithm aims to distribute the computational load
across multiple machines, allowing for efficient processing of large-scale web
graphs. By dividing the graph into partitions and updating the PageRank values
iteratively, the algorithm calculates the importance of web pages in a distributed
and parallel manner.
Caching Algorithm:
Caching of Web documents improves the response time perceived by the clients.
Cache algorithms play a central role in the response time reduction by selecting
a subset of documents for caching so that an appropriate performance metric is
maximized. At the same time, the cache must take extra steps to guarantee some
form of consistency of the cached data. Cache algorithms enforce appropriate
guarantees about the staleness of documents it stores. Most of the published work
on Web cache design either considers cache consistency algorithms separately
from cache replacement algorithms or concentrates only on studying one of the
two.
Caching in the World Wide Web is implemented through a hierarchical system
of servers and clients, where frequently accessed resources are stored temporarily
closer to the end user, reducing latency and server load. When a client requests a
resource, it first checks its local cache; if the resource is not found, it checks
intermediary caches like proxy servers or content delivery networks (CDNs),
which may have a cached copy. If the resource is still not found, the request is
forwarded to the origin server. This system optimizes performance by minimizing
the need for repeated requests to the origin server, improving efficiency and
responsiveness across the web.
Distributed caching involves storing data across multiple machines or nodes, often
in a network. This type of caching is essential for applications that need to scale
across multiple servers or are distributed geographically. Distributed caching
ensures that data is available close to where it’s needed, even if the original data
source is remote or under heavy load. In E-commerce website by using distributed
caching, product details can be stored across multiple cache servers located in
different regions. When a user accesses the website, the system retrieves product
details from the nearest cache server, ensuring faster response times and a better
user experience.