0% found this document useful (0 votes)
45 views22 pages

Module 1 Notes

Uploaded by

111tvshows
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views22 pages

Module 1 Notes

Uploaded by

111tvshows
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Distributed System (BCS515D)

MODULE-1

1. CHARACTERIZATION OF DISTRIBUTED SYSTEMS: Introduction, Focus on resource sharing,


Challenges.

2. REMOTE INVOCATION: Introduction, Request-reply protocols, Remote procedure call,


Introduction to Remote Method Invocation.

Introduction

A distributed System is a collection of autonomous computer systems that are physically


separated but are connected by a centralized computer network that is equipped with distributed
system software. The autonomous computers will communicate among each system by sharing
resources and files and performing the tasks assigned to them.

OR

A distributed system is a collection of computer programs that utilize computational resources


across multiple, separate computation nodes to achieve a common, shared goal. Also known as
distributed computing or distributed databases, it relies on separate nodes to communicate and
synchronize over a common network. These nodes typically represent separate physical hardware
devices but can also represent separate software processes, or other recursive encapsulated
systems. Distributed systems aim to remove bottlenecks or central points of failure from a system.

Types of Distributed Systems

• There are many models and architectures of distributed systems in use today.

• Internet: A vast interconnected collection of computer networks of many different types.


Passing message by employing a common means of communication (Internet Protocol). The web
is not equal to the Internet.

• Intranets: An intranet is a private network that is contained within an enterprise. It may consist
of many interlinked local area networks and also use leased lines in the Wide Area Network. It
separately administrated and enforces local security policies. It is connected to the Internet via a
router, It uses firewall to protect an

Intranet by preventing unauthorized messages leaving or entering, Some are isolated from the
Internet, Users in an intranet share data by means of file services.

• Client-server systems: The most traditional and simple type of distributed system, involves a
multitude of networked computers that interact with a central server for data storage, processing,
or other common goal.

GMIT, Dept., AIML


Distributed System (BCS515D)
• Peer-to-peer networks: They distribute workloads among hundreds or thousands of computers
all running the same software.

• Cell phone networks: It is an advanced distributed system, sharing workloads among handsets,
switching systems, and internet-based devices.

Example of a Distributed System

Any Social Media can have its Centralized Computer Network as its Headquarters and computer
systems that can be accessed by any user and using their services will be the Autonomous Systems
in the Distributed System Architecture.

Distributed System Software: This Software enables computers to coordinate their activities and
to share the resources such as Hardware, Software, Data, etc.

Database: It is used to store the processed data that are processed by each Node/System of the
Distributed systems that are connected to the Centralized network.

Working-of-Distributed-System

As we can see that each Autonomous System has a common Application that can have its own
data that is shared by the Centralized Database System.

GMIT, Dept., AIML


Distributed System (BCS515D)
• To Transfer the Data to Autonomous Systems, Centralized System should be having a
Middleware Service and should be connected to a Network.

• Middleware Services enable some services which are not present in the local systems or
centralized system default by acting as an interface between the Centralized System and the local
systems. By using components of Middleware Services systems communicate and manage data.
• The Data which is been transferred through the database will be divided into segments or
modules and shared with Autonomous systems for processing.

• The Data will be processed and then will be transferred to the Centralized system through the
network and will be stored in the database.

Characteristics of Distributed System

1. Resource Sharing: It is the ability to use any Hardware, Software, or Data anywhere in the
System.

2. Openness: It is concerned with Extensions and improvements in the system (i.e., How openly
the software is developed and shared with others)

3. Concurrency: It is naturally present in Distributed Systems, that deal with the same activity or
functionality that can be performed by separate users who are in remote locations. Every local
system has its independent Operating Systems and Resources.

4. Scalability: It increases the scale of the system as a number of processors communicate with
more users by accommodating to improve the responsiveness of the system.

5. Fault tolerance: It cares about the reliability of the system if there is a failure in Hardware or
Software, the system continues to operate properly without degrading the performance the
system.

6. Transparency: It hides the complexity of the Distributed Systems to the Users and Application
programs as there should be privacy in every system.

7. Heterogeneity: Networks, computer hardware, operating systems, programming languages,


and developer implementations can all vary and differ among dispersed system components.

Advantages of Distributed System

• Applications in Distributed Systems are Inherently Distributed Applications.

• Information in Distributed Systems is shared among geographically distributed users.

• Resource Sharing (Autonomous systems can share resources from remote locations).

• It has a better price performance ratio and flexibility.

GMIT, Dept., AIML


Distributed System (BCS515D)
• It has shorter response time and higher throughput.

• It has higher reliability and availability against component failure.

• It has extensibility so that systems can be extended in more remote locations and also
incremental growth.

Disadvantages of Distributed System

• Relevant Software for Distributed systems does not exist currently.

• Security possess a problem due to easy access to data as the resources are shared to multiple
systems.

• Networking Saturation may cause a hurdle in data transfer i.e., if there is a lag in the network
then the user will face a problem accessing data.

• In comparison to a single user system, the database associated with distributed systems is much
more complex and challenging to manage.

• If every node in a distributed system tries to send data at once, the network may become
overloaded.

Use cases of Distributed System

• Finance and Commerce: Amazon, eBay, Online Banking, E-Commerce websites.

• Information Society: Search Engines, Wikipedia, Social Networking, Cloud Computing.

• Cloud Technologies: AWS, Salesforce, Microsoft Azure, SAP.

• Entertainment: Online Gaming, Music, youtube.

• Healthcare: Online patient records, Health Informatics.

• Education: E-learning.

• Transport and logistics: GPS, Google Maps.

• Environment Management: Sensor technologies.

Resource Sharing in Distributed System

Resource sharing in distributed systems is very important for optimizing performance, reducing
redundancy, and enhancing collaboration across networked environments. By enabling multiple
users and applications to access and utilize shared resources such as data, storage, and computing
power, distributed systems improve efficiency and scalability.

GMIT, Dept., AIML


Distributed System (BCS515D)
Importance of Resource Sharing in Distributed Systems

Resource sharing in distributed systems is of paramount importance for several reasons:

Efficiency and Cost Savings: By sharing resources like storage, computing power, and data,
distributed systems maximize utilization and minimize waste, leading to significant cost
reductions.

Scalability: Distributed systems can easily scale by adding more nodes, which share the workload
and resources, ensuring the system can handle increased demand without a loss in performance.

Reliability and Redundancy: Resource sharing enhances system reliability and fault tolerance. If
one node fails, other nodes can take over, ensuring continuous operation.

Collaboration and Innovation: Resource sharing facilitates collaboration among geographically


dispersed teams, fostering innovation by providing access to shared tools, data, and
computational resources.

Load Balancing: Efficient distribution of workloads across multiple nodes prevents any single node
from becoming a bottleneck, ensuring balanced performance and preventing overloads.

Types of Resources in Distributed Systems

In distributed systems, resources are diverse and can be broadly categorized into several types:

Computational Resources: These include CPU cycles and processing power, which are shared
among multiple users and applications to perform various computations and processing tasks.

Storage Resources: Distributed storage systems allow data to be stored across multiple nodes,
ensuring data availability, redundancy, and efficient access.

Memory Resources: Memory can be distributed and shared across nodes, allowing applications
to utilize a larger pool of memory than what is available on a single machine.

Network Resources: These include bandwidth and network interfaces, which facilitate
communication and data transfer between nodes in a distributed system.

Data Resources: Shared databases, files, and data streams that are accessible by multiple users
and applications for reading and writing operations.

Peripheral Devices: Devices such as printers, scanners, and specialized hardware that can be
accessed remotely within the distributed network.

GMIT, Dept., AIML


Distributed System (BCS515D)
FOCUS ON RESOURCE SHARING

Resource Sharing Mechanisms

Resource sharing in distributed systems is facilitated through various mechanisms designed to


optimize utilization, enhance collaboration, and ensure efficiency. Some common mechanisms
include:

Client-Server Architecture:

A classic model where clients request services or resources from centralized servers. This
architecture centralizes resources and services, providing efficient access but potentially leading
to scalability and reliability challenges.

Client-server architecture is a fundamental concept in system design where a network involves


multiple clients and a server. Clients are devices or programs that request services or resources,
while the server is a powerful machine providing these resources or services. This architecture
allows efficient data management and resource sharing, making it popular in web applications,
databases, and other network-based systems. By separating roles and distributing tasks, client-
server architecture enhances performance, scalability, and security.

Advantages: Centralized management simplifies resource allocation and access control. It is


suitable for applications where clients primarily consume services or resources from centralized
servers.

Use Cases: Web applications, databases, and enterprise systems where centralized control and
management are critical.

2. Peer-to-Peer (P2P) Networks: Distributed networks where each node can act as both a client
and a server. P2P networks facilitate direct resource sharing between nodes without reliance on
centralized servers, promoting decentralized and scalable resource access.

Distributed networks where each node can act as both a client and a server. P2P networks
facilitate direct resource sharing between nodes without reliance on centralized servers,
promoting decentralized and scalable resource access.

Peer-to-peer (P2P) architecture is a decentralized computing model where network participants


share resources directly with each other without the need for a centralized server. In a P2P
network, each node acts as both a client and a server, enabling distributed sharing of files, data,
and computing resources. This article provides a comprehensive overview of the P2P
architecture, including its characteristics, benefits, types, key components, bootstrapping
process, data management, routing algorithms, challenges, security techniques, and applications.

GMIT, Dept., AIML


Distributed System (BCS515D)
Advantages: Decentralized nature facilitates direct resource sharing between peers without
dependency on centralized servers, enhancing scalability and fault tolerance.

Use Cases: File sharing, content distribution networks (CDNs), and collaborative computing
environments.

3. Distributed File Systems: Storage systems that distribute files across multiple nodes, ensuring
redundancy and fault tolerance while allowing efficient access to shared data.

A Distributed File System (DFS) is a file system that is distributed on multiple file servers or
multiple locations. It allows programs to access or store isolated files as they do with the local
ones, allowing programmers to access files from any network or computer. In this article, we will
discuss everything about Distributed File System

Advantages: Distributes file storage across multiple nodes, providing redundancy, fault tolerance,
and efficient access to shared data.

Use Cases: Large-scale data storage and retrieval systems, such as Hadoop Distributed File System
(HDFS) for big data processing.

3. Load Balancing: Mechanisms that distribute workload across multiple nodes to optimize
resource usage and prevent overload on individual nodes, thereby improving performance and
scalability.

Load Balancer is defined as a networking device or software application that distributes and
balances the incoming traffic among the servers to provide high availability, efficient utilization of
servers, and high performance. A load balancer works as a “traffic cop” sitting in front of your
server and routing client requests across all servers. It simply distributes the set of requested
operations (database write requests, cache queries) effectively across multiple servers and
ensures that no single server bears too many requests.

Advantages: Improves system responsiveness and scalability by preventing bottlenecks.

Use Cases:Online shopping,online booking.

4. Virtualization: Techniques such as virtual machines (VMs) and containers that abstract physical
resources, enabling efficient resource allocation and utilization across distributed environments.

A key idea in modern system design is virtualization, which provides a productive and adaptable
method of making use of hardware resources. Through the creation of virtualized versions of
physical components such as networks, storage, and servers, we can operate several separate
environments on a single physical machine or throughout a distributed system.

GMIT, Dept., AIML


Distributed System (BCS515D)
The framework and techniques used to create and manage virtual instances of computer
resources, such as hardware platforms, operating systems, storage devices, and network
resources, are referred to as virtualization architecture in system design. It makes it possible for
several virtualized instances to operate on a single physical machine, which enhances scalability,
flexibility, and cost- effectiveness while also facilitating effective resource utilization.

Advantages: easy to access, maintain, reduce component usage, avoid unnecessary error
accurance.

Use Cases: organizations, IT companies.

6. Caching: Storing frequently accessed data closer to users or applications to reduce latency and
improve responsiveness, enhancing overall system performance.

Caching is a system design concept that involves storing frequently accessed data in a location
that is easily and quickly accessible. The purpose of caching is to improve the performance and
efficiency of a system by reducing the amount of time it takes to access frequently accessed data.

Typically, web application stores data in a database. When a client requests some data, it is
fetched from the database and then it is returned to the user. Reading data from the database
needs network calls and I/O operation which is a time- consuming process. Cache reduces the
network call to the database and speeds up the performance of the system.

Advantages: High speed storage, Need less memory space.

Use Cases: System devices, Mobile Applications.

Resource Sharing Web


• Equipment’s are shared to reduce cost. Data shared in database or web pages are high-level
resources which are more significant to users without regard for the server or servers that provide
these.

• Patterns of resource sharing vary widely in their scope and in how closely users work together:
• Search Engine: Users need no contact between users.

• Computer Supported Cooperative Working (CSCW): Users cooperate directly share resources.
Mechanisms to coordinate users' action are determined by the pattern of sharing and the
geographic distribution.

• For effective sharing, each resource must be managed by a program that offers a
communication interface enabling the resource to be accessed and updated reliably and
consistently

GMIT, Dept., AIML


Distributed System (BCS515D)
• Server is a running program (a process) on a networked computer that accepts requests from
programs running on other computers to perform a service and responds appropriately. The
requesting processes are referred to as a client.

• An executing web browser is a client. It communicates with a web server to request web pages
from it.

• When a client invokes an operation upon the server, it is called the remote invocation.

• Resources may be encapsulated as objects and accessed by client objects. In this case a client
object invokes a method upon a server object.

The World Wide Web (WWW)


• WWW is an evolving system for publishing and accessing resources and services across Internet.
Web is an open system. Its operations are based on freely published communication standards
and documents standards.

• Key feature: Web provides a hypertext structure among the documents that it stores. The
documents contain links - references to other documents or resources. The structures of links can
be arbitrarily complex and the set of resources that can be added is unlimited.

• Three main standard technological components:

i. HTML (Hypertext Makeup Language) specify the contents and layout of web pages.

ii. URL (Uniform Resource Location): identify a resource to let browser find it.

iii. HTTP (Hypertext Transfer Protocol) defines a standard rule by which browsers and any other
types of client interact with web servers.

Challenges in Resource Sharing in Distributed System


Resource sharing in distributed systems presents several challenges that need to be addressed to
ensure efficient operation and optimal performance:

• Consistency and Coherency: Ensuring that shared resources such as data or files remain
consistent across distributed nodes despite concurrent accesses and updates.

• Concurrency Control: Managing simultaneous access and updates to shared resources to


prevent conflicts and maintain data integrity.

• Fault Tolerance: Ensuring resource availability and continuity of service in the event of node
failures or network partitions.

GMIT, Dept., AIML


Distributed System (BCS515D)
• Scalability: Efficiently managing and scaling resources to accommodate increasing demands
without compromising performance.

• Load Balancing: Distributing workload and resource usage evenly across distributed nodes to
prevent bottlenecks and optimize resource utilization.

• Security and Privacy: Safeguarding shared resources against unauthorized access, data
breaches, and ensuring privacy compliance.

• Communication Overhead: Minimizing overhead and latency associated with communication


between distributed nodes accessing shared resources.

• Synchronization: Coordinating activities and maintaining synchronization between distributed


nodes to ensure consistent and coherent resource access.

Request-reply communication

Request-reply communication is synchronous because the client process blocks until the reply
arrives from the server. It can also be reliable because the reply from the server is effectively
an acknowledgement to the client.
Asynchronous request-reply communication is an alternative that may be useful in situations
where clients can afford to retrieve replies later.

Operations of the request-reply protocol


The request-reply protocol:
• The protocol is based on a trio of communication primitives, doOperation, getRequest
and sendReply,
• This request-reply protocol matches requests to replies. It may be designed to provide
certain delivery guarantees. If UDP datagrams are used, the delivery guarantees must
be provided by the request-reply protocol, which may use the server reply message as
an acknowledgement of the client request message.
• The doOperation method is used by clients to invoke remote operations. Its
arguments specify the remote server and which operation to invoke, together with
additional information (arguments) required by the operation. Its result is a byte array
containing the reply.
• getRequest is used by a server process to acquire service requests.
• sendReply is used to send the reply message to the client. When the reply message is
received by the client the original doOperation is unblocked and execution of the
client program continues.

GMIT, Dept., AIML


Distributed System (BCS515D)

Request-reply message structure

1. Message Identifiers

• Purpose: Essential for managing messages to provide properties like reliable delivery or
request-reply communication. They allow unique referencing of each message.

• Composition: A unique identifier consists of two parts:

1. RequestID: A sequence number (integer) incremented by the sending process for


each new message.

2. Sender Identifier: A unique identifier for the sender process (e.g., its IP address
and port number).

• Uniqueness: The combination ensures the identifier is unique across the entire
distributed system.

• Sequence Wrap-around: When the requestId reaches its maximum value (e.g., 2³² - 1), it
resets to zero. A key assumption is that no message identifier is still in use by the time the
sequence numbers are reused.

2. Failure Model

• Underlying Protocol: If implemented over UDP, the request-reply primitives inherit UDP's
failure characteristics.

GMIT, Dept., AIML


Distributed System (BCS515D)
• Types of Failures:

o Omission Failures: Messages (requests or replies) can be lost.

o No Ordering Guarantees: Messages are not guaranteed to arrive in the order they
were sent.

o Process Failures: Processes are assumed to have crash failures (they stop and do
nothing) rather than Byzantine failures (behaving arbitrarily).

• Handling Failures: The doOperation primitive uses a timeout mechanism to detect


missing replies.

3. Handling Timeouts & Lost Messages

• Simple Approach (Not Recommended): Return an immediate failure to the client upon
timeout. This is problematic because the operation might have been executed on the
server, but the reply was simply lost.

• Standard Approach (Retransmission): The client (doOperation) retransmits the request


message repeatedly until:

o It receives a reply, or

o It becomes clear the server is not responding (not just a lost message).

o Eventually, it returns an exception to the client if no reply is received.

4. The Problem of Duplicate Requests & Idempotency

• Scenario: If a server receives a duplicate request (due to client retransmission) after it has
already sent the reply, it might execute the operation again.

• Idempotent Operations:

o Definition: An operation that can be performed multiple times with the exact
same effect as if it was performed once.

o Example: Adding an element to a set. Doing it once or multiple times results in the
same final state of the set.

o Non-Example: Appending an item to a list. Each execution will extend the list
further.

o Implication: Servers with only idempotent operations do not need special


mechanisms to handle duplicate requests.

GMIT, Dept., AIML


Distributed System (BCS515D)
5. Using a History to Avoid Re-execution

• Purpose: For non-idempotent operations, a server can avoid re-executing an operation by


storing (caching) the results of previous requests in a history.

• What is a History? A data structure that stores records of reply messages that have
already been sent.

o Each entry contains: Request Identifier, Reply Message, and Client Identifier.

• Problem: Memory Management. The history can grow very large.

• Optimization: Since a client typically makes only one request at a time, a new request can
be interpreted as an acknowledgment for the previous reply. Therefore, a server often
only needs to keep the last reply sent to each client.

• Cleanup: History entries are discarded after a limited time to handle cases where a client
terminates without sending another request (which would have acted as an
acknowledgment).

6. Styles of Exchange Protocols

Remote Procedure Call:


1. What is RPC

• Remote Procedure Call (RPC): an abstraction that lets a program call a procedure on
another machine/process as if it were a local call.

• Goal: make distributed programming look like conventional programming by hiding


marshalling, message-passing, and network details (i.e., distribution transparency).

• Introduced by Birrell & Nelson (1984); foundational for many distributed middleware
systems.

GMIT, Dept., AIML


Distributed System (BCS515D)
2. RPC architecture & call flow (step-by-step)

RPC systems provide stubs on client and server sides. The runtime and stubs handle network
details.

Typical sequence for a synchronous RPC call:

1. Client calls a client stub function just like a local call.

2. Client stub:

o Marshals (serializes) parameters into a message.

o Calls RPC runtime to send the request over the network.

3. Network transports request to server host.

4. Server runtime / server stub:

o Receives request, unmarshals parameters.

o Invokes the actual procedure in server process.

o Marshals the result into a reply message.

5. Reply sent back to client runtime.

6. Client stub unmarshals reply and returns result to the caller.

Design issues for RPC

Before looking at the implementation of RPC systems, we look at three issues that are important
in understanding this concept:

• the style of programming promoted by RPC – programming with interfaces;

GMIT, Dept., AIML


Distributed System (BCS515D)
• the call semantics associated with RPC;

• the key issue of transparency and how it relates to remote procedure calls.

Programming with interfaces • Most modern programming languages provide a means of


organizing a program as a set of modules that can communicate with one another.
Communication between modules can be by means of procedure calls between modules or by
direct access to the variables in another module. In order to control the possible interactions
between modules, an explicit interface is defined for each module. The interface of a module
specifies the procedures and the variables that can be accessed from other modules. Modules
are implemented so as to hide all the information about them except that which is available
through its interface. So long as its interface remains the same, the implementation may be
changed without affecting the users of the module.

Interfaces in distributed systems: In a distributed program, the modules can run in separate
processes. In the client-server model, in particular, each server provides a set of procedures that
are available for use by clients. For example, a file server would provide procedures for reading
and writing files. The term service interface is used to refer to the specification of the procedures
offered by a server, defining the types of the arguments of each of the procedures.

There are a number of benefits to programming with interfaces in distributed systems, stemming
from the important separation between interface and implementation:

• As with any form of modular programming, programmers are concerned only with the
abstraction offered by the service interface and need not be aware of implementation details.

• Extrapolating to (potentially heterogeneous) distributed systems, programmers also do not


need to know the programming language or underlying platform used to implement the service
(an important step towards managing heterogeneity in distributed systems).

• This approach provides natural support for software evolution in that implementations can
change as long as long as the interface (the external view) remains the same. More correctly, the
interface can also change as long as it remains compatible with the original.

RPC call semantics • Request-reply protocols were discussed in Section 5.2, where we showed
that doOperation can be implemented in different ways to provide different delivery guarantees.
The main choices are:

➢ Retry request message: Controls whether to retransmit the request message until
either a reply is received or the server is assumed to have failed.
➢ Duplicate filtering: Controls when retransmissions are used and whether to filter out
duplicate requests at the server.

GMIT, Dept., AIML


Distributed System (BCS515D)
➢ Retransmission of results: Controls whether to keep a history of result messages to
enable lost results to be retransmitted without re-executing the operations at the
server.

Combinations of these choices lead to a variety of possible semantics for the reliability of
remote invocations as seen by the invoker. Note that for local procedure calls, the
semantics are exactly once, meaning that every procedure is executed exactly once
(except in the case of process failure). The choices of RPC invocation semantics are defined
as follows.

Maybe semantics: With maybe semantics, the remote procedure call may be executed
once or not at all. Maybe semantics arises when no fault-tolerance measures are applied
and can suffer from the following types of failure:

• omission failures if the request or result message is lost;

• crash failures when the server containing the remote operation fails.

If the result message has not been received after a timeout and there are no retries, it is
uncertain whether the procedure has been executed. If the request message was lost,
then the procedure will not have been executed. On the other hand, the procedure may
have been executed and the result message lost. A crash failure may occur either before
or after the procedure is executed. Moreover, in an asynchronous system, the result of
executing the procedure may arrive after the timeout. Maybe semantics is useful only for
applications in which occasional failed calls are acceptable.

At-least-once semantics: With at-least-once semantics, the invoker receives either a


result, in which case the invoker knows that the procedure was executed at least once, or
an exception informing it that no result was received. At-least-once semantics can be
achieved by the retransmission of request messages, which masks the omission failures
of the request or result message. At-least-once semantics can suffer from the following
types of failure:

• crash failures when the server containing the remote procedure fails;

• arbitrary failures – in cases when the request message is retransmitted, the remote
server may receive it and execute the procedure more than once, possibly causing wrong
values to be stored or returned.

At-most-once semantics: With at-most-once semantics, the caller receives either a result,
in which case the caller knows that the procedure was executed exactly once, or an
exception informing it that no result was received, in which case the procedure will have

GMIT, Dept., AIML


Distributed System (BCS515D)
been executed either once or not at all. At-most-once semantics can be achieved by using
all of the fault-tolerance measure.

Remote method Invocation:


Remote method invocation (RMI) is closely related to RPC but extended into the world of
distributed objects. In RMI, a calling object can invoke a method in a potentially remote
object. As with RPC, the underlying details are generally hidden from the user. The
commonalities between RMI and RPC are as follows:

• They both support programming with interfaces, with the resultant benefits that stem
from this approach.

• They are both typically constructed on top of request-reply protocols and can offer a
range of call semantics such as at-least-once and at-most-once.

• They both offer a similar level of transparency – that is, local and remote calls employ
the same syntax but remote interfaces typically expose the distributed nature of the
underlying call, for example by supporting remote exceptions.

Design issues for RMI

RMI shares the same design issues as RPC in terms of programming with interfaces, call
semantics and level of transparency. The key added design issue relates to the object
model and, in particular, achieving the transition from objects to distributed objects. We
first describe the conventional, single-image object model and then describe the
distributed object model.

The object model: An object-oriented program, for example in Java or C++, consists of a
collection of interacting objects, each of which consists of a set of data and a set of
methods. An object communicates with other objects by invoking their methods,
generally passing arguments and receiving results. Objects can encapsulate their data and
the code of their methods.

Object references: Objects can be accessed via object references. For example, in Java, a
variable that appears to hold an object actually holds a reference to that object. To invoke
a method in an object, the object reference and method name are given, together with
any necessary arguments. The object whose method is invoked is sometimes called the
target and sometimes the receiver. Object references are first-class values, meaning that
they may, for example, be assigned to variables, passed as arguments and returned as
results of methods.

GMIT, Dept., AIML


Distributed System (BCS515D)
Interfaces: An interface provides a definition of the signatures of a set of methods (that
is, the types of their arguments, return values and exceptions) without specifying their
implementation. An object will provide a particular interface if its class contains code that
implements the methods of that interface. In Java, a class may implement several
interfaces, and the methods of an interface may be implemented by any class. An interface
also defines types that can be used to declare the type of variables or of the parameters
and return values of methods. Note that interfaces do not have constructors.

Actions: Action in an object-oriented program is initiated by an object invoking a method


in another object. An invocation can include additional information (arguments) needed
to carry out the method. The receiver executes the appropriate method and then returns
control to the invoking object, sometimes supplying a result.

Exceptions: Programs can encounter many sorts of errors and unexpected conditions of
varying seriousness. During the execution of a method, many different problems may be
discovered: for example, inconsistent values in the object’s variables, or failures in
attempts to read or write to files or network sockets. When programmers need to insert
tests in their code to deal with all possible unusual or erroneous cases, this detracts from
the clarity of the normal case. Exceptions provide a clean way to deal with error conditions
without complicating the code. In addition, each method heading explicitly lists as
exceptions the error conditions it might encounter, allowing users of the method to deal
with them. A block of code may be defined to throw an exception whenever particular
unexpected conditions or errors arise. This means that control passes to another block of
code that catches the exception. Control does not return to the place where the exception
was thrown.

Garbage collection: It is necessary to provide a means of freeing the space occupied by


objects when they are no longer needed. A language such as Java, that can detect
automatically when an object is no longer accessible recovers the space and makes it
available for allocation to other objects. This process is called garbage collection. When a
language (for example, C++) does not support garbage collection, the programmer has to
cope with the freeing of space allocated to objects. This can be a major source of errors.

Implementation of RMI
Remote Method Invocation (RMI) enables an object in one process (client) to invoke
methods of an object in another process (server), as if it were local. Several components
work together to make this possible.

GMIT, Dept., AIML


Distributed System (BCS515D)

1. Communication Module

• Responsible for transferring request and reply messages between client and server.
• Implements the request-reply protocol.
• Contents of messages:
o Message type (request or reply)
o Request ID (unique ID for matching reply)
o Remote object reference (target object)
• Ensures the required invocation semantics (e.g., at-most-once).
• On the server side:
o Uses the request’s remote object reference to find the correct local object.
o Passes the request to the dispatcher.

2. Remote Reference Module

• Manages translation between local and remote object references.


• Maintains a Remote Object Table containing:
o Remote objects in the server.
o Proxies in the client.
• Responsibilities:
o Create remote object references when objects are shared.
o Translate remote references into local references (either a proxy on the client or
servant on the server).
o When unknown references arrive, it creates a new proxy and adds it to the table.
• Works closely with marshalling and unmarshalling during parameter passing.

GMIT, Dept., AIML


Distributed System (BCS515D)
3. Servants

• A servant is the actual implementation of a remote object.


• Located in the server process.
• Created when remote objects are instantiated.
• Handle incoming requests after skeletons forward them.
• Destroyed when no longer needed (garbage collected).

4. RMI Software (Middleware Layer)

This layer sits between the application objects and the communication modules.
(a) Proxy (Client Side)
• Acts as a local representative of the remote object.
• Responsibilities:
o Hides complexity of communication.
o Marshals arguments + operation ID + target reference → into request message.
o Sends message to server and waits for reply.
o Unmarshals results and returns them to the client.
• Each remote object has one proxy per client process.
• Makes invocation transparent (client sees it like a normal method call).

(b) Dispatcher (Server Side)


• Each class of remote object has one dispatcher.
• Responsibilities:
o Receives request messages from the communication module.
o Uses operation ID to find the correct method.
o Passes the request to the skeleton.

(c) Skeleton (Server Side)


• Each class of remote object has a skeleton.
• Responsibilities:
o Unmarshals arguments from request.
o Calls the correct method on the servant.
o Collects results/exceptions → marshals them → sends back to proxy.

5. Generation of Proxy, Dispatcher, Skeleton

• Typically generated automatically by an interface compiler.

GMIT, Dept., AIML


Distributed System (BCS515D)
• Examples:
o CORBA: Defined in IDL → compiler generates proxy/dispatcher/skeleton in
Java/C++.
o Java RMI:
▪ Remote methods defined in a Java interface.
▪ rmic compiler generates proxy, dispatcher, skeleton classes automatically.

6. Dynamic Invocation

• Problem: Sometimes the remote interface is not known at compile time.


• Dynamic Invocation Interface (DII):
o Client provides: remote reference + method name + arguments.
o Sends request using a generic doOperation() method.
o Waits for results.
• Less convenient than proxies, but useful when:
o Interfaces are not predictable at design time.
o Example: Shared whiteboard (can add new shapes at runtime).
• CORBA supports DII via Interface Repository.
• Java RMI solves this with dynamic class downloading.

7. Server and Client Programs

• Server program contains:


o Servant classes (implementation of remote objects).
o Dispatcher and skeletons.
o Initialization code (e.g., main()) to create servants and register them with binder.
• Client program contains:
o Proxy classes for remote objects.
o Uses binder to lookup remote references.

8. Factory Methods

• Remote objects cannot have constructors invoked remotely.


• Solution: Use factory methods (normal remote methods) that create and return new
remote objects.
• Example: A “factory object” can create servants for clients on demand.

GMIT, Dept., AIML


Distributed System (BCS515D)
Distributed Garbage Collection:
• In a distributed system, objects may live on one machine (server) but be referenced by
another machine (client).
• Normal garbage collection (like in Java) works locally, but in a distributed setup we need
to ensure that objects are not deleted while still being referenced remotely.
• Goal of DGC:
o If any client or local object still has a reference → object must stay alive.
o If no one holds a reference (neither local nor remote) → object can be garbage
collected.

How Java’s Distributed Garbage Collection Works

1. Tracking Holders

• Each server process keeps track of which clients are holding references to its remote
objects.
• For object B, the server maintains B.holders → a set of clients that have a proxy for B.
• This info is stored in the remote object table.

2. Adding a Reference

• When a client C first receives a remote reference to object B:


1. Client sends addRef(B) to the server.
2. Server adds C to B.holders.
3. Client creates a proxy for B to use locally.

3. Removing a Reference

• When client C’s garbage collector sees that proxy B is no longer needed:
1. Client sends removeRef(B) to the server.
2. Server removes C from B.holders.
3. Client deletes its proxy.

GMIT, Dept., AIML

You might also like