CHAPTER – 1
INTRODUCTION TO DISTIBUTED
SYSTEMS
INTRODUCTION TO DISTIBUTED SYSTEMS
• A distributed system is a network of independent
computers that presents itself as a single, coherent system
to users.
• Despite being composed of multiple autonomous
components, these systems collaborate to solve problems
that cannot be addressed by individual entities.
Key characteristics include:
• Autonomy: Each component operates independently.
• Coherence: Users perceive the system as a unified whole,
even though it consists of interconnected parts.
• Distributed systems have existed since the universe
began, illustrating the importance of cooperation among
independent entities to achieve common goals.
Distributed systems exhibit several distinctive features:
• No Shared Memory: Communication between components
occurs through message passing rather than shared memory,
ensuring isolation and independence.
• Heterogeneity: Each component operates on its own local
operating system, allowing for a variety of hardware and software
environments to coexist.
• Multiple Independent Resources: From a hardware perspective,
distributed systems consist of several independent yet cooperating
resources.
• Single Unified System: From a software perspective, users
experience the system as a cohesive entity despite its distributed
nature.
Examples
• Common applications of distributed systems include:
• Web Services: Encompassing various online applications.
• File Sharing: Facilitating the distribution and access of files
across networks.
• Scientific Computing: Leveraging multiple resources for
complex calculations.
• Peer-to-Peer Networks: Enabling direct communication and
resource sharing among users.
Why Distributed Systems?
Distributed systems are implemented for several key reasons:
• Single-System Image: They provide a unified interface by hiding
internal structures and communication complexities from users.
• Easily Expandable: Users can seamlessly add new computers without
disruption to the system.
• Continuous Availability: The system can maintain functionality despite
failures in individual components, as other components can take over.
• Middleware Support: Middleware serves as an intermediary layer,
facilitating communication and data management across different
components.
Why Distributed Systems?
Distributed systems are implemented for several key reasons:
• Parallel Activities: Autonomous components can execute tasks
concurrently, enhancing efficiency.
• Message Passing Communication: The system relies on message
passing rather than shared memory, promoting independence among
components.
• Resource Sharing: Distributed systems enable the sharing of resources
such as printers, databases, and other services.
• Lack of Global State Knowledge: No single process possesses
knowledge of the entire system's current state, adding to the system's
robustness and flexibility.
Fig 1.1 a distributed system organized as middleware.
The middleware layer extends over multiple machines, and offers
each application the same interface.
Middleware in Distributed Systems
• To accommodate heterogeneous computers and networks
while presenting a unified system view, distributed
systems utilize a layer of software known as middleware.
This middleware functions as an intermediary between:
• Higher-Level Layer: Comprising users and applications.
• Lower-Level Layer: Consisting of operating systems and
basic communication facilities.
Middleware in Distributed Systems
• Fig. 1-1 shows Unified Interface: Middleware offers a
consistent interface for applications, facilitating
communication among components of a distributed
application and between different applications.
• Support for Distribution: For instance, an application B
can be distributed across multiple computers, such as
computers 2 and 3, while maintaining a cohesive
operational experience.
• Abstraction of Differences: Middleware abstracts the
underlying hardware and operating system differences,
allowing applications to operate without needing to
manage these variances directly.
Role of Middleware in Distributed Systems
• The middleware is the distributed software that drives the
distributed system, while providing transparency of heterogeneity
at the platform level.
• In some early research systems: MW tried to provide the illusion
that a collection of separate machines was a single computer.
– E.g. NOW project: GLUNIX middleware
• Today:
– clustering software allows independent computers to work
together closely
– MW also supports seamless access to remote services, doesn’t
try to look like a general-purpose OS
Role of Middleware in Distributed Systems
Examples of Middleware
• CORBA (Common Object Request Broker Architecture)
• DCOM (Distributed Component Object Model), now being
replaced by .NET
• Sun’s ONC RPC (Remote Procedure Call)
• RMI (Remote Method Invocation)
• SOAP (Simple Object Access Protocol)
Standards
• Middleware is often built on standards such as the Object
Management Group's (OMG) CORBA and the RPC
mechanism.
• The RPC allows for remote procedure calls in a manner
similar to local procedure calls, where the procedure code
may reside on a different machine, and a message is sent
across the network to invoke it.
• This architecture simplifies communication and
interaction between distributed applications, enhancing
their functionality and usability.
Goals of Distributed Computing Systems
• Distributed computing systems have evolved since the 1970s,
alongside the development of the Internet and ARPANET.
• Initially, the focus was on issues like remote data access, file
system design, and directory structures.
• As technology has advanced, new challenges have emerged,
particularly with the rise of high-speed internet and distributed
applications.
Goals of Distributed Computing Systems
• Resource Accessibility: A primary goal is to ensure that resources
are easily accessible to users, regardless of their physical location.
• Transparency: The system should effectively hide the
complexities of resource distribution across the network, providing
a seamless experience for users.
• Openness: The architecture should be open, allowing for
integration with other systems and flexibility in adding new
components.
• Scalability: The system must be scalable, meaning it can efficiently
handle increased loads and accommodate growth without
significant reconfiguration.
Types of Distributed Systems
• Distributed Computing Systems
– Clusters
– Grids
– Clouds
• Distributed Information Systems
– Transaction Processing Systems
– Enterprise Application Integration
• Distributed Embedded/Pervasive Systems
– Home systems
– Sensor networks
Types of Distributed Systems
Distributed Computing Systems
Cluster Computing
• A collection of similar processors (PCs, workstations) running the same
operating system, connected by a high-speed LAN.
• Parallel computing capabilities using inexpensive PC hardware
• Replace big parallel computers (MPPs)
Cluster Types & Uses
• High Performance Clusters (HPC)
– run large parallel programs
– Scientific, military, engineering apps; e.g., weather modeling
• Load Balancing Clusters
– Front end processor distributes incoming requests
– server farms (e.g., at banks or popular web site)
• High Availability Clusters (HA)
– Provide redundancy – back up systems
– May be more fault tolerant than large mainframes
Types of Distributed Systems
Grid Computing Systems
• Modeled loosely on the electrical grid.
• Highly heterogeneous with respect to hardware, software, networks,
security policies, etc.
• Grids support virtual organizations: a collaboration of users who pool
resources (servers, storage, databases) and share them
Grid software is concerned with managing sharing across administrative
domains.
• Similar to clusters but processors are more loosely coupled, tend to be
heterogeneous, and are not all in a central location.
• Can handle workloads similar to those on supercomputers, but grid
computers connect over a network (Internet?) and supercomputers’
CPUs connect to a high-speed internal bus/network
• Problems are broken up into parts and distributed across multiple
computers in the grid – less communication
Types of Distributed Systems
Cloud Computing
• Provides scalable services as a utility over the Internet.
• Often built on a computer grid
• Users buy services from the cloud
–Grid users may develop and run their own software
Types of Distributed Systems
• Distributed Information Systems
• Business-oriented
• Systems to make a number of separate network applications
interoperable and build “enterprise-wide information systems”.
• Two types discussed here:
– Transaction processing systems
– Enterprise application integration (EAI)
• Transaction Processing Systems
• Provide a highly structured client-server approach for database
applications
• Transactions are the communication model
• Obey the ACID properties:
– Atomic: all or nothing
– Consistent: invariants are preserved
– Isolated (serializable)
– Durable: committed operations can’t be undone
• Transaction Processing Systems
Example primitives for transactions
Transactions
• Transaction processing may be centralized (traditional client/server
system) or distributed.
• A distributed database is one in which the data storage is
distributed – connected to separate processors.
• A nested transaction is a transaction within another transaction (a
sub-transaction)
– Example: a transaction may ask for two things (e.g., airline
reservation info + hotel info) which would spawn two nested
transactions
• Primary transaction waits for the results.
– While children are active parent may only abort, commit, or
spawn other children
Enterprise Application Integration
• Less structured than transaction-based systems
• EA components communicate directly
– Enterprise applications are things like HR data, inventory
programs
– May use different OSs, different DBs but need to interoperate
sometimes.
• Communication mechanisms to support this include Common
Object Request Broker Architecture(CORBA), Remote Procedure
Call (RPC) and Remote Method Invocation (RMI)
Distributed Pervasive Systems
• The first two types of systems are characterized by their
stability: nodes and network connections are more or less
fixed
• This type of system is likely to incorporate small, battery-
powered, mobile devices
– Home systems
– Electronic health care systems – patient monitoring
– Sensor networks – data collection, surveillance