0% found this document useful (0 votes)
5 views34 pages

Introduction To Distributed Systems

Uploaded by

Manasa H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views34 pages

Introduction To Distributed Systems

Uploaded by

Manasa H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

CITY ENGINEERING COLLEGE

BENGALURU

MANASA M
Assistant Professor
DEPARTMENT OF INFORMATION SCIENCE
AND ENGINEERING
DISTRIBUTED SYSTEMS
SEMESTER : 5
Course code: BCS515D
CIE MARKS : 50 (20 OUT OF 50)
SEE MARKS: 50 (18 OUT OF 50)
TOTAL MARKS : 100 (40 OUT OF 100)
CREDITS : 03
TOTAL HOURS OF PEDAGOGY : 3
EXAMINATION TYPE(SEE) : THEORY
Textbook’s:
George Coulouris, Jean Dollimore and Tim Kindberg, “Distributed Systems Concepts and
Design”, Fifth Edition, Pearson Education, 2012.

Web links and Video Lectures (e-Resources):


[Link]

Continuous Internal Evaluation:


• For the Assignment component of the CIE, there are 25 marks and for the Internal
Assessment Test component, there are 25 marks.
• The first test will be administered after 40-50% of the syllabus has been covered, and
the second test will be administered after 85-90% of the syllabus has been covered.
• Any two assignment methods mentioned in the 220B2.4, if an assignment is project-
based then only one assignment for the course shall be planned. The teacher should not
conduct two assignments at the end of the semester if two assignments are planned.
• For the course, CIE marks will be based on a scaled-down sum of two tests and other
methods of assessment.
• Semester-End Examination:
• Theory SEE will be conducted by University as per the scheduled
timetable, with common question papers for the course (duration 03
hours).
• The question paper will have ten questions. Each question is set for 20
marks.
• There will be 2 questions from each module. Each of the two questions
under a module (with a maximum of 3 sub-questions), should have a
mix of topics under that module.
• The students have to answer 5 full questions, selecting one full question
from each module.
• Marks scored shall be proportionally reduced to 50 marks
Internal Assessment Test question paper is designed to attain the
different levels of Bloom’s taxonomy as per the outcome defined for the
course.

Level Cognitive Process Example Action Verbs


define, list, memorize, recall,
1
1️⃣ Remember identify
explain, describe, classify,
2️⃣ Understand summarize
solve, use, implement,
3️⃣ Apply demonstrate
compare, contrast, differentiate,
4️⃣ Analyze
test
5️⃣ Evaluate critique, defend, support, judge
design, construct, develop,
6️⃣ Create formulate
Course outcome (Course Skill Set)
At the end of the course, the student will be able to:
CO1: Identify the goals and challenges of distributed systems
CO2: Demonstrate the remote invocation techniques for communication
CO3: Describe the architecture of distributed file systems and name
services
CO4: Apply clock synchronization algorithms to monitor and order the
events
CO5: Analyze the performance of mutual exclusion, election and
consensus algorithms
CO6: Illustrate the fundamental concepts and algorithms related to
distributed transactions and replication
Syllabus
Module-1
• CHARACTERIZATION OF DISTRIBUTED SYSTEMS:
Introduction, Focus on resource sharing, Challenges.
• REMOTE INVOCATION:
Introduction, Request-reply protocols, Remote procedure call, Introduction to Remote Method
Invocation.
Textbook: Chapter - 1.1, 1.4, 1.5, 5.1–5.5

Module-2
• DISTRIBUTED FILE SYSTEMS:
Introduction, File service architecture.
• NAME SERVICES:
Introduction, Name services and the Domain Name System, Directory services.
Textbook: Chapter - 12.1, 12.2, 13.1–13.3
Module-3
• TIME AND GLOBAL STATES:
Introduction, Clocks, events and process states, Synchronizing Physical clocks, Logical time and logical
clocks, Global states.
Textbook: Chapter – 14.1 - 14.5

Module-4
• COORDINATION AND AGREEMENT:
Introduction, Distributed mutual exclusion, Elections, Coordination and agreement in group
communication, Consensus and related problems.
Textbook: Chapter - 15.1–15.5

Module-5
• DISTRIBUTED TRANSACTIONS:
Introduction, Flat and nested distributed transactions, Atomic commit protocols, Concurrency control in
distributed transactions, Distributed deadlocks, Transaction recovery.
• REPLICATION:
Introduction.
Introduction to Distributed Systems
 A distributed system is a collection of independent computers (called nodes or
processes) that appear to the users as a single coherent system.
 These computers communicate and coordinate with each other by passing messages over
a network.
 DS is a collection of autonomous computers linked by a computer network and equipped
with distributed system software’s.
 Instead of doing all the work on one computer, the work is divided among multiple
computers, which:
 Share resources
 Coordinate tasks
 Communicate via a network (like the internet)
 Distributed System software shares the resources of the system hardware, software and
the Data.
Feature / Focus Distributed Systems

Main Focus High-level design & communication in DS

Key Topics Consistency, Fault Tolerance, Scalability

Layer of Study Application and middleware layers

Example Systems Google Cloud, Apache Kafka

Academic Treatment Networking, distributed algorithms


Key Features of Distributed Systems:
• Multiple Autonomous Nodes
Each computer (node) operates independently and has its own local memory and
processor.
• Communication via Messages
Nodes interact through network-based communication — usually using message passing.
• No Shared Memory
Unlike parallel systems, distributed systems do not share memory; everything is done over the
network.
• Transparency
Distributed systems aim to hide the complexity of the underlying system from users (e.g.,
location transparency, access transparency).
• Fault Tolerance
The system continues to work even if one or more nodes fail.
• Scalability
It can easily grow by adding more nodes without much change to the architecture.
• Examples of Distributed Systems:
• 🌐 The Internet
• 📧 Email systems
• Cloud computing platforms (like AWS, Google Cloud)
• 📱 Distributed apps like WhatsApp or Google Docs
• Distributed databases (like Cassandra, MongoDB in cluster mode)

Relation to Computer system components


Hardware Components : Computers/Nodes, Processors, Storage Devices, Network devices
Operating System: It Manages the hardware resources on each node.
Middleware : It is a software layer between the OS and the application.
Role in DS:
• Provides a uniform interface to hide distribution details.
• Manages communication, security, naming, synchronization, and more.
• Examples: CORBA, Java RMI, Apache Kafka.
Network: Connect all the components, Enables message passing and resource sharing
Application Layer: services users interact with, runs on top of the middleware
Characteristics of DS (Challenges)
• Concurrency: Simultaneously Program Execution is the norm & sharing
resources such as web pages or files. The capacity of the system to handle
shared resources can be increased by adding more resources.
• No Global Clock: Different programs or computers need to work together.
To coordinate actions, they often rely on time. There is no single, accurate,
global clock that all computers in DS can be perfectly follow.
• Independent Failure: Components can fail separately in a DS
Types of Failures
1) Network Failure
2) Computer Crash
3) Slow Communication
No. Characteristic Description Simple Example

Components (computers, printers, databases) share Multiple computers access a central database or shared
1
1️⃣ Resource Sharing
resources across the system. printer.
Multiple processes run at the same time on different
2️⃣ Concurrency 1000 users book tickets on IRCTC at the same time.
machines.
System can grow easily (add more machines) without Amazon handles millions of users during sales by adding
3️⃣ Scalability
breaking down. servers.
If one Google server fails, another takes over without
4️⃣ Fault Tolerance If one part fails, the system still works correctly.
you noticing.
The user doesn’t know how many machines or where the You search on Google — it looks simple, but many
5️⃣ Transparency
services are located. servers work behind the scenes.
System follows standard protocols and can integrate with Your Android phone connects to a smart TV or cloud
6️⃣ Openness
other systems easily. storage.
Different types of hardware, OS, and networks can be Windows laptop, Android phone, and MacBook all
7️⃣ Heterogeneity
used together. access the same cloud drive.
Authentication, authorization, and encryption are used Online banking uses OTP and encryption for secure
8️⃣ Security
to protect data and access. access.
Time taken for data to travel between nodes; affects Watching YouTube may buffer if the server is far or
9️⃣ Latency
system performance. network is slow.

🔟
Location Users don’t need to know where a resource (file, service) You open Google Drive — it doesn’t show which data
Independence is stored. center stores your file.
Focus on Resource Sharing
• In distributed systems, multiple users or applications share hardware or software resources over a
network.
• There are two levels of sharing:
Low-level sharing: Sharing of hardware (like printers, disks) to save cost.
High-level sharing: Sharing of software services or data, which is more important to users.

•Technically, client and server refer to processes (programs), not computers.


•Client: A program that makes requests (like a browser).
•Server: A program that responds to those requests (like a web server).
•In this approach, requests are sent in messages from clients to a server and replies are sent in
messages from the server to the clients. When the client sends a request for an operation to
be carried out, we say that the client invokes an operation upon the server.
•Remote Invocation: A complete interaction between client and server, from the point when
the client sends a request to a remote server and gets a reply (e.g., your browser fetching a
webpage from a web server).
Different Patterns of sharing
Type of Sharing Example Details
Millions of users use it independently;
Global Sharing Search engine
no coordination needed.
Computer Supported Cooperative Users work together, need
Team editing shared document
Working (CSCW) coordination (e.g., Google Docs).

A service is a software component that manages a resource and provides operations (APIs) to use it
Service Operations
File Service Read, write, delete files
Printing Service Send document, check queue
Payment Service Pay, refund, check balance
•In distributed systems, resources are hidden ("encapsulated") inside the computer that owns them.
•To access a resource, you must go through a communication interface — a structured way to
send/receive messages.
•This ensures reliable and consistent access.
In object-oriented distributed systems:
•Resources are modeled as objects.
•A client object sends a message (invokes a method) on a server object.
•This keeps the system modular and organized.
Heterogeneity: Challenges
Heterogeneity refers to variety and differences in the components that make up distributed systems.
Applies to:
•Networks
•Hardware
•Operating systems
•Programming languages
•Implementations by different developers

Network Heterogeneity
•The internet includes different types of networks (e.g., Ethernet, wireless, etc.).
•Despite differences, Internet protocols (like TCP/IP) allow communication across all networks.
•Each computer uses an appropriate implementation of Internet protocols based on its network type.
Hardware Heterogeneity
•Different computers may represent data types differently (e.g., byte order in integers).
•These differences must be resolved when programs communicate across hardware types.
Operating System Heterogeneity
•All OSes must implement Internet protocols, but:
• They may offer different APIs (e.g., UNIX vs Windows).
• This affects how programs exchange messages.
Programming Language Heterogeneity
• Programming languages use different data representations (characters, arrays, structures).
• Programs in different languages can’t directly communicate unless translation or standardization
is applied.
•Programs from different developers must follow common standards to communicate.
Examples of standards:
•Internet protocols
•Data formats
•Interface definitions
Middleware
•Middleware is a software layer that:
•Hides heterogeneity in networks, OS, hardware, and languages.
•Provides a programming abstraction for developers.
Example middleware:
•CORBA (Common Object Request Broker Architecture)
•Java RMI (Remote Method Invocation) supports only single programming language.
•Implemented over internet Protocols
• It provides a uniform computational model for programmers of servers and distributed applications.
Middleware supports:
• Remote object invocation
• Remote event notification
• Remote SQL access
• Distributed transaction processing
Example: CORBA allows remote objects invocation on different machines to invoke each
other’s methods seamlessly.
Heterogeneity and Mobile code: Code that can move between machines and run at the destination.
•Example: Java applets
•Problem: Executable code is often platform-specific.
Virtual Machine
Compilers generate code for a virtual machine, not for specific hardware.
•Example:
• Java compiler → produces bytecode for the Java Virtual Machine (JVM)
• JVM must be implemented on each type of system.
•JavaScript in web pages is a modern and common example of mobile code.
•It runs inside browsers on different devices, enabling dynamic content.
Openness
• Openness refers to how easily a system can be extended or reimplemented.
• In distributed systems, this means new resource-sharing services can be added and used by various
client programs.
• Openness depends on publishing and documenting key software interfaces.
• These published interfaces allow different developers to understand how to interact with or extend
the system.
• While similar to standardization, publishing doesn't always follow formal (slow) standards processes.
•Distributed systems can be complex due to components developed by different people or organizations
•Designers must manage this complexity to ensure the system remains extensible and maintainable.
•Internet protocols are described in RFCs (Request for Comments) (e.g., for file transfer, email, telnet).
•Started in the 1980s, RFCs are public documents that help developers build interoperable internet
systems.
•RFCs include both specifications and discussions and are available at [Link].
•Another example of publishing standards is W3C which develops standards for Web technologies
(e.g., HTML, CSS).
•Website: [Link]
•Open Distributed systems can be extended in two main ways:
•Hardware level: Adding more computers to the network.
•Software level: Adding new services or improving existing ones.
•Goal: Allow applications to share resources seamlessly.
•Open systems promote independence from specific vendors.
•Developers can choose components from different sources as long as they follow the published
interfaces.

A further benefit that is often cited for open systems is their independence from individual vendors.
To summarize:
• Open systems are characterized by the fact that their key interfaces are published.
• Open distributed systems are based on the provision of a uniform communication mechanism and
published interfaces for access to shared resources.
• Open distributed systems can be constructed from heterogeneous hardware and software, possibly
from different vendors. But the conformance of each component to the published standard must be
carefully tested and verified if the system is to work correctly
Security in Distributed Systems
•In distributed systems, valuable data (like hospital records, banking info, etc.) is shared across multiple
computers.
•This makes security very important to protect against:
•Unauthorized access
•Tampering
•Disruption of services
Three Key Components of Security
• Confidentiality
→ Prevent unauthorized people from seeing the data.
• Integrity
→ Prevent unauthorized modifications to data.
• Availability
→ Ensure that authorized users can access the data when needed.
Security Risks in Internet and Intranets
• The Internet connects programs on different computers freely, which also creates security risks.
• Even within an intranet (private network), internal users can misuse data.
• Firewalls help by limiting outside access, but:
• They don’t control internal misuse.
• They don’t protect external systems that lack firewalls.
• Two Major Security Challenges
• 1. Secure Communication
• Ensuring data is safely sent over the network.
• Requires:
• Encryption (to hide data from attackers)
• Authentication (to verify identity of sender/receiver)
• 2. Correct Identification
• Systems must:
• Verify that the user is who they claim to be (e.g., doctor, customer).
• Ensure users are interacting with the real system (e.g., real bank website).
• Encryption and digital certificates are used to achieve this.
• Unresolved Security Issues
1. Denial of Service (DoS) Attacks
• Attackers flood a service with too many requests.
• Genuine users can’t access the service.
• No perfect solution yet:
• Current actions: Trace and punish attackers (after the attack).
• Future focus: Improve network management to prevent such attacks.
2. Security of Mobile Code
• Mobile code = Programs sent across networks (e.g., email attachments, web
applets).
• Risks:
• Code might appear safe but could steal data or crash systems.
• Example: An email attachment that looks like a photo viewer may secretly steal
files.
• Needs special security measures.
Scalability
• A scalable distributed system continues to function efficiently even as the number of users and resources grows — from small
local networks (intranet) to the massive global Internet.
Key Challenges in Designing Scalable Distributed Systems
• Controlling Cost of Physical Resources
• When more users join, more servers or hardware might be needed.
• The cost should grow proportionally with users (called O(n) in complexity terms).
• Example: If 1 file server can serve 20 users, then 2 servers should serve 40 users.
• Controlling Performance Loss
• More users and data = harder to maintain speed.
• Using hierarchical structures (like a tree) is better than linear ones (like a list).
• Hierarchical structures grow slower in complexity (O(log n)) which is manageable.
• Preventing Software Limitations
• IP Address Limit Example: Internet originally used 32-bit IP addresses, but those started running out.
• A newer system (IPv6) with 128-bit addresses is being adopted, but it requires updating a lot of software.
• Avoiding Performance Bottlenecks
• Example: The old Internet name system had one big file with all domain names – worked for few computers, but not for millions.
• The Domain Name System (DNS) solved this by dividing the data across many local servers.
• Frequent Resource Access
• Some web pages or files are accessed very often.
• Techniques like:
• Caching: Store frequently used data closer to the user.
• Replication: Copy data to multiple servers.
• These help improve performance and reduce server overload.
Failure Handling
• Faults can occur in hardware or software
• Programs may produce incorrect results or stop the process before.
• Failures in DS is partial – some components fail while others continue.
• Handling the failures in DS is difficult
Different types of failure techniques for dealing in DS are:
1. Detecting Failures:
Checksums can be used to detect corrupted data in a message or a file.
Remoted crashed server in the internet.
Here the challenge is to manage in the presence of failures that cannot be
detected buy may be suspected.
2. Masking Failures: Hiding the detected failures and made less severe.
Examples
Messages can be retransmitted when they fail to arrive
File data can be written to a pair of disks so that if one is corrupted, the other
may still be correct.
3. Tolerating Failures:
Example
When a web browser cannot contact a web server, it does not make the user
wait for ever while it keeps on trying.
4. Recovery from failures:
Recovery involves the design of software so that the state of permanent data
can be recovered or ‘rolled back’ after a server has crashed.
5. Redundancy: Services can be made to tolerate failures by the use of
redundant components.
Consider the following examples:
1. There should always be at least two different routes between any two
routers in the Internet.
2. In the Domain Name System, every name table is replicated in at least
two different servers.
3. A database may be replicated in several servers to ensure that the
data remains accessible after the failure of any single server; the servers
can be designed to detect faults in their peers; when a fault is detected
in one server, clients are redirected to the remaining servers.
Concurrency
• Each resource is encapsulated as an object and that invocations are executed in
concurrent threads. In this case it is possible that several threads may be
executing concurrently within an object, in which case their operations on the
object may conflict with one another and produce inconsistent results.
• Example: if two concurrent bids at an auction are ‘Smith: $122’ and ‘Jones: $111’,
and the corresponding operations are interleaved without any control, then they
might get stored as ‘Smith: $111’ and ‘Jones: $122’.
• The moral of this story is that any object that represents a shared resource in a
distributed system must be responsible for ensuring that it operates correctly in a
concurrent environment.
• This applies not only to servers but also to objects in applications.
• For an object to be safe in a concurrent environment, its operations must be
synchronized in such a way that its data remains consistent.
• This can be achieved by standard techniques such as semaphores, which are used
Transparency
• The ANSA Reference Manual [ANSA 1989] and the International Organization for Standardization’s
Reference Model for Open Distributed Processing (RM-ODP) [ISO 1992] identify eight forms of
transparency
• Access transparency enables local and remote resources to be accessed using identical operations.
• Location transparency enables resources to be accessed without knowledge of their physical or
network location (for example, which building or IP address).
• Concurrency transparency enables several processes to operate concurrently using shared resources
without interference between them.
• Replication transparency enables multiple instances of resources to be used to increase reliability and
performance without knowledge of the replicas by users or application programmers.
• Failure transparency enables the concealment of faults, allowing users and application programs to
complete their tasks despite the failure of hardware or software components.
• Mobility transparency allows the movement of resources and clients within a system without affecting
the operation of users or programs.
• Performance transparency allows the system to be reconfigured to improve performance as loads vary.
• Scaling transparency allows the system and applications to expand in scale without change to the
system structure or the application algorithms.
Quality of Service
• Reliability – Can the system be trusted to work without unexpected failures?
• Security – Can it prevent unauthorized access or damage?
• Performance – How quickly and efficiently can it handle tasks?
Additionally,
• Adaptability – Can it adjust to changing conditions (different hardware setups, varying resource availability,
etc.)?
Performance & Timeliness
• Originally, performance meant:
• Responsiveness – How fast the system reacts to requests.
• Throughput – How much work it can do in a given time.
• Now, especially for time-critical applications (e.g., multimedia), performance also means:
• Meeting deadlines – Ensuring tasks happen within strict time limits.
• Example:
• In a video streaming service, each video frame must be displayed on time (say, 30 frames per second).
• If frames are late, the video stutters or freezes.
• Achieving QoS means having enough computing power and network bandwidth available at the right time
Thank You

You might also like