0% found this document useful (0 votes)
44 views40 pages

Distributed Systems Unit 2

The document discusses namespaces in distributed systems, highlighting their features such as uniqueness, transparency, and hierarchy. It explains naming conventions, including names, identifiers, and addresses, and introduces election algorithms like the Bully and Ring algorithms for selecting coordinators. Additionally, it covers clock synchronization methods, both physical and logical, to ensure time consistency across distributed systems.

Uploaded by

ashrithaadepu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views40 pages

Distributed Systems Unit 2

The document discusses namespaces in distributed systems, highlighting their features such as uniqueness, transparency, and hierarchy. It explains naming conventions, including names, identifiers, and addresses, and introduces election algorithms like the Bully and Ring algorithms for selecting coordinators. Additionally, it covers clock synchronization methods, both physical and logical, to ensure time consistency across distributed systems.

Uploaded by

ashrithaadepu3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Name Space

In distributed systems, a namespace is a logical grouping or hierarchy used to


organize and uniquely identify resources, such as files, objects, services, or
processes, across multiple nodes in a distributed environment.
Key Features of a Namespace in Distributed Systems:
1. Uniqueness: Each resource is given a unique identifier within the
namespace, preventing conflicts.
2. Transparency: Users and applications can reference resources without
needing to know their exact physical locations.
3. Hierarchy: Many distributed namespaces follow a hierarchical structure
(e.g., file paths in a distributed file system)
Examples of Namespaces in Distributed Systems:
Distributed File Systems (e.g., HDFS, NFS, Google File System), DNS
(Domain Name System), Cloud Storage (e.g., Amazon S3, Google Cloud
Storage).

2)Define Name ,identifiers and addresses.


Entities in distributed systems are referred to using names, identifiers, and
addresses. Names are human-friendly, while identifiers are unique and
immutable. Addresses define an entity’s location and may change dynamically.
Access points are special names used to locate entities but can be reassigned.
Location-independent names are a type of name that do not depend on addresses.
Such names are very easy to use by referring to entities. Apart from addresses,
there are special names like identifiers, which are generally used to identify an
entity. An identifier possesses the following characteristics:
1. At most, one entity is referred to by an individual identifier.
2. At most, one identifier is used for referring to an individual entity.
3. The same identifier name cannot be reassigned to another entity.
In computers, addresses and identifiers are expressed as strings of bits, which are
easily understandable by a machine. In addition to these names, there is another
name called human-friendly names, which are generally used by humans.
Conventional naming services manage static and dynamic entities. The basic idea
for naming an entity is to access them by performing search operations. There are
three different types of names that can be assigned to an entity:
1. Human-friendly names
2. Identifiers
3. Addresses
The Domain Name System (DNS) is structured into global, administrative, and
local layers. High-performance caching and server cloning techniques improve
efficiency.
Naming in Distributed Systems
In Wide Area Networks (WANs), clients locate hosts via name servers (e.g.,
ftp.cs.vunl). If an entity moves within the same domain, only the local DNS
updates. If it moves to another domain, the host name should remain unchanged
to avoid broken symbolic links and network inefficiencies.
Problems with entity movement:
• If a remote host is updated, its address must also be updated in the DNS
database.
• Changing an entity’s name affects dependent systems, leading to search
inefficiencies.
To optimize search efficiency, the process is divided into two steps:
1. Identify the new machine’s name.
2. Search for the associated address in the DNS.
Challenges with mobile entities:
• Mobile entities require dynamic naming mechanisms to track location
changes.
• Traditional DNS is not suitable for frequently moving entities.
• Global name uniqueness must be maintained for entity consistency.
A naming service returns a stored identifier, which remains unchanged and can
be fetched locally. A location service helps track dynamic entities by mapping
identifiers to their current addresses.
Replica Management
A replica is a copy of data stored in multiple locations to improve availability,
reliability, and performance in a distributed system.
For example:
• A backup copy of a database stored on another server.
Key issues:
o Decide where, when, and by whom replicas should be placed.
o Which mechanisms to use for keeping the replicas consistent.
Placement Problem:
Replica management involves two key placement decisions:
1. Placing Replica Servers:
• This is about finding the best locations to place servers that store copies
of data.
2. Placing Content:
• This is about deciding which servers should store specific content for
better access.
Replica-Server Placement
The goal is to find the best K locations from N possible locations for placing
replica servers.
Methods:
• Minimizing Distance to Clients:
o Choose the first location that minimizes the average distance to
clients.
o Then, select the next best location.
• Using Autonomous Systems (AS):
o Select the K-th largest AS (a network managed by one
organization).
o Place the server at the best-connected host.
Content Replication and Placement
A replica is a copy of data stored on a server. There are different types of
replicas:
1. Permanent Replicas:
o Always present on certain servers.
2. Server-Initiated Replicas:
o Created dynamically by a server when needed.
3. Client-Initiated Replicas (Client Caches):
o Created by a client to store recently accessed data
Logical Organization of Replicas
Different replicas are arranged into three concentric rings, with permanent
replicas in the center and temporary replicas on the outer layers.

Permanent Replicas
Examples:
• The original set of replicas that form a distributed data store.
• A website's files replicated across multiple servers at one location.
• Databases spread across different locations, common in federated
databases.
How Requests Are Handled:
• Requests are forwarded to one of the available servers using strategies
like round-robin.

Server-Initiated Replicas
• Created to improve performance by reducing load on main servers.
Example:
• Web hosting services use replica servers to handle high traffic

How It Works:
1. Tracking Access Requests:
o Servers track how often and from where files are accessed.
2. Deciding Actions Based on Requests:
o If a file is rarely requested, its replica is deleted.
o If a file is frequently requested, it is replicated.
o If a file is moderately requested, it may be moved to a better
location.
Client-Initiated Replicas (Client Caches)
• A cache is a temporary local copy of data stored by a client for faster
access.
How It Works:
• Instead of fetching the same data repeatedly, a client stores it locally to
reduce network delays.
Types of Cache Placement:
1. Traditional File Systems:
o No shared caches since files are usually not shared.
2. LAN Caches:
o Shared cache servers for users on the same local network.
3. WAN Caches:
o Cache servers are placed in a wide-area network (like the internet).
o Clients find the nearest cache server and request copies of
frequently used data.

3. What are election algorithms? Explain Bully algorithm.


• Election algorithms are designed to choose a special process(
coordinator,initiator…).
• Election algorithms choose a process from a group of processors to act as
a coordinator.
• If the coordinator process crashes due to some reasons, then a new
coordinator is elected on other processor.
• Every active process in the system has a unique priority number.
• The process with highest priority will be chosen as a new coordinator.
Two types of election algorithms:
➢ The Bully Algorithm
➢ A Ring Algorithm

Election by Bullying
The bully election algorithm is a distributed algorithm used for electing a
coordinator (leader) in a system of processes. Each process in the system has
an associated priority (or weight), and the process with the highest priority
should always be elected as the coordinator.
Principle
1. Each process has a unique priority (weight). The process with the
highest priority must be the coordinator.
2. Any process can start an election if it detects that the coordinator has
failed.
3. The goal is to find the heaviest (highest-priority) process that is still
running and make it the new coordinator.
How the Bully Algorithm Works
The algorithm works in the following way:
1. Election Initiation:
o If a process P notices that the coordinator has crashed, it starts an
election.
o P sends an ELECTION message to all processes with a higher
priority.
2. Response Handling:
o If a process Pheavy (with a higher priority than P) receives the
election message:
▪ It responds with a TAKE-OVER message, telling P to stop
its election.
▪ Now, P is out of the race.
▪ Pheavy now starts its own election by sending ELECTION
messages to processes with higher priority than itself.
3. Determining the Winner:
o The process that does not receive any TAKE-OVER message is
the highest-priority process still alive.
o It declares itself the winner by sending a VICTORY message to
all other processes.
Example of the Bully Algorithm
Scenario: Process 7 (Coordinator) Crashes
• Consider a system of eight processes, numbered 0 to 7.
• Process 7 was previously the coordinator but crashed.
• Now, an election must be held.
Step-by-Step Execution
Step 1: Process 4 Notices the Coordinator Has Crashed
• Process 4 detects that process 7 is not responding.
• Process 4 starts an election by sending an ELECTION message to all
higher-numbered processes (5, 6, and 7).
Step 2: Responses from Higher Processes
• Processes 5 and 6 are still running and respond to 4, telling it to stop
because they have a higher priority.
Step 3: Processes 5 and 6 Hold Their Own Elections
• Process 5 now starts its own election, sending an ELECTION message
to processes 6 and 7.
• Process 6 responds to 5 and tells it to stop.
Step 4: Process 6 Wins
• Process 6 now starts an election.
• Since no higher-priority process is alive, process 6 wins.
• Process 6 sends a VICTORY message to all other processes, announcing
itself as the new coordinator.

Key Properties of the Bully Algorithm


1. The process with the highest priority always wins.
2. If a process recovers from failure, it starts a new election.
3. Multiple processes can start elections simultaneously, but only one
winner emerges.
Ring-Based Election Algorithm
Another election algorithm is the Ring-Based Election Algorithm, which works
by organizing processes in a logical ring.
Principle
1. Each process has a unique priority (weight).
2. Processes form a logical ring, meaning each process has a successor.
3. The highest-priority process should be elected as the new coordinator.
How the Ring-Based Algorithm Works
1. Election Initiation:
o Any process can start an election by sending an ELECTION
message to its successor.
o If the successor is down, the message moves to the next
successor.
2. Passing the Message:
o The election message circulates through the ring.
o Each process adds itself to the message to announce its presence.
3. Choosing the Winner:
o When the message returns to the initiator, it contains a list of all
active processes.
o The process with the highest priority is chosen as the new
coordinator.
4. Announcing the Coordinator:
o The initiator sends a COORDINATOR message around the ring,
telling everyone who the new leader is.
Example: Simultaneous Elections
Scenario: Processes 2 and 5 Notice That Process 7 Has Crashed
• Suppose process 7 was the coordinator, but it crashes.
• Now, processes 2 and 5 both detect the failure and start an election.
Step-by-Step Execution
1. Process 2 starts an election and sends an ELECTION message.
2. Process 5 also starts an election independently.
3. Both ELECTION messages circulate through the ring.
4. When each message completes a full circuit, it contains the same list of
processes.
5. Both process 2 and process 5 create a COORDINATOR message.
6. When the COORDINATOR messages make a second circuit, they
cancel each other out.
7. Eventually, only one COORDINATOR message remains, and the
highest-priority process is elected as the new leader.

• Suppose process 7 was the coordinator, but it crashes.


• Now, processes 2 and 5 both detect the failure and start an election.
Step-by-Step Execution
1. Process 2 starts an election and sends an ELECTION message.
2. Process 5 also starts an election independently.
3. Both ELECTION messages circulate through the ring.
4. When each message completes a full circuit, it contains the same list of
processes.
5. The messages eventually return to the nodes that initiated them. The node
with the highest ID in the list is elected as the new coordinator.
6. In this case, Node 7 has the highest ID in both lists, so it becomes the new
coordinator.

4.Explain about clock synchronization algorithm.(physical


and logical)
▪ Clock synchronization in distributed systems refers to the process of
ensuring that all clocks across various nodes or computers in the system
are set to the same time or at least have their times closely aligned.
▪ Types of clock synchronization:

1. Physical clock synchronization


➢ UTC(Universal Coordinated Time)
➢ GPS(Global Positioning System)
➢ Clock synchronization algorithms
▪ Network Time Protocol
▪ Berkeley Algorithm
➢ RBS(Reference broadcast synchronization)
2. Logical Clock Synchronization
➢ Lamport’s logical clocks
➢ Vector Clocks

Physical clock

• A physical clock in a computer is an electronic device that keeps time by


counting regular oscillations.
• This timekeeping is fundamental to ensure actions and operations are
coordinated and sequenced properly.
• Physical clocks rely on stable oscillation sources, such as quartz crystals
or atomic transitions, to keep time accurately.

Types of Physical Clocks: There are several types of physical clocks, including
quartz and atomic clocks.

➢ Quartz clocks use a quartz crystal oscillator to maintain time with


high precision.

Eg:Quartz clocks are used in computers, wristwatches, clocks,etc

➢ Atomic clocks, used in more critical applications, rely on the


frequency of electron transitions in atoms
1)

2)Global Positioning System (GPS):

• The Global Positioning System (GPS) is a satellite-based navigation


system .
• GPS receivers get time signals from multiple satellites.
• The receiver uses these signals to calculate the precise time.

3) Clock synchronization algorithms

i)NTP (Network Time Protocol):

• NTP (Network Time Protocol) is a protocol designed to synchronize the


clocks of computers over a network.
• It ensures that the time across distributed systems, such as servers in a
data center or devices in a network, remains consistent and accurate.

Clock Synchronization Principles

1. Every machine asks a time server for the accurate time at least once every
δ / (2 ρ) seconds (Network Time Protocol).
2. Let the time server scan all machines periodically, calculate an average,
and inform each machine how it should adjust its time relative to its
present time.

ii)Berkeley Algorithm:
• The server (actually, a time daemon) is active, polling every machine from
time to time to ask what time it is there.
• Based on the answers, it computes an average time and tells all the other
machines to advance their clocks to the new time or slow their clocks
down until some specified reduction has been achieved.
• The time daemon's time must be set manually by the operator periodically

How Berkeley works:

1.An individual node is chosen as the master node(Time daemon) from a


pool of nodes in the network.

2.The master node periodically queries all the other nodes (slave nodes)
for their local time

3.Each slave node responds with its current time. The master node then
calculates the time differences between its own clock and each of the slave
clocks.

3.The master computes a average of these time differences and sends the
necessary adjustments back to the slave nodes.

4. Each slave node adjusts its clock according to the master’s instructions,
thereby synchronizing the entire network.
4)Reference Broadcast Synchronization(RBS):

• Reference Broadcast Synchronization (RBS) is a technique used in clock


synchronization, particularly in wireless sensor networks.
• RBS is quite different from other proposals for 2 reasons:
• The protocol does not assume that there is a single node with an
accurate account of the actual time available.
• It lets only the receivers synchronize , keeping the sender out of the
loop.

Logical Clocks:

• If two machines do not interact , there is no need to synchronize their


clocks.
• The processes agree on the order in which they occur rather than the time
at which they occurred.
• Logical clocks are a method for tracking the order of events in a
distributed system, without requiring synchronized physical clocks across
all nodes.
• Logical clocks can only advance forward , not in reverse.
• Computer generally obtain logical time using interrupts to update a
software clock.
• Logical clock synchronization algorithms:
➢ Lamport’s logical clocks.
➢ Vector clocks.

Lamport’s Logical clocks:

• To synchronize the logical clocks , Lamport defined a relation called


happens-before.
• The happened-before relation on the set of events in a distributed system:
• If a and b are two events in the same process, and a comes before
b, then a →b.
• If a is the sending of a message, and b is the receipt of that
message, then a→b.
• If a→b and b→c, then a→c.(Transitive Relation).
• Assign “clock” value to each event,

P1: If a and b are two events in the same process, and a→b, then
we demand that C(a) < C(b).

P2: If a corresponds to sending a message m, and b to the receipt of


that message, then also C(a) < C(b).
• If a and b are two events in different processes that do not exchange
messages,then neither a->b nor b->a are true.
• These events are concurrent.
• In Lamport,instead of synchronizing clock,event ordering can be used.


• Each process (P₁, P₂, P₃) has its own logical clock.
• The clocks are increasing at different rates.
• Events (messages) are exchanged between processes, but without
synchronization, the timestamps may not respect causality.

Step-by-Step Breakdown

1. P₁ sends message m1 to P₂
o P₁'s clock at the time of sending: 6
o P₂ receives m1 and records the event at its clock value: 16
o This is incorrect ordering because the receiving clock (16) is
greater than the sending clock (6), but no rule is yet ensuring
consistency.
2. P₂ sends message m2 to P₃
o P₂'s clock at sending: 24
o P₃ receives m2 and records the event at 40
o Again, the clock values differ but are not yet synchronized.
3. P₃ sends message m3 to P₂
o P₃'s clock at sending: 66
o P₂ records receipt at 64
o This is wrong because it implies that the message is received
before it was sent.
4. P₂ sends message m4back to P₁
o P₂'s clock at sending: 72
o P₁ receives m4and records it at 54, which again implies incorrect
ordering.

Image 2: After Applying Lamport’s Logical Clock Algorithm

Lamport’s algorithm corrects the clocks by enforcing a simple rule:

When a message is received, update the local clock to be at least 1 greater


than the sender’s timestamp.

Step-by-Step Fixing Process

1. P₁ sends m1 to P₂
o P₁’s clock: 6
o P₂ receives it, but instead of keeping 16, it updates to 7 (at least 1
greater than 6).
2. P₂ sends m2 to P₃
o P₂’s clock before sending: 32
o P₃ receives it and updates its clock to 33 (at least 1 greater than 32).
3. P₃ sends m3 to P₂
o P₃’s clock: 66
o P₂ receives it and updates its clock to 67 (at least 1 greater than 66).
4. P₂ sends m4 to P₁
o P₂’s clock before sending: 72
o P₁ receives it and updates its clock to 73 (at least 1 greater than 72).
Vector Clocks Explained in Simple Words

Vector Clocks are used in distributed systems to keep track of the causal
order of events (i.e., which events happened before others).

How Vector Clocks Work

1. Each process maintains a vector of timestamps.


2. When a process performs an event, it increments its own clock in the
vector.
3. When a process sends a message, it attaches its vector clock to the
message.
4. When a process receives a message, it:
o Compares its own vector with the sender’s vector.
o Updates its own vector using the maximum values for each
position.
o Increments its own clock before executing the next event.
5.Mutual Exclusion with algorithms.
• Processes often need to access simultaneously to the same resource.
• We need to grant mutual exclusive access to resources.
Problem: A number of processes in a distributed system want exclusive access
to some resource.
Basic solutions:
➢ Centralized algorithm
➢ Decentralized algorithm(peer-to-peer)
➢ Distributed algorithm(no topology imposed)
➢ Token Ring algorithm
Centralized Algorithm

Decentralized algorithm

. There is no single coordinator.


. Multiple processes coordinate to manage access.
. Usually involves voting or quorum-based decision-making.

Distributed algorithm
· When a process wants to access a shared resource, it builds a message
containing the name of the resource, its process number, and the current
(logical) time.
· It then sends the message to all other processes, conceptually including itself.
· When a process receives a request message from another process, the action it
takes depends on its own state with respect to the resource named in the
message.
· Three different cases have to be clearly distinguished:
1. If the receiver is not accessing the resource and does not want to access it,
it sends back an OK message to the sender.
2. If the receiver already has access to the resource, it simply does not reply.
Instead, it queues the request.
3. If the receiver wants to access the resource as well but has not yet done
so, it compares the timestamp of the incoming message with the one
contained in the message that it has sent everyone.
· The lowest one wins.
a. If the incoming message has a lower timestamp, the receiver sends
back an OK message.
b. If its own message has a lower timestamp, the receiver queues the
incoming request and sends nothing.
· After sending out requests asking permission, a process waits until everyone
else has given permission – then proceeds.
· When finished, it sends OK messages to all processes on its queue and deletes
them all from the queue.
Token Ring algorithm
Organize processes in a logical ring, and let a token be passed between them.
· The one that holds the token is allowed to enter the critical region (if it wants
to)
(a) An unordered group of processes on a network.
(b) A logical ring constructed in software.
· When the ring is initialized, process 0 is given a token.
· The token circulates around the ring.
· It is passed from process k to process k+1 (modulo the ring size) in point-to-
point messages.
· When a process acquires the token from its neighbor, it checks to see if it
needs to access the shared resource.
o If so, the process goes ahead, does all the work it needs to, and releases
the resources. o After it has finished, it passes the token along the ring.
o It is not permitted to immediately enter the resource again using the same
token.
· If a process is handed the token by its neighbor and is not interested in the
resource, it just passes the token along. o As a consequence, when no processes
need the resource, the token just circulates at high speed around the ring.

6.Flat naming
A naming system maintains a name‐to‐ address binding to resolve names to
addresses – In a distributed system, the implementation of a naming system is
often distributed across multiple machines.
1. Broadcasting and Multicasting
(a) Broadcasting
• Sends data packets to every system in a network.
• Commonly used in local area networks (LANs) and wireless networks.
• Uses the Address Resolution Protocol (ARP) to determine the entity's
physical address.
• Drawback: Inefficient in large networks due to bandwidth usage.
(b) Multicasting
• Sends packets only to specific groups of systems.
• Used for locating peer-to-peer systems or groups of entities.
• Advantage: Reduces network congestion compared to broadcasting.
2. Forwarding Pointers
• When an entity changes location, a chain of forwarding pointers is
created, linking the old location to the new one.

When the object moves from A to B, it does not delete its presence entirely
from A.
Instead, it leaves behind a client stub in A.
A server stub is installed in B, referring to the actual object.
• A client stub acts as a local placeholder for the object.
• A server stub is the remote reference that points to the new location.
Since the client stub remains in A, from the client’s perspective, nothing
changes—it still interacts with the same object as before, making migration
completely transparent to the client.
When a client makes a request to the object (which it believes is still in A),
the client stub in A forwards the request to the new location (B).
The server stub in B receives the request and executes it on the actual object.
The result is then sent back to the client via the same forwarding chain.
The client does not need to know the object's real location; it only interacts
with the stub in A, which automatically redirects the request.
3.Home-based approach
Used for locating mobile entities in wide-area networks.
Maintains a static home location for each entity, which acts as a reference
point.
Communication follows these steps:
1. A packet is sent to the home location.
2. The home location determines the current location and forwards the
packet.

4.Hierarchical Approach
The hierarchical approach is a structured method for locating entities in a
distributed system. Unlike flat naming, where entities have random
identifiers without location information, a hierarchical naming system
organizes entities in a structured tree-like format.
It organizes entities in a tree-like structure, similar to DNS (Domain
Name System) or file systems.
Each entity has a structured name that contains location information.
The search process follows a top-down approach, reducing lookup time
and making it scalable.

Consistency protocols
Consistency protocols are used in distributed systems to ensure that all nodes
or replicas have a consistent view of data. They help maintain data integrity
when multiple processes access and modify shared data.
1. Primary-based Protocols
• In these protocols, a single primary server is responsible for handling all
updates to a particular data item.
• Secondary servers (or backup copies) rely on the primary to receive
updates.
• This approach ensures strong consistency but may introduce bottlenecks.
Types of Primary-based Protocols
(a) Remote-Write Protocols
• All write operations are forwarded to a single fixed server (remote
server).
• The data item is permanently placed on a specific server, and only that
server can modify it.
• Other clients or servers must send requests to this designated server for
updates.
Example Process:
1. A client (C₁) wants to update data.
2. C₁ sends a write request to a remote server (S).
3. S performs the update and sends an acknowledgment to C₁.
4. If another client (C₂) wants to read the data, it must request S.

(b) Local-Write Protocols


• The primary server is not fixed; instead, updates occur locally where
the data is accessed.
• The updated version is then propagated to other replicas.
Example Process:
1. Client (C₁) sends a write request to a local server (S₁).
2. S₁ updates its local copy and forwards the update to other servers.
3. If another client (C₂) wants to read data, it can request from its local
server (S₂).
2. Replicated-Write Protocols
• Unlike primary-based protocols, multiple copies of a data item exist,
and multiple servers can accept write operations.
• These protocols ensure consistency through synchronization mechanisms.
Types of Replicated-Write Protocols
(a) Active Replication Protocols
• All replicas receive and process updates simultaneously.
• Every replica executes operations independently to reach a consistent
state.
• Often used in fault-tolerant distributed systems.
Example Process:
1. A client sends a write request.
2. The request is broadcast to all replicas.
3. Each replica independently updates its copy.
4. If a read request is made, any replica can provide the latest value.

(b) Quorum-based Protocols


• A subset (quorum) of replicas must approve a write operation before
it is considered complete.
• This prevents conflicts between simultaneous updates.
• Introduced by Gifford's algorithm, quorum-based protocols define read
quorum (R) and write quorum (W) to maintain consistency.
Example Rules:
• Read-write conflict prevention: Ensuring that R + W > total replicas
(N).
• Write-write conflict prevention: Ensuring that W > N/2.
Example Process:
1. A client wants to update data.
2. It must obtain approval from at least W replicas.
3. If another client wants to read, it must check at least R replicas.
4. If R + W > N, then at least one replica will always have the latest version.
3. Cache-Coherence Protocols
• Used in multiprocessor systems and distributed caches to ensure
consistency across multiple cached copies of a data item.
• Cache-coherence protocols prevent stale or inconsistent data usage.
Types of Cache-Coherence Protocols
(a) Coherence Detection Protocols
• Detects inconsistencies in cached data and resolves them.
• Two main methods:
1. At compile time: Static analysis checks for potential
inconsistencies.
2. At runtime: The system verifies data freshness before allowing
access.
(b) Coherence Enforcement Protocols
• Ensures that updates to data are reflected across all caches
immediately.
• Two primary techniques:
1. Invalidation-based: When a cache updates a value, all other
caches holding that value invalidate their copy.
2. Update-based: The new value is broadcast to all caches to keep
them synchronized.

7.Structured Naming in Distributed System


. In distributed systems (like cloud storage or networked computers), we need a
way to name files, folders, and other resources.
. Instead of using simple names, we use structured names that are organized in
a meaningful way.
. Structured names make it easy to locate and manage files across multiple
computers.
Naming in distributed systems is often represented using a directed graph (a
diagram with nodes an d arrows showing connections).

• This graph has two types of nodes:


1. Leaf Node:
▪ It is a node with no outgoing connections (edges).
▪ It is like a file in a system that does not contain other files.
2. Directory Node:
▪ It has multiple outgoing connections.
▪ It acts like a folder (directory) that contains other files or
folders.
3. Example of Naming Graph
• In the diagram, G1 is the root node (top-most directory).
• From G1, there are multiple paths leading to other nodes.
• To find a specific node, a path name is used, which consists of labels
representing edges (connections).
• A path name can be:
o Absolute: If it starts from the root node.
o Relative: If it starts from any other node.
4. Global vs. Local Names
• Global Names: Refer to resources in the entire system (like a website
URL).
• Local Names: Are understood only in a specific location (like a file name
inside a folder).
5. Naming in File Systems
• The naming structure in file systems is similar to naming graphs.
• In file systems:
o The root node is the main directory.
o Folders (directories) act as directory nodes.
o Files act as leaf nodes.
• Paths are written using "/" (e.g., /home/user/documents/file.txt).
6. Hierarchical Naming
• Naming graphs are often arranged hierarchically (like a tree structure).
.Each file/folder can be found using its absolute path.
7. Directed Acyclic Graph (DAG)
• Some systems use Directed Acyclic Graphs (DAG) instead of simple
trees.
• A DAG allows multiple parent nodes, meaning a file can be accessed
from different paths.
• Naming File System in UNIX
• In UNIX (like Linux), files and folders are organized using a naming
system similar to the naming graph.
• UNIX file systems consist of blocks, which are divided into three main
parts

8. Date-Centric and Client Centric consistency model


In distributed systems, multiple clients interact with shared data stored across
different servers. Consistency models define how data is updated and accessed
across these distributed nodes. The two main types of consistency models are:
1. Data-Centric Consistency.
2. Client-Centric Consistency
Data-Centric Consistency.
A consistency model is like a contract between a data storage system (e.g., a
distributed database) and the processes that use it. It defines rules for how
updates (writes) and reads behave across multiple systems.
If all processes follow these rules, the system behaves as expected.
Types of Consistency Models (From Strict to Weakest)
1. Strict Consistency (Linearizability) – The strongest model
2. Sequential Consistency
3. Causal Consistency
4. Eventual Consistency / FIFO Consistency – The weakest model
Strict Consistency (Most Strict Model)
Strict consistency means that every read operation always returns the most
recent write, no matter where or when it happens.
For example:
• If Process P1 writes x = 10 at time T1, and
• Process P2 tries to read x at time T2 (T2 > T1),
o Then P2 must see x = 10, because that was the most recent write.

Sequential Consistency
Sequential consistency is a relaxed consistency model used in distributed
systems and multiprocessor architectures. It ensures that:
1. All operations (reads and writes) across multiple processes appear to
execute in a single, sequential order.
2. Each process executes its operations in the order specified by its
program.
It is weaker than strict consistency but easier to implement.
The execution results should be as if all operations were performed in some
sequential order.
Different executions can have different sequential orders, but all processes
must see operations in a consistent sequence

Causal Consistency
It distinguishes between causally related events and concurrent events.
If event B is caused or influenced by an earlier event A, then all processes in
the system must see event A before event B.
P1: W(x)a
• Process P1 writes the value a to x.
P2: R(x)a → W(x)b
• P2 reads the value a (written by P1).
• P2 then writes a new value b to x.
P3: R(x)b → R(x)a
• P3 first reads the value b and then reads a, which is inconsistent with
causal consistency.
• Correct causal consistency should enforce that P3 sees 'a' before 'b'.
P4: R(x)a → R(x)b
• P4 reads the values in the correct order (a before b), satisfying causal
consistency.
• P3 violates causal consistency because it reads b before a, which is
incorrect.
• P4 maintains causal consistency because it reads a before b.
Weak Consistency
• Not all applications need to see all changes to data instantly.
• Weak Consistency allows some flexibility, making it useful for systems
where speed is more important than strict order.
o Sequential Consistency for Sync Variables: Sync operations happen
in a fixed order.
o Delayed Updates: Until all previous writes are completed, no new
sync operations can happen.
o Controlled Reads and Writes: Before reading or writing data, all
previous sync operations must be completed.
Client-Centric Consistency Models
client-centric consistency models aim to provide a reasonable level of
consistency from the perspective of an individual client, without
enforcing strict synchronization across the entire system.
1) Eventual Consistency: Guarantees that, if no new updates are made to
a particular data item, all replicas will eventually converge to the same
value. However, this model does not provide guarantees on when
convergence will happen.
>If you edit a document on your phone, the changes will show up on
your laptop after a short delay.
2) Monotonic Reads: Ensures that if a client has read a particular version
of a data item, they will never see an older version in subsequent reads.

3)Monotonic Writes: Guarantees that a client’s writes will be applied in


the order they were issued, ensuring a predictable sequence of updates.
4) Read Your Writes: Ensures that once a client has written a value, they
will always read that value (or a more recent one) on subsequent reads.

5) Writes Follow Reads: Guarantees that if a client reads a value and then
writes a new value based on that read, any subsequent reads will reflect
that dependency.

You might also like