Clock Synchronization in Centralized Systems
a process makes a kernel call to get the time process which tries to get the time later will always get a higher (or equal) time value no ambiguity in the order of events and their time
Clock Synchronization in Distributed Systems
Distributed Systems lack of a global time consider using the local clock of each machine make program with files being accessed in different machines consider using the clock of a certain machine the communication delay can result in the same problem
The rules for enforcing correct interaction are implemented in the form of synchronization mechanisms The following are the synchronization related issues:
Clock Synchronization Event ordering Mutual exclusion Election Algorithms Deadlock
Clock Synchronization
Physical Clocks
synchronize the physical clocks in the machines so that the time difference is limited to a very small value Physical clocks keep time of day Consistent across systems
Logical Clocks
in distributed computation, associating an event to an absolute real time is not essential, we only need to know an unambiguous order of events Logical clock keeps track of event ordering among related (causal) events
Computer clock
A computer clock usually consists of 3 parts :
Quartz crystal that oscillates at a well defined frequency Counter Register keep track of oscillations of quartz crystal which is decremented by 1 for each oscillation When the value reaches zero an interrupt is generated called CLOCK TICK Again the register is reinitialized to the vale of a constant register Constant register Stores a constant value of 60 or 100
Computer clock is synchronized with real time clock which stores two extra values in the system
A fixed starting date and time And number of ticks At every clock tick the ISR increments the value of number of ticks to keep the clock running
Mutual Exclusion: A Centralized Algorithm
a)
b)
c)
Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. Process 2 waits. When process 1 exits the critical region, it tells the coordinator, which then replies to Process 2
Distributed algorithm
Entering the critical section
send a request to all partners (must be known) await acknowledge from all these
Response to a request
ok, if not interested self or if request smaller than own request queue, otherwise
Ordering
total event order (e.g. based on Lamportsclocks)
A Distributed Algorithm
a) Two processes want to enter the same critical region at the same moment. b) Process 0 has the lowest timestamp, so it wins. c) When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
A Toke Ring Algorithm
a) b) c) d)
An unordered group of processes on a network. A logical ring constructed in software. When the ring is initialized, Process 0 is given a token Process entering into CS holds the token, on exit release the token and forward to its neighbor e) It is not permitted to enter a second critical section using the same token.
Failures
Process failure
causes break in the ring hence reconfiguration of ring has to be maintained when failure detected pass the token to the next alive process in the ring when failed process recovers back inform its neighbor previous to it in the ring
Token Loss
Mechanism for loss detection and regeneration Monitor sends frequently a message Who has the Token? Process holding the token appends its ID and sends if monitor receives the message without any ID, concludes that token is lost and hence regenerates and circulates in the ring
Major failure
Monitor may crash who has the token ? message may be lost
So have multiple monitors conduct election to decide who generates the token to avoid multiple tokens being generated
Comparison
Algorithm
Centralized Distributed Token ring
Messages per entry/exit 3 2(n1) 1 to
Delay before entry (in message times) 2 2(n1) 0 to n 1
Problems
Coordinator crash Crash of any process Lost token, process crash
A comparison of three mutual exclusion algorithms.
Election Algorithms
Many distributed algorithms need one process to act as coordinator
Doesnt matter which process does the job, just need to pick one
Election algorithms: technique to pick a unique coordinator (aka leader election) Examples: take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Types of election algorithms: Bully and Ring algorithms
Bully Algorithm
Each process has a unique numerical ID Processes know the Ids and address of every other process Communication is assumed reliable Key Idea: select process with highest ID Process initiates election if it just recovered from failure or if coordinator failed 3 message types: election, OK, I am Coordinator Several processes can initiate an election simultaneously O(n2) messages required with n processes
Bully Algorithm Details
Any process P can initiate an election P sends Election messages to all process with higher Ids and awaits OK messages If no OK messages, P becomes coordinator and sends I won messages to all process with lower Ids If it receives an OK, it drops out and waits for an I won If a process receives an Election msg, it returns an OK and starts an election If a process receives a I am Coordinator, it treats sender an coordinator
The Bully Algorithm
The bully election algorithm Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election Process 6 tells 5 to stop Process 6 wins and tells everyone
A Ring Algorithm
Processes are ordered in a logical ring
each process knows its successor, no token involved
Any process noticing that the coordinator is not responding
sends an election message to its successor
if successor is down, send to next member therefore each process has full knowledge of the ring
receiving process adds its number to the message and passes it along
A Ring Algorithm
When message gets back to election initiator
change message to coordinator circulate to all members
note that members now have complete (and ordered) list of members
coordinator is highest process number
Deadlocks in Distributed Systems
Deadlocks in distributed systems are similar to deadlocks in single processor systems, only worse. They are harder to avoid, prevent or even detect. They are hard to cure when tracked down because all relevant information is scattered over many machines. People sometimes might classify deadlock into the following types: Communication deadlocks -- competing with buffers for send/receive Resources deadlocks -exclusive access on I/O devices, files, locks, and other resources like printers, tape drives, Tables We treat everything as resources, there we only have resources deadlocks
Reusable/consumable resources
reusable resource
number of units is constant unit is either free or allocated; no sharing process requests, acquires, releases units Examples: memory, devices, files, tables
consumable resources
number of units varies at runtime process may create new units process may consume units Examples: messages, signals
Resources
Preemptable resources
can be taken away from a process with no ill effects
Example: process swapping form main memory
Nonpreemptable resources
will cause the process to fail if taken away
Example: print request by more than one proceses
The sequence of events required to use a resource by a process is: Request initiated by client
Must wait if request is denied
requesting process may be blocked may fail with error code
Allocate initiated by system Release - initiated by client If these 3 events happen simultaneously by multiple processes deadlock can happen
Deadlocks
Suppose a process holds resource A and requests resource B
at same time another process holds B and requests A both are blocked and remain so
Four Necessary Conditions for Deadlock
All Four conditions must hold for there to be a deadlock: Mutual exclusion condition
If a resource s held by a process, any other process which needs that resource must wait until it is released
Hold and wait condition
process holding resources can request additional resources without releasing the existing ones
No preemption condition
previously granted resources cannot forcibly taken away
Circular wait condition
must be a circular chain of 2 or more processes each is waiting for resource held by next member of the chain
Deadlock Handling Strategies
There are three strategies for handling deadlocks, deadlock prevention, deadlock avoidance, and deadlock detection and Recovery. Deadlock prevention is commonly achieved either by having a process acquire all the needed resources simultaneously before it begins executing or by preempting a process which holds the needed resource. This approach is highly inefficient and impractical in distributed systems.
Deadlock Handling Strategies In deadlock avoidance approach to distributed systems, a resource is granted to a process if the resulting global system state is However, due to several problems, deadlock avoidance is impractical in distributed systems. Deadlock detection requires examination of the status of process-resource interactions for presence of cyclic wait. Deadlock detection in distributed systems seems to be the best approach to handle deadlocks in distributed systems.
Deadlock avoidance
Deadlock avoidance - dynamically grants a resource to a process if the resulting state is safe. A state is safe if there is at least one execution sequence that allows processes to run to completion.
In this approach, the OS may delay granting a resource request, even when the resources are available, because doing so will put the system in an unsafe state where deadlock may occur later.
Safe and Unsafe States (1)
(a)
(b)
(c)
(d)
(e)
Demonstration that the state in (a) is safe
Safe and Unsafe States (2)
(a)
(b)
(c)
(d)
Demonstration that the state in b is not safe
3