Consensus and Agreement Algorithms : Problem Definition
+” Processes/Sites in distributed systems often compete .as well as cooperate to
achieve a common goal. Mutual Trust/agreement is very much required
In distributed data bases; there may be a situation where data managers have to
decide “Whether to commit or Abort the Transaction”. When there is no failure,
reaching an agreement is easy.
However, in case of failures, processes must exchange their values with other
processes and relay the values received from others several times to isolate the
effect of faulty processor:
Agreement protocols helps to reach an agreemient in, presence of failures.
Examples : Agreeing whether to commit or to abort a transaction in a distributed
database management system. Agreeing on a common clock value in a distributed
system. i
[Link] absence of failures or faulty processors, values (that is to be decided) can be
exchanged. A vote can be taken aiid’ decision/ agreement can be made based on :
majority, minimum vote, mean, etc.
Presence of failure : Processors can fail or misbehave intentionally.
‘of message exchanges might be needed before agreement can be reached.
Several roundsate ee ene 1
ERE system Wan alone mn
* Agreement problems have been studied under following system model
1. ‘n’ processors in the system with at most m of them being faulty.
2. Processors can exchange messages directly (no need to go through another
processor) (
3 Receiver knows the identity of the sender. i ~
4 Coniminnication miedium is reliable : messages are deliveréd without errors.
(
ERIEI Synchronous vs Asynchronous Computation
‘Synchronous Computation (
1. Processes run in lock step manner, oF in rounds. (
2 Step of synchronous computation is called round
3 In each step/round i, processes receive messages that were sent in the previous (
step/round i-1.
(
4. Processes then do some computation and send out messages, that will be
received in step/round i + 1.
5: lesoige deligg c' Haw prose owt tones te hdc sd
6. Agreement protocols to be discussed assume synchronous computation. '
Asynchronous Computation '
i Cita dove nak peocead ek ta
2. Process can send receive messages and perform computation at any time.
Model of Processor Failures
"+ Processor'can fail int three modes : -
1. Crash fault : Processor stops and never resumes operation,
2 Omission fault : Processor Omits to send message to some processors
3. Malicious fault : Also known as Byzantine faults. Processor may send fictitious
values /message to other processes to confuse them. Tough to detect/correct.
Authenticated vs Non-authenticated Messages
+ There are two types of messages
1. Authenticated
2. Non-authenticated messagesAuthenticated Mossages ;
* Alto known as signed message
* Processor can not forge/change a reccived message.
+ Processor can ve
+ Wis easier to reach on an agreement inv this case
“Non-authenticated Messages :
+ Also known as Oral Message.
ify the authenticity of the message.
* Processor ‘can forge/change a received message and daims to have received it
from others.
Processor can not verify the authenticity of the message in this cas
ERIE] Pertormance Aspects of Agreement Protocols
+ Fellowing metrics are used
1. Time : No of rounds needed to reach an agreement
2 Message traffic : Number of messages exchanged to reich an agreement.
3. Storage overhead :,Amount of information that needs to stored at processors
during execution of the protocol.
1. Compare synchronous vs asynchronous computation,
2 Explain model of processor fiers
are Seni reoron revo wen (SynchoorOS 5y Sis
enemy, they must decide upon a common plan of action. Some of the
may be traitors, trying to pr
generals
ent the loyal generals from reaching agreement.”
+ Three or more generals are agree
{to attack or to retreat. Once the commander is
iseues the order, lieutenants to the commander are to decide to attack or retreat
+ But the one or more of the
rals may be treacherous, ie. faulty
+ If the commander is treacherous, he
Proposes attacking to one general and
retreating to another.+ Wa lieutenant is treacherous, he tells one of his peers that the commander told
tem to attack and another that they are to retreat
+ Sourwe processor broadcasts its values to others. Solution must meet following
adjectives :
Agreement: All non-faulty processors agree on. thi sime vale
; Validity : If source is nonfaulty, then the comution agreed Value must bé the value
supplied by the source processor.
+ "WK source is faulty then all non - faulty’ processors can agree
value", “Value agreed upon by faulty processors is irelevant”
+ Fg 4121 shows Byzantine agreement
any common
Fig. 4.124
* Ne lution for. three processes can handle single ta. Ina system with m
‘aay processes agreement can be achieved onl if there ate vt] (more then 2/3)
functioning correctly.
ERE consensus Problem
\ Frery processor broadcaits ts initial value to all ther processors
Intl values may be different for different processor
‘very processor has its own initial value:
All non faulty processor must a
igre on a single common value.
* Winital value of non-faulty
Processors is di
(2 agree on any common value.
* Nalue agreed upon by faulty processors is irrelevant
Atseement: All non-aulty processors agree on the same single value.
72 Validity +1 initial value of eve
Agreed value by all non faulty
W inlal value of nonfaully processors is dilecar then all non
PeeSOR AN agree on any common. value
Processors is irrelevant.
———
ferent then all non - faulty processors
‘ery non-faulty processor is v , then the common
Processors must be ¥.
> faulty
*. "Value agreed upon by faulty4.10 AGREEMENT IN A FAILURE -FREE SYSTEM:
In a failure-tree system, consensus can be reached by collecting information from the
different. processes, arriving at a decision, and distributing this decision in the system. A
distributed mechanism would have each process broadcast its values to others, and each process
computes the same function on the values received. The decision can be reached by using an
application specific function
+ Algorithms to collect the initial values and then distribute the decision may be based
, on the token circulation on a logical ring, or the three-phase tree-based broadcast
converge cast: broadcast, or direct communication with all nodes.
* Inasynchronous system, this can be done simply in a constant number of rounds
i + Further, common knowledge of the decision value can be obtained using an
additional round
«In an asynchronous system, consensus can similarly be reached in a constant number
' of message hops.
A-~ +. Further, concurrent common knowledge of the consensus value can also be attained
© 4,11 AGREEMENT IN(MESSAGE PASSING)SYNCHRONOUS SYSTEMS WITH
FAILURES:
\ 411.1 Consensus algorithm for rash failures (synchronous system):
+ Consensus algorithm for crash failures message passing synchronous system
— « The consensus algorithm for n processes where up to f processes where f
0) checkpoint of process Pp is assigned a sequence number i and is denote’
by Cy : :
We also assume that each process F, takes an initial checkpoint Cp 9 immediately
before execution begins and ends with a virtual checkpoint that represents the las!
state attained before: termination. ;
The i" checkpoint interval ‘of process Pp denotes all the computation performed
between its i" and (i+1)": checkpoint; including the it" checkpoint but not the!
(i+1)"™ checkpoint.Issues in Failure Recovery :
Recovery refers to restoring a system to its normal operational ste. Once 4 ee
has occu, it resent thatthe proves where the failure happened can
recover to a correct state
Fundamental to fault tolerance is the recovery from an error.
Resources are_allocated to executing, processes in a computer. For example : a
process has memory allocated to it and a process may have locked shared
resources, such as files and memory.
Following are some solution on process recovery,
Reclaim resources allocated to process
Undo modification made to databases and
Restatt the process
(Or restart process from point of failure and resume execution
distributed process recovery, undo effect’of interactions of failed process with
other cooperating processes.
peepee
Basic Concept
7 System is combination of hardware and software components. These components
provide a specified service.
Failure of 3 sysiem Sccurs when the system does ‘not perform its service in the
manner specified.
‘An erroneous state ofthe system is a state which could lead to a system failure by
2 sequence of valid slate transitions.
A system is said to “fail” when it cannot meet it
about by the existence of “errors” in the system,
A system is said to have a failure if the service it deliv
{rom compliance with the system specification for a specif
Fig. 44.1 shows concept of fault and recovery.
's promises. A failure is brought
fers to the user deviates
ied period of time.
77 System failure : System does not meet require
stole saci ‘ments, ice does not perform itsLeads 10
Fig. 44:1 Concept af recovery
77 Exoncous system state State which could lead ta system file by a sequence
of valid sate transitions
eA Bre the pat of the sytem state which ifr rom its ited vane
27 Fault: Anomalous physical condition, eg. design erors; manufacturing, problems,
damage, external disturbances =
| 1 ges Eror b Fault cb
EE] checkpoint Based Recovery
+ The basic idea behind checkpoint-recovér isthe saving and cestoration-of-sysiom
state By saving the current state of the system periodically or before ertical code
Sections, it provides the baseline information needed for the restoration of lost
slate in the event of a system failure:
+ While the cost of checkpointxecovery cai be high, by using techniques tke
memory exclusion, and by designing a system to have as small a critical state as
Possible may minimize the cost of checkpointing enough to be useful in even cost
sensitive embedded applications.
+ When a system is checkpointed, the state of the entre system is saved ty”
nonvolatile storage, —_
+ The checkpointing mechanism takes a snapshot of the system state and stores the
4ata on some non-volatile storage inediumn.
+ Cheatly, the cos
saved and the |
of a checkpoint will vary with the amount of state required to beim the event of a system failure, the intemal state of the system can be restore
can continui® service from the point at which its state was last saved.
Typically this involves restarting the failed task or system, and providing
Parameter indicating that there js state to be recovered.
Depending on the task complexity, the amount of state, and the bandwidth to the
Storage device this process could take from a fraction. of a second to mar
seconds.
This technique provides protection against the ransent fault suc typical
‘upon state restoration the system Will continue processing in an identical manneg
as it did previously.
This will tolerate any transient fault, however if the fault was caused by a desig
error, then the system will continue to fail and recover endlessly. In some cases,
this may be the most important type of fault to guard against, but not in everff
case,
1. Uncoordinated Checkpointing i
*+ Esch process has autonomiy in deciding when to take checkpoints ‘
+ Advantages : The lower runtime overhead during normal execution
+ Disadvantages e
a Domino effect during a recovery
Bb. Recovery from a failure is slow because processes need to iterate to find a
Consistent set of checkpoints ;
¢ Each process maintains multiple checkpoints and periodically invoke a
garbage collection algorithin
4. Not suitable for application with frequent output commits
+ The. processes record the dependencies among their checkpoints caused by
message exchange during failure-free-operation
record the dependencies among their checkpoints caused by message exchange
during failure free operation.
Direct dependency tracking technique
+ Assume each process P, starts its execution with an initia ckckpoint C, o
+ I, is a checkpoint interval and itis an interval between Cy xy and C,,,.
+ When P} receives a message m during I, y it records the dependency from I
1y which is later saved onto stable storage when By takes Cy.”
‘
<
t
q
+ Inorder to determine a consistent global checkpoint during recovery, the processes {
«
4
‘
‘
r+ When failure occurs, the recovering, process initiates rollback by broad casting, &
dependency request message to collect al the dependency information maintained
by each process.
2. Coordinated Checkpointing
“*, Cooidinated checkpointing, simplifies faihure recovery and eliminates domino
elles in case of failures by Fiseiving a corsistent global checkpoint on stable
orage
: tv the “approach sulférs fom high overhead associated with the
checkpointing process
+ Two approaches are used to reduce the overhead: fis s fo minimize the nurter
of synchronization messages and the numberof checkpoints, the other is to make
the checkpooting process nonblosking
Blocking Checkpointing
+ After a process takes local checkpoint, 0 prevent orphan messages, it _semains
‘blocked until the entire checkpointing activity is complete.
+ Fig 451 shows blocking checkpointing
Fig, 45.1 Blocking checkpointing
+ When a process takes a checkpoint, it engages a protocol fo coordinate with other
processes to also checkpoint
a) Coordinator takes a checkpoint; broadcasts a message to all processes
bb) Braces rectves this message and halts execution: takes tentative checkpoint
©) Coordinator receives acknowledgement (rom all processes; broadcasts comunit
message 19 end protocal
4d) Process receives commit message, removes old permanent checkpoint and
makes tentative chekpoini permanent) Processes resume execution
+ Disadvantages : The comp
‘Non blocking Checkpointing
‘+ The processes need not stop their execution while taking checkpoints.
2 +. Key issue with coordinated checkpointing: Being able lo prevent a process from
receiving application messages that could make the checkpoint inconsistent
| Problem can be avoided by preceding, the first postcheckpoint message on each
channel by a checkpoint request, forcing each process to take a checkpoint upon
receiving the first checkpoint-request message
J+A fundamental problem in coordinated checkpointing isto prevent a process from
receiving application messages that could make the checkpoint inconsistent,
ly fig. 452 shows non-blocking checkpoint.
ion is blocked during the checkpointing
nate Iaiitoe aioe
esipoint request Checkpoint request checkpont
Po Gs Po Com Pe Gun
's fs Cy Fee
® © @
Fig. 43.2 Non-blockinig checkpoint
+ The Chandy-Lamport-algorithin is the nonblocking algorithm for coordinated
sbeckpointing, .
Example of Coordinated Checkpointing -
2) Checkpoint inconsistency + the process Py sent message “in” after receiving a
checkpoint request from the checkpoint coordinator. Assume message “m*
reaches process [before the checkpoint request, This situation results in. an
inconsistent checkpoint since checkpoint C),« shows the receipt of message “m™
fooen Py, while checkpoint Co, dogs not show m being sent from Py
) Solution with FIFO channels : If channels are FIFO, this problem can be
avoided by preceding, the first post-checkpoint message on each channel by a
checkpoint request, forcing each process 10 take a checkpoint before receiving
the first post-checkpoint message.cmmuncaninduced Checkooining
Baa ese ends Cicapeting (CK) protese e Papas Oo
ns ates Ge ee
Fein nahh reciente ri
Sas eee ess
+ It axoids doming cect, while allowing processes to take some oftheir checkpoints
=
£ Eaeeee ete ined ected feed ns lin i
teat Wola! poland SMe arate ee
= Eigen av lt ott pa yg lain oa tats cad ws
all ons ous Geet see
+ Comimunication induced — checkpointing . piggybacks protocol and related
information on each application’ message
+ The receiver of each application message uses the piggybacked information to
determine if thas to take a forced checkpoint to advance the global recovery line.
The forced checkpoint must be takeir before the application may process thé
contents of the message.. In contrast with coordinated checkpointing, no special
coordination messages are exchanged.
* Two_types of | communication induced’ checkpointing are model-based