Time and State in
Distributed Systems
2023
TIME AND STATE IN DISTRIBUTED SYSTEMS
1. Time in Distributed Systems
2. Lamport’s Logical Clocks
3. Vector Clocks
4. Causal Ordering of Messages
5. Global States and their Consistency
6. Cuts of a Distributed Computation
7. Recording of a Global State (“Snapshot”)
2
Time in Distributed Systems
Because each machine in a distributed system has its own clock,
there is no notion of global physical time.
The n crystals on the n computers will run at slightly different
rates, causing the clocks gradually to get out of synchronization
and give different values.
Problems:
Time triggered systems: systems in which activities are
scheduled to occur at predefined moments in time.
If activities are to be coordinated over a distributed system,
we need a coherent notion of time.
Example: time-triggered real-time systems
Maintaining the consistency of distributed data is often based
on the time when a certain modification has been performed.
Example: a make program
3
Time in Distributed Systems
The make-program example
When the programmer has finished changing some source files,
she starts make
make examines the times at which object and source files were
last modified, and decides which sources have to be (re)compiled
previous version of
modified version of
Although P.c is modified after P.o has been generated,
because of the clock drift the time assigned to P.c is smaller.
P.c will not be recompiled for the new version!
4
Time in Distributed Systems
Solutions:
Synchronization of physical clocks
Computer clocks are synchronized with one another
to an achievable, known, degree of accuracy
within the bounds of this accuracy, we can coordinate activities
on different computers using each computer’s local clock.
Clock synchronization is needed for distributed real-time systems.
Logical clocks
In many applications we are not interested in the physical time at which
events occur; what is important is the relative order of events.
The make-program is such an example.
In such situations we do not need synchronized physical clocks.
Relative ordering is based on a virtual notion of time - logical time.
Logical time is implemented using logical clocks.
5
Lamport’s Logical Clocks
The order of events occurring at different processes is critical for many
distributed applications.
Example: P.o_created and P.c_created in make-program example.
Ordering can be based on two simple situations:
1. If two events occurred in the same process, then they occurred
in the order observed following the respective process;
2. Whenever a message is sent between processes, the event of
sending the message occurred before the event of receiving it.
Ordering by Lamport is based on happened-before relation (denoted ““):
a b, if a and b are events in the same process
and a occurred before b;
a b, if a is the event of sending a message m in a process,
and b is the event of the same message m being received by
another process.
If a b and b c, then a c (the relation is transitive).
6
Lamport’s Logical Clocks
If a b, we say that event a causally affects event b.
The two events are causally related.
There are events which are not related by the happened-before relation.
If both a e and e a are false,
then a and e are concurrent events:
we write a || e.
Example:
a b, c d, e f,
b c, d f
a c, a d, a f,
b d, b f, ...
a || e, c || e, ...
7
Lamport’s Logical Clocks
Using physical clocks, the happened-before relation cannot be captured.
It is possible that b c and at the same time Tb > Tc
(where Tb is the physical time of event b).
Logical clocks can be used to capture the happened-before relation.
A logical clock is a monotonically increasing software counter.
There is a logical clock CPi at each process Pi in the system.
The value of the logical clock is used to assign timestamps to events.
CPi(a) is the timestamp of event a in process Pi.
There is no relationship between a logical clock and any physical clock.
To capture the happened-before relation,
logical clocks have to be implemented so that:
if a b, then C(a) < C(b)
8
Lamport’s Logical Clocks
Implementation of logical clocks is performed using the following rules for
updating the clocks and transmitting their values in messages:
[R1]: Each event issued at process Pi is timestamped with the value obtained
after incrementing the local clock CPi: CPi := CPi + 1.
[R2]: a) If a is the event of sending a message m from process Pi,
then the timestamp tm = CPi(a) is included in m
(CPi(a) is the logical clock value obtained after applying rule R1).
b) On receiving message m by process Pj,
its logical clock CPj is updated as follows:
CPj := max ( CPj, tm ).
c) The new value of CPj is used to timestamp the event
of receiving message m by Pj (applying rule
R1).
If a and b are events in the same process and a occurred before b,
then a b, and (by R1) C(a) < C(b).
If a is the event of sending a message m in a process,
and b is the event of the same message m being received by another process,
then a b, and (by R2) C(a) < C(b).
If a b and b c, then a c, and (by induction) C(a) < C(c).
9
Lamport’s Logical Clocks
Example
(0) 1 2
(0) 3 4
(0) 1 5
10
Problems with Lamport’s Logical Clocks (1)
Lamport’s logical clocks impose only a partial order on the set of
events
pairs of distinct events generated by different processes can
have identical timestamp.
For certain applications a total ordering is needed;
they consider that no two events can occur at the same time.
In order to enforce total ordering,
a global logical timestamp is introduced:
The global logical timestamp of an event a occurring at process
Pi, with logical timestamp CPi(a), is a pair ( CPi(a), i ),
where i is an identifier of process Pi
We define
(CPi(a), i) < (CPj(b), j) if and only if CPi(a) < CPj(b),
or CPi(a) = CPj(b) and i < j.
= lexical order on pairs 11
Lamport’s Logical Clocks
Example
(0) 1 2
(0) 3 4
(0) 1 5
12
Lamport’s Logical Clocks
with Global Logical Timestamps
Example
(0) (1,1) (2,1)
(0) (3,2) (4,2)
(0) (1,3) (5,3)
13
Problems with Lamport’s Logical Clocks (2)
Lamport’s logical clocks are not powerful enough to perform a causal
ordering of events.
We have seen earlier:
if a b, then C(a) < C(b).
However, the reverse is not always true:
if C(a) < C(b), then a b is not necessarily true.
(it is only guaranteed that b a is not true).
C(e) < C(b),
however there is no causal
relation from event e to event b.
By just looking at the timestamps
of the events, we cannot say
whether two events are causally
related or not.
If C(x) < C(y),
14
it might be that x y or x || y
Problems with Lamport’s Logical Clocks
We want messages received by P3 to be processed in their causal order.
Can we use the associated timestamp for this purpose?
Process P3 receives messages MA, MB, MC, and MD.
send(MA) send(MC), send(MA) send(MB),
send(MB) || send(MC), send(MA) || send(MD),
send(MB) || send(MD), send(MC) || send(MD).
15
Problems with Lamport’s Logical Clocks
send(MA) send(MC), send(MA) send(MB)
process MA before MC and MB
But, P3 needs not wait for MB and MD in order to process them before MC;
similarly, the delivery of MB is not needed to be delayed after that of MD.
By processing the messages in order of their timestamp, all happened-before
relations will be correctly enforced, but additional, unneeded, delays will be
introduced (due to enforcement of ordering
16
where, in fact, not needed).
Vector Clocks
Vector clocks give the ability to decide whether two events are
causally related or not by simply looking at their timestamp.
Each process Pi has a clock Cv
Pi
CvPi is an integer vector of length n
n is the number of processes
The value of CvPi is used to assign timestamps to events in
process Pi.
Cv (a) is the timestamp of event a in process P .
Pi i
v v
C Pi[i], the ith entry of C , corresponds to an event counter in P
Pi i
simply, counts the events in Pi
Cv [j], for j i, is P ’s "best guess" of the local event counter at P
Pi i j
CvPi[j] indicates the value of the local event counter of Pj
at the occurrence of the last event at Pj which is in a
happened-before relation to the current event at Pi.
17
Vector Clocks Example (n=3)
18
Vector Clocks
19
Vector Clocks
Implementation of vector clocks is performed using the following rules
for updating the clocks and transmitting their values in messages:
[R1]: Each event issued at process Pi is timestamped with the
value of the vector clock CvPi obtained after incrementing
the corresponding element CvPi[i]: CvPi[i] := CvPi[i] + 1.
[R2]: (a) If a is the event of sending a message m from process Pi,
then the timestamp tm = CvPi(a) is included in m
(CvPi(a) is the vector clock value obtained after applying rule R1).
(b) On receiving message m by process Pj,
its vector clock Cv Pj is updated as follows:
v v
k in { 1, 2,.., n }, C [k] := max ( C [k], t [k] )
Pj Pj m
(c) The new value of CvPj is used to timestamp the event
of receiving message m by Pj (applying rule R1).
20
Vector Clocks
For any two vector timestamps u and v, we have:
u = v if and only if i, u[i] = v[i]
u < v if and only if i, u[i] < v[i]
u < v if and only if u < v u v)
u || v if and only if ¬(u < v) ¬(v < u)
Two events a and b are causally related if and only if Cv(a) < Cv(b) Cv(b) < Cv(a).
Otherwise, the events are concurrent.
21
Vector Clocks
Vector clocks have the property which we missed for Lamport’s
clocks:
a b if and only if Cv(a) < Cv(b).
Thus, by just looking at the timestamps of the events,
we can tell whether two events are causally related or not.
Vector clocks can be used for
causal ordering of events/messages.
22
Global States
The problem is how to collect and record
a consistent global state in a distributed system.
“State” is application-specific
Example use cases:
Monitoring of a distributed shared data structure
Bank account example
Distributed garbage collection
Progress monitoring for dynamic load balancing
Distributed deadlock detection
Why a problem?
Because there is no global clock (no coherent notion of
time) and no shared memory!
23
Global States
In general, a global state consists of
a set of local states and
a set of states of the communication channels.
The state of a communication channel in a consistent global
state should be the sequence of messages sent along the
channel before the sender’s state was recorded,
excluding the sequence of messages received along the
channel before the receiver’s state was recorded.
It is difficult to record channel states to ensure the above rule
global states are very often recorded without using
channel states.
24
Formal Definition (1)
LSi is the local state of process Pi.
Beside other information, the local state also includes a
record of all messages sent and received by the process.
We consider the global state GS of a system
as the collection of the local states of its processes:
GS = ( LS1, LS2, ..., LSn ).
A certain global state can be consistent or not!
25
Formal Definition (2)
send(mk ) denotes the event of sending message mk from P to P
ij ij i j
rec(mk ) denotes the event of receiving message mk by P .
ij ij j
send(mk ) LS if and only if the sending event occurred before
ij i
the local state was recorded;
rec(mk ) LS if and only if the receiving event occurred before
ij j
the local state was recorded.
transit(LSi, LSj) = { mk | send(mk ) LS rec(mk ) LS }
ij ij i ij j
inconsistent(LSi, LSj) = { mk | send(mk ) LS rec(mk ) LS }
ij ij i ij j
26
Example
( LS11, LS22, LS32 ) is ...
inconsistent
( LS12, LS23, LS33 ) is ...
consistent
( LS11, LS21, LS31 ) is ...
strongly consistent
27
Cuts of a Distributed Computation
A cut is a graphical representation of a global state.
A consistent cut is a graphical representation of a consistent
global state.
A cut of a distributed computation is a set
Ct = { c1, c2, ..., cn}, where ci is the cut event at process Pi.
A cut event is the event of recording a local state of the process.
Example:
{ c1, c2, c3 } is a cut
28
Global State Recording
(Chandy-Lamport Algorithm)
The algorithm records:
a collection of local states,
which give a consistent global state of the system, and
the state of the channels,
which is consistent with the collected global state.
Such a recorded "view" of the system is called a snapshot.
We assume here that
processes are connected through one-directional channels and
message delivery is FIFO.
the graph of processes and channels is strongly connected
(there exists a path between any two processes).
The algorithm is based on the use of a special message, the
snapshot token, in order to control the state collection process.
39
Global State Recording
How to collect a global state?
Pi
A process P i records its local state LS i
and later sends a message m to Pj Pj
LSj at Pj has to be recorded before Pj has received m.
The channel state SChij of the channel Chij consists of all
messages that process Pi sent before recording LSi and which have
not been received by Pj when recording LSj.
A snapshot is started at the request of a particular process Pi,
for example, when Pi suspects a deadlock because of long delay in
accessing a resource.
Pi then records its state LSi and, before sending any other
message, it sends a token to every Pj that Pi communicates with.
When Pj receives a token from Pi, and this is the first time it
received a token, it must record its state before it receives the next
message from Pi.
After recording its state, Pj sends a token to every process it
communicates with, before sending them any other message.
30
Global State Recording
31
Global State Recording
32
Global State Recording
33
Global State Recording
34
Global State Recording
Rule for sender Pi:
/* performed by the initiating process
and by any other process at the reception of the first token */
[SR1]: Pi records its state.
[SR2]: Pi sends a token on each of its outgoing channels.
Rule for receiver Pj:
/* executed whenever Pj receives a token from another process Pi
on channel Chij */
[RR1]: if Pj has not yet recorded its state then
Record the state of the channel: SChij := Ø.
Follow the "Rule for sender".
else
Record the state of the channel: SChij := M,
where M is the set of messages that Pj received from Pi
after Pj recorded its state and
before Pj received the token on Chij.
end if. 35