1
Lahore Garrison University
Parallel and Distributed
Computing
Session Fall 2024
Lecture – 05 Week – 03
2
Preamble
Introduction to transactions
The Transaction Model
A.C.I.D
Types of Transactions
Nested Transactions vs. Distributed Transactions
Write-Ahead Log
Concurrency Control
Serializability
Lahore Garrison University
3
Lesson Plan
Fault Tolerance Basic Concepts
Dependability Basic Concepts
Fault and Types of fault
Failure Models
Failure Masking by Redundancy
Agreement in Faulty Systems
Lahore Garrison University
4
Fault Tolerance Basic Concepts
Dealing successfully with partial failure within a
Distributed System.
Being fault tolerant is strongly related to what are called
dependable systems.
Dependability implies the following:
1. Availability
2. Reliability
3. Safety
4. Maintainability
Lahore Garrison University
5
Dependability Basic Concepts
Availability – the system is ready to be used immediately.
Reliability – the system can run continuously without failure.
Safety – if a system fails, nothing catastrophic
will happen.
Maintainability – when a system fails, it can
be repaired easily and quickly (sometimes, without its users noticing the
failure).
Lahore Garrison University
6
Phases of Fault Tolerance
Fault Detection
Fault Diagnosis
Evidence Generation
Assessment
Recovery
Lahore Garrison University
8
Fault and Types of Fault
A system is said to “fail” when it cannot meet its promises.
A failure is brought about by the existence
of “errors” in the system.
The cause of an error is a “fault”.
There are three main types of ‘fault’:
1. Transient Fault – appears once, then disappears.
2. Intermittent Fault – occurs, vanishes, reappears; but: follows no real pattern
(worst kind).
3. Permanent Fault – once it occurs, only the replacement/repair of a faulty component
will allow the DS to function normally.
Lahore Garrison University
9
Failure Models
Lahore Garrison University
10
Failure Masking by Redundancy
Strategy: hide the occurrence of failure from other processes using
redundancy.
Three main types:
1. Information Redundancy – add extra bits to allow for error detection/recovery
(e.g., Hamming codes and the like).
2. Time Redundancy – perform operation and, if needs be, perform it again. Think
about how transactions work (BEGIN/END/COMMIT/ABORT).
3. Physical Redundancy – add extra (duplicate) hardware and/or software to the
system.
Lahore Garrison University
11
Agreement in Faulty Systems
Possible cases:
1. Synchronous (lock-step) versus asynchronous systems.
2. Communication delay is bounded (by globally and predetermined maximum
time) or not.
3. Message delivery is ordered (in real-time) or not.
4. Message transmission is done through unicasting or multicasting.
Lahore Garrison University
12
Lesson Review
Fault Tolerance Basic Concepts
Dependability Basic Concepts
Fault and Types of fault
Failure Models
Failure Masking by Redundancy
Agreement in Faulty Systems
Lahore Garrison University
13
Next Lesson Preview
Flynn’s Classification: Hardware dimensions of memory and control
SIMD (Single Instruction Multiple Data)
MIMD (Multiple Instruction Multiple Data)
Comparison between SIMD and MIMD
Lahore Garrison University
14
References
To cover this topic, different reference material has been used for
consultation.
Textbook:
Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M.
V. Steen, Prentice Hall, 2nd Edition, 2007.
Distributed and Cloud Computing: Clusters, Grids, Clouds, and the
Future Internet, K. Hwang, J. Dongarra and GC. C. Fox, Elsevier, 1st Ed.
Google Search Engine
Lahore Garrison University