0% found this document useful (0 votes)
29 views13 pages

PDC Lecture 5 Fault Tolerance

Uploaded by

M.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views13 pages

PDC Lecture 5 Fault Tolerance

Uploaded by

M.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

1

Lahore Garrison University


Parallel and Distributed
Computing
Session Fall 2024

Lecture – 05 Week – 03
2
Preamble

 Introduction to transactions
 The Transaction Model
 A.C.I.D
 Types of Transactions
 Nested Transactions vs. Distributed Transactions
 Write-Ahead Log
 Concurrency Control
 Serializability

Lahore Garrison University


3
Lesson Plan

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Models

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University


4
Fault Tolerance Basic Concepts

 Dealing successfully with partial failure within a


Distributed System.
 Being fault tolerant is strongly related to what are called
dependable systems.
 Dependability implies the following:
1. Availability
2. Reliability
3. Safety
4. Maintainability
Lahore Garrison University
5
Dependability Basic Concepts

 Availability – the system is ready to be used immediately.

 Reliability – the system can run continuously without failure.

 Safety – if a system fails, nothing catastrophic


will happen.

 Maintainability – when a system fails, it can


be repaired easily and quickly (sometimes, without its users noticing the
failure).

Lahore Garrison University


6
Phases of Fault Tolerance

 Fault Detection

 Fault Diagnosis

 Evidence Generation

 Assessment

 Recovery

Lahore Garrison University


8
Fault and Types of Fault

 A system is said to “fail” when it cannot meet its promises.


 A failure is brought about by the existence
of “errors” in the system.
 The cause of an error is a “fault”.
 There are three main types of ‘fault’:
1. Transient Fault – appears once, then disappears.
2. Intermittent Fault – occurs, vanishes, reappears; but: follows no real pattern
(worst kind).
3. Permanent Fault – once it occurs, only the replacement/repair of a faulty component
will allow the DS to function normally.

Lahore Garrison University


9
Failure Models

Lahore Garrison University


10
Failure Masking by Redundancy

 Strategy: hide the occurrence of failure from other processes using


redundancy.
 Three main types:
1. Information Redundancy – add extra bits to allow for error detection/recovery
(e.g., Hamming codes and the like).
2. Time Redundancy – perform operation and, if needs be, perform it again. Think
about how transactions work (BEGIN/END/COMMIT/ABORT).
3. Physical Redundancy – add extra (duplicate) hardware and/or software to the
system.

Lahore Garrison University


11
Agreement in Faulty Systems

 Possible cases:

1. Synchronous (lock-step) versus asynchronous systems.

2. Communication delay is bounded (by globally and predetermined maximum


time) or not.

3. Message delivery is ordered (in real-time) or not.

4. Message transmission is done through unicasting or multicasting.

Lahore Garrison University


12
Lesson Review

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Models

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University


13
Next Lesson Preview

 Flynn’s Classification: Hardware dimensions of memory and control

 SIMD (Single Instruction Multiple Data)

 MIMD (Multiple Instruction Multiple Data)

 Comparison between SIMD and MIMD

Lahore Garrison University


14
References

 To cover this topic, different reference material has been used for
consultation.

 Textbook:

Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M.


V. Steen, Prentice Hall, 2nd Edition, 2007.

Distributed and Cloud Computing: Clusters, Grids, Clouds, and the


Future Internet, K. Hwang, J. Dongarra and GC. C. Fox, Elsevier, 1st Ed.

 Google Search Engine


Lahore Garrison University

You might also like