0% found this document useful (0 votes)

29 views13 pages

PDC Lecture 5 Fault Tolerance

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views13 pages

PDC Lecture 5 Fault Tolerance

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 13

1

Lahore Garrison University

Parallel and Distributed
Computing
Session Fall 2024

Lecture – 05 Week – 03
2
Preamble

 Introduction to transactions
 The Transaction Model
 A.C.I.D
 Types of Transactions
 Nested Transactions vs. Distributed Transactions
 Write-Ahead Log
 Concurrency Control
 Serializability

Lahore Garrison University

3
Lesson Plan

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Models

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University

4
Fault Tolerance Basic Concepts

 Dealing successfully with partial failure within a

Distributed System.
 Being fault tolerant is strongly related to what are called
dependable systems.
 Dependability implies the following:
1. Availability
2. Reliability
3. Safety
4. Maintainability
Lahore Garrison University
5
Dependability Basic Concepts

 Availability – the system is ready to be used immediately.

 Reliability – the system can run continuously without failure.

 Safety – if a system fails, nothing catastrophic

will happen.

 Maintainability – when a system fails, it can

be repaired easily and quickly (sometimes, without its users noticing the
failure).

Lahore Garrison University

6
Phases of Fault Tolerance

 Fault Detection

 Fault Diagnosis

 Evidence Generation

 Assessment

 Recovery

Lahore Garrison University

8
Fault and Types of Fault

 A system is said to “fail” when it cannot meet its promises.

 A failure is brought about by the existence
of “errors” in the system.
 The cause of an error is a “fault”.
 There are three main types of ‘fault’:
1. Transient Fault – appears once, then disappears.
2. Intermittent Fault – occurs, vanishes, reappears; but: follows no real pattern
(worst kind).
3. Permanent Fault – once it occurs, only the replacement/repair of a faulty component
will allow the DS to function normally.

Lahore Garrison University

9
Failure Models

Lahore Garrison University

10
Failure Masking by Redundancy

 Strategy: hide the occurrence of failure from other processes using

redundancy.
 Three main types:
1. Information Redundancy – add extra bits to allow for error detection/recovery
(e.g., Hamming codes and the like).
2. Time Redundancy – perform operation and, if needs be, perform it again. Think
about how transactions work (BEGIN/END/COMMIT/ABORT).
3. Physical Redundancy – add extra (duplicate) hardware and/or software to the
system.

Lahore Garrison University

11
Agreement in Faulty Systems

 Possible cases:

1. Synchronous (lock-step) versus asynchronous systems.

2. Communication delay is bounded (by globally and predetermined maximum

time) or not.

3. Message delivery is ordered (in real-time) or not.

4. Message transmission is done through unicasting or multicasting.

Lahore Garrison University

12
Lesson Review

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Models

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University

13
Next Lesson Preview

 Flynn’s Classification: Hardware dimensions of memory and control

 SIMD (Single Instruction Multiple Data)

 MIMD (Multiple Instruction Multiple Data)

 Comparison between SIMD and MIMD

Lahore Garrison University

14
References

 To cover this topic, different reference material has been used for
consultation.

 Textbook:

Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M.

V. Steen, Prentice Hall, 2nd Edition, 2007.

Distributed and Cloud Computing: Clusters, Grids, Clouds, and the

Future Internet, K. Hwang, J. Dongarra and GC. C. Fox, Elsevier, 1st Ed.

 Google Search Engine

Lahore Garrison University

Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
13 pages
Fault Tolerance
No ratings yet
Fault Tolerance
10 pages
PDC Course Overview for Fall 2024
No ratings yet
PDC Course Overview for Fall 2024
24 pages
Overview of Distributed Computing Systems
No ratings yet
Overview of Distributed Computing Systems
18 pages
PDC Lecture 3-4 Transactions
No ratings yet
PDC Lecture 3-4 Transactions
18 pages
Fault Tolerance in Distributed Systems
100% (1)
Fault Tolerance in Distributed Systems
21 pages
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
No ratings yet
Fault Tolerance: Click To Add Text Dealing Successfully With Partial System. Key Technique: Redundancy
48 pages
Distributed Systems Essentials
No ratings yet
Distributed Systems Essentials
156 pages
PDC Lecture 3-4 Transactions
No ratings yet
PDC Lecture 3-4 Transactions
18 pages
PDC Lecture 3
No ratings yet
PDC Lecture 3
20 pages
DS Unit - 4
No ratings yet
DS Unit - 4
20 pages
DS Chapter V8.0fault Tolerance
No ratings yet
DS Chapter V8.0fault Tolerance
23 pages
DS CH7 - Fault Tolerance
No ratings yet
DS CH7 - Fault Tolerance
17 pages
Lecture 01 - Introduction
No ratings yet
Lecture 01 - Introduction
54 pages
LECTURE 02 MS Formal SMV ModelingConcurrentSystems I November 2021 PDF
No ratings yet
LECTURE 02 MS Formal SMV ModelingConcurrentSystems I November 2021 PDF
79 pages
Lecture 7
No ratings yet
Lecture 7
57 pages
M25CS3.401 Lecture 1
No ratings yet
M25CS3.401 Lecture 1
32 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
6 pages
Fault Tolerant Message Passing Systems
No ratings yet
Fault Tolerant Message Passing Systems
26 pages
Lesson 1 - Introduction To Fault-Tolerant Computing
No ratings yet
Lesson 1 - Introduction To Fault-Tolerant Computing
6 pages
CSC 308 Fault Tolerant Computing
No ratings yet
CSC 308 Fault Tolerant Computing
24 pages
Chapter 8-Fault Tolerance
No ratings yet
Chapter 8-Fault Tolerance
51 pages
CC ZG526 Course Handout
No ratings yet
CC ZG526 Course Handout
6 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Chapter 8
No ratings yet
Chapter 8
107 pages
PDC Lecture 1-2 Intro To Distributed and Parallel Systems - Communications
No ratings yet
PDC Lecture 1-2 Intro To Distributed and Parallel Systems - Communications
18 pages
Unit5 Compressed Fault Tolerance - PACE
No ratings yet
Unit5 Compressed Fault Tolerance - PACE
11 pages
Fault Tolerance in Distributed Systems
No ratings yet
Fault Tolerance in Distributed Systems
21 pages
DS Unit-3 Notes
No ratings yet
DS Unit-3 Notes
35 pages
Text Book Content Syllabus Unit-1 To 4
No ratings yet
Text Book Content Syllabus Unit-1 To 4
7 pages
PDC Lecture 14 MPI Sockets and Memory Models
No ratings yet
PDC Lecture 14 MPI Sockets and Memory Models
20 pages
Unit I
No ratings yet
Unit I
17 pages
Dependable Systems
No ratings yet
Dependable Systems
22 pages
Chapter 1 - Intro
No ratings yet
Chapter 1 - Intro
31 pages
Introduction to Distributed Systems Course
No ratings yet
Introduction to Distributed Systems Course
1 page
DS Syllabus Introduction (Reference)
No ratings yet
DS Syllabus Introduction (Reference)
44 pages
Lect8 FaultTolerance
No ratings yet
Lect8 FaultTolerance
37 pages
Osm-Rev1 3
100% (2)
Osm-Rev1 3
559 pages
GPU & Heterogeneous Computing 101
No ratings yet
GPU & Heterogeneous Computing 101
17 pages
CS3551 - Distributed Computing
No ratings yet
CS3551 - Distributed Computing
106 pages
Basic Concepts
No ratings yet
Basic Concepts
48 pages
CST402 Scheme
No ratings yet
CST402 Scheme
9 pages
Slides 08 PDF
No ratings yet
Slides 08 PDF
95 pages
Blockchain Essentials & Dapps
100% (1)
Blockchain Essentials & Dapps
125 pages
Week 1 Introduction To Distributed Computing
No ratings yet
Week 1 Introduction To Distributed Computing
75 pages
CSE446 Lecture 4
No ratings yet
CSE446 Lecture 4
30 pages
Dis Sys
No ratings yet
Dis Sys
16 pages
Lecture 7 - FAULT-TOLERANT COMPUTING
No ratings yet
Lecture 7 - FAULT-TOLERANT COMPUTING
13 pages
Lecture 5
No ratings yet
Lecture 5
49 pages
Fault
No ratings yet
Fault
101 pages
CSE446 Lecture 4
No ratings yet
CSE446 Lecture 4
32 pages
Unit I
No ratings yet
Unit I
19 pages
Distributed Systems Practitioners Dimos Raptis Raspoznan
No ratings yet
Distributed Systems Practitioners Dimos Raptis Raspoznan
259 pages
Cloud
No ratings yet
Cloud
18 pages
Distributed System
No ratings yet
Distributed System
5 pages
Distributed System TYPED NOTES
No ratings yet
Distributed System TYPED NOTES
40 pages
Distributed DBMS Reliability Unit IV
100% (1)
Distributed DBMS Reliability Unit IV
27 pages
Fault Tolerance FDCC
No ratings yet
Fault Tolerance FDCC
76 pages
Intro To DS Chapter 6
No ratings yet
Intro To DS Chapter 6
51 pages
Candy Logistics LLC
No ratings yet
Candy Logistics LLC
1 page
Final Prototype Ai Detect
No ratings yet
Final Prototype Ai Detect
32 pages
Week 07
No ratings yet
Week 07
24 pages
Vehicle Shipping Quotes & Services
No ratings yet
Vehicle Shipping Quotes & Services
5 pages
PDC Lecture 7-8 GPU Architectures
No ratings yet
PDC Lecture 7-8 GPU Architectures
25 pages
Posture Detection System
No ratings yet
Posture Detection System
31 pages
Vehicle Transport Dispatch Info
No ratings yet
Vehicle Transport Dispatch Info
1 page
Vehicle Shipping Quotes and Services
No ratings yet
Vehicle Shipping Quotes and Services
5 pages
Talent Management Essentials
No ratings yet
Talent Management Essentials
12 pages
Bone Fracture Detection
No ratings yet
Bone Fracture Detection
15 pages
Tic Tac Toe Game History and Code
No ratings yet
Tic Tac Toe Game History and Code
4 pages
Assembly Language Project Os Shape Making Program
No ratings yet
Assembly Language Project Os Shape Making Program
6 pages
Topic 1 - Introduction To Information Systems Engineering
No ratings yet
Topic 1 - Introduction To Information Systems Engineering
28 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Week+4 Practice+Application+of+Nursing+Informatics
No ratings yet
Week+4 Practice+Application+of+Nursing+Informatics
11 pages
Dependable Computing Concepts
No ratings yet
Dependable Computing Concepts
37 pages
Understanding Software Quality Attributes
No ratings yet
Understanding Software Quality Attributes
68 pages
Fault-Tolerant Design
No ratings yet
Fault-Tolerant Design
11 pages
CoSc3311 - Udated Slides - Design and Arch
No ratings yet
CoSc3311 - Udated Slides - Design and Arch
52 pages
Protection System Management
No ratings yet
Protection System Management
4 pages
Electronic Health Records Overview
No ratings yet
Electronic Health Records Overview
37 pages
Shema First
No ratings yet
Shema First
10 pages
An Overview of Existing Tools For Fault-Injection and Dependability Benchmarking in Grids
No ratings yet
An Overview of Existing Tools For Fault-Injection and Dependability Benchmarking in Grids
15 pages
Chapter 5: Availability: © Len Bass, Paul Clements, Rick Kazman, Distributed Under Creative Commons Attribution License
No ratings yet
Chapter 5: Availability: © Len Bass, Paul Clements, Rick Kazman, Distributed Under Creative Commons Attribution License
31 pages
Software Exception Handling
No ratings yet
Software Exception Handling
7 pages
Surgearresters Monitoring Eng Final PDF
No ratings yet
Surgearresters Monitoring Eng Final PDF
12 pages
Abdulazeez Et Al - Assessing Strucctural Intergrity of Existing Building Structures
No ratings yet
Abdulazeez Et Al - Assessing Strucctural Intergrity of Existing Building Structures
10 pages
Penang Mutiara Hotel Case Study Insights
No ratings yet
Penang Mutiara Hotel Case Study Insights
5 pages
1.professional Software Development
No ratings yet
1.professional Software Development
21 pages
BS Iec 60300-3-6-1997 - (2020-08-31 - 04-36-32 PM)
100% (1)
BS Iec 60300-3-6-1997 - (2020-08-31 - 04-36-32 PM)
26 pages
Software Fault Tolerance Guide
No ratings yet
Software Fault Tolerance Guide
50 pages
Lesson 2 Operations Performance
57% (7)
Lesson 2 Operations Performance
56 pages
IEEE 982.1-2005 Measures of The Software Aspects of Dependability
No ratings yet
IEEE 982.1-2005 Measures of The Software Aspects of Dependability
44 pages
(BS 5760-23 - 1997) - Reliability of Systems, Equipment and Components. Guide To Life Cycle Costing
No ratings yet
(BS 5760-23 - 1997) - Reliability of Systems, Equipment and Components. Guide To Life Cycle Costing
22 pages
Reliable and Fault Tolerant Distributed Systems
No ratings yet
Reliable and Fault Tolerant Distributed Systems
45 pages
Java Sample Report
No ratings yet
Java Sample Report
19 pages
Reliability Data Collection
100% (1)
Reliability Data Collection
83 pages
List of References
No ratings yet
List of References
10 pages
IEC Dependability Standards Overview
No ratings yet
IEC Dependability Standards Overview
23 pages
As IEC 62347-2008 Guidance On System Dependability Specifications
No ratings yet
As IEC 62347-2008 Guidance On System Dependability Specifications
8 pages
SWE-600 SW Dependable System
No ratings yet
SWE-600 SW Dependable System
48 pages
GFMAM Maintenance Framework - 2nd Edition Final
100% (4)
GFMAM Maintenance Framework - 2nd Edition Final
40 pages

PDC Lecture 5 Fault Tolerance

Uploaded by

PDC Lecture 5 Fault Tolerance

Uploaded by

1

Lahore Garrison University

Lahore Garrison University

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University

 Dealing successfully with partial failure within a

 Availability – the system is ready to be used immediately.

 Reliability – the system can run continuously without failure.

 Safety – if a system fails, nothing catastrophic

 Maintainability – when a system fails, it can

Lahore Garrison University

Lahore Garrison University

 A system is said to “fail” when it cannot meet its promises.

Lahore Garrison University

Lahore Garrison University

 Strategy: hide the occurrence of failure from other processes using

Lahore Garrison University

1. Synchronous (lock-step) versus asynchronous systems.

2. Communication delay is bounded (by globally and predetermined maximum

3. Message delivery is ordered (in real-time) or not.

4. Message transmission is done through unicasting or multicasting.

Lahore Garrison University

 Fault Tolerance Basic Concepts

 Dependability Basic Concepts

 Fault and Types of fault

 Failure Masking by Redundancy

 Agreement in Faulty Systems

Lahore Garrison University

 Flynn’s Classification: Hardware dimensions of memory and control

 SIMD (Single Instruction Multiple Data)

 MIMD (Multiple Instruction Multiple Data)

 Comparison between SIMD and MIMD

Lahore Garrison University

Distributed Systems: Principles and Paradigms, A. S. Tanenbaum and M.

Distributed and Cloud Computing: Clusters, Grids, Clouds, and the

 Google Search Engine

You might also like