Basic Concepts
Reliability, MTTF, Availability, etc.
CprE 545: Fault Tolerant Systems (G. Manimaran)
Definitions
Reliability of a system is defined to be the probability
that the given system will perform its required function
under specified conditions for a specified period of
time.
MTBF (Mean Time Between Failures): Average time a
system will run between failures. The MTBF is usually
expressed in hours. This metric is more useful to the
user than the reliability measure.
CprE 545: Fault Tolerant Systems (G. Manimaran)
Approaches to increase the reliability of a system
Increasing reliability of a system
1.
Worst case design
1.
Redundancy
2.
Using high quality
components
2.
Typically employed
3.
Less expensive
3.
Strict quality
control procedures
CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability expressions
Exponential Failure Law:
Reliability of a system is often modeled as:
R(t) = exp(-t)
where is the failure rate expressed as
percentage failures per 1000 hours or as failures
per hour.
When the product t is small,
R(t) = 1 - t
CprE 545: Fault Tolerant Systems (G. Manimaran)
Relation between MTBF and the Failure rate
MTBF is the average time a system will run between
failures and is given by:
MTBF = 0 R(t) dt = 0 exp(-t) dt = 1 /
In other words, the MTBF of a system is the
reciprocal of the failure rate.
If is the number of failures per hour, the MTBF
is expressed in hours.
CprE 545: Fault Tolerant Systems (G. Manimaran)
A simple example
A system has 4000 components with a failure rate of
0.02% per 1000 hours. Calculate and MTBF.
= (0.02 / 100) * (1 / 1000) * 4000 = 8 * 10-4
failures/hour
MTBF = 1 / (8 * 10-4 ) = 1250 hours
CprE 545: Fault Tolerant Systems (G. Manimaran)
Relation between Reliability and MTBF
R(t) = (1 t) = (1 t / MTBF)
Therefore,
MTBF = t / (1 R(t))
1.0
0.8
Reliability 0.6
R(t)
0.4
0.36
0.2
0
1 MTBF
2 MTBF
Time t
CprE 545: Fault Tolerant Systems (G. Manimaran)
An example
A first generation computer contains 10000
components each with = 0.5%/(1000 hours). What is
the period of 99% reliability?
MTBF = t / (1 R(t)) = t / (1 0.99)
t = MTBF * 0.01 = 0.01 / av
Where av is the average failure rate
N = No. of components = 10000
= failure rate of a component
= 0.5% / (1000 hours) = 0.005/1000 = 5 * 10-6 per hour
Therefore, av = N = 10000 * 5 * 10-6 = 5 * 10-2 per hour
Therefore, t = 0.01 / (5 * 10-2 ) = 12 minutes
CprE 545: Fault Tolerant Systems (G. Manimaran)
Reliability for different configurations
1. Series Configuration
1
Overall reliability = Ro = R * R * R. R = RN
2. Parallel Configuration
Ro = 1 (probability that all of the
components fail)
Ro = 1 (1 -
R)N
CprE 545: Fault Tolerant Systems (G. Manimaran)
R
9
Reliability for different configurations
3. Hybrid Configuration
1
1
R
R
Overall reliability = Ro = ?
CprE 545: Fault Tolerant Systems (G. Manimaran)
10
Reliability for different configurations
4. Triple Modular Redundancy (TMR)
1
2
Voting
Overall reliability = Ro = [3C2 * R2 * (1-R)] + [R3]
CprE 545: Fault Tolerant Systems (G. Manimaran)
11
Reliability calculation a more complicated example
R = Rc Rs2 + (1-Rc) Rs1
System
B
A
Assuming C is faulty
Assuming C is fault
free
S1
E
D
S2
B
A
E
D
Needs
further
reduction
Rs1 can be
calculated
using
parallel
series
formulae
S2
Rs2 = RE Rs3 + (1-RE) Rs4
B
A
Assuming E is faulty
S4
Assuming E is fault
free
S3
B
A
S3
F
B
D
Maintainability
Maintainability of a system is the probability of
isolating and repairing a fault in the system within a
given time.
Maintainability is given by:
M(t) = 1 exp(-t)
Where is the repair rate
And t is the permissible time constraint for the
maintenance action
= 1/(Mean Time To Repair) = 1/MTTR
M(t) = 1 exp(-t/MTTR)
CprE 545: Fault Tolerant Systems (G. Manimaran)
14
Availability
Availability of a system is the probability that the system will be
functioning according to expectations at any time during its
scheduled working period.
Availability = System up-time / (System up-time + System down-time)
System down-time = No. of failures * MTTR
System down-time = System up-time * * MTTR
Therefore,
Availability = System up-time / (System up-time + (System up-time *
* MTTR)
= 1 / (1 + ( *MTTR)
Availability = MTBF / (MTBF + MTTR)
CprE 545: Fault Tolerant Systems (G. Manimaran)
15