IS05 IS205 REE Unit 3 Reliability Prediction Models
IS05 IS205 REE Unit 3 Reliability Prediction Models
In a series configuration, a failure of any component results in the failure of the entire system.
In most cases, when considering complete systems at their basic subsystem level, it is found that these
are arranged reliability-wise in a series configuration. For example, a personal computer may consist
of four basic subsystems: the motherboard, the hard drive, the power supply and the processor. These
are reliability-wise in series and a failure of any of these subsystems will cause a system failure. In
other words, all of the units in a series system must succeed for the system to succeed.
The reliability of the system is the probability that unit 1 succeeds and unit 2 succeeds and all of the
other units in the system succeed. So all n units must succeed for the system to succeed. The reliability
of the system is then given by:
RS==P(X1∩X2∩...∩Xn)P(X1)P(X2|X1)P(X3|X1X2)⋅⋅⋅P(Xn|X1X2...Xn−1)
where:
This conclusion can also be illustrated graphically, as shown in the following figure. Note the slight
difference in the slopes of the three lines. The difference in these slopes represents the difference in
the effect of each of the components on the overall system reliability. In other words, the system
reliability's rate of change with respect to each component's change in reliability is different. This
observation will be explored further when the importance measures of components are considered in
later chapters. The rate of change of the system's reliability with respect to each of the components is
also plotted. It can be seen that Component 1 has the steepest slope, which indicates that an increase
in the reliability of Component 1 will result in a higher increase in the reliability of the system. In
other words, Component 1 has a higher reliability importance.
Effect of Number of Components in a Series System
The number of components is another concern in systems with components connected reliability-wise
in series. As the number of components connected in series increases, the system's reliability
decreases. The following figure illustrates the effect of the number of components arranged reliability-
wise in series on the system's reliability for different component reliability values. This figure also
demonstrates the dramatic effect that the number of components has on the system's reliability,
particularly when the component reliability is low. In other words, in order to achieve a high system
reliability, the component reliability must be high also, especially for systems with many components
arranged reliability-wise in series.
In a simple parallel system, as shown in the figure on the right, at least one of the units must succeed
for the system to succeed. Units in parallel are also referred to as redundant units. Redundancy is a
very important aspect of system design and reliability in that adding redundancy is one of several
methods of improving system reliability. It is widely used in the aerospace industry and generally
used in mission critical systems. Other example applications include the RAID computer hard drive
systems, brake systems and support cables in bridges.
The probability of failure, or unreliability, for a system with n statistically independent parallel
components is the probability that unit 1 fails and unit 2 fails and all of the other units in the system
fail. So in a parallel system, all n units must fail for the system to fail. Put another way, if unit 1
succeeds or unit 2 succeeds or any of the n units succeeds, then the system succeeds. The unreliability
of the system is then given by:
Qs==P(X1∩X2∩...∩Xn)P(X1)P(X2|X1)P(X3|X1X2)...P(Xn|X1X2...Xn−1)
where:
This conclusion can also be illustrated graphically, as shown in the following plot.
Effect of Number of Components in a Parallel System
In the case of the parallel configuration, the number of components has the opposite effect of the one
observed for the series configuration. For a parallel configuration, as the number of
components/subsystems increases, the system's reliability increases.
The following plot illustrates that a high system reliability can be achieved with low-reliability
components, provided that there are a sufficient number of components in parallel. Note that this plot
is the mirror image of the one above that presents the effect of the number of components in a series
configuration.
What is the reliability of the system if R1=99.5, R2=98.7 and R3=97.3 at 100 hours?
First, the reliability of the series segment consisting of Units 1 and 2 is calculated:
R1,2=R1,2=R1,2=R1⋅R20.9950⋅0.98700.982065 or 98.2065
The reliability of the overall system is then calculated by treating Units 1 and 2 as one unit with a
reliability of 98.2065% connected in parallel with Unit 3. Therefore:
Rs=Rs=Rs=Rs=1−[(1−0.982065)⋅(1−0.973000)]1−0.0004842450.99951575599.95
1 INTRODUCTION
1.1 Before any reliability analyses can be carried out on a system there must be knowledge of
the operational relationships of the various elements comprising that system. The reliability of a
system cannot be improved or even evaluated unless there is a thorough understanding of how
each of its elements function and how these functions affect system operation. The accurate
representation of these relationships is an integral part of this understanding and is particularly
important for meaningful predictions, apportionment's and assessments. A Reliability Block
Diagram (RBD) provides a method of representing this information in a form, which is easy to
comprehend because it is simple and has visual impact.
1.2 It should be noted that a system or equipment might require more than one RBD to describe
it. This is particularly true of equipment, which is capable of performing several functions, or
experiences several different operating states during a deployment. An RBD may be required for
each particular condition. In fact, the approach should be to produce an RBD for a function in a
particular operating state, rather than for a piece of hardware.
2.1.1 For a system comprising a number of identifiable items of hardware the reliability block
diagram is a means of showing the functional relationship between the items, and indicates which
ones must operate successfully for the system to accomplish its intended function (or functions).
An example of an RBD is shown in Figure 1.
2.1.2 It is assumed here that both the system and its elements are in one of two states; either
up or down. Hence, in an RBD, each element may be looked on as a switch that is closed when
the element is up and open when the element is down. The system will be up only when a path
exists between the input and output nodes.
2.1.3 In the system shown in Figure 1, the system will be down when:
a) element a is down; or
b) elements b or c, and element d are down.
(The groups will not necessarily be independent and elements may be duplicated).
2.2.1 The simplest form of a system for reliability analysis is one where the elements are
connected in series. In this type of system, if one or more of the elements are down then the system
is down. For example, consider the power train of a motor car, comprising engine (e), gearbox
and drive links (g), and two wheels (w1, w2). A failure of any element results in failure of the
system. The RBD of the system is:
w1 w2
Note that an RBD only depicts functional relationships between elements, so that, although it may
be convenient to arrange elements in the same order in which their functions are performed, it is
not necessary to do so.
2.3.1 There is redundancy in a system if not all of its constituent elements are required to be up
for successful operation of the system. A ‘m/n redundant group’ is a group of n items where only
(any) m of them has to be up for the group to be considered up. This chapter considers two forms
of redundancy, namely active and standby redundancy.
2.4.1 A group of elements are said to be in active redundancy if all elements in the group are
active when the system is operating, but it is not necessary for all elements to be up for the group
to be up. Redundancy appears in an RBD as parallel routes. For example, consider the rear
suspension of a lorry comprising four wheels on either side as illustrated in Figure 3.
w11 w13
w1 w14
2
w24
w2
w21 w23
2
2.4.2 Suppose the load can be supported by three out of four wheels on each side. Then the
RBD for the off-side wheels is as shown in Figure 4.
3/4
w13
w14
2.4.3 For all eight wheels considered as a system the RBD is a series combination of active
redundancy groups, as shown in Figure 5.
3/4 3/4
w11 w21
w12 w22
w13 w23
w14 w24
2.5.1 Elements are said to be in an ‘m/n standby redundant group’ when only m of the
elements are required to be in an active state and the remainder are in a passive state. When an
active element fails then one of the passive elements is switched on in its place. The failure
time distribution of the elements depend on whether they are in an active or passive state (the
failure rate of an element in an active state is generally much larger than its failure
rate when it is in a passive state). An example of a standby redundancy group is a car with a
spare wheel (assuming that the car with a burst tyre is not regarded as a system failure when
there is a working spare). The RBD of this system is shown in Figure 6.
4/5
w5
2.5.2 An important feature of most standby redundancy groups is that failures of elements in
the passive state will not be detected until the passive element is required to replace a failed
active element. Thus, even elements with a low failure rate in the passive state can have a
critical effect on system reliability unless the ‘spare’ is regularly and comprehensively tested.
In addition, if an automatic switching mechanism is used to bring in the passive elements, the
mechanism itself may have its own failure time distribution.
3.1 A Method
3.1.1 Where a system has a single function, such as in the examples described above, it is
generally a straightforward matter to construct the system RBD. Normally such systems will
comprise a group of elements in series or, at the most, a number of groups with redundancy
themselves connected in series, (e.g. Figure 4). However, where a system has to perform more
than one function or where the same function may be performed by more than one set of
elements, the construction of the RBD may become much more complex. In these situations, it
is advisable to follow a set procedure when drawing the RBD.
3.1.2 The first step is to define the functions that the system is to perform, and the operating
states (e.g. standby, full power, etc.). If the motor car example is taken again, several different
functions may be required, e.g.:
a) ability to carry two passengers;
b) ability to carry luggage; and
c) ability to tow a caravan.
3.1.3 The next step is to decide which of these functions is the minimum required for
successful operation of the system. If, in the example above, it is decided that all three functions
are required, then the functions should be represented as shown in Figure 7.
Figure 7: Representation of Series Function
If any one of the three functions is sufficient for the operation of the system, then the system
should be represented as shown in Figure 8.
3.1.4 Intermediate configurations are also possible of course. For example, if passenger-
carrying was the prime requirement and any one of the other two functions were sufficient for
system operation, then the initial RBD would be as shown in Figure 9.
3.1.5 The next step is to associate with each function those elements that are required to
perform the function. It may be that there are some elements which are common to all functions
and they should be separated out first, e.g. in the example above, the engine, steering, brakes,
transmission and four wheels are clearly necessary for all three functions. Function a could be
stated as requiring two out of three passenger seats, function b as requiring either a roof rack
or boot, and function c as requiring a towbar. The full RBD for the case shown in Figure 9
would then be as shown in Figure 10.
4/5
w1
w2
Steer Trans
Engine Brakes
ing mission
w3
w4
2/3
Px Roof
Px
Boot
Px
bar
3.2.2 When developing RBDs it is not always a matter of simple inspection to determine the
conditions that represent successful operation of a system, the system up states, or alternatively
the system down states. In these cases, techniques such as Truth Tables or Boolean Algebra
should be used. The Boolean analysis in PtDCh1 shows how to construct and simplify an RBD.
For simple systems the following results are useful when minimising an RBD.
1)
2)
3)
4)
3.2.3 When simplifying an RBD take care to ensure that the routing conditions in the diagram
are preserved. This problem arises because the node-block-node construction is not directional. It
is possible to have routes in both directions through blocks not connected to the input or output
node. An example of this is the crossover network shown below in which routes pass through
element ‘e’ in both directions.
3.2.4 If the ‘e’ block is directional, i.e. ‘ceb’ is not a system up state, then the RBD must either
have a downward arrow on the ‘e’ route or, preferably, be drawn as:
3.2.5 If the ‘e’ block is non-directional, i.e. either ‘aed’ or ‘ceb’ is a system up state, then the
RBD must either have a bi-directional arrow on the ‘e’ route or, preferably, be drawn as:
a b
e d
e d
c d
3.2.6 The process of constructing an RBD for a complex system may be summarised as follows:
a) Specify the function(s) to be analysed, and the operating state (e.g. standby, full power,
etc.).
b) Specify the minimum requirements for the system to operate successfully in terms of
the functions of the system.
c) Draw a system RBD in terms of the system functions.
d) Specify the elements that comprise the system functions.
e) Draw a system RBD in terms of the system elements.
f) Simplify the RBD but if in doubt leave it; the analysis will still be correct.
4 QUANTIFYING AN RBD
4.1 General
4.1.1 RBDs can be used to evaluate the system availability or system reliability given the
availability, reliability etc. of the individual elements.
4.1.2 The quantitative use of RBDs to derive R&M parameters is detailed in PtDCh6, which
provides formulae and methods for calculating the System Reliability (R), Availability (A), Mean
Time To Failure (MTTF), Mean Time Between Failures (MTBF), and Mean Time To Repair
(MTTR), when the Reliability and/or MTBF and/or MTTR (as appropriate) of the items
comprising the System are known.
4.2.1 The assessment of Availability, Reliability and Maintainability (R&M) parameters for
complex RBDs is a difficult task and is generally beyond the scope of manual or analytical
calculation in all but the simplest situations. For example, if queuing for repair occurs or if it is
required to assess the sensitivity of the system R&M parameters to the reliability and repair time
characteristics of elements of the system, then the mathematical difficulty and computational load
will be such that help from a computer program is essential.
4.2.2 Computer based analytical aids, such as RAM48, provide the solution to this problem.
4.3.1 Programs such as RAM4 are designed to be capable of assessing a variety of systems of
mixed repairable and non-repairable elements, including those with standby redundancy. They
may include the capability to model logistic delays, common mode failures and repair teams with
specific skills, and may also perform two common types of R&M study, sensitivity analysis and
apportionment.
4.3.2 For a system under specified mission conditions RAM4 calculates the mean and
distributions of the following system parameters:
• Availability.
• System Failure Rate.
• Mean Time Between Failures (MBTF).
• Mean Time To Repair (MTTR).
• Mean Time to First Failure (MTFF).
4.3.3 In addition to producing overall single figure measures of these system reliability
parameters, RAM4 also produces the distributions about and confidence intervals about the
calculated means. RAM4 also produces:
• an analysis of R&M parameters for elements of the system and for sub-systems (groups
of elements).
• a repair queue analysis giving statistics for waiting times and queue lengths.
• an automatic Sensitivity Analysis.
• distributions of the main system R&M parameters displayed graphically as histograms.
• a graphical display of results by colour shading of the RBD in a Criticality Analysis.
• detailed results tabulated in an output file which can be viewed and output to a printer.
4.3.4 These outputs enable the factors influencing the overall system reliability to be identified.
It must however be recognised that the R&M assessment of a complex system will always produce
results which require careful study.
5.1.1 There are some assumptions in the RBD method of representing complex system which
mean that some systems analyses are less straightforward than indicated previously. Problem areas
resulting from these assumptions are described below.
5.1.2 Failure Modes. An item of equipment might have more than one mode of failure in which
case the effect on the system of a failure in that equipment will depend upon which failure mode
occurs.
5.1.3 To clarify this point, consider two electronic capacitors in a circuit connected in series, as
shown in the following schematic.
C1 C2
5.1.4 Two failure modes are possible for each capacitor, namely open circuit and short circuit.
Considering the open circuit case first, output will be lost if either one of the capacitors goes open
circuit, so a series configuration will be shown for this case, see Figure 14.
C1 C2
C1
C2
5.1.6 From a reliability point of view the RBD for this system might be drawn as shown in
Figure 16 and the failure rates by mode assigned to each element.
C1
fails short
C1 C2
fails open fails open
C2
fails short
5.1.7 However, it can be seen that because C1 and C2 both appear twice in the figure the
elements are not independent of each other; the operation and failure of each element is
conditional. For instance, the failure of C1 due to a short is conditional upon there being no failure
due to an open circuit.
5.1.8 To evaluate an RBD there must be only one failure mode represented for each element.
For elements with more than one failure mode, separate RBDs must be drawn using each failure
mode.
5.1.9 Active Redundancy. In practice, if an element fails in a m/n active redundancy group, the
failure time distributions of the remaining elements are altered. The failure of an element may
mean, for example, that additional stress is placed on the remaining elements, which reduces their
MTBF. This would be the case for the lorry example described in section 2.4.
5.1.10 Multi-mode Missions. Often the requirements of a system change in successive phases
of mission although the basic physical configuration of the system remains the same. Some system
components may not operate during every phase of the mission. Thus component failures may or
may not affect system failure, depending on phase and operational requirements at the time. The
consequence for the RBD is that different phases might represent different reliability
configurations, elements in series might become redundancy items and vice versa, furthermore,
functional definitions may change. This situation cannot be depicted accurately using RBDs and
approximations may have to be made.
5.1.11 Consider the case of a system power supply consisting of two batteries. If in the early
phase of the mission the power demands are light, either one of the batteries is sufficient. However,
if in the latter stage of the mission demand is heavy both batteries must be operational to meet the
demand. The RBD for the battery sub-system in the early mission phase would appear thus:
While for the latter phase when both are required a series arrangement is required:
5.1.12 Obviously both configurations cannot be included in the same RBD, so which one can
be used as the best approximation for the whole mission? From a consideration of the mission
requirements it is clear that the failure of either battery in the early phase will subsequently result
in system failure in the later phase due to insufficient power, so the series arrangement would be
the better one. (This conclusion assumes that repair action during the mission would be either
impossible or very lengthy. If repair times were short it would complicate the situation further).
In many cases approximations such as this may be possible without significantly affecting the
analysis results, whereas for other cases it may prove necessary to construct a separate RBD for
each mission phase.
5.1.13 Common Cause Failures. Due allowance for the effect of common cause failures should
be allowed for in the RBD. When such failures occur all the elements affected by the common
mode event fail. A discussion on common cause failures is presented in PtCCh28.
STANDBY MODEL
The Standby Model evaluates improved reliability when backup replacements are switched on
when failures occur.
A Standby Model refers to the case in which a key component (or assembly) has an identical backup
component in an "off" state until needed. When the original component fails, a switch turns on the
"standby" backup component and the system continues to operate.
In the simple case, assume the non-standby part of the system has CDF F(t) and there are
(n−1) identical backup units that will operate in sequence until the last one fails. At that point, the
system finally fails.
where f(t) is the PDF F′(t). In general, convolutions are solved numerically. However, for the special
case when F(t) is the exponential model, the above integrations can be solved in closed form.
The total system lifetime is the sum of n identically distributed random lifetimes, each having
CDF F(t).
F2(t)f2(t)fn(t)===1−λte−λt−e−λtλ2te−λt, andλntn−1e−λt(n−1)!
and the PDF fn(t) is the well-known gamma distribution.
Example: An unmanned space probe sent out to explore the solar system has an onboard computer
with reliability characterized by the exponential distribution with a Mean Time To Failure (MTTF)
of 1/λ = 30 months (a constant failure rate of 1/30 = 0.033 fails per month). The probability of
surviving a two year mission is only exp(−24/30) = 0.45. If, however, a second computer is included
in the probe in a standby mode, the reliability at 24 months (using the above formula for F2) becomes
0.8 × 0.449 + 0.449 = 0.81. The failure rate at 24 months (f2/(1−F2)) reduces to
[(24/900) × 0.449]/0.81 = 0.015 fails per month. At 12 months the failure rate is only 0.0095 fails per
month, which is less than 1/3 of the failure rate calculated for the non-standby case.
Standby units (as the example shows) are an effective way of increasing reliability and reducing failure
rates, especially during the early stages of product life. Their improvement effect is similar to, but
greater than, that of parallel redundancy . The drawback, from a practical standpoint, is the expense of
extra components that are not needed for functionality.
M/N CONFIGURATION
m-out-of-n SYSTEMS Simple series and parallel representations are often inadequate to describe real
systems. A first generalization, which includes series and parallel systems as extreme cases, is that of
“m-out-of-n” systems. These systems fail if m or more out of n components fail. The case m = 1
corresponds to series systems, the case m = n to parallel systems. Again, analysis is simpler if the
components fail independently with the same probability P. Then, the probability of failure can be
calculated from the binomial distribution: Let M be the number of failed elements. M has binomial
distribution with parameters n and P, hence its probability mass function is given by
𝑛
PM;n (m) = (𝑚 ) Pm (1 − P)n−m
n n!
Where( m ) =m!(n − m)! is the binomial coefficient. The probability of failure of the ⎝ m⎠ system is
Where FM;n(m) = P[M ≤ m] is the cumulative distribution function of M.
Example 1 Consider the case of a car with one spare tire. The car will become impaired if 2 (or more)
tires are flat. In a conservative approximation, one may assume that all 5 tires are simultaneously used
and subject to punctures. Then the probability of not completing a trip is given by Eqs. 5 and 6, with
n = 5, m = 2, and P = probability of puncture of a single tire during the trip.
Problem 1.2 Compare the probability of completing a car trip in the cases without spare tire and with
1 spare tire by using Eq. 6 with (n = 4, m = 1) and (n = 5, m = 2). Make the comparison for P = 0.001,
0.01, 0.1. Comment on the results.
Example 2 In order to fly, an airplane needs at least half of its engines to be functioning. Suppose
that, during any given flight, engines fail independently, with probability P. Would you be safer in an
airplane with 1, 2, 3 or 4 engines?
Under the condition of independent and equally likely failures, the number of nonfunctioning engines
at the end of a generic flight, M, has binomial distribution with probability mass function in Eq. 5. The
probability Pn that an airplane with n engines is unable to fly is therefore
Problem 1.3
Plot the probabilities P1, P2, P3, and P4 in Eq. 7 as functions of P, for P in the range [10 -4, 10-1].
Comment on the results.
The previous analysis rests on the assumption that airplane engines fail independently. This is a rather
unrealistic assumption, because in many cases a single “common cause” may induce simultaneous or
serial failure of several engines. A more sophisticated but still relatively simple model is as follows.
Suppose that potentially damaging events occur at Poisson times during a flight, with mean rate λ.
When one such event occurs, each engine of the airplane fails with probability p, independently of the
other engines. Also, failure or survival of an engine in different potentially damaging events are
assumed to be independent events. Notice that, in this case, engine failures are conditionally
independent given a potentially damaging event. However, unconditionally (during a generic flight),
engine failures are probabilistically dependent (failure of one engine makes it more probable that other
engines also failed, because an engine failure indicates that at least one damaging event occurred
during the flight). For example, engine failures may be caused by mulfunctionings of the electrical
system or by encounters with bird flocks. When any such event occurs, it is likely that several engines
are damaged.
Before we analyze airplane reliability through this revised model, we recall the following property of
Poisson processes. Consider a primary Poisson process {ti} with rate parameter λ. A secondary process
{t’i} is obtained by “independently thinning” {ti}. This
3. MORE COMPLICATED SYSTEM CONFIGURATIONS
In even more complex cases, series and parallel connections are intermixed. For example, in
assembling a car, it is necessary that a large number of components be simultaneously available (this
is a series system and is highly vulnerable because the unavailability of just one component may force
an assembly plant to shut down, as one knows well from labor strikes). To increase system reliability,
car manufacturers rely on several alternative part suppliers and part-producing plants, which are used
“in parallel”. This corresponds to a scheme with several subsets of components (sub-systems). The
sub-systems are connected in series, but have an internal parallel structure, as illustrated by the
following scheme.
The following is an example of reliability analysis for a moderately complex system, which includes
series and parallel connections, as well as m-out-of-n sub-systems.
Example Consider a system composed of a heater (R1), two pumps (R2 and R3), and 5 turbines (R4
through R8). The two pumps work in parallel, meaning that the pump subsystem operates if at least
one of the pumps operates. The turbine sub-system operates if at least 3 turbines operate. The heater,
pump, and turbine sub-systems are connected in series, meaning that they must all properly work for
the whole system to perform adequately.
The reliability of the system id defined as the probability that the system does not fail between
scheduled maintenances. We denote by Wi the event “component i is working properly”. To calculate
system reliability, we first consider the reliability of each subsystem seperatly
Problem 1.5
Turbines are very expensive. Therefore, there are financial incentives to removing one of them. How
would removal of turbine R4, which is the least reliable one, affect the reliability of the whole system?
Many physical and non-physical systems work “in parallel” in a sense different from the case
considered above. Rather than just having multiple elements connected in parallel and requiring that
at least one works, the elements share the total applied “load” or demand. Parallel systems of this type
may function in different ways, depending on the characteristics of the components. The following are
examples of two different types:
• A first example is a power company that interchangeably uses several power generating plants
to meet total demand. In this case, the system fails if the demand exceeds the combined
capacity of the power plants; i.e., the system capacity is the sum of the capacities of its
components;
• A second example is a rope made of several bundles that share the applied load. If the bundles
are “ductile” (e.g., they are made of mild steel), they are able to redistribute the load among
themselves. In this case the strength of the rope is the sum of the bundle strengths and the
behavior of the system is of the same type as that of the power company in the previous
example. However, if the bundles are brittle (e.g. they are made of glass), then each bundle
will break as soon as its capacity is reached. The load must then be carried by the surviving
bundles. In this case, the strength of the rope is less than the sum of the strengths of the
individual components.
For a system of the first type, the probability of failure is
where D is the demand and C1, ..., Cn are the capacities of the components. We shall see later in the
course how to evaluate probabilities of the type in Eq., when D and the Ci are random variables.
Systems of the second type are more complicated to analyze, because the capacity of the system is a
complicated nonlinear function of the component capacities. A way to evaluate reliability in this and
other complicated cases is to use Monte Carlo simulation.
APPLICATION OF BAYES’THEOREM
Bayes' Theorem
Bayes’ theorem describes the probability of occurrence of an event related to any condition. It is also
considered for the case of conditional probability. Bayes theorem is also known as the formula for the
probability of “causes”. For example: if we have to calculate the probability of taking a blue ball from
the second bag out of three different bags of balls, where each bag contains three different colour balls
viz. red, blue, black. In this case, the probability of occurrence of an event is calculated depending on
other conditions is known as conditional probability. In this article, let us discuss the statement and
proof for Bayes theorem, its derivation, formula, and many solved examples.
Table of Contents:
• Statement
• Proof
• Formula
• Derivation
• Examples
• Practice problems
• FAQs
Bayes Theorem Statement
Let E1, E2,…,En be a set of events associated with a sample space S, where all the events E1, E2,…,
En have nonzero probability of occurrence and they form a partition of S. Let A be any event
associated with S, then according to Bayes theorem,
[latex]P(E_i│A)~=~\frac{P(E_i)P(A│E_i)}{\sum\limits_{k=1}^{n}P(E_k)P(A| E_k)}[/latex]
for any k = 1, 2, 3, …., n
P(A|B) = P(A∩B)/P(B)
Where P(A|B) is the probability of condition when event A is occurring while event B has already
occurred.
P(A ∩ B) is the probability of event A and event B
P(B) is the probability of event B
Practice Problems
Solve the following problems using Bayes Theorem.
1. A bag contains 5 red and 5 black balls. A ball is drawn at random, its color is noted, and again
the ball is returned to the bag. Also, 2 additional balls of the color drawn are put in the bag.
After that, the ball is drawn at random from the bag. What is the probability that the second
ball drawn from the bag is red?
2. Of the students in the college, 60% of the students reside in the hostel and 40% of the students
are day scholars. Previous year result reports that 30% of all students who stay in the hostel
scored A Grade and 20% of day scholars scored A grade. At the end of the year, one student
is chosen at random and found that he/she has an A grade. What is the probability that the
student is a hostlier?
3. From the pack of 52 cards, one card is lost. From the remaining cards of a pack, two cards are
drawn and both are found to be the diamond cards. What is the probability that the lost card
being a diamond?
Reliability can be given as a probabilistic performance index on operation conditions and redundancy
of components in a system, as well as tolerance of possible failures. For the Fault Tolerant Control
(FTC) systems, reliability has always been a subjective concern. It is natural to make the ultimate goal
of the FTC as to enhance the system reliability. However, there lacks quantitative measures for
reliability in this context because the standard reliability assessment techniques are not geared toward
the redundancy in the control systems [1]. Normal reliability analysis concerns with the series-parallel
or network structures but few methods deal with the functional and dynamic relations involved in a
control system. Hence, in the hybrid FTC systems, a linkage between the low-level control/diagnosis
subsystems and the high-level decision/supervision module is missing [2]. In this work, we attempt to
develop a method to describe the operating status of control systems in terms of reliability. Herein we
are only interested in evaluating the reliability of the overall control system rather than individual
component, and the reliability of each individual component is assumed to be known a priori. As a
matter of fact, given a system, one can not always expect to improve the overall system reliability by
using more reliable parts. On the other hand, we also know that even if some components fail, it is
possible that the system reliability can be maintained at certain level. This fact indeed reflects the
fundamental philosophy of the FTC systems. How to quantitatively assess the system level reliability
in this context is still an open problem. There have been some investigations on this issue in the control
community. In [3], the signal flow graph was adopted to perform the failure mode analysis but in this
approach the control system was treated as the static system without considering the dynamics; fault
tree analysis was used in [4], [5] but no control objective or dynamics were considered; functional-
reliability modelling methods was employed in [6] but only a mean loss criterion instead of reliability
was calculated; recent results were reported in [1], where an approximate Markov model was used to
evaluate reliability and a criterion based on coverage was employed to bridge the control action and
system reliability. In this paper, we develop a procedure for reliability assessment of the control system
by extending the tie/cutset methods, which have been established for reliability analysis of networked
systems. In the proposed method, the required functions of the system are related to control
performance or control objectives. Furthermore, the procedure can easily cope with the change of the
operating conditions of the system components and the number of performance requirements. When
such change occurs, we only need to update the cut/tie set model and then re-calculate the reliability.
The remainder of this paper is organized as follows: In section II, the basic concepts about probability
and reliability evaluation for network structures are briefly reviewed; the proposed methods are
presented in section III followed by an example to illustrate the main procedures. The results in the
example shows that simply changing loop gains in the control system can result in a change of system
reliability. Section IV draws the conclusions.
where Ti, i = 1, 2, 3, 4, is the i-th tie set. The above expression can be decomposed into Pr(Ti) which
represents the probability that all the components in Ti work. Please note that the definitions of Pr(Ci)
and Pr(Ti) are complementary. The minimal cut/tie sets can be generated by standard algorithms, such
as multiplication of the connection matrices, so the reliability evaluation methods based on minimal
cut/tie set can be easily implemented on computers [7].
D. Reliability calculation
Assume that all the minimal cut sets for a control systems are identified as:
For the active FTC systems, the component failure can be diagnosed by a Fault
Detection & Isolation (FDI) scheme on-line. Based on the FDI results, the reliability value can be
updated by modifying cut/tie sets. For instance, if fk is detected by FDI and we do not consider the
false alarm, the new tie set or cut set is derived according to the rules below:
Tnew = {Ti|Ti ∈ T, fk ∈/ Ti}, (7)
Cnew = {C i|C i = Ci − {fk},
Ci ∈ C, fk ∈ Ci; or C
i = Ci, Ci ∈ C, fk ∈/ Ci}. (8)
The updated reliability can be computed by these new sets. If consider the false alarm of the FDI, for
instance, if fk is detected by FDI with probability Pˆk and the original probability is Pk, then the new
probability is replaced by
Pˆk, or P
k = Pˆk, while the cut sets and tie sets remain unchanged.
To illustrate the above method, let us look at an example
E. Cascade cut set and tie set
In FTC systems, if some components suffer from failures, the requirements are usually relaxed
by giving up certain performance. The system performance will be degraded but ‘gracefully’. In this
case, it is done by removing some elements in the objective set O = {O1, ··· , Ono }, so a new objective
set is obtained, O . Instead of repeating the whole reliability evaluation procedure when the objective
set is changed, one simpler method is to search the cut/tie sets for each single objective Oi in O, i = 1
∼ no, and then derive the cut/tie sets for the entire set O. We name this method as cascade cut/tie set,
which can reduce the computation during the re-evaluation when the objective set changes. Suppose
the basic cut/tie sets for each objective Oi, i = 1 ∼ no, are given below as
which can be easily proved by examining the definitions of cut/tie sets and those two set
operations. The reliability can be calculated based on C or T. This method offers flexibility when
evaluating reliability under various objectives, which can be implemented as a recursive procedure
when control objectives changes. F. Searching process for cut/tie sets There are two approaches,
namely simulation based and model based approaches. If the control objectives are given in the form
of transient characteristics, we can search for the cut/tie sets through the off-line simulation. If the
control objectives are given that are related or can be mapped to the system models (parameters), one
can find the sets by examining the system characteristics from the system models. In fact, this is the
philosophy behind the control system analysis. The main procedure is given as follows.
1. Transform the system into the standard set up.
2. For all expected failure scenarios, i.e. all possible combinations of basic failures, evaluate if
the objective sets are satisfied by simulation or analyzing the system models
MARKOV ANALYSIS
Markov analysis is a method used to forecast the value of a variable whose predicted value is
influenced only by its current state, and not by any prior activity. In essence, it predicts a random
variable based solely upon the current circumstances surrounding the variable.
Markov analysis is often used for predicting behaviors and decisions within large groups of people. It
was named after Russian mathematician Andrei Andreyevich Markov, who pioneered the study of
stochastic processes, which are processes that involve the operation of chance. Markov first applied
this method to predict the movements of gas particles trapped in a container.
KEY TAKEAWAYS
• Markov analysis is a method used to forecast the value of a variable whose predicted value is
influenced only by its current state.
• The primary advantages of Markov analysis are simplicity and out-of-sample forecasting
accuracy.
• Markov analysis is not very useful for explaining events, and it cannot be the true model of the
underlying situation in most cases.
• Markov analysis is useful for financial speculators, especially momentum investors.
Understanding Markov Analysis
The Markov analysis process involves defining the likelihood of a future action, given the current state
of a variable. Once the probabilities of future actions at each state are determined, a decision tree can
be drawn, and the likelihood of a result can be calculated.
Markov analysis has several practical applications in the business world. It is often employed to predict
the number of defective pieces that will come off an assembly line, given the operating status of the
machines on the line. It can also be used to predict the proportion of a company's accounts
receivable (AR) that will become bad debts.
Companies may also use Markov analysis to forecast future brand loyalty of current customers and
the outcome of these consumer decisions on a company's market share. Some stock price and option
price forecasting methods incorporate Markov analysis, too.
Unfortunately, Markov analysis is not very useful for explaining events, and it cannot be the true
model of the underlying situation in most cases. Yes, it is relatively easy to estimate conditional
probabilities based on the current state. However, that often tells one little about why something
happened.
Markov analysis is a valuable tool for making predictions, but it does not provide explanations.
In engineering, it is quite clear that knowing the probability that a machine will break down does not
explain why it broke down. More importantly, a machine does not really break down based on a
probability that is a function of whether or not it broke down today. In reality, a machine might break
down because its gears need to be lubricated more frequently.
In finance, Markov analysis faces the same limitations, but fixing problems is complicated by our
relative lack of knowledge about financial markets. Markov analysis is much more useful for
estimating the portion of debts that will default than it is for screening out bad credit risks in the first
place.
Markov analysis also allows the speculator to estimate that the probability the stock will outperform
the market for both of the next two days is 0.6 * 0.6 = 0.36 or 36%, given the stock beat the market
today. By using leverage and pyramiding, speculators attempt to amplify the potential profits from
this type of Markov analysis.
1. Fault Tree Analysis Summary:
Fault-Tree Analysis (FTA) is a graphical binary logic top-down technique that is used to
describe how a specific unwanted event in a system may be caused by the effects of a single failure
or combination of failures.
[Link].
Fault-Tree Analysis (FTA) is a graphical binary logic top-down technique that is used to describe
how a specific unwanted event in a system may be caused by the effects of a single failure or
combination of failures. The specific unwanted event, such as an accident or explosion, is known
as the “top event”, where definition is critical to the success of this type of analysis. The fault tree
is then constructed by relating sequences of events which individually or in combination could lead
to the top event. The linkages between faults are represented by Boolean logic gates, such as AND
or OR gates, which serve to permit or inhibit the flow of fault logic up the tree. These symbols
denote the relationship of the input events required for the output event. The tree is constructed by
deducing in turn the preconditions for the top event and then successively considering the next level
of events, and the next, until the basic causes are identified.
[Link].
A fault tree can be used quantitatively to permit frequency or probability of the top event to be
calculated or it can be used qualitatively to identify combinations of basic events that are sufficient
to cause the top event; these are known as ‘cut sets’. Cut sets are identified using the technique
“Minimal Cut Set Analysis” (Lees 1996) which assigns a unique label to every base event on the
tree and shows all possible ways in which these can combine to lead to the major hazard event.
These are often shown as letter combinations, for example, A, AB, ABCD, CDFGH. These are
known as Single Event Cut Sets, Two Event Cut Sets, etc.
[Link].
The significance of these is that single or two event cuts imply no or little safeguarding between
the initiating event and the top event, whereas 4 and 5 event cut sets do have multiple redundancy.
There are rules of thumb appropriate for major hazards that single or 2- event cut sets require
additional mitigation / safeguarding, whereas 5 event cut sets and higher are probably adequate.
Three and 4 event cut sets may require additional evaluation. Factors for evaluation include both
the number of safeguards and their quality or reliability.
[Link].
Unlike FMEA, the technique has the flexibility to allow the consideration of human errors, as well
as permitting the modelling of equipment failures and external conditions, which can lead to an
accident.
[Link].
FTA is generally applicable for almost every type of system-level risk assessment application, but
is used most effectively to address the fundamental causes of specific identified accidents likely to
be dominated by relatively complex combinations of events. It can be used to determine the root
causes that could lead to an accident so enabling preventative or mitigative measures to be
identified reducing the likelihood of the event.
1.1.3. Advantages, disadvantages and limitations to the defence sector or the particular domain
[Link].
Advantages
• The technique is widely used and well accepted and can be used for cross-discipline system
analysis
• It is suitable for considering the many hazards that arise from a combination of adverse
circumstances
• It allows for the identification of common mode or common cause failures which may not
be apparent when considering sub-systems in isolation
• It is often the only technique that can generate credible likelihoods for novel, complex
systems
• Human errors can be included in the analysis
• It can be used both qualitatively and quantitatively depending on what is required from the
analysis
• It a clear and logical form of presentation to non-specialist users provided an appropriate
of the tree is used.
[Link].
Disadvatnages
• The diagrammatic format discourages analysts from stating explicitly the assumptions
and conditional probabilities for each gate.
• This can be overcome by careful back-up text documentation.
• FTA can be come time-consuming and complicated for large systems
• The technique examines only one specific top event.
• Additional FTAs must be developed to analyse other top events
• Analysts may overlook failure modes and fail to recognise common cause failures (i.e.
a single fault affecting two or more safeguards) unless they have a high level of expertise
and work jointly with the operator
• Manual FTA assumes all events are independent however the more sophisticated
computer software packages can cater for the combination of events
• Due to its wide use there can be temptation to read across data from ARM or ILS
projects where, for example, the fault-tree technique has been used. As a consequence,
the safety perspective can be lost as human error has been excluded and the focus has
been solely on determining faults and on not on more far-reaching safety issues
In series configurations, the overall system reliability decreases as the number of components increases, and it is dominated by the least reliable component. In parallel configurations, the system reliability increases with more components, and it is heavily influenced by the most reliable component as the system fails only if all components fail simultaneously .
In series configurations, as the number of components increases, overall system reliability decreases, necessitating higher individual component reliability to maintain acceptable system reliability. In parallel configurations, increasing the number of components enhances system reliability, allowing for the use of lower reliability components while achieving high system reliability. These relationships guide design strategies; in series systems, focus is on enhancing individual component reliability, while in parallel systems, emphasis is on increasing redundancy .
Cut/tie set methods support reliability evaluation in networked systems by transforming dynamic system structures into manageable reliability block diagrams. Cut sets identify ways a system can fail, while tie sets define conditions under which the system functions. These methods allow for systematic analysis and update of system reliability, even as requirements or conditions change, thereby aiding in consistent evaluation and improvement of system reliability .
In an independent series system, the system reliability is the product of the reliabilities of all components, meaning Rs = R1 × R2 × ... × Rn. For an independent parallel system, the system unreliability is the product of the unreliabilities of all components, so the reliability is Rs = 1 - ((1-R1) × (1-R2) × ... × (1-Rn)). This reflects the fundamental difference where series systems require all components to succeed for system success, while parallel systems require only one component to succeed .
In a series configuration, increasing the reliability of the least reliable component results in the greatest improvement in the overall system reliability because the system's reliability is limited by its weakest component. Conversely, in a parallel configuration, increasing the reliability of the most reliable component yields the greatest improvement because even if one component fails, the most reliable component is likely to function longer, enhancing the overall system reliability .
Redundancy in parallel systems is critical because it allows for failure of individual components without compromising the entire system's functionality. This is particularly important in mission-critical applications, such as aerospace and RAID systems, where system failure can have severe consequences. Adding redundancy improves overall system reliability by mitigating the impact of single component failures .
The concept of minimal cut and tie sets is crucial because it reduces the complexity of reliability assessment by focusing on the most critical failure and success paths, respectively. This simplification facilitates efficient analysis and recalibration of reliability models in dynamic control systems, enabling accurate evaluation and targeted improvement of reliability without exhaustive recalculations .
High system reliability can be achieved with low-reliability components when these components are configured in parallel. In parallel configurations, because only one component needs to succeed for the entire system to succeed, adding more components increases the probability that at least one component will function properly, thereby increasing overall system reliability. This concept is helpful in designing systems where using components with low individual reliability is more feasible or economical .
The reliability block diagram is a graphical representation that shows how components are connected—either in series, parallel, or a combination—and how these connections affect overall system performance. By clearly outlining the relationships between component functionalities and system outcomes, it helps in visualizing potential weak points and the impact of configuration changes on reliability, thus facilitating the assessment and optimization of system performance .
Failure Detection and Isolation (FDI) enhances reliability assessment by identifying component failures in real-time, allowing for immediate adjustment of cut/tie sets to reflect these failures. This real-time adjustment ensures that reliability calculations remain accurate under changing conditions and helps to maintain or restore system reliability rapidly by diagnosing failures and potentially redirecting or relaxing system requirements in response .