0% found this document useful (0 votes)
238 views24 pages

2017 Reliability Engineering - Theory and Practice PDFDrive

Uploaded by

Chung Duy Hùng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
238 views24 pages

2017 Reliability Engineering - Theory and Practice PDFDrive

Uploaded by

Chung Duy Hùng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

1 Basic Concepts, Quality & Reliability (RAMS)

Assurance of Complex Equipment & Systems

Considering that complex equipment and systems are generally repairable, contain
redundancy and must be safe, the term reliability appears often for reliability, avail-
ability, maintainability, and safety. RAMS is used to point out this wherever neces-
sary in this book. The purpose of reliability engineering is to develop methods and
tools to assess RAMS figures of components, equipment & systems, as well as to
support development and production engineers in building in these characteristics.
In order to be cost and time effective, reliability (RAMS) engineering must be inte-
grated in the project activities, support quality assurance & concurrent engineering
efforts, and be performed without bureaucracy. This chapter introduces basic con-
cepts, shows their relationships, and discusses the tasks necessary to assure quality
and reliability (RAMS) of complex equipment & systems with high quality and reli-
ability (RAMS) requirements. A comprehensive list of definitions with comments is
given in Appendix A1. Standards for quality and reliability (RAMS) assurance are
discussed in Appendix A2. Refinements of management aspects are in Appendices
A3 - A5. Risk management is considered in Sections1.2.7 and 6.11.

1.1 Introduction
Until the 1950's, quality targets were deemed to have been reached when the item
considered was found to be free of defects and systematic failures at the time it left
the manufacturer. +) The growing complexity of equipment and systems, as well as
the rapidly increasing cost incurred by loss of operation as a consequence of failures,
have brought the aspects of reliability, availability, maintainability, and safety to the
forefront. The expectation today is that complex equipment & systems are not only
free from defects and systematic failures at time t = 0 (when starting operation), but
also perform the required function failure-free for a stated time interval and have a
fail-safe behavior, i. e. preserve safety, in case of critical (or catastrophic) failures.
However, the question of whether a given item will operate without failures during
a stated period of time cannot be answered by yes or no, based on an acceptance test.
__________________
+) Nodistinction is made in this book between defect and nonconformity (except pp. 388, 394, 395);
however, defect has a legal connotation and has to be used when dealing with product liability.

© Springer-Verlag GmbH Deutschland 2017


A. Birolini, Reliability Engineering, DOI 10.1007/978-3-662-54209-5_1
2 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

Experience shows that only a probability for this occurrence can be given. This
probability is a measure of the item’s reliability and can be interpreted as follows,
if n statistically identical and independent items are put into operation at
-
time t = 0 to perform a given mission and n £ n of them accomplish it
-
successfully, then the ratio n /n is a random variable which converges for
increasing n to the true value of the reliability (Eq. (A6.146) on p. 472).
Performance parameters as well as reliability, availability, maintainability, and safety
have to be built in during design & development and retained during production and
operation of the item. After the introduction of some important concepts in Section
1.2, Section 1.3 gives basic tasks and rules for quality and reliability (RAMS) as-
surance of complex equipment & systems (refinements are in Appendices A1-A5).

1.2 Basic Concepts


This section introduces important concepts used in reliability (RAMS) engineering
and shows their relationships (see Appendix A1 for a more complete list).

1.2.1 Reliability
Reliability is a characteristic of the item, expressed by the probability that it will
perform its required function under given conditions for a stated time interval.
It is generally designated by R . From a qualitative point of view, reliability can
also be defined as the ability of the item to remain functional. Quantitatively,
reliability specifies the probability that no operational interruption will occur
during a stated time interval. This does not mean that redundant parts may not fail,
such parts can fail and be repaired on-line (i. e. without operational interruption at
item (system) level). The concept of reliability thus applies to nonrepairable as well
as to repairable items (Chapters 2 and 6, respectively). To make sense, a numerical
statement on reliability (e. g. R = 0.9 ) must be accompanied by a clear definition of
the required function, environmental, operation & maintenance conditions,
as well as the mission duration and the state of the item at mission begin
(often tacitly assumed new or as-good-as-new).
An item is a functional or structural unit of arbitrary complexity (e. g. component
(part, device), assembly, equipment, subsystem, system) that can be considered as an
entity for investigations. +) It may consist of hardware, software, or both, and may
also include human resources. Assuming ideal human aspects & logistic support,
technical system should be preferred; however, system is often used for simplicity.
__________________
+) System refers in this book (as often in practice) to the highest integration level of the item considered.
1.2 Basic Concepts 3

The required function specifies item's task; i. e. for given inputs, the item outputs
have to be constrained within specified tolerance bands (performance parameters
should always be given with tolerances). The definition of the required function is
the starting point for any reliability analysis, as it defines failures.
Operating conditions have an important influence on reliability, and must be
specified with care. Experience shows, for instance, that the failure rate of semi-
conductor devices will double for an operating temperature increase of 10 to 20 ° C.
The required function and / or operating conditions can be time dependent. In
these cases, a mission profile has to be defined and all reliability figures will be
referred to it. A representative mission profile and the corresponding reliability
targets should be given in the item's specifications.
Often the mission duration is considered as a parameter t, the reliability function
R( t ) is then the probability that no failure at item (system) level will occur in (0 , t ] ;
however, item's condition at t = 0 influences final results, and to take care of
this, reliability figures at system level will have in this book (starting from
Section 2.2.6, except Chapter 7 & Appendix A6 to simplify notation) indices
Si (e. g. RS i (t ) ), where S stands for system (footnote on p. 2) and i is the
state Z i entered at t =0 (up state for reliability), with i = 0 for system new or
as-good-as-new, as often tacitly assumed at t =0 (see the footnote on p. 512).
A distinction between predicted and estimated reliability is important. The first
is calculated on the basis of the item’s reliability structure and the failure & repair
rates of its components (Chapters 2 & 6); the second is obtained from a statistical
evaluation of reliability tests or from field data by stated conditions (Chapter 7).
The concept of reliability can be extended to processes and services, although
human aspects can lead to modeling difficulties (Sections 1.2.7, 5.2.5, 6.10, 6.11).

1.2.2 Failure
A failure occurs when the item stops performing its required function. As simple as
this definition is, it can become difficult to apply it to complex items. The
failure-free time, used in this book for failure-free operating time,
is generally a random variable. It is often reasonably long; but it can be very short,
for instance because of a failure caused by a transient event at turn-on. A general
assumption in investigating failure-free times is that at t = 0 the item is new or
as-good-as-new and free of defects & systematic failures. Besides their frequency,
failures should be classified according to the mode, cause, effect, and mechanism:
1. Mode: The mode of a failure is the symptom (local effect) by which a failure is
observed; e. g. open, short, drift, functional faults for electronic, and brittle
fracture, creep, buckling, fatigue for mechanical components or parts.
2. Cause: The cause of a failure can be intrinsic, due to weaknesses in the item
4 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

and / or wear-out, or extrinsic, due to errors, misuse or mishandling during the


design, production, or use. Extrinsic causes often lead to systematic failures,
which are deterministic and should be considered like defects (dynamic defects
in software quality). Defects are present at t = 0 , failures appear always in
time, even if time to failure is short as it can be with systematic or early failures.
3. Effect: The effect (consequence) of a failure can be different if considered on
the item itself or at higher level. A usual classification is: non relevant, minor,
major, critical (affecting safety). Since a failure can also cause further failures,
distinction between primary and secondary failure is important.
4. Mechanism: Failure mechanism is the physical, chemical, or other process
that leads to a failure (see Table 3.5 on p. 103 for some examples).
Failures can also be classified as sudden or gradual. As failure is not the only cause
for the item being down, the general term used to define the down state (not caused
by preventive maintenance, other planned actions, or lack of external resources)
is fault. Fault is a state of the item and can be due to a defect or a failure.

1.2.3 Failure Rate, MTTF, MTBF


The failure rate plays an important role in reliability analysis. This Section intro-
duces it heuristically, see Appendix A6.5 for an analytical derivation.
Let us assume that n statistically identical, new, and independent items are put
into operation at time t = 0 under the same conditions, and at the time t a subset
ν( t ) of these items have not yet failed. ν(t ) is a right continuous decreasing step
function (Fig. 1.1). t1 ,..., t n , measured from t = 0 , are the observed failure-free
times (operating times to failure) of the n items considered. They are independent
realizations of a random variable τ, hereafter identified as failure-free time, and
must not be confused with arbitrary points on the time axis ( t1* , t *2 ,... ). The quantity
Ê[ τ ] = ( t1 + … + t n ) / n (1.1)
is the empirical mean (empirical expected value) of τ. +) For n → ∞, Ê [ τ ] converges
to the true mean E [ τ ] = MTTF given by Eq. (1.8) (Eq. (A6.147)). The function

R̂( t ) = ν−( t ) / n (1.2)


is the empirical reliability function, which converges to R( t ) for n → ∞ (Eq. (A8.5)).
For an arbitrary time interval (t , t +δ t ] , the empirical failure rate is defined as
ˆ ( t ) = ν (t ) − ν (t + δ t ) .
λ (1.3)
ν (t ) δ t
ˆ (t ) δ t is the ratio of the items failed in the interval (t , t +δ t ] to the number of items
λ
still operating (or surviving) at time t. Applying Eq. (1.2) to Eq. (1.3) yields
__________________
+) Empirical quantities are statistical estimates, marked with ˆ in this book.
1.2 Basic Concepts 5

ν(t)

n
n–1
n–2
n–3

2
1
t1 t
0
t2
t3

Figure 1.1 Number ν(t ) of (nonrepairable) items still operating at time t

ˆ ˆ
ˆ ( t ) = R (t ) − R (t + δ t ) .
λ (1.4)
δ t R̂ (t )
For R (t ) derivable, n → ∞ & δ t → 0 , λ̂( t ) converges to the (instantaneous) failure rate
− d R(t ) / d t
λ( t ) = . (1.5)
R(t )

Considering R(0 ) = 1 (at t = 0 all items are new), Eq. (1.5) leads to
t
− ∫ λ (x ) dx
R( t ) = e 0 , (for R ( 0) = 1) . (1.6)
The failure rate λ( t ) defines thus completely the reliability function R (t ) of a
nonrepairable item. However,
considering Eq. (2.10) on p. 39 or Eq. (A6.25) on p. 441, λ( t ) can also be
defined for repairable items which are as-good-as-new after repair, taking
instead of t the variable x starting by 0 after each repair (renewal), as for
interarrival times (see e. g. pp. 41, 390); extension which is necessary when
investigating repairable systems in Chapter 6 (see e. g. Fig. 6.8 on p. 197).
In this case, for an item (system) with more than one element, as-good-as-new at
item (system) level is only given either if all elements have constant (time inde-
pendent) failure rates or at each failure also all not failed elements with time
dependent failure rate are renewed to as-good-as-new (see pp. 390 - 91, 442, 443).
If a repairable item (system) cannot be restored to be as-good-as-new after repair,
failure intensity z ( t ) must be used (pp. 389, 390 - 91, 540). Use of hazard rate or
force of mortality for λ( t ) should be avoided.
6 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

In many practical applications, l (x ) = l can be assumed. Eq. (1.6) then yields


R( t ) = e - l t , (for l ( x ) = l , t > 0, R ( 0) = 1), (1.7)
and the failure-free time t > 0 is exponentially distributed ( F(t ) = Pr{t £ t } =1- e -l t );
for this, and only in this case, the failure rate l can be estimated by
l̂ = k / T , where T is a given (fixed) cumulative operating time and k 0
the total number of failures during T (Eqs. (7.28) & (A8.46)).
The mean (expected value) of the failure-free time t > 0 is given by (Eq. (A6.38))

MTTF = E [ t ] = Ú R( t ) d t , (for MTTF < •) , (1.8)
0

where MTTF stands for mean time to failure. For l (x ) = l it follows MTTF = 1/l .
A constant (time independent) failure rate l is often considered also for repaira-
ble items (pp. 390 - 91, Chapter 6). Assuming item (system) as-good-as-new after
each repair (renewal), consecutive failure-free times are then independent random
variables, exponentially distributed with parameter l and mean 1/l . In practice,
1/l MTBF (for l ( x ) = l , x starting by 0 after each repair / renewal) (1.9)
is often tacitly used, where MTBF stands for mean operating time between failures,
expressing a figure applicable to repairable items (system). Considering thus
the common usage of MTBF, the statistical estimate MTBF ˆ = T / k used in
practical applications (see e. g. [1.22, A2.5, A2.6 (HDBK-781]) but valid only
for l (x ) = l (p. 330), and to avoid misuses, MTBF should be confined to re-
pairable items with l (x) = l , i. e. to MTBF 1/ l as in this book (pp. 392 -93).
However, at component level MTBF = 108 h for l =10 -8 h -1 has no practical significance.
Moreover, for systems described by Markov or semi-Markov processes in steady-
state or for t Æ • MUTS (system mean up time) applies, and the use of MTBFS
instead of MUTS should be avoided (see pp. 393, 394 & 279 and e. g. Eq. (6.95)).
The failure rate of a large population of statistically identical and independent
items exhibits often a typical bathtub curve (Fig. 1.2) with the following 3 phases:
1. Early failures: l( t ) decreases (in general) rapidly with time; failures in this
phase are attributable to randomly distributed weaknesses in materials,
components, or production processes.
2. Failures with constant (or nearly so) failure rate: l( t ) is approximately
constant; failures in this period are Poisson distributed and often sudden.
3. Wear-out failures: l( t ) increases with time; failures in this period are attribut-
able to aging, wear-out, fatigue, etc. (e. g. corrosion or electromigration).
Early failures are not deterministic and appear in general randomly distributed in
time and over the items. During the early failure period, l(t ) must not necessarily
decrease as in Fig. 1.2, in some cases it can oscillate. To eliminate early failures,
1.2 Basic Concepts 7

λ (t)

θ2 > θ1

θ1

λ
1. 2. 3.

t
0

Figure 1.2 Typical shape for the failure rate of a large population of statistically identical and inde-
pendent (nonrepairable) items (dashed is a possible shift for a higher stress, e. g. ambient temperature)

burn-in or environmental stress screening is used (Chapter 8). However,


early failures must be distinguished from defects and systematic failures,
which are present at t = 0 , deterministic, caused by errors o r mistakes,
and whose elimination requires a change in design, production process,
operational procedure, documentation or other.
Length of early failure period varies in practice from few h to some 1'000 h . The
presence of a period with constant (or nearly so) failure rate λ (t ) ≈ λ is realistic for
many equipment & systems, and useful for calculations. The memoryless property,
which characterizes this period, leads to exponentially distributed failure-free times
and to time-homogeneous Markov processes for the time behavior of repairable sys-
tems if also constant repair rates can be assumed (Chapter 6). An increasing failure
rate after a given operating time ( > 10 years for many electronic equipment) is typi-
cal for most items and appears because of degradation phenomena due to wear-out.
A possible explanation for the shape of λ( t ) given in Fig. 1.2 is that the popu-
lation contains n p f weak elements and n (1 − p f ) good ones. The distribution of
the failure-free time can then be expressed by F(t ) = p f F1 (t ) + (1 − p f ) F2 (t ) , where
F1 ( t ) can be a gamma (β <1) and F 2 ( t ) a shifted Weibull (β >1) distribution (Eqs.
(A6.34), (A6.96), (A6.97)), see also pp. 350, 367 & 483 for alternative possibilities.
The failure rate strongly depends upon the item's operating conditions, see e. g.
Figs. 2.4 - 2.6 and Table 2.3. Typical figures for λ are 10 −10 to 10 −7 h −1 for
electronic components at 40°C , doubling for a temperature increase of 10 to 20°C .
From Eqs. (1.3) - (1.5) one recognizes that for an item new at t = 0 and δ t → 0 ,
λ ( t ) δ t is the conditional probability for failure in (t , t +δ t ] given that the item has
not failed in (0 , t ] . Thus, λ( t ) is not a density per Eq. (A6.23) and must be clearly
distinguished from the density f (t ) of the failure-free time ( f( t ) δ t is the un-
conditional probability for failure in (t , t +δ t ] ), from the failure intensity z (t ) of an
arbitrary point process (Eqs. (A7.228)), and form the intensity h ( t ) or m ( t ) of a re-
newal or Poisson process (Eqs. (A7.24), (A7.193)); this also for the case of a homo-
geneous Poisson process, see pp. 390 - 91, 442, 482, 540 for further considerations.
The concept of failure rate applied to human, yields a shape as in Fig. 1.2.
8 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

1.2.4 Maintenance, Maintainability


Maintenance defines the set of actions performed on the item to retain it in or to
restore it to a specified state. Maintenance deals thus with preventive maintenance,
carried out at predetermined intervals e. g. to reduce wear-out failures, and corrective
maintenance, carried out at failure and intended to bring the item to a state in which
it can perform the required function. Aim of a preventive maintenance must also be
to detect & repair hidden failures & defects (e. g. undetected failures in redundant
elements). Corrective maintenance, also known as repair, includes detection, lo-
calization, correction, and checkout. By neglecting logistic & administrative delays,
repair is used in this book for restoration (pp. 113 & 387).
To simplify calculations, it is generally assumed that
the element (in the reliability block diagram) for which a maintenance
action has been performed, is as-good-as-new after maintenance (the
case of minimal repair is critically discussed on pp. 138 - 39, 535 -36);
this assumption is valid with respect to the state Z i considered (at system level)
if and only if Z i is a regeneration state (footnote on p. 386).
Maintainability is a characteristic of the item, expressed by the probability that
a preventive maintenance or a repair of the item will be performed within a stated
time interval for given procedures and resources (skill level of personnel, spare
parts, test facilities, etc.). From a qualitative point of view, maintainability can also
be defined as the ability of the item to be retained in or restored to a specified state.
The mean of the repair time is denoted by MTTR (mean time to repair (restoration)),
that of a preventive maintenance by MTTPM. Maintainability has to be built into
complex equipment & systems during design and development by realizing a
maintenance concept (Section 4.2). Due to the increasing maintenance cost, main-
tainability aspects have grown in importance. However, maintainability achieved in
the field largely depends on resources available for maintenance and on the correct
installation of the equipment or system, i. e. on logistic support and accessibility.

1.2.5 Logistic Support


Logistic support designates all actions undertaken to provide effective and eco-
nomical use of the item during its operating phase. To be effective, logistic support
should be integrated in the maintenance concept of the item, incl. after-sales service.
An emerging aspect related to maintenance and logistic support is that of
obsolescence management; i. e. how to assure functionality over a long operating
period (e. g. 20 years) when technology is rapidly evolving and components neces-
sary for maintenance are no longer manufactured. Care has to be given here to de-
sign aspects, to assure interchangeability during the equipment’s useful life without
important redesign (standardization has been started [1.5, 1.11, A2.6 (IEC 62402)]).
1.2 Basic Concepts 9

1.2.6 Availability
Availability is a broad term, expressing the ratio of delivered to expected service. It
is often designated by A and used for the asymptotic & steady-state value of the
point & average availability (PA = AA ) . Point availability (PA( t )) is a characteristic
of the item expressed by the probability that the item will perform its required func-
tion under given conditions at a stated instant of time t. From a qualitative point of
view, point availability can also be defined as the ability of the item to perform its
required function under given conditions at a stated instant of time (dependability).
Availability calculation is often difficult, as human aspects & logistic support
have to be considered in addition to reliability and maintainability (Fig. 1.3). Ideal
human aspects & logistic support are often assumed, yielding to the intrinsic
availability. In this book, availability is generally used for intrinsic availability.
Further assumptions for calculations are continuous operation and complete renewal
of the repaired element (assumed as-good-as-new after repair, see p. 386). In this
case, the point availability PA( t ) of the one-item structure rapidly converges to an
asymptotic & steady-state value, given by
PA = MTTF / (MTTF + MTTR ) . (1.10)
PA is also the asymptotic & steady-state value of the average availability ( AA ) giv-
ing the mean percentage time during which the item performs its required function
( PAS ( t ) , PA S & AA S are used on p. 185 for the one-item structure, with S for
system as per footnote on p. 2). For systems described by Markov or semi-Markov
processes, MUTS & MDTS are used instead of MTTF & MTTR (pp. 279, 393 - 94,
516, 525). Other availability measures can be defined, e. g. mission, work-mission,
overall availability (Sections 6.2.1.5, 6.8.2). Application specific figures are also
known, see e. g. [6.12]. In contrast to reliability analyses for which no failure at
system level is allowed (only redundant parts can fail and be repaired on-line),
availability analyses allow failures at system level.

1.2.7 Safety, Risk, Risk Acceptance


Safety is the ability of the item to cause neither injury to persons, nor unacceptable
consequences to material, environment or other, during its use. Following 2 aspects,
safety when the item functions and is operated correctly,
and safety when the item (or a part of it) has failed,
must be considered. The first, dealing with accident prevention, is covered by inter-
national standards; the 2nd, dealing with technical safety, is investigated in 5 steps,
identify critical failures & hazards, their causes, their effects, classify
effects, and investigate possibilities to avoid causes or mitigate effects,
using similar tools as for reliability (Sections 2.6, 6.10, 6.11). However, while
10 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

safety assurance examines measures which can bring the item in a safe state at
failure (fail-safe procedure), reliability assurance aims to minimize the number of
failures. Moreover, for technical safety, effects of external events (human errors,
natural catastrophes, attacks, etc.) are important and must be considered carefully
(Sections 6.10, 6.11). The safety level of the item also influences the number of
product liability claims. However, increasing in safety can reduce reliability.
Closely related to the concept of safety are those of risk, risk management, and
risk acceptance. Experience shows that risk problems are generally interdisciplinary
and have to be solved in close cooperation between engineers and sociolo-
gists, in particular, to find common solutions to controversial questions.
For risk evaluation, a weighting between probability of occurrence and effect (con-
sequence) of a given accident / disaster is often used, and the multiplicative rule is
one among other possibilities (see e. g. [1.3, 2.82]). Also it is necessary to consider
the different causes (machine, machine & human, human) and effects (location,
time, involved people, effect duration) of an accident / disaster. Statistical tools can
support risk assessment. However, although the behavior of a homogenous human
population is often known, experience shows that the reaction of a single person can
become unpredictable. Similar difficulties also arise in the evaluation of rare events
in complex systems. Risk analysis and risk mitigation are generally performed with
tools given in Sections 2.6, 6.8, 6.9; see also Sections 6.10 & 6.11 for new models.
Basically, considerations on risk and risk acceptance should take into account
that the probability p1 for a given accident / disaster which can be caused by one of
n statistically identical and independent items (systems), each of them with
occurrence probability p, is for n p small ( n → ∞ , p → 0 ) nearly equal to n p as per
p1 = n p (1 − p ) n−1 ≈ n p e − n p ≈ n p (1 − n p ) ≈ n p . (1.11)
Equation (1.11) follows from the binomial distribution and Poisson approximation
(Eqs. (A6.120), (A6.129)). It also applies with n p = λ tot T to the case in which one
assumes that the accident / disaster occurs randomly in the interval ( 0, T ] , caused by
one of n independent items (systems) with failure rates λ1 ,…, λn (λ tot = λ1 +…+ λn ).
This is because the sum of n independent Poisson processes is again a Poisson proc-
ess (Eq. (7.27)) and the probability λ tot T e − λ tot T for one failure in the interval ( 0, T ]
is nearly equal to λ tot T . Thus, for n p << 1 or λ tot T << 1 it holds that
p1 ≈ n p ≈ ( λ1 + …+ λ n ) T . (1.12)
Also by assuming a reduction of the individual occurrence probability p (or of λ i ),
one recognizes that in the future it will be necessary either
to accept greater risks p1 or to keep the spread of high-risk technologies
under tighter control, similar for environmental stresses caused by mankind.
Aspects of ecologically acceptable production, use, disposal & recycling of products
should become subject for international regulations (sustainable development).
1.2 Basic Concepts 11

In the context of a product development, risks related to feasibility and time to


market within the given cost constraints must also be considered during all devel-
opment phases (feasibility checks in Fig. 1.6 and Tables A3.3 & 5.3).
Risk management for repairable systems is considered carefully in Section 6.11,
human reliability in Section 6.10. However, mandatory for risk management are
also psychological aspects related to risk awareness and safety communication.
As long as a danger for risk is not perceived, people often do not react. Knowing
that a safety behavior presupposes a risk awareness, communication is an important
tool to avoid that the risk related to a system will be underestimated.

1.2.8 Quality
Quality is understood as the degree to which a set of inherent characteristics fulfills
specified or expected requirements. This definition, given also in the ISO 9000: 2000
family [A1.6], follows closely the traditional definition of quality, expressed by
fitness for use, and applies to products and services as well.

1.2.9 Cost and System Effectiveness


All previously introduced concepts are interrelated. Their relationship is best shown
through the concept of cost effectiveness, as given in Fig. 1.3. Cost effectiveness is
a measure of the ability of the item to meet a service demand of stated quantitative
characteristics, with the best possible usefulness to life-cycle cost ratio. It is often
referred also to as system effectiveness. Figure 1.3 deals essentially with technical
and cost aspects. Some management aspects are considered in Appendices A2 - A5 .
From Fig. 1.3, one recognizes the central role of quality assurance, bringing
together all assurance activities (Section 1.3.3), and of dependability (collective
term for availability performance and its influencing factors).
As shown in Fig. 1.3, life-cycle cost (LCC) is the sum of cost for acquisition,
operation, maintenance, and disposal of the item. For complex systems, higher
reliability leads in general to higher acquisition cost and lower operating cost, so
that the optimum of life-cycle cost seldom lies at extremely low or high reliability
figures. For such a system, per year operating & maintenance cost often exceeds
10% of acquisition cost, and experience shows that up to 80% of the life-cycle cost is
frequently generated by decisions early in the design phase. To be complete, life-
cycle cost should also take into account current and deferred damage to the
environment caused by production, use, and disposal of the item. Life-cycle cost
optimization falls within the framework of cost effectiveness or systems engineering.
It can be positively influenced by concurrent engineering [1.16, 1.22]. Figure 1.4
shows an example of the influence of the attainment level of quality and reliability
targets on the sum of quality and reliability assurance cost for two systems with
different mission profiles [2.2 (1986)], see Example 1.1 for an introduction.
12 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

Example 1.1
An assembly contains n independent components each with a defective probability p . Let ck be
the cost to replace k defective components. Determine (i) the mean (expected value) C(i ) of the
total replacement cost (no defective components are allowed in the assembly) and (ii) the mean
of the total cost (test and replacement) C (ii ) if the components are submitted to an incoming
inspection which reduces defective percentage from p to p0 (test cost ct per component).
Solution
(i) The solution makes use of the binomial distribution (Appendix A6.10.7) and question (i) is
also solved in Example A6.19 on p. 466. The probability of having exactly k defective
components in a sample of size n is given by (Eq. (A6.120))

pk = () n
k
p k ( 1 − p) n − k . (1.13)

The mean C(i ) of the total cost (deferred cost) caused by the defective components follows
then from the weighted sum

()
n n n
C(i ) = ∑ ck pk = ∑ ck p k ( 1 − p) n − k . (1.14)
k =1 k =1 k

(ii) To the cost caused by the defective components, calculated from Eq. (1.14) with p0 instead
of p, one must add the incoming inspection cost n ct

()
n n
C (ii ) = n ct + ∑ ck p0k ( 1 − p0 ) n − k . (1.15)
k =1 k

The difference between C(i ) and C(ii ) gives the gain (or loss) obtained by introducing the incom-
ing inspection, allowing thus a cost optimization (see also Section 8.4 for a deeper discussion).

Using Eq. (A7.42) instead of (A6.120), similar considerations to those in


Example 1.1 yield for the mean (expected value) of the total repair cost C cm t
during the cumulative operating time T of an item with failure rate λ and cost ccm
per repair
Ccm t = λ T ccm , (1.16)

with λ T = mean value of the number of failures during T (Eq. (A7.42)).


From the above considerations, the following equation expressing the mean C
of the sum of the cost for quality assurance and for the assurance of reliability,
maintainability, and logistic support of a series structure during an operating time T
can be obtained
T
C = Cq + Cr + Ccm + Cpm + Cl + ccm + (1 − OAS ) T coff + nd cd . (1.17)
MUTS

Thereby, q is used for quality, r for reliability, cm for corrective maintenance, pm


for preventive maintenance, l for logistic support, off for down time & d for defects.
1.2 Basic Concepts 13

Cost Effectiveness
(System Effectiveness)

Life-Cycle Operational Effectiveness


Cost (LCC)

Operational Availability Safety and Risk


Capability Management
Operation, Maintenance

(Dependability)

Intrinsic
Availability
Acquisition

Disposal

Damage to Environment
Damage to Property
Injury to Persons
Logistic Support
Maintainability

Human Factors

Useful Life
Reliability

Cost Effectiveness Assurance


(System Effectiveness Assurance)

Safety and
Capability and Quality Maintain-
Reliability ability
Human- Logistic
Life-Cycle Assurance
Engineering Engineering
Factors Support
Cost (Hardw.& Softw.)
Engineering
• Design, develop- • Configuration • Reliability • Maintainability • Safety targets • Maintenance
ment, evaluation management targets targets • Design guide- concept
• Production • Quality testing • Required • Maintenance lines (incl. hu- • Customer/User
(hardware) (incl. reliability, function concept man aspects) documentation
• Cost analyses maintainability, • Environm. cond. • Partitioning • Safety analysis • Spare parts
(Life-cycle costs, and safety tests) • Parts & materials in LRUs (FMEA/FMECA, provisioning
VE, VA) • Quality control • Derating • Faults detection FTA, etc.) • Tools and test
during produc- • Screening and localization • Risk manage- equipment for
tion (hardware) • Redundancy • Design ment maintenance
• Quality data • FMEA, FTA, etc. guidelines • Design reviews • After sales
reporting system • Design • Maintainability service
• Software guidelines analysis
quality • Rel. block diagr. • Design reviews
• Rel. prediction
• Design reviews

Figure 1.3 Cost Effectiveness (System Effectiveness) for complex equipment & systems with high
quality and reliability (RAMS) requirements (see Appendices A1 - A5 for definitions & management
aspects, and pp. 395 & 404 for the concept of product assurance as used in space & railway fields;
dependability can be used instead of operational availability, for a qualitative meaning)
14 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

MUTS and OA S are the system mean up (operating) time between failures, assumed
here = 1 / λ , +) and the system steady-state overall availability (Eq. (6.196) with Tpm
instead of TPM ). T is the total system operating time (useful life) and nd the number
of hidden defects discovered in the field. C q , C r , C cm , C pm & C l are the cost for
quality assurance and for the assurance of reliability, repairability, serviceability,
and logistic support, respectively. ccm , coff , and cd are the cost per repair, per hour
down time, and per hidden defect (preventive maintenance cost are scheduled cost,
considered here as a part of Cpm ). The first five terms in Eq. (1.17) represent a part
of the acquisition cost, the last three terms are deferred cost occurring during field
operation. ++) A model for investigating the cost C according to Eq. (1.17) was de-
veloped in [2.2 (1986)], by assuming Cq , C r , Ccm , C pm , C l , MUTS , OA S , T , ccm ,
coff , cd , and nd as parameters and investigating the variation of the total cost given
by Eq. (1.17) as a function of the level of attainment of the specified targets, i. e. by
introducing the variables gq = QA /QA g , gr = MUTS / MUTSg , gcm = MDTSg / MDTS ,
g pm = MTTPM Sg / MTTPM S , and gl = MLDSg / MLDS , where the subscript g denotes the
specified target for the corresponding quantity. A power relationship
mi
C i = Cig gi (1.18)

was assumed between the actual cost C i , the cost C ig to reach the specified target
(goal) of the considered quantity, and the level of attainment of the specified target
( 0 < m l < 1 , other m i >1). The following relationship between number of hidden
defects discovered in the field and ratio Cq / Cqg was also included in the model
1 1
nd = −1= − 1. (1.19)
(Cq / Cqg ) m d gqm q m d

The final equation for the cost C as function of the variables gq , gr , gcm , g pm , and
gl follows then as (using Eq. (6.196) for OA S ) +)
mq m T c cm
+ C rg grm r + C cmg gcm
m cm
C = C qg gq + C pmg g pmpm + C lg glm l +
gr MUTSg
1 1
+ (1 − ) T coff + ( m m − 1) cd . + +)
1 MDTSg 1 MLDSg MTTPM Sg gq q d
1+ ⋅ + ⋅ +
gr gcm MUTSg gr gl MUTSg g pm T pm (1.20)
The relative cost C / Cg given in Fig. 1.4 is obtained by dividing C by the value Cg
from Eq. (1.20) with all gi =1. Extensive analyses with different values for m i , C ig ,
MUTSg , MDTSg , MLDSg , MTTPMSg , T pm , T , ccm , coff , and cd , have shown that
the value C / Cg is only moderately sensitive to the parameters m i .
__________________
+) Equations (1.17) and (1.20) hold for series structures; for systems with redundancy, elaboration is
more laborious and can be performed taking care of the remarks given on pp. 121 - 24.
++) If repair cost differ for each element, T ∑ λ c
i cmi can be used instead of T ccm / MUTS = λ T ccm .
1.2 Basic Concepts 15

Rel. cost C/Cg Rel. cost C/Cg

5 5 C/Cg = f(gq)
C/Cg = f(gq)
4 4

3 3

2 2
C/Cg = f(gr) C/Cg = f(gr)

1 1

gq , gr gq , gr
0 0.5 1 1.5 2 0 0.5 1 1.5 2

Figure 1.4 Basic shape of the relative cost C / Cg per Eq. (1.20) as function of gq = QA / QAg and
g r = MUTS / MUTSg (quality assurance and reliability assurance as in Fig. 1.3) for two complex sys-
tems with different mission profiles (the specified targets gq =1 and gr =1 are dashed)

1.2.10 Product Liability


Product liability is the onus on a producer or others to make restitution for loss
related to personal injury, property damage, or other harm caused by the product.
The manufacturer has to specify a safe operational mode for the product in the user
documentation. In legal documents related to product liability, the term product
often indicates hardware only and the term defective product is in general used
instead of defective or failed product. Responsible in a product liability claim are
all those people involved in the design, production, sale, and maintenance of the
product (item), inclusive suppliers. Often, strict liability is applied, and the
manufacturer has to demonstrate that the product was free from defects when it
left the production line (see footnote on p. 1). This holds in the USA and
increasingly in Europe [1.10]. However, in Europe the causality between damage
and defect has still to be demonstrated by the user (see p. 395 for further
considerations).
The rapid increase of product liability claims (alone in the USA, 50,000 in
1970 and over one million in 1990) cannot be ignored by manufacturers.
Although such a situation has probably been influenced by the peculiarity of US
legal procedures, configuration management, safety analysis (in particular causes-
to-effects analysis like FMEA / FMECA, FTA, ETA) & considerations on risk man-
agement, are important to increase safety and avoid product liability claims
(see Sections 1.2.7, 2.6, 5.2.5, 6.9 and Appendix A.3.3 for general considerations,
and Sections 6.10 & 6.11 for new models for human reliability & risk management).
16 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

1.2.11 Historical Development


Methods and procedures for quality assurance and reliability (RAMS) engineering
have been developed extensively over the last 60 years. For indicative purpose,
Table 1.1 summarizes major steps of this development and Fig. 1.5 shows the
approximate distribution of the effort between quality assurance and reliability
(RAMS) engineering during the same period of time. Because of the rapid progress
of microelectronics, considerations on redundancy, fault-tolerance, test strategy,
and software quality gain in importance. A skillful, allegorical presentation of
the story of reliability is in [1.25], see Appendix A2.4 for some considerations.

Table 1.1 Historical development of quality assurance and reliability (RAMS) engineering

before 1940 Quality attributes & characteristics are defined. In-process & final tests are carried out,
usually in the production area. The concept of quality of manufacture is introduced.
1940 - 50 Defects (nonconformities) and failures are systematically collected and analyzed.
Corrective actions are carried out. Statistical quality control is developed. It is re-
cognized that quality must be built into an item, quality of design becomes important.
1950 - 60 Quality assurance is recognized as a means for developing and manufacturing an
item with a specified quality level. Preventive measures (actions) are added to tests
and corrective actions. It is recognized that correct short-term functioning does not
also signify reliability. Design reviews and systematic analysis of failures (failure
data and failure modes / mechanisms), performed often in the research & development
area, lead to important reliability improvements.
1960 - 70 Difficulties with respect to reproducibility and change control, as well as interfacing
problems during the integration phase, require a refinement of the concept of
configuration management. Reliability engineering is recognized as a means for
developing and manufacturing an item with specified reliability. Reliability predic-
tion, estimation & demonstration tools are developed. It is recognized that reliability
cannot easily be demonstrated at an acceptance test. Instead of a reliability figure
( λ or MTBF =1 / λ ) , contractual requirements are for a reliability assurance program.
Maintainability, availability, and logistic support become important.
1970 - 80 Due to the increasing complexity and cost for maintenance of equipment and sys-
tems, the aspects of man-machine interface and life-cycle cost become important.
Customers require demonstration of reliability and maintainability during the
warranty period. Quality and reliability assurance activities are made project specific
and carried out in close cooperation with all engineers involved in a project.
Concepts like product assurance, cost effectiveness and systems engineering are intro-
duced. Product liability and human reliability become important.
1980 - 90 Testability is required. Test and screening strategies are developed to reduce testing
cost and warranty services. Because of the rapid progress in microelectronics, greater
possibilities are available for redundant and fault tolerant structures. Software
quality becomes important.
after 1990 The necessity to further shorten the development time leads to the concept of concur-
rent engineering. Total Quality Management appears as a refinement to Quality As-
surance. RAMS is used for reliability, availability, maintainability & safety, reliability
engineering for RAMS engineering. Performance based contracts are stipulated for
systems with high RAMS requirements. Faced increasing safety and sustainability
problems, risk management and ethic aspects become important.
1.3 Basic Tasks & Rules for Quality & Reliability (RAMS) Assurance of Complex Eq. & Systems 17

Relative effort [%]

100
Systems engineering (part)
Reliability (RAMS)
engineering Fault causes / modes /
effects / mechanisms analysis
75
Reliability (RAMS) analysis

50 Software quality

Quality assurance
Configuration management
25
Quality testing, Quality control,
Quality data reporting system
0 Year
1950 1970 1990 2010

Figure 1.5 Approximate distribution of the effort between quality assurance and reliability (RAMS)
engineering for complex equipment & systems with high quality and reliability (RAMS) requirements

1.3 Basic Tasks & Rules for Quality and Reliability


(RAMS) Assurance of Complex Equip. & Systems
This section deals with some important considerations on the organization of quality
and reliability assurance in the case of complex repairable equipment and systems
with high quality and reliability requirements. In this context, the term reliability
appears for reliability, availability, maintainability, and safety (RAMS). This minor
part of the book aims to support managers in answering the question of how to
specify and realize high reliability (RAMS) targets for equipment and systems.
Refinements are in Appendix A3 for complex equipment and systems for which
tailoring is not mandatory, with considerations on quality management and total
quality management (TQM) as well. As a general rule, quality assurance and relia-
bility (RAMS) engineering must avoid bureaucracy, be integrated in project activi-
ties, and support quality management & concurrent engineering efforts, as per TQM .

1.3.1 Quality and Reliability (RAMS) Assurance Tasks


Experience shows that
besides the prevention of defects and systematic failures, which remains
a main task of quality assurance, development and production of complex
repairable equipment and systems with high reliability (RAMS) targets
requires specific activities during all life-cycle phases of the item.
Figure 1.6 shows the life-cycle phases and Table 1.2 give the main tasks for quality &
reliability (RAMS) assurance. Depicted in Table 1.2 is the period of time over which
tasks have to be performed. Within a project, tasks of Table 1.2 must be refined in a
project-specific quality and reliability (RAMS) assurance program (Appendix A3).
18 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

Table 1.2 Main tasks for quality and reliability (RAMS) assurance of complex equipment & systems
with high quality and reliability requirements (the bar height is a measure of the relative effort)

Specific during

Project-independent

Design & Devel.


Main tasks for quality and reliability (RAMS) assurance of
complex equipment and systems, conforming to TQM

Conception

Production
Evaulation
Definition
(see Table A3.2 for greater details and a possible task assignment;
(software quality appears in tasks 4, 8-11, 14-16, see also Section 5.3)

Use
1. Customer and market requirements

2. Preliminary analyses

3. Quality and reliability aspects in specs, quotations, contracts, etc.

4. Quality and reliability (RAMS) assurance program

5. Reliability and maintainability analyses

6. Safety and human aspects/factors analyses

7. Selection and qualification of components and materials

8. Supplier selection and qualification

9. Project-dependent procedures and work instructions

10. Configuration management

11. Prototype qualification tests

12. Quality control during production

13. In-process tests

14. Final and acceptance tests

15. Quality data reporting system

16. Logistic support

17. Coordination and monitoring (quality and reliability (RAMS))

18. Quality costs

19. Concepts, methods, general procedures (quality & reliability (RAMS))

20. Motivation and training (quality and reliability (RAMS))


1.3 Basic Tasks & Rules for Quality & Reliability (RAMS) Assurance of Complex Eq. & Systems 19

Conception, Definition, Production


Design, Development, Evaluation (Manufacturing) Use

Preliminary Definition, Design, Pilot Series Installation,


study, Full development, production production Operation
Conception Prototype qualification

• Idea, market • Feasibility check • Feasibility check • Feasibility check • Series item

Disposal, Recycling
requirements • System • Revised system • Production • Customer
• Evaluation of specifications specifications documentation documentation
delivered • Interface definition • Qualified and • Qualified produc- • Logistic support
equipment • Proposal for the released tion processes plan
and systems design phase prototypes • Qualified and • Spare part
• Proposal for • Technical released first provisioning
preliminary documentation series item • Risk manage-
study • Proposal for pilot • Proposal for series ment plan
production production

Figure 1.6 Basic life-cycle phases of complex equipment and systems (the output of a given
phase is the input to the next phase), see Tab. 5.3 on p. 161 for software

1.3.2 Basic Quality and Reliability (RAMS) Assurance Rules


Performance, dependability, cost, and time to market are key factors for today's
products and services. However, facing increasing sustainability problems, ethic
aspects have to be considered (Appendix A2.4). Basic rules for a suitable quality
and reliability (intended as RAMS , i. e. reliability, availability, maintainability &
safety) assurance, optimized also with respect to cost and time schedule, can thus
be summarized as follows:
1. Quality and reliability (RAMS) targets should be so high as to satisfy
real customer needs,
→ build products satisfying real customer's needs and expectations,
while respecting sustainability during the whole product's life cycle.
2. Activities for quality & reliability (RAMS) assurance should be performed con-
tinuously throughout all project phases, from definition to operating phase,
→ don't change the project manager before ending the pilot production.
3. Activities must be performed in close cooperation between all engineers
involved in the project (Table A3.2),
→ use TQM and concurrent engineering approaches.
4. Quality and reliability (RAMS) assurance activities should be monitored by a
central quality & reliability assurance department (Q & RA) which cooperates
actively in all project phases (Fig. 1.7 and Table A3.2)
→ establish an efficient and independent quality & reliability assurance
department (Q & RA) active in the projects.
20 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

Figure 1.7 shows a basic organization which can embody the above rules and sat-
isfy requirements of quality management standards (Appendix A2). As shown in
Table A3.2, the assignment of quality and reliability (RAMS) assurance tasks should
be such, that every engineer in a project bears his / her own responsibilities (as per
TQM). So, for instance, a design engineer should be responsible for all aspects of
his / her own product (e. g. an assembly) including reliability, maintainability and
safety aspects, the production department should be able to manufacture and test
such an item within its own competence, and the quality and reliability (RAMS)
assurance department (Q & RA in Fig. 1.7) should be responsible,
• within a project, for the
• formulation of preliminary quality and reliability (RAMS) targets,
• preparation of guidelines & working documents (quality & rel.(RAMS) aspects),
• coordination of activities belonging to quality & reliability (RAMS) assurance,
• reliability (RAMS) analyses at system level (footnote on p. 2),
• planning and evaluation of qualification, testing and screening of components and
material (quality and reliability (RAMS) aspects),
• release of manufacturing processes (quality and reliability (RAMS) aspects),
• operation of the quality data reporting system (Fig. 1.8),
• acceptance testing with customers;

• at company level, for the


• preparation and updating of the company's Quality Assurance Handbook,
• development and operation of the quality data reporting system (Fig. 1.8),
• preparation of periodic reports on quality cost and quality & reliability (RAMS)
level of delivered products;
see Tab. A3.2 (pp. 410 -13) for a possible task assignment. This central quality &
reliability (RAMS) department should not be too small (credibility) nor too large
(sluggishness).

Management

Marketing R&D Production Central Q&RA

QC

Figure 1.7 Basic organizational structure for quality and reliability (RAMS) assurance i n a company
producing complex equipment and systems with high quality and reliability (RAMS) requirements
(connecting lines indicate close cooperation; A denotes assurance, C control and tests during pro-
duction, Q quality, R reliability (RAMS))
1.3 Basic Tasks & Rules for Quality & Reliability (RAMS) Assurance of Complex Eq. & Systems 21

1.3.3 Elements of a Quality Assurance System


As stated in Sections 1.3.1, many of the tasks associated with quality assurance
(in the sense of quality management as per TQM) are interdisciplinary. In order
to have a minimum impact on cost and time schedules, their solution requires the
concurrent efforts (close cooperation) between all engineers involved in a project.
To improve coordination, it is useful to group as follows the quality assurance
activities (see also Fig. 1.3 and Appendix A3.3) :
1. Configuration Management: Procedure to specify, describe, audit & release the
configuration of the item, as well as to control it during modifications or
changes. Configuration management is an important tool for quality assurance.
It can be subdivided into configuration identification, auditing (design
reviews), control, and accounting (Appendix A3.3.6).
2. Quality Tests: Tests to verify whether the item conforms to specified require-
ments. Quality tests include incoming inspections, as well as qualification
tests, production tests, and acceptance tests. They also cover reliability, main-
tainability, safety, and software aspects. To be cost effective, quality tests
must be coordinated and integrated into a test and screening strategy.
3. Quality Control During Production: Control (monitoring) of the production
processes and procedures to reach a stated quality of manufacturing.
4. Quality Data Reporting System (FRACAS): A system to collect, analyze &
correct all defects and failures (faults) occurring during tests of the item,
as well as to evaluate and feedback the corresponding quality and reliability
(RAMS) data (Fig. 1.8, p. 22, Appendix A5). Such a system is generally com-
puter-aided. Analysis must be traced to the cause, to avoid repetition o f the
same problem, and be pursued at least during the warranty period.
5. Software quality: Procedures and tools to specify, develop, and test software
(appears in tasks 4, 8 -11, 14 -16 of Tables 1.2 on p. 18 and A3.2 on pp. 410 -13,
see Section 5.3 (pp. 159 - 68) for practical considerations).
Configuration management spans from the definition up to the operating phase
(Appendices A3 & A4). Quality tests encompasses technical and statistical aspects
(Chapters 3, 7, and 8). The concept of a quality data reporting system is depicted
in Fig. 1.8 (see Appendix A5 for basic requirements). Table 1.3 shows an example
of data reporting sheets for PCBs evaluation.
The quality and reliability (RAMS) assurance system must be described in a com-
pany's Quality Assurance Handbook supported by the company management. For
a company producing complex equipment and systems with high quality & reliability
(RAMS) requirements, a possible content is: • General, • Project Organization,
• Quality Assurance (Management) system, • Quality & Reliability (RAMS) Assur-
ance Program, • Reliability Engineering, • Maintainability Eng., • Safety & Human
Eng., • Software Quality Assurance, • Logistic Support, • Motivation & Training.
Internal feedback loop in the production
Information
feedback

Preliminary Full scale Prototype Acceptance Use / Company


study Definition Design development qualifikation Production tests Operation management
(targets)
Project
management
Customer

Processing

Defects, Storage
Failures

Compression

Collection

Long feedback loop Analysis Mode, Cause, Effect, Mechanism


(preventive
measures)
Short term
Medium feedback loop Actions / Measures
(corrective actions and Long term
preventive measures)

Short feedback loop


(corrective actions)

Figure 1.8 Basic concept for a quality data reporting system


22 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems
1.3 Basic Tasks & Rules for Quality & Reliability (RAMS) Assurance of Complex Eq. & Systems 23

Table 1.3 Example of data reporting sheets for PCBs (populated printed circuit boards) evaluation,
from a quality data reporting system (see also footnote on p. 1)

a) Defects and failures (faults) at PCB level


Period: . . . .
No. of PCBs Rough classification No. of faults Measures Cost
PCB tested with % assem- sol- board com- total per short long pro- Q A other
faults bling dering ponent PCB term term duction areas

b) Defects and failures (faults) at component level


Period: . . . . PCB: . . . . No. of PCBs: . . . .

Compo- Manufac- No. of components Number of No. of faults per place of occurrence
nent turer Same Same faults % incoming in-process final test warranty
type application inspection test period

c) Cause analysis for defects and failures (faults) due to components


Period: . . . .

Cause Percent defective (%) Failure rate (10 −9 h −1 ) Measures


Compo-
nent PCB sys- inherent not iden- observed predicted observed predicted short long
tematic failure tified term term

d) Correlation between components and PCBs


Period: . . . .
PCB
Com-
ponent
24 1 Basic Concepts, Quality and Reliability (RAMS) Assurance of Complex Equipment & Systems

1.3.4 Motivation and Training


A cost effective quality and reliability (RAMS) assurance / management can be
achieved if every engineer involved in a project is made responsible for his / her
assigned activities (e. g. as per Table A3.2). Figure 1.9 shows a comprehensive,
practice oriented, motivation and training program in a company producing com-
plex equipment and systems with high quality and reliability (RAMS) requirements.

Basic training

Title: Quality Management and


Reliability (RAMS) Engineering
Aim: Introduction to tasks, methods, and
organization of the company's quality
and reliability (RAMS) assurance
Participants: Top and middle management, project
managers, selected engineers
Duration: 4 h (seminar with discussion)
Documentation: ca. 30 pp.

Advanced training Advanced training

Title: Methods of Reliability (RAMS) Title: Reliability (RAMS) Engineering


Engineering Aim: Learning the techniques used
Aim: Learning the methods used in in reliabilty (RAMS) engineering
reliabiliy (RAMS) assurance (applications oriented and
Participants: Project managers, engineers company specific)
from marketing & production, Participants: Design engineers, Q&RA
selected engineers from specialists, selected engineers
development from marketing and production
Duration: 8 h (seminar with discussion) Duration: 24 h (course with exercices)
Document.: ca. 40 pp. Document.: ca. 150 pp.

Special training

Title: Special Topics*


* Examples: Statistical Quality Control, Aim: Learning special tools and
Test and Screening Strategies, Software techniques
Quality, Testability, Reliability and Avail- Participants: Q&RA specialists, selected
ability of Complex Repairable Systems, engineers from development
Fault Tolerant Systems with Hardware and and production
Software, Mechanical Reliability, Failure Duration: 4 to 16 h per topic
Mechanisms and Failure Analysis, etc.
Document.: 10 to 20 pp. per topic

Figure 1.9 Example for a practical oriented training and motivation program in a company
producing complex equipment and systems with high quality and reliability (RAMS) requirements

You might also like