100% found this document useful (4 votes)
4K views459 pages

Reliability Engineering Methods and Applications

Uploaded by

Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (4 votes)
4K views459 pages

Reliability Engineering Methods and Applications

Uploaded by

Mohammad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 459

Reliability Engineering

Advanced Research in Reliability and System Assurance Engineering


Series Editor: Mangey Ram, Professor, Graphic Era (Deemed to be University),
Dehradun, India

Modeling and Simulation Based Analysis in Reliability Engineering


Edited by Mangey Ram

Reliability Engineering
Theory and Applications
Edited by Ilia Vonta and Mangey Ram

System Reliability Management


Solutions and Technologies
Edited by Adarsh Anand and Mangey Ram

Reliability Engineering
Methods and Applications
Edited by Mangey Ram

For more information about this series, please visit: https:// www.crcpress.com/
Reliability-Engineering-Theory-and-Applications/Vonta-Ram/p/book/9780815355175
Reliability Engineering
Methods and Applications

Edited by
Mangey Ram
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2020 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-138-59385-5 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts
have been made to publish reliable data and information, but the author and publisher cannot assume
responsibility for the validity of all materials or the consequences of their use. The authors and publishers
have attempted to trace the copyright holders of all material reproduced in this publication and apologize
to copyright holders if permission to publish in this form has not been obtained. If any copyright material
has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans-
mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereaf-
ter invented, including photocopying, microfilming, and recording, or in any information storage or
retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.
com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and
registration for a variety of users. For organizations that have been granted a photocopy license by the
CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.

Library of Congress Cataloging‑in‑Publication Data

Names: Ram, Mangey, editor.


Title: Reliability engineering : methods and applications / edited by Mangey Ram.
Other titles: Reliability engineering (CRC Press : 2019)
Description: Boca Raton, FL : CRC Press/Taylor & Francis Group, 2018.
Series: Advanced research in reliability and system assurance engineering | Includes
bibliographical references and index.
Identifiers: LCCN 2019023663 (print) | LCCN 2019023664 (ebook) | ISBN
9781138593855 (hardback) | ISBN 9780429488009 (ebook)
Subjects: LCSH: Reliability (Engineering)
Classification: LCC TA169 .R439522 2019 (print) | LCC TA169 (ebook) | DDC
620/.00452--dc23
LC record available at https://lccn.loc.gov/2019023663

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents
Preface......................................................................................................................vii
Acknowledgments......................................................................................................ix
Editor.........................................................................................................................xi
Contributors............................................................................................................ xiii

Chapter 1 Preventive Maintenance Modeling: State of the Art.............................1


Sylwia Werbińska-Wojciechowska

Chapter 2 Inspection Maintenance Modeling for Technical Systems:


An Overview....................................................................................... 41
Sylwia Werbińska-Wojciechowska

Chapter 3 Application of Stochastic Processes in Degradation Modeling:


An Overview....................................................................................... 79
Shah Limon, Ameneh Forouzandeh Shahraki,
and Om Prakash Yadav

Chapter 4 Building a Semi-automatic Design for Reliability Survey with


Semantic Pattern Recognition........................................................... 107
Christian Spreafico and Davide Russo

Chapter 5 Markov Chains and Stochastic Petri Nets for Availability and
Reliability Modeling......................................................................... 127
Paulo Romero Martins Maciel, Jamilson Ramalho Dantas,
and Rubens de Souza Matos Júnior

Chapter 6 An Overview of Fault Tree Analysis and Its Application in Dual


Purposed Cask Reliability in an Accident Scenario......................... 153
Maritza Rodriguez Gual, Rogerio Pimenta Morão, Luiz
Leite da Silva, Edson Ribeiro, Claudio Cunha Lopes,
and Vagner de Oliveira

Chapter 7 An Overview on Failure Rates in Maintenance Policies.................. 165


Xufeng Zhao and Toshio Nakagawa

v
vi Contents

Chapter 8 Accelerated Life Tests with Competing Failure Modes:


An Overview..................................................................................... 197
Kanchan Jain and Preeti Wanti Srivastava

Chapter 9 European Reliability Standards........................................................ 223


Miguel Angel Navas, Carlos Sancho, and Jose Carpio

Chapter 10 Time-Variant Reliability Analysis Methods for Dynamic


Structures.......................................................................................... 259
Zhonglai Wang and Shui Yu

Chapter 11 Latent Variable Models in Reliability............................................... 281


Laurent Bordes

Chapter 12 Expanded Failure Modes and Effects Analysis: A Different


Approach for System Reliability Assessment................................... 305
Perdomo Ojeda Manuel, Rivero Oliva Jesús, and Salomón
Llanes Jesús

Chapter 13 Reliability Assessment and Probabilistic Data Analysis of


Vehicle Components and Systems..................................................... 337
Zhigang Wei

Chapter 14 Maintenance Policy Analysis of a Marine Power Generating


Multi-state System............................................................................. 361
Thomas Markopoulos and Agapios N. Platis

Chapter 15 Vulnerability Discovery and Patch Modeling: State of the Art........ 401
Avinash K. Shrivastava, P. K. Kapur, and Misbah Anjum

Chapter 16 Signature Reliability Evaluations: An Overview of Different


Systems.............................................................................................. 421
Akshay Kumar, Mangey Ram, and S. B. Singh

Index....................................................................................................................... 439
Preface
The  theory, methods, and applications of reliability analysis have been developed
significantly over the last 60 years and have been recognized in many publications.
Therefore, awareness about the importance of each reliability measure of the system
and its fields is very important to a reliability specialist.
This book Reliability Engineering: Methods and Applications is a collection of
different models, methods, and unique approaches to deal with the different techno-
logical aspects of reliability engineering. A deep study of the earlier approaches and
models has been done to bring out better and advanced system reliability techniques
for different phases of the working of the components. Scope for future develop-
ments and research has been suggested.
The main areas studied follow under different chapters:
Chapter 1 provides the review and analysis of preventive maintenance modeling
issues. The discussed preventive maintenance models are classified into two main
groups for one-unit and multi-unit systems.
Chapter  2 provides the literature review on the most commonly used optimal
inspection maintenance mode using appropriate inspection strategy analyzing the
complexity of the system whether single or multi-stage system etc. depending on the
requirements of quality, production, minimum costs, and reducing the frequency of
failures.
Chapter 3 presents the application of stochastic processes in degradation modeling
to assess product/system performances. Among the continuous stochastic processes,
the Wiener, Gamma, and inverse Gaussian processes are discussed and applied for
degradation modeling of engineering systems using accelerated degradation data.
Chapter  4 presents a novel approach for analysis of Failure Modes and Effect
Analysis (FMEA)-related documents through a semi-automatic procedure involving
semantic tools. The aim of this work is reducing the time of analysis and improving
the level of detail of the analysis through the introduction of an increased number of
considered features and relations among them.
Chapter 5 studies the reliability and availability modeling of a system through
Markov chains and stochastic Petri nets.
Chapter  6 talks about the fault tree analysis technique for the calculation of
reliability and risk measurement in the transportation of radioactive materials.
This study aims at reducing the risk of environmental contamination caused due to
human errors.
Chapter 7 surveys the failure rate functions of replacement times, random, and
periodic replacement models and their properties for an understanding of the com-
plex maintenance models theoretically.
Chapter 8 highlights the design of accelerated life tests with competing failure
modes which give rise to competing risk analysis. This design helps in the prediction
of the product reliability accurately, quickly, and economically.

vii
viii Preface

Chapter 9 presents an analysis, classification, and orientation of content to encour-


age researchers, organizations, and professionals to use IEC standards as applicable
procedures and/or as reference guides. These standards provide methods and math-
ematical metrics known worldwide.
Chapter  10 discusses the time-variant reliability analysis methods for real-life
dynamic structures under uncertainties and vibratory systems having high nonlinear
performance. These methods satisfy the accuracy requirements by considering the
time correlation.
Chapter 11 presents a few reliability or survival analysis models involving latent
variables. The latent variable model considers missing information, heterogeneity of
observations, measurement of errors, etc.
Chapter 12 highlights the failure mode and effects analysis technique that esti-
mates the system reliability when the components are dependent on each other and
there is common cause failure as in redundant systems using the logical algorithm.
Chapter 13 provides an overview of the current state-of-the-art reliability assess-
ment approaches, including testing and probabilistic data analysis approaches, for
vehicle components and systems, vehicle exhaust components, and systems. The new
concepts include a fatigue S-N curve transformation technique and a variable trans-
formation technique in a damage-cycle diagram.
Chapter  14 is an attempt to develop a semi-Markov model of a ship’s electric
power generation system and use multi-state systems theory to develop an alterna-
tive aspect of maintenance policy, indicating the importance of the human capital
management relating to its cost management optimization.
Chapter 15 discusses the quantitative models proposed in the software security
literature called vulnerability discovery model for predicting the total number of
vulnerabilities detected, identified, or discovered during the operational phase of
the software. This work also described the modeling framework of the vulnerability
discovery models and vulnerability patching models.
Chapter  16 discusses the signature and its factor such as mean time to failure,
expected cost, and Barlow-Proschan index with the help of the reliability function
and the universal generating function also using Owen’s method for a coherent sys-
tem, which has independent identically, distributed elements.
Throughout this book, engineers and academician gain great knowledge and help
in understanding reliability engineering and its overviews. This book gives a broad
overview on the past, current, and future trends of reliability methods and applica-
tions for the readers.

Mangey Ram
Graphic Era (Deemed to be University), India
Acknowledgments
The  Editor acknowledges CRC Press for this opportunity and professional sup-
port. My special thanks to Ms. Cindy Renee Carelli, Executive Editor, CRC Press/
Taylor & Francis Group for the excellent support she provided me to complete this
book. Thanks to Ms. Erin Harris, Editorial Assistant to Mrs. Cindy Renee Carelli,
for her follow up and aid. Also, I would like to thank all the chapter authors and
reviewers for their availability for this work.

Mangey Ram
Graphic Era (Deemed to be University), India

ix
Editor
Dr. Mangey Ram received a PhD degree major in Mathematics and minor in
Computer Science from G. B. Pant University of Agriculture and Technology,
Pantnagar, India. He has been a Faculty Member for over 11 years and has taught
several core courses in pure and applied mathematics at undergraduate, postgradu-
ate, and doctorate levels. He is currently a Professor at Graphic Era (Deemed to be
University), Dehradun, India. Before joining Graphic Era, he was a Deputy Manager
(Probationary Officer) with Syndicate Bank for a short period. He is Editor-in-Chief
of International Journal of Mathematical, Engineering and Management Sciences
and the guest editor and member of the editorial board of various journals. He is
a regular reviewer for international journals, including IEEE, Elsevier, Springer,
Emerald, John Wiley, Taylor  & Francis, and many other publishers. He  has pub-
lished 150-plus research publications in IEEE, Taylor & Francis, Springer, Elsevier,
Emerald, World Scientific, and many other national and international journals of
repute and presented his works at national and international conferences. His fields
of research are reliability theory and applied mathematics. Dr. Ram is a Senior
Member of the IEEE, life member of Operational Research Society of India, Society
for Reliability Engineering, Quality and Operations Management in India, Indian
Society of Industrial and Applied Mathematics, member of International Association
of Engineers in Hong Kong, and Emerald Literati Network in the UK. He has been
a member of the organizing committee of a number of international and national
conferences, seminars, and workshops. He was conferred with the Young Scientist
Award by the Uttarakhand State Council for Science and Technology, Dehradun,
in 2009. He was awarded the Best Faculty Award in 2011; the Research Excellence
Award in 2015; and the Outstanding Researcher Award in 2018 for his significant
contribution in academics and research at Graphic Era (Deemed to be University)
in, Dehradun, India.

xi
Contributors
Misbah Anjum Kanchan Jain
Amity Institute of Information Department of Statistics
Technology Panjab University
Amity University Chandigarh, India
Noida, India
Rivero Oliva Jesús
Laurent Bordes Departamento de Engenharia Nuclear
Laboratory of Mathematics and its Universidade Federal do Rio de Janeiro
Applications—IPRA, UMR 5142 (UFRJ)
University of Pau and Pays Rio de Janeiro, Brazil
Adour—CNRS—E2S UPPA
Pau, France Salomón Llanes Jesús
GAMMA SA
Jose Carpio La Habana, Cuba
Department of Electrical, Electronic
and Control Engineering P. K. Kapur
Spanish National Distance Education Amity Centre for Interdisciplinary
University Research
Madrid, Spain Amity University
Noida, India
Jamilson Ramalho Dantas
Departamento de Ciência da Akshay Kumar
Computação Centro de Informática Department of Mathematics
da UFPE—CIN Recife Graphic Era Hill University
Pernambuco, Brasil Dehradun, India
and
Shah Limon
Departamento de Ciência da Industrial & Manufacturing
Computação Universidade Federal Engineering
do Vale do São Francisco— North Dakota State University
UNIVASF Campus Salgueiro Fargo, North Dakota
Salgueiro, Pernambuco, Brasil
Claudio Cunha Lopes
Maritza Rodriguez Gual Department of Reactor Technology
Department of Reactor Technology Service ( SETRE)
Service (SETRE) Centro de Desenvolvimento da
Centro de Desenvolvimento da Tecnologia Nuclear—CDTN
Tecnologia Nuclear—CDTN Belo Horizonte, Brazil
Belo Horizonte, Brazil

xiii
xiv Contributors

Paulo Romero Martins Maciel Vagner de Oliveira


Departamento de Ciência da Department of Reactor Technology
Computação Centro de Informática Service (SETRE)
da UFPE—CIN Recife Centro de Desenvolvimento da
Pernambuco, Brasil Tecnologia Nuclear—CDTN
Belo Horizonte, Brazil
Perdomo Ojeda Manuel
Instituto Superior de Tecnologías y Agapios N. Platis
Ciencias Aplicadas Department of Financial and
Universidad de La Habana (UH) Management Engineering
La Habana, Cuba University of the Aegean
Chios, Greece
Thomas Markopoulos
Department of Financial and Mangey Ram
Management Engineering Department of Mathematics; Computer
University of the Aegean Science & Engineering
Chios, Greece Graphic Era (Deemed to be University)
Dehradun, India
Rubens de Souza Matos Júnior
Coordenadoria de Informática Instituto Edson Ribeiro
Federal de Educação, Ciência e Centro de Desenvolvimento da
Tecnologia de Sergipe, IFS Lagarto Tecnologia Nuclear—CDTN
Sergipe, Brasil Belo Horizonte, Brazil

Rogerio Pimenta Morão Davide Russo


Department of Reactor Technology Department of Management,
Service (SETRE) Information and Production
Centro de Desenvolvimento da Engineering
Tecnologia Nuclear—CDTN University of Bergamo
Belo Horizonte, Brazil Bergamo, Italy

Toshio Nakagawa Carlos Sancho


Department of Business Administration Department of Electrical, Electronic
Aichi Institute of Technology and Control Engineering
Toyota, Japan Spanish National Distance Education
University
Miguel Angel Navas Madrid, Spain
Department of Electrical, Electronic
and Control Engineering Ameneh Forouzandeh Shahraki
Spanish National Distance Education Civil & Industrial Engineering
University North Dakota State University
Madrid, Spain Fargo, North Dakota
Contributors xv

Avinash K. Shrivastava Zhigang Wei


Department: QT, IT and Operations Tenneco Inc.
International Management Institute Grass Lake, Michigan
Kolkata, West Bengal, India
Sylwia Werbin′ska-Wojciechowska
Luiz Leite da Silva Department of Operation and
Department of Reactor Technology Maintenance of Logistic,
Service (SETRE) Transportation and Hydraulic
Centro de Desenvolvimento da Systems Faculty of Mechanical
Tecnologia Nuclear—CDTN Engineering
Belo Horizonte, Brazil Wroclaw University of Science and
Technology
S. B. Singh Wrocław, Poland
Department of Mathematics,
Statistics & Computer Science Om Prakash Yadav
G. B. Pant University of Agriculture & Civil & Industrial Engineering
Technology North Dakota State University
Pantnagar, India Fargo, North Dakota

Christian Spreafico Shui Yu


Department of Management, School of Mechanical and Electrical
Information and Production Engineering
Engineering University of Electronic Science and
University of Bergamo Technology of China
Dalmine, Italy Chengdu, China

Preeti Wanti Srivastava Xufeng Zhao


Department of Operational Research College of Economics and Management
University of Delhi Nanjing University of Aeronautics and
New Delhi, India Astronautics
Nanjing, China
Zhonglai Wang
School of Mechanical and Electrical
Engineering
University of Electronic Science and
Technology of China
Chengdu, China
1 Preventive Maintenance
Modeling
State of the Art
Sylwia Werbińska-Wojciechowska

CONTENTS
1.1 Introduction........................................................................................................1
1.2  Preventive Maintenance Modeling for Single-Unit Systems.............................3
1.3  Preventive Maintenance Modeling for Multi-unit Systems............................. 14
1.4  Conclusions and Directions for Further Research...........................................24
References.................................................................................................................26

1.1 INTRODUCTION
Preventive maintenance (PM) is an important part of facilities management in many
of today’s companies. The goal of a successful PM program is to establish consistent
practices designed to improve the performance and safety of the operated equip-
ment. Recently, this type of maintenance strategy is applied widely in many techni-
cal systems such as production, transport, or critical infrastructure systems.
Many studies have been devoted to PM modeling since the 1960s. One of the first
surveys of maintenance policies for stochastically failing equipment—where PM
models are under investigation—is given in [1]. In this work, the author investigated
PM for known and uncertain distributions of time to failure. Pierskalla and Voelker [2]
prepared another excellent survey of maintenance models for proper scheduling and
optimizing maintenance actions, which Valdez-Flores and Feldman [3] updated later.
Other valuable surveys summarize the research and practice in this area in different
ways (e.g.,  [4–18]. In  turn, the comparison between time-based maintenance and
condition-based maintenance is the authors’ area of interest, e.g., in works [19,20]).
In  this chapter, the author focuses on the review and summary of recent PM
policies developed and presented in the literature. The adopted main maintenance
models classification is based on developments given in [15–18]. The models classi-
fication includes two main groups of maintenance strategies—single- and multi-unit
systems. The main scheme for classification of PM models for technical system is
presented in Figure 1.1.

1
2 Reliability Engineering

PREVENTIVE MAINTENANCE (PM) FOR TECHNICAL SYSTEMS

PM FOR SINGLE-UNIT SYSTEMS PM FOR MULTI-UNIT SYSTEMS

Age-based PM policies Sequential PM policies BASIC MODELS FOR HYBRID PM MODELS


SYSTEMS WITHOUT
COMPONENTS * inspection maintenance modeling
DEPENDENCE * spare parts provisioning policy
Periodic PM policies Failure limit policies
* dynamic reliability maintenance
Extended PM models for
single-unit systems
Repair limit policies

BASIC MODELS FOR SYSTEMS


WITH COMPONENTS
Repair cost limit policies Repair time limit policies
DEPENDENCE

* group maintenance policy


* opportunistic maintenance policy
* cannibalization maintenance

FIGURE  1.1  The  classification for preventive maintenance models for technical system.
(Own contribution based on Wang, H., European Journal of Operational Research, 139,
469–489, 2002; Werbińska-Wojciechowska, S., Technical System Maintenance, Delay-time-
based modeling, Springer, London, UK, 2019; Werbińska-Wojciechowska, S., Multicomponent
technical systems maintenance models: State of art (in Polish), in Siergiejczyk, M. (ed.),
Technical Systems Maintenance Problems: Monograph (in Polish), Publication House of
Warsaw University of Technology, Warsaw, Poland, pp. 25–57, 2014.)

Many well-known research papers focus on PM models dedicated for optimi-


zation of single-unit systems performance. The  well-known maintenance models
for single-unit systems are age-dependent PM and periodic PM models. In  these
areas, the most frequently used replacement models are based on age replacement
and block replacement policies. The basic references in this area are [3,15,22,23].
The maintenance policies comparison is presented, e.g., in works [24–29].
According to Cho and Parlar [4], “multi component maintenance models are con-
cerned with optimal maintenance policies for a system consisting of several units
of machines or many pieces of equipment, which may or may not depend on each
other.” In 1986, Thomas, in his work [30], presents classification of optimal mainte-
nance strategies for multi-unit systems. He focuses on the models that are based on
one of three types of dependence that occurs between system elements—economic,
failure, and structural. According to the author, economic dependence implies that
an opportunity for a group replacement of several components costs less than sepa-
rate replacements of the individual components. Stochastic dependence, also called
failure or probabilistic dependence, occurs if the condition of components influences
the lifetime distribution of other components. Structural dependence means that com-
ponents structurally form a part, so that maintenance of a failed component implies
maintenance of working components. These definitions are adopted in this chapter.
Literature reviews are given, e.g., in works  [5,31–33] that are compatible with
research findings given in [30]. More comprehensive discussion in maintenance from
an application point of view can be found in [34,35]. For other recent references, see,
e.g., [8,18,23]. A detailed review of the most commonly used PM policies for single-
and multi-unit systems is presented in subchapters 1.2 and 1.3.
Preventive Maintenance Modeling 3

1.2 PREVENTIVE MAINTENANCE MODELING


FOR SINGLE-UNIT SYSTEMS
First, the PM models for single-unit systems are investigated. Here a unit may be
perceived as a component, an assembly, a subsystem, or even the whole system
(treated as a complex system). The main classification for maintenance models of
such systems is given in Figure  1.2. The  comparisons concerning different PM
policies are given in works [22,24,25,28,29,36–38].
One of the most commonly used PM policies for single-unit systems is an age
replacement policy (ARP) that was developed in the early 1960s  [39]. Under this
policy, a unit is always replaced at its age T or at failure, whichever occurs first [40].
The issues of ARP modeling have been extensively studied in the literature since
the 1990s. The main extensions that are developed for this maintenance policy apply
to minimal repair, imperfect maintenance performance, shock modeling, or inspec-
tion action implementation. Following this, in the known maintenance models, the
PM at T and corrective maintenance (CM) at failure might be either minimal, imper-
fect, or perfect. The main optimization criteria are based on maintenance cost struc-
ture. Therefore, in the case of the simple ARP, the expected cost per unit of time for
an infinite time span is given as [39,41]:

cr F (T ) + c p F (T )
C (T ) = T (1.1)

∫ F (t )dt
0

where:
C(T) is the long-run expected cost per unit time
cp is the cost of preventive replacement of a unit
cr is the cost of failed unit replacement
F(t) is the probability distribution function of system/unit lifetime: F (t ) = 1 − F (t )

PREVENTIVE MAINTENANCE (PM) FOR SINGLE-UNIT SYSTEMS

ARP MODELS FOR BRP MODELS FOR SEQUENTIAL PM MODELS LIMIT PM MODELS FOR
SINGLE-UNIT SYSTEMS SINGLE-UNIT SYSTEMS FOR SINGLE-UNIT SINGLE-UNIT SYSTEMS
SYSTEMS
*minimal repair implementation *minimal repair implementation
*perfect/imperfect repair *perfect/imperfect repair *minimal repair implementation
*shock modelling *shock modeling *finite/infinite time horizon
*cost/availability/reliability *cost/availability constraints *hybrid models
constraints *inspection policy
*inspection policy *finite/infinite time horizon
*new/used unit maintenance
modeling
*negligible/non-negligible downtime LIMIT PM MODELS FOR LIMIT PM MODELS FOR
SINGLE-UNIT SYSTEMS SINGLE-UNIT SYSTEMS

*perfect/imperfect repair
*finite/infinite time horizon
*dynamic reliability models
*mixed PM models

REPAIR-TIME LIMIT REPAIR-COST LIMIT


POLICY POLICY

*finite/infinite time horizon *perfect/imperfect maintenance


*different modeling approaches *inspection performance
*mixed PM models *mixed PM models

FIGURE 1.2  The classification for PM models for single-unit systems.


4 Reliability Engineering

The first investigated group of ARP models apply to minimal repair implementa-


tion. Minimal repair is defined herein as “the repair that put the failed item back
into operation with no significant effect on its remaining life time” [39]. A simple
ARP model with minimal repair is given in [42], where the author investigates a
one-unit system that is replaced at first failure after age T. All failures that happen
before the age T are minimally repaired. The model is based on the optimization of
the mean cost rate function. The extension of this model is given in [43,44], where
the authors develop the ARP with minimal repair and general random repair cost.
The continuation of this research also is given in [45], where the author introduces
the model for determining the optimal number of minimal repairs before replace-
ment. The main assumptions are compatible with [43,44] and incorporate minimal
repair, replacement, and general random repair cost.
A similar problem is analyzed later in [46], where the authors investigate PM with
Bayesian imperfect repair. In the given PM model, the failure that occurred (for the
unit age Ty < T) can be either minimally repaired or perfectly repaired with random
probabilities. The expected cost per unit time is investigated for the infinite-horizon
case and the one-replacement-cycle case.
The implementation of Bayesian approach for determining optimal replacement
strategy also is given in [47]. In this paper, the authors present a fully Bayesian anal-
ysis of the optimal replacement problem for the block replacement protocol with
minimal repair and the simple age replacement protocol. The optimal replacement
strategies are obtained by maximizing the expected utility with uncertainty analysis.
The  ARP with minimal repair usually is investigated with the use of mainte-
nance costs constraints for optimization performance. However, a few PM models
are developed based on availability optimization. For example, in [48] the authors
investigate the steady-state availability of imperfect repair model for repairable two-
state items. The authors use the renewal theory for providing analytical solutions for
single and multi-component systems.
In another work [49], the author introduces an ARP with non-negligible down-
times. In this work, the author develops the sufficient conditions for the ARP in the
aspect of the existence of a global minimum to the asymptotic expected cost rate.
The introduction of periodic testing or inspections in ARP performance is given
in [50]. The author in this work introduces an ARP for components whose failures can
occur randomly but are detected only by periodic testing or inspections. The devel-
oped model includes finite repair and maintenance times and cost contributions due
to inspection (or testing), repair, maintenance, and loss of production (or accidents).
The analytical solution encompasses general cost rate and unavailability equations.
The continuation of inspection maintenance and PM optimization problems is given
in [51], where the authors focus on the issues of random failure and replacement time
implementation.
In [52], the authors introduce replacement policies for a unit that is running suc-
cessive works with cycle times. In the paper, three replacement policies are defined
that are scheduled at continuous and discrete times:

• Continuous age replacement: The  unit is replaced before failure at a


planned time T
Preventive Maintenance Modeling 5

• Discrete age replacement: The unit is replaced before failure at completion


of the Nwcth working cycle
• Age replacement with overtime: The unit is replaced before failure at the
first completion of some working cycle over the planned time T

Analytical equations of the expected cost rate with numerical solutions are provided.
The authors also present the comparison of given replacement policies.
Another extension of ARP modeling is given in [53], where the authors investigate
the problem of PM uncertainty by assuming that the quality of PM actions is a random
variable with a defined probability distribution. Following this, the authors analyze an
age reduction PM model and a failure rate PM model. Under the age reduction PM
model, it is assumed that each PM reduces operational stress to the existing time units
previous to the PM intervention, where the restoration interval is less than or equal to
the PM interval. The optimization criteria also is based on maintenance cost structure.
The  issues of warranty policy are investigated in  [54]. The  author in this work
investigates a general age-replacement model that incorporates minimal repair,
planned replacement, and unplanned replacement for a product under a renewing
free-­replacement warranty policy. The main assumptions of the ARP are compatible
with [43,44]. The authors assume that all the product failures that cause minimal repair
can be detected instantly and repaired instantaneously by a user. Thus, it is assumed
in this study that the user of the product should be responsible for all minimal repairs
before and after the warranty expires. Following this, for the product with an increas-
ing failure rate function, the authors show that a unique optimal replacement age exists
such that the long-run expected cost rate is minimized. The  authors also compare
­analytically the optimal replacement ages for products with and without warranty.
The  warranty policy problem is analyzed in  [55], where the authors propose
an age-dependent failure-repair model to analyze the warranty costs of products.
In  this paper, the authors consider four typical warranty policies (fixed warranty,
renewing warranty, mixture of minimal and age-reducing repairs, and partial rebate
warranty).
The last group of ARP models applies to PM strategies based on the implementa-
tion of shock models. The simple age-based policy with shock model is presented
in [56]. In this work, the authors introduce the three main cumulative damage m ­ odels:
(1) a unit that is subjected to shocks and suffers some damage due to shocks, (2) the
model includes periodic inspections, and (3) the model assumes that the amount of
damage increases linearly with time. For the defined shock models, optimal replace-
ment policies are derived for the expected cost rate minimization.
The extension of the given models is presented in [57], where the authors study
the mean residual life of a technical object as a measure used in the age replacement
model assessment. The analytical solution is supplied with a new U-statistic test pro-
cedure for testing the hypothesis that the life is exponentially distributed against the
alternative that the life distribution has a renewal-increasing mean residual property.
Another development of general replacement models of systems subject to shocks
is presented in [58], where the authors introduce the fatal and nonfatal shocks occur-
rence. The fatal shock causes the system total breakdown and the system is replaced,
whereas the nonfatal shock weakens the system and makes it more expensive to run.
6 Reliability Engineering

Following this, the authors focus on finding the optimal T that minimizes the long-
run expected cost per unit time.
Another extension of the ARP with shock models is to introduce the minimal repair
performance. Following this, in [59] the authors extend the generalized replacement
policy given in [58] by introducing minimal repair of minor failures. Moreover, in the
given PM model, the cost of minimal repair of the system is age dependent.
Later, in [60], the authors introduce an extended ARP policy with minimal repairs
and a cumulative damage model implementation. Under the developed maintenance
policy, the fatal shocks are removed by minimal repairs and the minor shocks increase
the system failure rate by a certain amount. Without external shocks, the failure rate
of the system also increases with age due to the aging process. The optimality criteria
also are focused on the long-run expected cost per unit time. This model is extended
later in [61], where the authors consider the ARP with minimal repair for an extended
cumulative damage model with maintenance at each shock. According to the devel-
oped PM policy, when the total damage does not exceed a predetermined failure level,
the system undergoes maintenance at each shock. When the total damage has reached
a given failure level, the system fails and undergoes minimal repair at each failure.
The system is replaced at periodic times T or at Nth failure, whichever occurs first.
To sum up, many authors usually discuss ARPs of single-unit systems analyti-
cally. The main models that address this maintenance strategy also should be sup-
plemented by works that investigate the problem of ARP modeling with the use of
semi-Markov processes (see, e.g.,  [62,63]), TTT-plotting (see, e.g.,  [64]), heuristic
models (see, e.g.,  [65]), or approximate methods implementation (see, e.g.,  [66]).
The authors in [67] introduce the new stochastic order for ARP based on the com-
parison of the Laplace transform of the time to failure for two different lifetime
distributions. The comparison of ARP models for a finite horizon case based on a
renewal process application and a negative exponential and Weibull failure-time dis-
tribution is presented in [68]. The additional interesting problems in ARP modeling
may be connected with spare provisioning policy implementation (see, e.g., [69]) or
multi-state systems investigation (see, e.g., [62,70,71]).
The quick overview of the given ARPs is presented in Table 1.1.
Another popular PM policy for single-unit systems is block replacement policy
(BRP). For the given maintenance policy, it is assumed that all units in a system are
replaced at periodic intervals regardless of their individual age in kT time moments,
where k = 1, 2, 3, and so on. The maintenance problem usually is aimed at finding
the optimal cycle length T either to minimize total maintenance and operational
costs or to maximize system availability. The simple BRP, when the maintenance
times are negligible, is based on the optimization of the expected long-run mainte-
nance cost per unit time as a function of T, given as [72]:

cr N (T ) + c p
C (T ) = (1.2)
T

where:
N(t) is the expected number of failure/renewals for time interval (0,t)
TABLE 1.1
Summary of PM Policies for Single-Unit Systems
Type of Maintenance Policy Planning Horizon Optimality Criterion Modeling Method Typical References
ARP Infinite (∞) The long-run expected cost per time unit Bayesian approach [47]
ARP Infinite (∞) The long-run expected cost per unit time, Analytical [38]
availability function
ARP Infinite (∞) The long-run expected cost per time unit Analytical [39,40–42,44,53,54,​
60,118]
ARP Infinite (∞) The expected cost rate Analytical [45,49,51,​56,59,61,​
66,119]
Preventive Maintenance Modeling

ARP Infinite (∞) The mean cost rate Analytical [120]


ARP Infinite (∞) The total cost rate, the expected Analytical [50]
unavailability
ARP Infinite (∞) The expected replacement cost rate Analytical [52]
ARP Infinite (∞) The expected warranty cost Analytical [55]
ARP Infinite (∞) The steady-state availability function Analytical [48]
ARP Infinite (∞) The survival function Analytical [121]
ARP Infinite (∞) The mean time to failure Analytical (Laplace [67]
transform)
ARP Infinite (∞) The long-run expected cost per unit time, Multi-attribute value model [122]
availability, lifetime, and reliability
functions
ARP Infinite (∞) The expected long-run cost rate Heuristic model [65]
ARP Infinite (∞) The expected long-run cost rate Semi-Markov decision [63]
process
(Continued)
7
8

TABLE 1.1 (Continued)


Summary of PM Policies for Single-Unit Systems
Type of Maintenance Policy Planning Horizon Optimality Criterion Modeling Method Typical References
ARP Infinite (∞) The expected long-run cost rate Semi-Markov process [62]
ARP Infinite (∞) The long-run average cost per unit time Proportional hazard model [64]
and TTT-plotting
ARP Infinite (∞) The total system costs Simulation model [69]
ARP Infinite (∞) State-age-dependent policy Multi-phase Markovian model [71]
ARP Infinite (∞) Mean residual life Analytical/simulation [57]
ARP Infinite (∞) The expected cost of operating the system Analytical [123]
over a time interval
ARP Infinite (∞) The expected long-run cost per unit time, Analytical [78]
the total discounted cost
ARP Infinite (∞)/finite The expected cost rate per unit time Analytical [46]
ARP Infinite (∞)/finite The long-run expected cost per unit time Analytical [43,58,124]
ARP Finite Expected cumulative cost Analytical [68]
ARP Finite Customer’s expected discounted Continuous-time Markov [70]
maintenance cost process
BRP Infinite (∞) The long-run expected cost per time unit Analytical [72,74–80,83,​
125–127]
BRP Infinite (∞) The long-run expected cost per time unit Analytical/semi-Markov [81]
processes
BRP Finite The long-run expected cost per time unit Analytical [7]
(Continued)
Reliability Engineering
TABLE 1.1 (Continued)
Summary of PM Policies for Single-Unit Systems
Type of Maintenance Policy Planning Horizon Optimality Criterion Modeling Method Typical References
Sequential PM policy Infinite (∞) Mean maintenance costs Analytical [41]

Sequential PM policy Infinite (∞) Expected cost rate Analytical [88]

Sequential PM policy Infinite (∞) Expected costs per unit time Analytical [90]

Sequential PM policy Infinite (∞) Total expected maintenance costs Genetic algorithm [92]
Preventive Maintenance Modeling

Sequential PM policy Infinite (∞) Mean cost rate Bayesian approach [93]
Sequential PM policy Infinite (∞)/finite Expected cost rate till replacement Analytical [89]

Sequential PM policy Finite Expected cost till replacement Analytical [7]

Sequential PM policy Finite Expected profit Genetic algorithm [91]


Failure limit policy (Failure rate Infinite (∞) Total expected long-run cost per unit time Analytical [94]
through wear/accumulated damage
or stress)
Failure limit policy (Failure rate) Infinite (∞) Cost rate Analytical [95,96]
Failure limit policy (Failure rate) Infinite (∞) Availability function Analytical [128]
Failure limit policy (Degradation Infinite (∞) Total expected long-run cost per unit Analytical [129]
ratio) time/availability function
Failure limit policy (Failure rate) Infinite (∞) Unit-cost life of a system Genetic algorithms [98]
Failure limit policy (Age) Finite Total costs function Analytical (branching [97]
algorithm)
(Continued)
9
10

TABLE 1.1 (Continued)


Summary of PM Policies for Single-Unit Systems
Type of Maintenance Policy Planning Horizon Optimality Criterion Modeling Method Typical References
Repair-time limit policy Infinite (∞) Expected cost per unit time Markov renewal process [101]
Repair-time limit policy Infinite (∞) Expected cost per unit time Analytical [100]
Repair-time limit policy Infinite (∞) The total expected costs per unit time Graphical approach (TTT) [102,107,112]
Repair-time limit policy Infinite (∞) The expected total discounted cost Graphical approach [106]
Repair-time limit policy Infinite (∞) The expected cost per unit time Lorenz curve [105]
Repair-time limit policy Infinite (∞) The long-run average profit rate/the total Analytical/nonparametric [104]
discounted profit algorithms
Repair-cost limit policy Infinite (∞) Cost rate Analytical [108,110,111]
Repair-cost limit policy Infinite (∞) Mean cost rate Analytical [109,115]
Repair-cost limit policy Infinite (∞) Mean cost rate Markov renewal process [117]
Repair-cost limit policy Infinite (∞) The long-term cost per unit time Analytical [113]
Repair-cost limit policy Infinite (∞) The long-run total maintenance cost rate Analytical [114]
Repair-cost limit policy Infinite (∞) Total expected cost per unit time Graphical approach (TTT) [103]
Repair-cost limit policy Infinite (∞) The expected average cost per unit time Optimal stopping theory [130]
Repair-cost limit policy Infinite (∞) The long-run average expected Semi-Markov decision [131]
maintenance cost per unit time process
Repair-cost limit policy Finite The expected cost of servicing Analytical [132]
Reliability Engineering
Preventive Maintenance Modeling 11

The  main advantage of this policy is its simplicity. However, the main drawback
of simple block replacement policy is that at planned replacement times practically
new items might be replaced and a major portion of the useful life of these units is
wasted. Thus, to overcome this disadvantage, various modifications have been intro-
duced in the literature. The main extensions for the simple BRP include minimal
repair implementation, finite/infinite time horizon, shock modeling use, and inspec-
tion maintenance performance.
The introduction of minimal repair performance was analyzed first in the 1970s.
(see, e.g., [41,73]). Later, in [74], the author considers a BRP with minimal repair at
failure for a used unit of age Tax. In the given model, the item is preventively replaced
by new ones at times kT, k = 1, 2, 3, and so on. If the system fails in [(k−1)T, kT−Δδ],
then the item either is replaced by new ones or is repaired minimally. If the failure
occurs in [kT−Δδ , kT ], then the item either is replaced by used ones with age vary-
ing from Δδ to T or is repaired minimally. The choice is random with age-dependent
probability. The cost structure also is age-dependent. For the given assumptions, the
author defines the expected long-run cost per unit time function. This maintenance
model is extended later in [75] for single and multi-unit cases.
An interesting model is introduced in [76], where the authors investigate optimal
maintenance model for repairable systems under two types of failures with differ-
ent maintenance costs. The model assumes that there are performed periodic visual
inspections that detect potential failures of type I. For the given assumptions, the
total expected costs are estimated.
The  presented models are developed for an infinite time span. In  [7] finite
replacement models are considered. Taking into account, that the working time of
a unit is given by a specified value Two, the long-run expected costs per unit time
are estimated.
Another extension of the simple BRP applies to shock modeling implementation.
For example, in [77] the authors investigate the system subjected to shocks, which occur
independently and according to a Poisson process with intensity rate λs. The occurred
shocks either may be nonlethal with probability ps or lethal with probability (1−ps).
Later, the extension of the given model is presented in [78]. In the given paper, the author
analyzes a system subject to shocks that arrive according to an Non-Homogeneous
Poisson (NHP) process. As shocks occur, the system has two types of failures:

• Type I (minor) failure: Removed by minimal repair


• Type II (catastrophic) failure: Removed by unplanned replacement

The probability of the type II failure is dependent on the number of shocks suffered


since the last replacement. The author derives the expressions for the expected long-
run cost per unit time and the total α-discounted cost for each policy. This model
is later extended in [79], where the authors consider a BPR model for a system sub-
jected to shock occurrence and with minimal repair at failure for a used unit of age
Tax. The proposed solution was based on assumptions given in [74].
The time-dependent cost structure is investigated in [80], where the authors deter-
mine a replacement time for a system with the use of counting process whose jump
size is of one unit magnitude.
12 Reliability Engineering

To sum up, many authors discuss BRPs of single-unit systems due to their sim-
plicity. The  main models that address this maintenance strategy also should be
supplemented by works that investigate the problem of imperfect maintenance (see,
e.g., [81,82]), joint preventive maintenance with production inventory control policy
(see, e.g., [83]), risk at failure investigation (see, e.g., [84]), or estimation issues (see,
e.g.,  [72]). The  examples of BRP implementation apply to transportation systems
maintenance (see, e.g.,  [85]), aircraft component maintenance (see, e.g.,  [86]), or
preventive maintenance for milling assemblies (see, e.g., [87]). The quick overview
of the given BRPs is presented in Table 1.1.
Another PM policy applied in the area of maintenance of single-unit systems
is sequential PM policy. Under this PM policy a unit is preventively maintained at
unequal time intervals. The unequal time interval usually is related to the age of the
system or is predetermined as in periodic maintenance policies [15].
One of the first works where the author considers sequential PM policy is [88].
In this work, the sequential preventive maintenance for a system with minimal repair
at failure is investigated. The policy assumes that the system is replaced at constant
time intervals and at the Nth failure. This model is later investigated in [7], where the
author proposes the simple sequential PM policy with imperfect maintenance for a
finite time span.
Another interesting model of the sequential PM policy is presented in [89], where
the authors introduce a shock model and a cumulative damage model. In this article,
two replacement policies are developed—a periodic PM and a sequential PM pol-
icy with minimal repair at failure and imperfect PM. The solutions are obtained for
finite and infinite time spans. These problems are investigated later in [90], where
the authors adopt improvement factors in the hazard rate function for modeling the
imperfect PM performance. The  model is presented for an infinite time-horizon.
The main characteristic of the given model is connected with considering the age-
dependent minimal repair cost and the stochastic failure type.
In [91], the authors present a sequential imperfect PM policy for a degradation
system. This model extends assumptions given in [88]. The developed model is based
on maximal/equal cumulative-hazard rate constraints. The optimization is obtained
using a genetic algorithm. Later, the random adjustment-reduction maintenance
model with imperfect maintenance policy for a finite time span is presented in [92].
The authors also use the genetic algorithm implementation.
The Bayesian approach implementation in the sequential PM problem is presented
in [93]. The authors determine the optimal PM schedules for a hybrid sequential PM
policy, where the age reduction PM model and the hazard rate PM model are com-
bined. Under such a hybrid PM model, each PM action reduces the effective age of
the system to a certain value and also adjusts the slope of the hazard rate (slows down
the degradation process of the maintained system).
Sequential PM policies are practical for most units that need more frequent main-
tenance with increasing age. The quick overview of the main known sequential PM
models is given in Table 1.1.
The last group of PM policies applies to predefined limit level policies. The PM
policy depends on the failure model assumed for operated units—failure limit p­ olicy.
Under this policy, PM is performed only when the defined state variable, which
Preventive Maintenance Modeling 13

describes the state of the unit at age T (e.g., failure rate), reaches a predetermined
level and failures that occur are repaired.
One of the first works that investigates the optimal replacement model with the use
of the failure limit policy is in [94]. The author in this work presents the replacement
policy based on the failure model defined for an operating unit. In this model, a unit
state at age T is defined by a random variable. The replacement is performed either at
failure or when the unit state reaches or exceeds a given level, whichever occurs first.
Model optimization is based on the average long-run cost per unit time estimation.
This problem is investigated later in [95]. The author in his work introduces a PM
model with the monotone hazard function affected by system degradation. The author
develops a hazard model and achieves a cost optimization of system operation.
The imperfect repair in failure limit policy is introduced in [96]. The authors in
their work consider two types of PM (simple PM and preventive replacement) and
two types of corrective maintenance (minimal repair and corrective replacement).
The developed cost-rate model is based on adjustment of the failure rate after simple
PM with the use of a concept of improvement factor. The expected costs are the sum
of average costs of both types of PM and average cost of downtime. This problem is
addressed continued in [97]. The authors in their work propose a cost model for two
types of PM (as in [96]) and one type of corrective maintenance (corrective replace-
ment) that considers inflationary trends over a finite time horizon.
The PM scheduling for a system with deteriorated components also is analyzed
in [98]. The authors consider a PM policy compatible with those presented in [97],
but the degraded behavior of maintained components is modeled by a dynamic reli-
ability equation. The optimal solution, based on unit-cost life estimation, is obtained
with the use of genetic algorithms.
Another example of PM modeling under the failure limit policy is presented
in [99], where the authors focus on system availability optimization. In the presented
model system failure rate is reduced after each PM and depends on age and on the
number of performed PM actions.
Maintenance models under the failure limit policy are summarized in the
Table 1.1.
The second group of PM policies based on predefined limit levels are repair limit
policies. In the known literature, there are two types of repair limit policies: a repair
cost limit policy and a repair time limit policy [13]. Under the repair cost limit policy,
when a unit fails, a repair cost is estimated and repair is undertaken if the estimated
cost is less than a predetermined limit. Otherwise, the unit is replaced. For the repair
cost limit policy, a decision variable applies to time of repair. If the time of corrective
repair is greater than the specified time Trmax, a unit is replaced. Otherwise, the unit
is repaired [15,100].
The first models on repair limit policies are presented in [100,101]. The modeling
methods are based on Markov renewal process use. Later, in [102], the authors dis-
cuss the optimal repair limit replacement policy based on a graphical approach with
the use of the Total Time on Test (TTT) concept. This graphical approach is used
in [103] to determine the optimal repair limit replacement policy.
Another extension of the simple repair time limit policy is imperfect maintenance
implementation. In this implementation, known models are presented in [104–107].
14 Reliability Engineering

The implemented modeling methods are based on using the TTT concept and Lorenz
statistics.
The  second type of repair limit policies is repair cost estimations at a system
failure and is defined as a repair-cost limit policy. One of the first studies that inves-
tigates a general maintenance model with replacements and minimal repair as a
base for repair limit replacement policy is  [108]. The  author presents three basic
maintenance policies (based on age-dependent PM and periodic PM) and two basic
repair limit replacement policies. In the first repair-cost limit replacement policy, the
author assumes that a system is replaced by the new one if the random repair cost
exceeds a given repair cost limit; otherwise, it is minimally repaired. This problem
is later investigated in [109], in which the minimal repairs follow Non-Homogeneous
Poisson Process (NHPP).
The problem of imperfect maintenance is introduced in [110], whereas in [111]
the authors investigate the problem of imperfect estimation of repair cost (imperfect
inspection case).
The implementation of a graphical method (TTT concept) in the repair-cost limit
replacement problem with imperfect repair is presented in  [112]. In  the presented
model, the authors introduce the imperfect repair (according to [110]) and a lead time
for failed unit replacement. The solution is based on the assumption of negligible
replacement time and uses the renewal reward process.
The  cumulative damage model for systems subjected to shocks is presented
in [113]. The author introduces a periodical replacement policy with the concept of
repair cost limit under a cumulative damage model and solves it analytically for an
infinite time span.
Another interesting approach to the repair-cost limit replacement policies is pre-
sented in [114]. The author proposes the total repair-cost limit replacement policy,
where a system is replaced by the new one as soon as its total repair cost reaches
or exceeds a given level. The presented problem is later investigated and extended
in [115,116], where the authors introduce two types of failures (repairable and non-
repairable) and propose a mixed maintenance policy similar to the one presented
in [117].
The current repair limit policies and their extensions are summarized in the Table 1.1.

1.3 PREVENTIVE MAINTENANCE MODELING


FOR MULTI-UNIT SYSTEMS
In  this subchapter, the PM models for multi-unit systems are investigated. In  this
research area models can be distinguished for system with component dependence
and for systems without that component dependence defined. For systems without
component dependence simple age- and block-maintenance models can be imple-
mented. When there is possibility to identify any occurrence of components depen-
dence in a system, three main types of maintenance policies may be used:

• Group maintenance policy


• Opportunistic maintenance policy
• Cannibalization maintenance
Preventive Maintenance Modeling 15

First, the group maintenance policies may be used. Under such a policy, a group of
items is replaced at the same time to take advantage of economies of scale.
Opportunity-based replacement models is based on the rule that replacement is
performed at the time when an opportunity arrives, such as scheduled downtime,
planned shutdown of the machines, or failure of a system in close proximity to the
item of interest.
In the situation when one machine is inoperative due to lack of components and
at the same time one or more other machines are inoperative due to the lack of dif-
ferent components, maintenance personnel may cannibalize operative components
from one or more machines to repair the other or others. This practice is common in
systems that are composed of sufficiently identical component parts (see, e.g., [34]).
The main classification for these types of PM maintenance models is given in Figure 1.3.
Following is a detailed review of the most commonly used maintenance policies.
First, maintenance policies for multi-unit systems without component dependence
are reviewed. In these systems two PM policies usually are used—ARP and BRP.
One of the first works that applies the simple age replacement policy imple-
mentation is [133]. The author proposes the simple ARP model for an nk-out-of-n
warm stand-by system, where the lifetime of components is exponentially distrib-
uted. The  optimal maintenance policy for n failure-independent but non-identical
machines in series is given in [134]. The solution is obtained with the use of nonlin-
ear programming models.
The  maintenance models with the use of ARP for multi-unit systems mostly
implement minimal repair, a shock-modeling approach, and hybrid PM.
The minimal repair is introduced in [135]. In this paper, the model assumes that a
system is replaced at age T. When the system fails before age T, it is either replaced
or minimally repaired depending on the random repair cost at failure. The  model
considers finite and infinite time spans and is solved with a Bayesian approach
implementation.

PREVENTIVE MAINTENANCE (PM) FOR MULTI-UNIT SYSTEMS

BASIC MODELS FOR MULTI-UNIT BASIC MODELS FOR MULTI-UNIT


SYSTEMS WITHOUT SYSTEMS WITH COMPONENTS
COMPONENTS DEPENDENCE DEPENDENCE

ARP MODELS FOR MULTI- BRP MODELS FOR MULTI- OPPORTUNISTIC GROUP MAINTENANCE
UNIT SYSTEMS UNIT SYSTEMS MAINTENANCE MODELS MODELS

*minimal repair implementation *minimal repair implementation *age-based maintenance *static models
*perfect/imperfect repair *failure-based maintenance *dynamic models
*perfect/imperfect repair
*shock modeling *condition-based maintenance
*shock modeling *HYBRID MODELS (mixed PM)
*HYBRID MODELS (mixed PM) *cost/availability constraints
*HYBRID MODELS (mixed PM,
economic dependence occurrence)

CANNIBALIZATION
MAINTENANCE

*reliability-based models
* simulation models
*inventory-based models

FIGURE 1.3  The classification for PM models for multi-unit systems.


16 Reliability Engineering

Another interesting extension of the simple ARP is shock-modeling implemen-


tation. This problem is investigated in [136,137]. In [136], the authors introduce a
maintenance model for a two-unit system subjected to shocks and with a failure
rate interaction. The two types of shocks (minor and catastrophic) stem from a non-
homogeneous pure birth process and their occurrence is dependent on the number of
shocks that have occurred since the last replacement. In [137], this model is extended
by a spare parts availability investigation.
The hybrid ARP applies mostly to opportunity-based maintenance implementa-
tion. This problem is investigated in [138], where maintenance opportunities arise
according to a Poisson process. The problem of opportunity-based ARP also is inves-
tigated in [139–141].
In the available literature, ARP models can be found that apply to a repair priority
problem (see [142]), a machine repair problem (see, [143]), or production systems main-
tenance (see [144]). The quick overview of the given ARPs is presented in Table 1.2.
The second group of PM policies for multi-unit systems without economic depen-
dence applies to BRPs. Various BRPs are investigated in [145]. The author analyzes
a two-unit system in a series reliability structure.
The  maintenance problems of a two-unit parallel system also are investigated
in  [146]. In  this article, the authors introduce a replacement model with minimal
repair at minor failure. The  analyzed system is based on structural dependence.
The significant development of this model is given in [147], where the authors focus
on periodic replacement for an n-unit parallel system subject to common cause shock
failures. In this model, two types of failures are considered:

• Independent failures of one component in the system


• Failures of many components of the system at the same time, not necessar-
ily independent

The summary of optimum replacement policies for an n-unit system in parallel is given


in [148]. The authors compare four replacement policies—a simple BRP and a mixed
BRP. This work is the basis for other authors to introduce many extensions of the BRPs
for multi-unit systems. The analysis of a system with non-identical components is given
in [149]. Imperfect maintenance is introduced in [150]. Moreover, the periodic replace-
ment with minimal repair at failure for a multi-unit system is considered in [151]. In this
work, the author investigates a simple model of BRP with minimal repair, when repair
costs depend on system age and the number of performed minimal repairs.
The problem of minimal repair performance is investigated in [152], where the
authors introduce a periodical inspection for a two-unit parallel system. This model
considers the detection capacity of inspections (perfect/imperfect), minimal repairs,
and failure interactions to examine dependence between subsystems. The  investi-
gation is continued in  [153], where the authors examine issues analyzed in  [152]
and [150].
The main maintenance models focus on optimization of the cycle length T between
performance of preventive maintenance actions. A number of research works also deal
with the problem of cyclically scheduling maintenance activities assuming a fixed cycle
length. In [154], the authors formulate a maintenance scheduling problem to maintain a
Preventive Maintenance Modeling 17

TABLE 1.2
Summary of Age and Block Replacement Policies for Multi-unit Systems
Type of
Maintenance Planning Typical
policy Horizon Optimality Criterion Modeling Method References
ARP Infinite (∞) The expected long-run Analytical [133,138–141,​
costs per unit time 144]
ARP Infinite (∞) The expected long-run Nonlinear [134]
costs per unit time programming
ARP Infinite (∞) The expected cost rate Analytical [136,137,143]
ARP Infinite (∞) Average loss rate Renewal process/ [142]
geometric process/
Markov process
ARP Infinite (∞)/ The expected long-run Renewal reward theory/ [119]
finite costs per unit time Bayesian approach
BRP Infinite (∞) The expected long-run Analytical/simulation [145,149]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical (hybrid PM) [152,157]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical (expected and [155]
cost per unit time critical value models)
BRP Infinite (∞) The expected long-run Markov processes [158]
cost per unit time
BRP Infinite (∞) The expected long-run Embedded Markov [153]
cost per unit time chain
BRP Infinite (∞) The expected long-run Analytical [75,146–151]
cost per unit time
BRP Infinite (∞) The expected long-run Analytical [160]
cost per unit time,
system availability
BRP Infinite (∞) System availability Analytical [150,161]

BRP Infinite (∞) System availability Analytical (matrix [156]


and reliability Laplace
transformations)
BRP Infinite (∞) Total operating and Branch and price [154]
servicing cost algorithm
BRP Infinite (∞) System reliability Simulation [162]

set of machines for a given determined T. The study presents the completely determinis-
tic approach to decide for each period t ∈ T which machine to service (if any) such that
total servicing costs and operating costs are minimized. The solution is obtained with the
use of a branch and price algorithm. Another interesting maintenance problem applies to
investigation of uncertain lifetime of system units (see [155]), introduction of repairable
18 Reliability Engineering

and non-repairable failures of a system (see [156]), lives of heterogeneous components


of a system (see [157]), implementation of a ergodic Markov environment (see [158],
or nearly optimal and optimal PM assessment for real-life systems (see  [128,159]).
The quick overview of the given BRPs is presented in Table 1.2.
For  technical systems, where component dependence can be defined, group
maintenance policies may be used to optimize system performance. This mainte-
nance policy is based on the performance of a maintenance activity for a group of
components. According to [15], the group maintenance is performed either when a
fixed time interval is expired or when a fixed number of units have failed, which-
ever comes first. The  main classification of group replacement policies includes
two main groups of models—static maintenance models and dynamic maintenance
models.
In the group of static maintenance models, four main classes of group replace-
ment policies can be defined. A T-age policy that assumes a system replacement is
performed after every T units of time. An m-failure policy that calls for replacing
a system at the time of mth failure. The (m, T)-policy combines features of T-age
policy and m-failure policy—under such a policy, system replacement is performed
at the time of the mth failure or at time T, whichever occurs first. The T-policy refers
to the assumptions of the block replacement.
The presented classes of maintenance models are based on the assumption that a
failure distribution of a system is known with certainty. However, in practice the fail-
ure distribution of a system is usually unknown or known with uncertain parameters.
In this case, there are proposed Bayesian group replacement policies.
Considering the planning aspect, group maintenance models can be classified as
stationary or dynamic. In stationary models, a long-term stable situation is assumed
during which the rules for maintenance do not change over the planning horizon.
The models in this overview mostly applies to this type. However, stationary mod-
els cannot incorporate dynamically changing information during operational pro-
cess performance, such as a varying deterioration of components or unexpected
opportunities.
To consider such short-term circumstances there are proposed dynamic models
that can adapt the long-term plan according to information becoming available in the
short term. This situation yields a dynamic grouping policy [163].
The main extensions of the group maintenance apply to minimal repair perfor-
mance, shock modeling, or periodic inspection implementation.
Additional replacement problems that are investigated in grouping maintenance
models apply to risk management (see [164]), continuous deteriorating process imple-
mentation (see  [165]), or joint optimization of production scheduling (see  [166]).
In [164], the author analyzes the correlation among potential human error, grouping
maintenance, and major accident risk. In [165], the authors introduce the novel sto-
chastic Petri-Net and genetic algorithm-based approach to solve maintenance model-
ing and optimization problems. The authors in [166] present a Bayesian approach to
develop a joint optimization model connecting group PM with production schedul-
ing of a series system.
Group maintenance models are investigated widely in the literature. A review is
presented in Table 1.3.
TABLE 1.3
Summary of Group Maintenance Policies for Deteriorating Multi-unit Systems
Planning
Horizon Type of Group Maintenance Optimality Criterion Modeling Method Typical References
Infinite (∞) Static (T-policy) The long-run cost per unit time Analytical [167,168]
The expected cost per unit time [169]
The expected cost rate [148]
System maintenance cost in a unit time [170]
Stationary availability [161]
Expected discounted cost to go Control theory of jump process/ [171]
Preventive Maintenance Modeling

dynamic programming
Static (T-age policy) The long run expected cost per unit of time Analytical [172,173]
Static (T-age policy, m-failure The expected cost per unit time [174,175]
policy, (m, T)-policy)
Static The long run expected cost per unit of time Bayesian approach [164,176,177]
The long-run average maintenance cost per Markov processes [35]
unit time Discrete-time Markov decision chains/ [178]
simulation
Total maintenance possession time and cost Petri-net and GA-based approach [165]
Total maintenance costs Random-key genetic algorithm [166]
Finite rolling Dynamic The long-term tentative plan Dynamic programming [179]
horizon The economic profit of group Heuristic approach based on genetic [180]
algorithm and MULTIFIT algorithm
The economic profit of group Heuristic approach based on GA [181]
Penalty cost function, total maintenance cost Analytical [163]
savings over the scheduling interval
19
20 Reliability Engineering

Another group of maintenance policies for multi-unit systems with component


dependence is opportunity-based maintenance. During performance processes of a
multi-unit system, some maintenance opportunities may occur due to breakdowns
of units in a series configuration. In most cases opportunities cannot be predicted in
advance and, because of their random occurrence, opportunistic maintenance mod-
els can be used for effective maintenance planning. Types of opportunistic mainte-
nance policies considered in this chapter are based mainly on [182] and include four
main groups of maintenance policies:

• Age-based opportunity maintenance models


• Failure-based opportunity maintenance models
• Opportunity and condition-based maintenance models
• Mixed PM models that consider implementation of different types of main-
tenance policies

The detailed classification and review of the given opportunity-based maintenance


policies is presented in Table 1.4.
The main extensions of opportunity-based maintenance models apply to minimal
repair performance, imperfect maintenance implementation, data uncertainty inves-
tigation, finite horizon case, or shock modeling. The  main applications are main-
tenance of production systems (see  [183–185]) or offshore wind turbine systems
(see [186]).
A  few papers deal with an opportunistic maintenance policy under a multi-­
criteria perspective. The main research studies apply to production system perfor-
mance (see [187]) and a power plant (see [188]).
Worth mentioning also is a group of risk-based opportunistic maintenance
models. This  modeling problem is considered in  [189]. The authors develop a
reliability model for a system that releases signals as it degrades. These released
signals are used to inform opportunistic maintenance. They assume that system
vulnerability to shock occurrence is dependent on its deterioration level. The risk-
based opportunistic maintenance model also is analyzed in [190]. In [190], the
authors present the model that uses risk evaluation of system shutdown caused by
component failure. The proposed approach is based on the analysis of fault cou-
pling features of a complex mechanical system considering age and risk factors.
In  this research area, the issues of dynamic opportunistic maintenance policy
optimization are analyzed. For example, in [191], the authors develop a dynamic
opportunistic maintenance policy for a continuously monitored multi-unit series
system with imperfect maintenance. The model is based on short-term optimization.
It is assumed also that a unit’s hazard rate distribution in the current maintenance
cycle can be directly derived through condition-based predictive maintenance.
This  problem is later investigated in  [192], where the authors present a dynamic
TABLE 1.4
Summary of Opportunity-Based Maintenance Policies for Deteriorating Multi-unit Systems
Planning Maintenance Typical
Horizon Model Optimality Criterion Modeling Method References
Infinite (∞) Age-based Expected total discounted time/ expected total discounted value of Analytical [203]
good time minus costs, total discounted good time vs cost ratio
Cost rate [168]
Expected long-run cost per unit time Analytical (deterministic problem) [204]
Optimal production stops Odds algorithm-based approach [198]
One-step cost function discrete-time Markov chain [205]
Total expected mainte­nance cost per unit per day Simulation [206]
Preventive Maintenance Modeling

Total maintenance cost [207]


The expected cost per unit time Monte Carlo simulation [184]
MC simulation and Bootstrap [208]
technique
Finite Total maintenance cost in a given time period Shortest path algorithm [209]
The total maintenance cost Linear programming [194]
The cumulative maintenance cost in a given time horizon Monte Carlo simulation [210]
The average cost per unit time Heuristic approach [211]
Infinite (∞) Failure-based Expected system cost rate Analytical [212]
The long-run mean cost rate [213]
Long-run expected system maintenance cost per unit time [214]
Number of failures Analytical/coupling technique [215]
The total maintenance cost rate Dynamic simulation [190]
Signals of failure state and degradation state of a component Signal model/simulation [189]
System availability MAM [188]

(Continued)
21
22

TABLE 1.4 (Continued)


Summary of Opportunity-Based Maintenance Policies for Deteriorating Multi-unit Systems
Planning Maintenance Typical
Horizon Model Optimality Criterion Modeling Method References
Finite The expected total maintenance cost Analytical [185]
The total maintenance cost Simulation [216]
Genetic algorithm [217]
MAM-APB model [187]
Survival function Expert judgment [202]
The total maintenance cost Genetic algorithm [196]
Infinite (∞) Condition-based The long-run expected maintenance cost rate Simulation [218]
The long-run average maintenance cost rate Markov decision process [219]
The long-run average maintenance cost per blade and per time Analytical [220]
unit [186]
Finite Cumulative OM cost saving [191]
The long-term average maintenance cost [192]
The expected total cost per unit time Dynamic Bayesian networks [201]
Infinite (∞) Mixed PM Joint stationary probability A deterioration state space partition [182]
method
Optimal total cost Discrete-event simulation model [200]
Finite The expected cost incurred in a cycle Analytical [221]
The total maintenance cost per unit time [193]
Average net benefit over failure replacement policy Genetic algorithm [195]
The expected maintenance cost Dynamic programming [197]
– Components proximity measure Fuzzy approach [199]
Reliability Engineering
Preventive Maintenance Modeling 23

opportunistic condition-based maintenance strategy that is based on real-time pre-


dictions of the remaining useful life of components with stochastic and economic
dependencies.
In  [193], the authors propose a dynamic opportunistic PM optimization policy
for multi-unit series systems that integrates two PM techniques: periodic PM and
sequential PM policies. Whenever one unit reaches its reliability threshold level, the
whole system has to stop and at that time PM opportunities arise for other units of
the system. The optimal PM policy is determined by maximizing the cost saving for
short-term cumulative opportunistic maintenance of the whole system.
Moreover, some research studies are based on the implementation of lin-
ear programming (see  [194]), genetic algorithms (see  [195,196]), dynamic pro-
gramming (see  [197]), theory of optimal stopping (see  [198]), fuzzy modeling
approach (see [199]), and simulations (see [200]). A generalized modeling method
for maintenance optimization of single- and multi-unit systems is given in [182].
Moreover, a Bayesian perspective in opportunistic maintenance is investigated
in  [201], where the authors propose a PM policy for multi-component systems
based on dynamic Bayesian networks (DBN)—Hazard and Operability Study
(HAZOP) model. The use of expert judgment to parameterize a model for degra-
dation, maintenance, and repair is provided in [202].
The last group of PM models for multi-unit systems with component dependence
applies to cannibalization maintenance. Cannibalization in maintenance occurs
“when a failed unit in a system is replaced with a functioning component from another
system that is failed for some other reason” [222]. The key issue in cannibalization
is how to use the component of failed units to maximize the number of working
units. Thus, cannibalization actions often are used in systems with large costs
associated with their critical components maintenance and operation (e.g., critical
infrastructures, transport systems, and production systems).
In the recent literature, a significant amount of research is available on the use
of mathematical modeling to analyze the effects of cannibalization. For a literature
survey, see [18,223,224].
Following  [222,225], this research can be separated into the three main
approaches [18]:

• Reliability-based models
• Inventory-based maintenance models
• Simulation (queuing) maintenance models

The detailed classification and review of the given opportunity-based maintenance


policies is presented in Table 1.5.
24 Reliability Engineering

TABLE 1.5
Summary of Cannibalization Maintenance Policies for Deteriorating
Multi-unit Systems
Optimality Criterion Approach Modeling Method Typical References
System minimum condition Reliability-based Analytical [226]
Cannibalized structure function (allocation model) [227]
Four measures: expected system Analytical [228]
state, defectives per failed (allocation
machine, MTTCFa, total model)/simulation
cannibalizations
The survival function of number Analytical [229]
of units of equipment available
or use at the end of given time
period
System reliability for mission Nonlinear [225]
programming
Total profit resulting from a Simulation [230]
component reusing
Reasons for product returns Case study [223]
Expected number of inoperative Markov process [34]
machines
The average total maintenance Simulation-based A closed-network, [222]
investments discrete-event
Average total maintenance simulation [231]
costs/average fleet readiness
NORS rate Inventory-based NORS model [232]
Optimal portfolio, optimal stock Allocation problem – [233]
level heuristic approach
The expected availability objective DRIVE model [224]
function
Aircraft availability Analytical (AAM [234]
model)
Cannibalization rates Analytical [235]
Cannibalization rates Performance indicators [236]
analysis
Product cannibalization Statistical data analysis [237]
e.g., Inter-Squadron cannibalization Balanced Scorecard [238]

a MTTCF – Mean time to complete failure

1.4  CONCLUSIONS AND DIRECTIONS FOR FURTHER RESEARCH


In this chapter, the literature is reviewed on the most commonly used preventive main-
tenance models for single- and multi-unit systems. The literature was selected based
on using Google Scholar as a search engine and ScienceDirect, JStor, SpringerLink,
and SAGEJournals. The author primarily searched the relevant literature based on
Preventive Maintenance Modeling 25

keywords, abstracts, and titles. The following main terms and/or a combination of


them were used for searching the literature: preventive maintenance, maintenance
model, time-based maintenance.
The  selection methodology was based on searching for the defined keywords,
and later choosing the models, that satisfy the main reviewing criteria. For example,
when searching for the keyword preventive maintenance in a Google search, there
were about 260 million hits. In the ScienceDirect database, this keyword had about
68,440  hits. Comparing the obtained search results to the main required criteria,
such as age-based maintenance model, block-based maintenance model, mainte-
nance optimization for multi-unit system, and periodic maintenance, the author
focused on the most frequently used inspection models published from 1964 to 2015.
Preventive maintenance issues have been investigated by various researchers and
practitioners for over 60  years. Thus, it is impossible to present all of the known
models that appeared during the period under consideration. As a result, just a few
of the other problems are presented that are investigated in the literature but omitted
in this chapter:

• Spare part optimization issues (see [239,240])


• Data uncertainty (see [241,242])
• Maintenance decision-making issues (see [243]).

Moreover, the given literature overview provided definition for the following main
conclusions:

• The most commonly used mathematical methods for analyzing maintenance


scheduling problems include applied probability theory, renewal reward
processes, and Markov decision theory. When the functional relationship
between the system’s input and output parameters cannot be described
analytically, various maintenance models have been developed that apply
linear and nonlinear programming, dynamic programming, simulation pro-
cesses, genetic algorithms, Bayesian approach, and heuristic approaches,
which were only mentioned in the presented overview.
• The investigated maintenance models usually are based on cost criterion to
obtain the optimal maintenance parameters. However, maintenance actions
focused on improving system dependability. Thus, for complex systems, where
various types of components have different maintenance cost and different
reliability importance in the system, it is more appropriate to analyze the opti-
mal maintenance policy under cost and reliability constraints simultaneously.
• Many maintenance models consider the grouping of maintenance activities
on a long-term basis with an infinite horizon. In practice, planning horizons
are usually finite for a number of reasons: information is only available
over the short term, a modification of the system changes the maintenance
problem completely, and some events are unpredictable.
• In the most existing literature on maintenance theory, the maintenance time
is assumed to be negligible. This assumption makes availability modeling
impossible or unrealistic.
26 Reliability Engineering

• Most maintenance models are based on the assumption of fully available


logistic support when it is needed. Thus, in the modeling approach, it is
assumed that whenever a system component is to be replaced, a new com-
ponent is immediately available. However, considering real life situations,
the number of spare parts is usually limited and the procurement lead-time
is non-negligible. This  situation implies that the maintenance policy and
spare provisioning policy should be modeled and optimized jointly.
• Another problem applies to data availability and reliability. Maintenance
and replacement decisions are based on the information available, such as
the failure data of the equipment under consideration, maintenance per-
formance times, and type and number of necessary support resources.
Sufficient data rarely exist for estimating parameters in a complex model,
and if data do exist, they are often unreliable. This  situation makes the
application of mathematical models to support maintenance and replace-
ment decisions less obvious.

In summary, traditional PM programs often require very time-consuming, manual


data and rely heavily on “tribal knowledge” estimates or require in-depth knowledge
and analysis of each individual piece of equipment on an ongoing basis to stay up-
to-date. Thus, based on the authors main conclusions and following the global trends
in maintenance (see [244,245] for recent reports), in the future most likely the main
interests will be on more advanced maintenance optimization models that are based
on the use of digital technologies.

REFERENCES
1. Mccall, J. J. (1965). Maintenance policies for stochastically failing equipment: A sur-
vey. Management Science 11(5): 493–524.
2. Pierskalla, W. P. and Voelker, J. A. (1976). A survey of maintenance models: The con-
trol and surveillance of deteriorating systems. Naval Research Logistics Quarterly 23:
353–388.
3. Valdez-Flores, C. and Feldman, R. (1989). A survey of preventive maintenance mod-
els for stochastically deteriorating single-unit systems. Naval Research Logistics 36:
419–446.
4. Cho, I. D. and Parlar, M. (1991). A survey of maintenance models for multi-unit sys-
tems. European Journal of Operational Research 51(1): 1–23.
5. Dekker, R., Wildeman, R. E., and Van Der Duyn Schouten, F. A. (1997). A  review
of multi-component maintenance models with economic dependence. Mathematical
Methods of Operations Research 45: 411–435.
6. Mazzuchi, T. A., Van Noortwijk, J. M., and Kallen, M. J. (2007). Maintenance optimi-
zation. Technical Report, TR-2007-9.
7. Nakagawa, T. and Mizutani, S. (2009). A  summary of maintenance policies for a
finite interval. Reliability Engineering and System Safety 94: 89–96. doi:10.1016/
j.ress.2007.04.004.
8. Nicolai, R. P. and Dekker, R. (2007). A review of multi-component maintenance models.
In: Aven, T. and Vinnem, J. M. (eds.) Risk, Reliability and Societal Safety: Proceedings
of European Safety and Reliability Conference ESREL 2007, Stavanger, Norway, June
25–27, 2007, Leiden, the Netherlands: Taylor & Francis Group: pp. 289–296.
Preventive Maintenance Modeling 27

9. Nowakowski, T. and Werbińska, S. (2009). On problems of multi-component system


maintenance modelling. International Journal of Automation and Computing 6(4):
364–378.
10. Pham, H. and Wang, H. (1996). Imperfect maintenance. European Journal of
Operational Research 94: 425–438.
11. Pophaley, M. and Ways, R. K. (2010). Plant maintenance management practices in
automobile industries: A  retrospective and literature review. Journal of Industrial
Engineering and Management 3(3): 512–541. doi:10.3926/jiem..v3n3.p512-541.
12. Popova, E. and Popova, I. (2014). Replacement strategies. Wiley StatsRef: Statistics
Reference Online.
13. Sarkar, A., Behera, D. K., and Kumar, S. (2012). Maintenance policies of single
and multi-unit systems in the past and present. International Journal of Current
Engineering and Technology 2(1): 196–205.
14. Vasili, M., Hond, T. S., Ismail, N., and Vasili, M. (2011). Maintenance optimization
models: A review and analysis. In: Proceedings of the 2011 International Conference
on Industrial Engineering and Operations Management, January 22–24, 2011, Kuala
Lumpur, Malaysia: pp. 1131–1138.
15. Wang, H. (2002). A survey of maintenance policies of deteriorating systems. European
Journal of Operational Research 139(3): 469–489. doi:10.1016/S0377-2217(01)00197-7.
16. Wang, H. and Pham, H. (2003). Optimal imperfect maintenance models. In: Pham,
H. (ed.) Handbook of Reliability Engineering, London, UK: Springer-Verlag London
Limited: pp. 397–414.
17. Wang, H. and Pham, H. (1997). A survey of reliability and availability evaluation of
complex networks using Monte Carlo techniques. Microelectronics Reliability 37(2):
187–209. doi:10.1016/S0026-2714(96)00058-3.
18. Werbińska-Wojciechowska, S. (2019). Technical System Maintenance. Delay-Time-
Based Modelling. London, UK: Springer.
19. Ahmad, R. and Kamaruddin, S. (2012). An overview of time-based and condition-
based maintenance in industrial application. Computers and Industrial Engineering
63: 135–149. doi:10.1016/j.cie.2012.02.002.
20. Geurts, J. H. J. (1983). Optimal age replacement versus condition based replacement:
Some theoretical and practical considerations. Journal of Quality Technology 15(4):
171–179.
21. Werbińska-Wojciechowska, S. (2014). Multicomponent technical systems mainte-
nance models: State of art (in Polish). In: Siergiejczyk, M. (ed.) Technical Systems
Maintenance Problems: Monograph (in Polish), Warsaw, Poland: Publication House of
Warsaw University of Technology: pp. 25–57.
22. Barlow, R. E. and Proschan, F. (1964). Comparison of replacement policies, and
renewal theory implications. The  Annals of Mathematical Statistics 35(2): 577–589.
doi:10.1214/aoms/1177703557.
23. Wang, H. and Pham, H. (2006). Reliability and Optimal Maintenance, London, UK:
Springer-Verlag.
24. Aven, T. and Dekker, R. (1997). A  useful framework for optimal replacement
models. Reliability Engineering and System Safety 58(1): 61–67. doi:10.1016/
S0951-8320(97)00055-0.
25. Block, H. W., Langberg, N.A., and Savits, T.H. (1990). Maintenance comparisons:
Block policies. Journal of Applied Probability 27: 649–657. doi:10.2307/3214548.
26. Block, H. W., Langberg, N. A., and Savits, T. H. (1990). Comparisons for maintenance
policies involving complete and minimal repair. Lecture Notes-Monograph Series
16(Topics in Statistical Dependence): 57–68.
27. Christer, A. H. and Keddie, E. (1985). Experience with a stochastic replacement model.
Journal of Operational Research Society 36(1): 25–34.
28 Reliability Engineering

28. Frostig, E. (2003). Comparison of maintenance policies with monotone failure rate dis-
tributions. Applied Stochastic Models in Business and Industry 19: 51–65. doi:10.1002/
asmb.485.
29. Langberg, N. A. (1988). Comparisons of replacement policies. Journal of Applied
Probability 25: 780–788.
30. Thomas, L. C. (1986). A survey of maintenance and replacement models for maintain-
ability and reliability of multi-item systems. Reliability Engineering 16(4):297–309.
31. Aboulfath, F. (1995). Optimal maintenance schedules for a fleet of vehicles under the
constraint of the single repair facility. MSc Thesis. Toronto, ON: University of Toronto.
32. Nicolai, R. P. and Dekker, R. (2006). Optimal maintenance of multicomponent sys-
tems: A review. Economic Institute Report.
33. Lamberts, S. W. J. and Nicolai, R. P. (2008). Maintenance Models for Systems Sub-
ject to Measurable Deterioration. Rotterdam, the Netherlands: Rozenberg Publishers,
University Dissertations.
34. Fisher, W. W. (1990). Markov process modelling of a maintenance system with spares,
repair, cannibalization and manpower constraints. Mathematical Computer Modelling
13(7): 119–125.
35. Gurler, U. and Kaya, A. (2002). A  maintenance policy for a system with multi-state
components: An approximate solution. Reliability Engineering and System Safety 76:
117–127. doi:10.1016/S0951-8320(01)00125-9.
36. Block, H. W., Langberg, N. A., and Savits, T. H. (1993). Repair replacement policies.
Journal of Applied Probability 30: 194–206. doi:10.2307/3214632.
37. Park, M. and Pham, H. (2016). Cost models for age replacement policies and block
replacement policies under warranty. Applied Mathematical Modelling 40(9–10):
5689–5702. doi:10.1016/j.apm.2016.01.022.
38. Scarf, P. A., Dwight, R., and Al-Musrati, A. (2005). On reliability criteria and the
implied cost of failure for a maintained component. Reliability Engineering and System
Safety 89: 199–207. doi:10.1016/j.ress.2004.08.019.
39. Chowdhury, C. H. (1988). A systematic survey of the maintenance models. Periodica
Polytechnica. Mechanical Engineering 32(3–4): 253–274.
40. Glasser, G. J. (1967). The age replacement problem. Technometrics 9(1): 83–91.
41. Rakoczy, A. and Żółtowski, J. (1977). About the issues on technical object renewal
principles definition (in Polish). In: Proceedings of Winter School on Reliability,
Szczyrk, Poland: pp. 175–191.
42. Yun, W. Y. (1989). An age replacement policy with increasing minimal repair cost.
Microelectronics Reliability 29(2): 153–157.
43. Sheu, S.-H. (1991). A general age replacement model with minimal repair and general
random repair cost. Microelectronics Reliability 31(5): 1009–1017.
44. Sheu, S.-H. and Liou, C.-T. (1992). An age replacement policy with minimal repair and
general random repair cost. Microelectronics Reliability 32(9): 1283–1289.
45. Sheu, S.-H. (1993). A generalized model for determining optimal number of minimal
repairs before replacement. European Journal of Operational Research 69: 38–49.
46. Lim, J. H., Qu, J., and Zuo, M. J. (2016). Age replacement policy based on imperfect
repair with random probability. Reliability Engineering and System Safety 149: 24–33.
doi:10.1016/j.ress.2015.10.020.
47. Mazzuchi, T. A. and Soyer, R. (1996). A Bayesian perspective on some replacement
strategies. Reliability Engineering and System Safety 51: 295–303.
48. Cha, J. H. and Kim, J. J. (2002). On the existence of the steady state availability of
imperfect repair model. Sankhya: The Indian Journal of Statistics 64, series B. Pt.
1: 76–81.
49. Dagpunar, J. S. (1994). Some necessary and sufficient conditions for age replacement
with non-zero downtimes. Journal of Operational Research Society 45(2): 225–229.
Preventive Maintenance Modeling 29

50. Vaurio, J. K. (1999). Availability and cost functions for periodically inspected pre-
ventively maintained units. Reliability Engineering and System Safety 63: 133–140.
doi:10.1016/S0951-8320(98)00030-1.
51. Nakagawa, T., Zhao, X., and Yun, W. Y. (2011). Optimal age replacement and inspection
policies with random failure and replacement times. International Journal of Reliability,
Quality and Safety Engineering 18(5): 405–416. doi:10.1142/S0218539311004159.
52. Zhao, X., Mizutani, S., and Nakagawa, T. (2015). Which is better for replacement poli-
cies with continuous or discrete scheduled times? European Journal of Operational
Research 242: 477–486. doi:10.1016/j.ejor.2014.11.018.
53. Wu, S. and Clements-Croome, D. (2005). Preventive maintenance models with random
maintenance quantity. Reliability Engineering and System Safety 90: 99–105.
54. Chien, Y.-H. (2008). A  general age-replacement model with minimal repair under
renewing free-replacement warranty. European Journal of Operational Research 186:
1046–1058. doi:10.1016/j.ejor.2007.02.030.
55. Dimitrov, B., Chukova, S., and Khalil, Z. (2004). Warranty costs: An age-dependent
failure/repair model. Naval Research Logistics 51(7): 959–976. doi:10.1002/nav.20037.
56. Ito, K. and Nakagawa, T. (2011). Comparison of three cumulative damage models.
Quality Technology and Quantitative Management 8(1): 57–66. doi:10.1080/16843703
.2011.11673246.
57. Sepehrifar, M. B., Khorshidian, K., and Jamshidian, A. R. (2015). On renewal
increasing mean residual life distributions: An age replacement model with hypoth-
esis testing application. Statistics and Probability Letters 96: 117–122. doi:10.1016/
j.spl.2014.09.009.
58. Sheu, S.-H. (1992). A general replacement of a system subject to shocks. Microelectronics
Reliability 32(5): 657–662.
59. Sheu, S.-H., Griffith, W. S., and Nakagawa, T. (1995). Extended optimal replace-
ment model with random repair cost. European Journal of Operational Research 85:
636–649.
60. Lai, M.-T. and Leu, B.-Y. (1996). An economic discrete replacement policy for a shock
damage model with minimal repairs. Microeconomics Reliability 36(10): 1347–1355.
61. Qian, C., Nakamura, S., and Nakagawa, T. (2003). Replacement and minimal repair pol-
icies for a cumulative damage model with maintenance. Computers and Mathematics
with Applications 46: 1111–1118.
62. Lam, C. T. and Yeh, R. H. (1994). Optimal replacement policies for multi-state deterio-
rating systems. Naval Research Logistics 41(3): 303–315.
63. Segawa, Y., Ohnishi, M., and Ibaraki, T. (1992). Optimal minimal-repair and replace-
ment problem with age dependent cost structure. Computers and Mathematics with
Applications 24(1/2): 91–101.
64. Kumar, D. and Westberg, U. (1997). Maintenance scheduling under age replacement
policy using proportional hazards model and TTT-ploting. European Journal of
Operational Research 99: 507–515.
65. Mahdavi, M. and Mahdavi, M. (2009). Optimization of age replacement policy using
reliability based heuristic model. Journal of Scientific and Industrial Research 68:
668–673.
66. Zhao, X., Al-Khalifa, K. N., and Nakagawa, T. (2015). Approximate method for opti-
mal replacement, maintenance, and inspection policies. Reliability Engineering and
System Safety 144: 68–73. doi:10.1016/j.ress.2015.07.005.
67. Kayid, M., Izadkhah, S., and Alshami, S. (2016). Laplace transform ordering of time
to failure in age replacement models. Journal of the Korean Statistical Society 45(1):
101–113.
68. Christer, A. H. (1986). Comments on finite-period applications of age-based replace-
ment models. IMA Journal of Mathematics in Management 1: 111–124.
30 Reliability Engineering

69. Kabir, A. B. M. Z. and Farrash, S. H. A. (1996). Simulation of an integrated age replace-


ment and spare provisioning policy using SLAM. Reliability Engineering and System
Safety 52: 129–138.
70. Wu, S. and Zuo, M. J. (2010). Linear and nonlinear preventive maintenance models.
IEEE Transactions on Reliability 59(1): 242–249. doi:10.1109/TR.2010.2041972.
71. Yeh, R. H. (1997). State-age-dependent maintenance policies for deteriorating systems
with Erlang sojourn time distributions. Reliability Engineering and System Safety 58:
55–60.
72. Crowell, J. I. and Sen, P. K. (1989). Estimation of optimal block replacement policies.
Mimeo series/the Institute of Statistics, the Consolidated University of North Carolina,
Department of Statistics, available at: stat.ncsu.edu.
73. Rakoczy, A. (1980). Simulation method for technical object’s optimal preventive main-
tenance time assessment (in Polish). In: Proceedings of Winter School on Reliability,
Szczyrk, Poland: pp. 143–152.
74. Sheu, S.-H. (1994). Extended block replacement policy with used item and general ran-
dom minimal repair cost. European Journal of Operational Research 79(3): 405–416.
75. Sheu, S.-H. (1991). Periodic replacement with minimal repair at failure and general ran-
dom repair cost for a multi-unit system. Microelectronics Reliability 31(5): 1019–1025.
76. Colosimo, E. A., Santos, W. B., Gilardoni, G. L., and Motta, S. B. (2006). Optimal
maintenance time for repairable systems under two types of failures. In: Soares, C. G.
and Zio, E. (eds.) Safety and Reliability for Managing Risk: Proceedings of European
Safety and Reliability Conference ESREL 2006, Estoril, Portugal, September 18–22,
2006, Leiden, the Netherlands: Taylor & Francis Group.
77. Lai, M.-T. and Yuan, J. (1993). Cost-optimal periodical replacement policy for a system
subjected to shock damage. Microelectronics Reliability 33(8): 1159–1168.
78. Sheu, S.-H. (1998). A  generalized age and block replacement of a system subject to
shocks. European Journal of Operational Research 108: 345–362.
79. Sheu, S.-H. and Griffith, W. S. (2002). Extended block replacement policy with shock
models and used items. European Journal of Operational Research 140: 50–60.
doi:10.1016/S0377-2217(01)00224-7.
80. Abdel-Hameed, M. (1986). Optimum replacement of a system subject to shocks.
Journal of Applied Probability 23: 107–114.
81. Abdel-Hameed, M. (1995). Inspection, maintenance and replacement models. Computers
and Operations Research 22(4): 435–441. doi:10.1016/0305-0548(94)00051-9.
82. Zhao, X., Qian, C., and Nakagawa, T. (2017). Comparisons of replacement policies with
periodic times and repair numbers. Reliability Engineering and System Safety 168:
161–170. doi:10.1016/j.ress.2017.05.015.
83. Berthaut, F., Gharbi, A., and Dhouib, K. (2011). Joint modified block replacement and
production/inventory control policy for a failure-prone manufacturing cell. Omega 39:
642–654. doi:10.1016/j.omega.2011.01.006.
84. Drobiszewski, J. and Smalko, Z. (2006). The equable maintenance strategy. Journal of
KONBiN 2: 375–383.
85. Pilch, R., Smolnik, M., Szybka, J., and Wiązania, G. (2014). Concept of preventive
maintenance strategy for a chosen example of public transport vehicles (in Polish). In:
Siergiejczyk, M. (ed.) Maintenance Problems of Technical Systems, Warsaw, Poland:
Publication House of Warsaw University of Science and Technology: pp. 171–182.
86. Kustroń, K. and Cieślak, Ł. (2012). The  optimization of replacement time for non-
repairable aircraft component. Journal of KONBiN 2(22): 45–58.
87. Pilch, R. (2017). Determination of preventive maintenance time for milling assemblies
used in coal mills. Journal of Machine Construction and Maintenance 1(104): 81–86.
88. Nakagawa, T. (1986). Periodic and sequential preventive maintenance policies. Journal
of Applied Probability 23: 536–542.
Preventive Maintenance Modeling 31

89. Nakagawa, T. and Mizutani, S. (2008). Periodic and sequential imperfect preven-
tive maintenance policies for cumulative damage models. In: Pham, H. (ed.) Recent
Advances in Reliability and Quality in Design, London, UK: Springer.
90. Sheu, S.-H., Chang, C. C., and Chen, Y.-L. (2012). An extended sequential imper-
fect preventive maintenance model with improvement factors. Communications in
Statistics: Theory and Methods 41(7): 1269–1283. doi:10.1080/03610926.2010.542852.
91. Liu, Y., Li, Y., Huang, H.-Z., and Kuang, Y. (2011). An optimal sequential preventive
maintenance policy under stochastic maintenance quality. Structure and Infrastructure
Engineering: Maintenance, Management, Life-Cycle Design and Performance 7(4):
315–322.
92. Peng, W., Liu, Y., Zhang, X., and Huang, H.-Z. (2015). Sequential preventive main-
tenance policies with consideration of random adjustment-reduction features.
Eksploatacja i Niezawodnosc: Maintenance and Reliability 17(2): 306–313.
93. Kim, H. S., Sub Kwon, Y., and Park, D. H. (2006). Bayesian method on sequential
preventive maintenance problem. The  Korean Communications in Statistics 13(1):
191–204.
94. Bergman, B. (1978). Optimal replacement under a general failure model. Advances in
Applied Probability 10: 431–451.
95. Canfield, R. V. (1986). Cost optimization of periodic preventive maintenance. IEEE
Transactions on Reliability R-35(1): 78–81. doi:10.1109/TR.1986.4335355.
96. Lie, C. H. and Chun, Y. H. (1986). An algorithm for preventive maintenance policy.
IEEE Transactions on Reliability R-35(1): 71–75.
97. Jayabalan, V. and Chaudhuri, D. (1992). Cost optimization of maintenance scheduling
for a system with assured reliability. IEEE Transactions on Reliability 41(1): 21–25.
doi:10.1109/24.126665.
98. Tsai, Y.-T., Wang, K.-S., and Teng, H.-Y. (2001). Optimizing preventive maintenance
for mechanical components using genetic algorithms. Reliability Engineering and
System Safety 74: 89–97. doi:10.1016/S0951-8320(01)00065-5.
99. Chan, J.-K. and Shaw, L. (1993). Modeling repairable systems with failure rates that
depend on age and maintenance. IEEE Transactions on Reliability 42(4): 566–571.
doi:10.1109/24.273583.
100. Nakagawa, T. and Osaki, S. (1974). The  optimum repair limit replacement policies.
Operational Research Quarterly 25(2): 311–317.
101. Okumoto, K. and Osaki, S. (1976). Repair limit replacement policies with lead time.
Zeitschrift fur Operations Research 20: 133–142.
102. Koshimae, H., Dohi, T., Kaio, N., and Osaki, S. (1996). Graphical/statistical approach to
repair limit replacement policies. Journal of the Operations Research 39(2): 230–246.
103. Dohi, T., Kaio, N., and Osaki, S. (2000). A  graphical method to repair-cost limit
replacement policies with imperfect repair. Mathematical and Computer Modelling 31:
99–106. doi:10.1016/S0895-7177(00)00076-5.
104. Dohi, T., Ashioka, A., Kaio, N., and Osaki, S. (2006). Statistical estimation algorithms
for repairs-time limit replacement scheduling under earning rate criteria. Computers
and Mathematics with Applications 51: 345–356. doi:10.1016/j.camwa.2005.11.004.
105. Dohi, T., Ashioka, A., Kaio, N., and Osaki, S. (2003). The optimal repair-time limit
replacement policy with imperfect repair: Lorenz transform approach. Mathematical
and Computer Modelling 38: 1169–1176. doi:10.1016/S0895-7177(03)90117-8.
106. Dohi, T., Kaio, N., and Osaki, S. (2003). A  new graphical method to estimate the
optimal repair-time limit with incomplete repair and discounting. Computers and
Mathematics with Applications 46: 999–1007. doi:10.1016/S0898-1221(03)90114-3.
107. Dohi, T., Matsushima, N., Kaio, N., and Osaki, S. (1996). Nonparametric repair-limit
replacement policies with imperfect repair. European Journal of Operational Research
96: 260–273.
32 Reliability Engineering

108. Beichelt, F. (1992). A  general maintenance model and its application to repair
limit replacement policies. Microelectronics Reliability 32(8): 1185–1196.
doi:10.1016/0026-2714(92)90036-K.
109. Bai, D. S. and Yun, W. Y. (1986). An age replacement policy with minimal repair cost
limit. IEEE Transactions on Reliability R-35(4): 452–454.
110. Yun, W. Y. and Bai, D. S. (1987). Cost limit replacement policy under imperfect repair.
Reliability Engineering 19: 23–28.
111. Yun, W. Y. and Bai, D. S. (1988). Repair cost limit replacement policy under imperfect
inspection. Reliability Engineering and System Safety 23: 59–64.
112. Dohi, T., Takeita, K., and Osaki, S. (2000). Graphical method for determining/­estimating
optimal repair-limit replacement policies. International Journal of Reliability, Quality
and Safety Engineering 7(1): 43–60.
113. Lai, M.-T. (2014). Optimal replacement period with repair cost limit and cumulative
damage model. Eksploatacja i Niezawodnosc: Maintenance and Reliability 16(2):
246–252.
114. Beichelt, F. (1999). A general approach to total repair cost limit replacement policies.
ORiON 15(1/2): 67–75.
115. Chang, C.-C., Sheu, S.-H., and Chen, Y.-L. (2013). Optimal replacement model with
age-dependent failure type based on a cumulative repair-cost limit policy. Applied
Mathematical Modelling 37: 308–317. doi:10.1016/j.apm.2012.02.031.
116. Chang, C.-C., Sheu, S.-H., and Chen, Y.-L. (2013) Optimal number of minimal repairs
before replacement based on a cumulative repair-cost limit policy. Computers and
Industrial Engineering 59: 603–610. doi:10.1016/j.cie.2010.07.005.
117. Kapur, P. K. and Garg, R. B. (1989) Optimal number of minimal repairs before replace-
ment with repair cost limit. Reliability Engineering and System Safety 26: 35–46.
118. Chien, Y.-H. and Sheu, S.-H. (2006). Extended optimal age-replacement policy with
minimal repair of a system subject to shocks. European Journal of Operational
Research 174: 169–181. doi:10.1016/j.ejor.2005.01.032.
119. Sheu, S.-H. (1999). Extended optimal replacement model for deteriorating systems.
European Journal of Operational Research 112: 503–516.
120. Chang, C.-C. (2014). Optimum preventive maintenance policies for systems subject to
random working times, replacement, and minimal repair. Computers and Industrial
Engineering 67: 185–194. doi:10.1016/j.cie.2013.11.011.
121. Martorell, S., Sanchez, A., and Serradell, V. (1999). Age-dependent reliability model
considering effects of maintenance and working conditions. Reliability Engineering
and System Safety 64: 19–31.
122. Jiang, R. and Ji, P. (2002). Age replacement policy: A  multi-attribute value
model. Reliability Engineering and System Safety 76: 311–318. doi:10.1016/
S0951-8320(02)00021-2.
123. Sheu, S.-H. and Chien, Y.-H. (2004). Optimal age-replacement policy of a system sub-
ject to shocks with random lead-time. European Journal of Operational Research 159:
132–144. doi:10.1016/S0377-2217(03)00409-0.
124. Legat, V., Zaludowa, A. H., Cervenka, V., and Jurca, V. (1996). Contribution to opti-
mization of preventive replacement. Reliability Engineering and System Safety 51:
259–266.
125. Nakagawa, T. and Kowada, M. (1983). Analysis of a system with minimal repair and its
application to replacement policy. European Journal of Operational Research 12(2):
176–182.
126. Park, D. H., Jung, G. M., and Yum, J. K. (2000). Cost minimization for periodic main-
tenance policy of a system subject to slow degradation. Reliability Engineering and
System Safety 68(2): 105–112. doi:10.1016/S0951-8320(00)00012-0.
Preventive Maintenance Modeling 33

127. Sheu, S.-H., Chen, Y.-L., Chang, C. H.-C. H., and Zhang, Z. G. (2016). A note on a
two variable block replacement policy for a system subject to non-homogeneous pure
birth shocks. Applied Mathematical Modelling 40(5–6): 3703–3712. doi:10.1016/​
j.apm.2015.10.001.
128. Bukowski, L. (1980). Optimization of technical systems maintenance policy (case
study of metallurgical production line) (in Polish). In: Proceedings of Winter School on
Reliability. Katowice, Ploand: Centre for Technical Progress: pp. 47–62.
129. Zhao, Y. X. (2003). On preventive maintenance policy of a critical reliability level for
system subject to degradation. Reliability Engineering and System Safety 79: 301–308.
doi:10.1016/S0951-8320(02)00201-6.
130. Jiang, X., Cheng, K., and Makis, V. (1998). On the optimality of repair-cost-limit poli-
cies. Journal of Applied Probability 35: 936–949.
131. Segawa, Y. and Ohnishi, M. (2000). The average optimality of a repair-limit replace-
ment policy. Mathematical and Computer Modelling 31: 327–334.
132. Murthy, D. N. P. and Nguyen, D. G. (1988). An optimal repair cost limit policy for ser-
vicing warranty. Mathematical and Computer Modelling 11: 595–599.
133. Frees, E. W. (1986). Optimizing costs on age replacement policies. Stochastic Processes
and their Applications 21: 195–212.
134. Maillart, L. M. and Fang, X. (2006). Optimal maintenance policies for serial, multi-
machine systems with non-instantaneous repairs. Naval Research Logistics 53(8):
804–813.
135. Sheu, S.-H., Yeh, R. H., Lin, Y.-B., and Juang, M.-G. (1999). A Bayesian perspective
on age replacement with minimal repair. Reliability Engineering and System Safety 65:
55–64.
136. Sheu, S.-H., Sung, C. H.-K., Hsu, T.-S., and Chen, Y.-C. H. (2013a). Age replacement
policy for a two-unit system subject to non-homogeneous pure birth shocks. Applied
Mathematical Modelling 37: 7027–7036. doi:10.1016/j.apm.2013.02.022.
137. Sheu, S.-H., Zhang, Z. G., Chien, Y.-H., and Huang, T.-H. (2013). Age replacement pol-
icy with lead-time for a system subject to non-homogeneous pure birth shocks. Applied
Mathematical Modelling 37: 7717–7725. doi:10.1016/j.apm.2013.03.017.
138. Dekker, R. and Dijkstra, M. C. (1992) Opportunity-based age replacement:
Exponentially distributed times between opportunities. Naval Research Logistics 39:
175–190.
139. Iskandar, B. P. and Sandoh, H. (2000). An extended opportunity-based age replacement
policy. RAIRO Operations Research 34: 145–154.
140. Jhang, J. P. and Sheu, S. H. (1999). Opportunity-based age replacement policy with
minimal repair. Reliability Engineering and System Safety 64: 339–344.
141. Satow, T. and Osaki, S. (2003). Opportunity-based age replacement with different
intensity rates. Mathematical and Computer Modelling 38: 1419–1426. doi:10.1016/
S0895-7177(03)90145-2.
142. Leung, F. K. N., Zhang, Y. L., and Lai, K. K. (2011). Analysis for a two-dissimilar-
component cold standby repairable system with repair priority. Reliability Engineering
and System Safety 96: 1542–1551. doi:10.1016/j.ress.2011.06.004.
143. Armstrong, M. J. (2002). Age repair policies for the machine repair problem. European
Journal of Operational Research 138: 127–141. doi:10.1016/S0377-2217(01)00135-7.
144. Van Dijkhuizen, G. C. and Van Harten, A. (1998). Two-stage generalized age mainte-
nance of a queue-like production system. European Journal of Operational Research
108: 363–378.
145. Scarf, P. A. and Deara, M. (2003). Block replacement policies for a two-component
system with failure dependence. Naval Research Logistics 50: 70–87. doi:10.1002/
nav.10051.
34 Reliability Engineering

146. Yusuf, I. and Ali, U. A. (2012). Structural dependence replacement model for parallel
system of two units. Journal of Basic and Applied Science 20(4): 324–326.
147. Lai, M.-T. and Yuan, J. (1991). Periodic replacement model for a parallel system subject
to independent and common cause shock failures. Reliability Engineering and System
Safety 31(3): 355–367.
148. Yasui, K., Nakagawa, T., and Osaki, S. (1988). A summary of optimum replacement
policies for a parallel redundant system. Microelectronic Reliability 28(4): 635–641.
149. Jodejko, A. (2008). Maintenance problems of technical systems composed of hetero-
geneous elements. In: Proceedings of Summer Safety and Reliability Seminars, June
22–28, 2008, Gdańsk-Sopot, Poland: pp. 187–194.
150. Sheu, S.-H., Lin, Y.-B., and Liao, G.-L. (2006). Optimum policies for a system with
general imperfect maintenance. Reliability Engineering and System Safety 91(3): 362–
369. doi:10.1016/j.ress.2005.01.015.
151. Sheu, S.-H. (1990). Periodic replacement when minimal repair costs depend on the age
and the number of minimal repairs for a multi-unit system. Microelectronics Reliability
30(4): 713–718.
152. Zequeira, R. I. and Berenguer, C. (2005). A block replacement policy for a periodically
inspected two-unit parallel standby safety system. In: Kołowrocki, K. (ed.) Advances in
Safety and Reliability: Proceedings of the European Safety and Reliability Conference
(ESREL 2005), Gdynia-Sopot-Gdańsk, Poland, June 27–30, 2005, Leiden, the
Netherlands: A. A. Balkema: pp. 2091–2098.
153. Park, J. H., Lee, S. C., Hong, J. W., and Lie, C. H. (2009). An optimal Block pre-
ventive maintenance policy for a multi-unit system considering imperfect mainte-
nance. Asia-Pacific Journal of Operational Research 26(6): 831–847. doi:10.1142/
S021759590900250X.
154. Grigoriev, A., Van De Klundert, J., and Spieksma, F. C. R. (2006). Modeling and solv-
ing the periodic maintenance problem. European Journal of Operational Research
172: 783–797. doi:10.1016/j.ejor.2004.11.013.
155. Ke, H. and Yao, K. (2016). Block replacement policy with uncertain lifetimes. Reliability
Engineering and System Safety 148: 119–124. doi:10.1016/j.ress.2015.12.008.
156. Wells, C. H. E. (2014). Reliability analysis of a single warm-standby system subject
to repairable and non-repairable failures. European Journal of Operational Research
235: 180–186. doi:10.1016/j.ejor.2013.12.027.
157. Scarf, P. A. and Cavalcante, C. A. V. (2010). Hybrid block replacement and inspection
policies for a multi-component system with heterogeneous component lives. European
Journal of Operational Research 206: 384–394. doi:10.1016/j.ejor.2010.02.024.
158. Anisimov, V. V. (2005). Asymptotic analysis of stochastic block replacement policies
for multi-component systems in a Markov environment. Operations Research Letters
33: 26–34. doi:10.1016/j.orl.2004.03.009.
159. Caldeira, D. J., Taborda, C. J., and Trigo, T. P. (2012). An optimal preventive main-
tenance policy of parallel-series systems. Journal of Polish Safety and Reliability
Association Summer Safety and Reliability Seminars 3(1): 29–34.
160. Duarte, A. C., Craveiro Taborda, J. C., Craveiro, A., and Trigo, T. P. (2005). Optimization
of the preventive maintenance plan of a series components system. In: Kołowrocki,
K. (ed.) Advances in Safety and Reliability: Proceedings of the European Safety and
Reliability Conference (ESREL 2005), Gdynia-Sopot-Gdańsk, Poland, June 27–30,
2005, Leiden, the Netherlands: A.A. Balkema.
161. Chelbi, A., Ait-Kadi, D., and Aloui, H. (2007). Availability optimization for multi-
component systems subjected to periodic replacement. In: Aven, T. and Vinnem, J. M.
(eds.) Risk, Reliability and Societal Safety: Proceedings of European Safety and
Reliability Conference ESREL 2007, Stavanger, Norway, June 25–27, 2007, Leiden,
the Netherlands: Taylor & Francis Group.
Preventive Maintenance Modeling 35

162. Okulewicz, J. and Salamonowicz, T. (2008). Preventive maintenance with imper-


fect repairs of a system with redundant objects. In: Proceedings of Summer Safety
and Reliability Seminars SSARS 2008, June 22–28, 2008, Gdańsk-Sopot, Poland:
pp. 279–286.
163. Do Van, P., Barros, A., Berenguer, C. H., and Bouvard, K. (2013). Dynamic group-
ing maintenance with time limited opportunities. Reliability Engineering and System
Safety 120: 51–59. doi:10.1016/j.ress.2013.03.016.
164. Okoh, P. (2015). Maintenance grouping optimization for the management of risk
in offshore riser system. Process Safety and Environmental Protection 98: 33–39.
doi:10.1016/j.psep.2015.06.007.
165. Zhang, T., Cheng, Z., Liu, Y.-J., and Guo, B. (2012). Maintenance scheduling for multi-
unit system: A stochastic Petri-net and genetic algorithm based approach. Eksploatacja
i Niezawodność: Maintenance and Reliability 14(3): 256–264.
166. Xiao, L., Song, S., Chen, X., and Coit, D. W. (2016). Joint optimization of production
scheduling and machine group preventive maintenance. Reliability Engineering and
System Safety 146: 68–78. doi:10.1016/j.ress.2015.10.013.
167. Sandve, K. and Aven, T. (1999). Cost optimal replacement of monotone, repairable
systems. European Journal of Operational Research 116: 235–248.
168. Zequeira, R. I. and Berenguer, C. (2004). Maintenance cost analysis of a two-
component parallel system with failure interaction. In: Proceedings of Reliability
and Maintainability, 2004 Annual Symposium: RAMS, 26-29 Jan. 2004, IEEE,
pp. 220–225. doi:10.1109/RAMS.2004.1285451.
169. Sheu, S.-H. and Jhang, J.-P. (1996). A generalized group maintenance policy. European
Journal of Operational Research 96: 232–247.
170. Bai, Y., Jia, X., and Cheng, Z. (2011) Group optimization models for multi-component
system compound maintenance tasks. Eksploatacja i Niezawodnosc: Maintenance and
Reliability 1: 42–47.
171. Haurie, A. and L’ecuyer, P. L. (1982). A stochastic control approach to group preventive
replacement in a multicomponent system. IEEE Transactions on Automatic Control,
AC-27 2: 387–393.
172. Lai, M.-T. and Chen, Y.-C. H. (2006). Optimal periodic replacement policy for a
two-unit system with failure rate interaction. International Journal of Advanced
Manufacturing Technology 29: 367–371.
173. Shafiee, M. and Finkelstein, M. (2015). An optimal age-based group maintenance pol-
icy for multi-unit degrading systems. Reliability Engineering and System Safety 134:
230–238. doi:10.1016/j.ress.2014.09.016.
174. Popova, E. and Wilson, J. G. (1999). Group replacement policies for parallel systems whose
components have phase distributed failure times. Annals of Operations Research 91: 163–189.
175. Ritchken, P. and Wilson, J. G. (1990). (m, T) group maintenance policies. Management
Science 36(5): 632–639.
176. Popova, E. (2004), Basic optimality results for Bayesian group replacement policies.
Operations Research Letters 32: 283–287.
177. Sheu, S.-H., Yeh, R. H., Lin, Y.-B., and Juang, M.-G. (2001). A Bayesian approach to an
adaptive preventive maintenance model. Reliability Engineering and System Safety 71:
33–44. doi:10.1016/S0951-8320(00)00072-7.
178. Dekker, R. and Roelvink, I. F. K. (1995). Marginal cost criteria for preventive replacement
of a group of components. European Journal of Operational Research 84: 467–480.
179. Wildeman, R. E., Dekker, R., and Smit, A. C. J. M. (1997). A dynamic policy for group-
ing maintenance activities. European Journal of Operational Research 99: 530–551.
180. Do, P., Vu, H. C., Barros, A., and Berrenguer, C. H. (2015). Maintenance grouping for
multi-component systems with availability constraints and limited maintenance teams.
Reliability Engineering and System Safety 142: 56–67. doi:10.1016/j.ress.2015.04.022.
36 Reliability Engineering

181. Vu, H. C., Do, P., Barros, A., and Berenguer, C. H. (2014). Maintenance grouping strat-
egy for multi-component systems with dynamic contexts. Reliability Engineering and
System Safety 132: 233–249. doi:10.1016/j.ress.2014.08.002.
182. Zhang, X. and Zeng, J. (2015) A general modelling method for opportunistic mainte-
nance modelling of multi-unit systems. Reliability Engineering and System Safety 140:
176–190. doi:10.1016/j.ress.2015.03.030.
183. Zequeira, R. I., Valdes, J. E., and Berenguer, C. (2008). Optimal buffer inventory
and opportunistic preventive maintenance under random production capacity avail-
ability. International Journal of Production Economics 111: 686–696. doi:10.1016/​
j.ijpe.2007.02.037.
184. Laggoune, R., Chateauneuf, A., and Aissani, D. (2009). Opportunistic policy for opti-
mal preventive maintenance of a multi-component system in continuous operating
units. Computers and Chemical Engineering 33: 1499–1510.
185. Hou, W. and Jiang, Z. (2013). An opportunistic maintenance policy of multi-unit series
production system with consideration of imperfect maintenance. Applied Mathematics
and Information Sciences 7(1L): 283–290.
186. Shafiee, M., Finkelstein, M., and Berenguer, C. H. (2015). An opportunistic condition-
based maintenance policy for offshore wind turbine blades subjected to degradation
and environmental shocks. Reliability Engineering and System Safety 142: 463–471.
doi:10.1016/j.ress.2015.05.001.
187. Xia, T., Jin, X., Xi, L., and Ni, J. (2015). Production-driven opportunistic mainte-
nance for batch production based on MAM-APB scheduling. European Journal of
Operational Research 240: 781–790. doi:10.1016/j.ejor.2014.08.004.
188. Cavalcante, C. A. V. and Lopes, R. S. (2015). Multi-criteria model to support the defini-
tion of opportunistic maintenance policy: A study in a cogeneration system. Energy 80:
32–80.
189. Bedford, T., Dewan, I., Meilijson, I., and Zitrou, A. (2011). The signal model: A model
for competing risks of opportunistic maintenance. European Journal of Operational
Research 214: 665–673. doi:10.1016/j.ejor.2011.05.016.
190. Hu, J. and Zhang, L. (2014). Risk based opportunistic maintenance model for complex
mechanical systems. Expert Systems with Applications 41(6): 3105–3115. doi:10.1016/j.
eswa.2013.10.041.
191. Zhou, X., Xi, L., and Lee, J. (2006). A  dynamic opportunistic maintenance policy
for continuously monitored systems. Journal of Quality in Maintenance Engineering
12(3): 294–305. doi:10.1108/13552510610685129.
192. Shi, H. and Zeng, J. (2016). Real-time prediction of remaining useful life and p­ reventive
opportunistic maintenance strategy for multi-component systems considering stochas-
tic dependence. Computers and Industrial Engineering 93: 192–204. doi:10.1016/​
j.cie.2015.12.016.
193. Zhou, X., Lu, Z.-Q., Xi, L.-F., and Lee, J. (2010). Opportunistic preventive maintenance
optimization for multi-unit series systems with combing multi-preventive maintenance
techniques. Journal of Shanghai Jiaotong University 15(5): 513–518.
194. Gustavsson, E., Patriksson, M., Stromberg, A.-B., Wojciechowski, A., and Onnheim,
M. (2014). Preventive maintenance scheduling of multi-component systems with
interval costs. Computers and Industrial Engineering 76: 390–400. doi:10.1016/​
j.cie.2014.02.009.
195. Haque, S. A., Zohrul Kabir, A. B. M., and Sarker, R. A. (2003). Optimization model for
opportunistic replacement policy using genetic algorithm with fuzzy logic controller.
Proceedings of the Congress on Evolutionary Computation 4: 2837–2843.
196. Samhouri, M. S., Al-Ghandoor, A., Fouad, R. H., and Alhaj Ali, S. M. (2009). An intel-
ligent opportunistic maintenance (OM) system: A genetic algorithm approach. Jordan
Journal of Mechanical and Industrial Engineering 3(4): 246–251.
Preventive Maintenance Modeling 37

197. Kececioglu, D. and Sun, F.-B. (1995). A general discrete-time dynamic programming
model for the opportunistic replacement policy and its application to ball-bearing sys-
tems. Reliability Engineering and System Safety 47: 175–185.
198. Iung, B., Levrat, E., and Thomas, E. (2007). Odds algorithm-based opportunistic main-
tenance task execution for preserving product conditions. Annals of the CIRP 56/1:
13–16.
199. Derigent, W., Thomas, E., Levrat, E., and Iung, B. (2009). Opportunistic maintenance
based on fuzzy modelling of component proximity. CIRP Annals  – Manufacturing
Technology 58: 29–32.
200. Assid, M., Gharbi, A., and Hajji, A. (2015). Production planning and opportunistic pre-
ventive maintenance for unreliable one-machine two-products manufacturing systems.
IFAC-PapersOnLine 48–43: 478–483. doi:10.1016/j.ifacol.2015.06.127.
201. Hu, J., Zhang, L., and Liang, W. (2012). Opportunistic predictive maintenance for
complex multi-component systems based on DBN-HAZOP model. Process Safety and
Environmental Protection 90: 376–386.
202. Bedford, T. and Alkabi, B. M. (2009). Modelling competing risks and opportunis-
tic maintenance with expert judgement. In: Martorell, S., Guedes Soares, C. and
Barnett, J. Safety, Reliability and Risk Analysis: Theory, Methods and Applications:
Proceedings of European Safety and Reliability Conference ESREL 2008, Valencia,
Spain, September 22–25, 2008, Leiden, the Netherlands: Taylor & Francis Group:
pp. 515–521.
203. Radner, R. and Jorgenson, D. W. (1963). Opportunistic replacement of a single part in
the presence of several monitored parts. Management Science 10(1): 70–84.
204. Epstain, S. and Wilamowsky, Y. (1985). Opportunistic replacement in a deterministic
environment. Computers and Operations Research 12(3): 311–322.
205. Van Der Duyn Schouten, D. A., and Vanneste, S. G. (1990). Analysis and computation
of (n, N)-strategies for maintenance of a two-component system. European Journal of
Operational Research 48: 260–274.
206. Ding, S.-H. and Kamaruddin, S. (2012). Selection of optimal maintenance policy
by using fuzzy multi criteria decision making method. In: Proceedings of the 2012
International Conference on Industrial Engineering and Operations Management,
July 3–6, 2012, Istanbul, Turkey: pp. 435–443.
207. Sarker, B. R. and Ibn Faiz, T. (2016). Minimizing maintenance cost for offshore wind
turbines following multi-level opportunistic preventive strategy. Renewable Energy 85:
104–113. doi:10.1016/j.renene.2015.06.030.
208. Laggoune, R., Chateauneuf, A., and Aissani, D. (2010). Impact of few failure data
on the opportunistic replacement policy for multi-component systems. Reliability
Engineering and System Safety 95: 108–119. doi:10.1016/j.ress.2009.08.007.
209. Gunn, E. A. and Diallo, C. (2015). Optimal opportunistic indirect grouping of preven-
tive replacements in multicomponent systems. Computers and Industrial Engineering
90: 281–291. doi:10.1016/j.cie.2015.09.013.
210. Zhou, X., Huang, K., Xi, L., and Lee, J. (2015). Preventive maintenance ­modeling
for multi-component systems with considering stochastic failures and ­disassembly
sequence. Reliability Engineering and System Safety 142: 231–237. doi:10.1016/​
j.ress.2015.05.005.
211. Hopp, W. J. and Kuo, Y.-L. (1998). Heuristics for multicomponent joint replacement:
Applications to aircraft engine maintenance. Naval Research Logistics 45: 435–458.
212. Fard, N. and Zheng, X. (1991). An approximate method for non-repairable systems
based on opportunistic replacement policy. Reliability Engineering and System Safety
33: 277–288.
213. Zheng, X. and Fard, N. (1991). A maintenance policy for repairable systems based on
opportunistic failure-rate tolerance. IEEE Transactions on Reliability 40(2): 237–244.
38 Reliability Engineering

214. Pham, H. and Wang, H. (1999). Optimal (τ,T) opportunistic maintenance of a k-out-
of-n:G system with imperfect PM and partial failure. Naval Research Logistics 47:
223–239.
215. Cui, L. and Li, H. (2006). Opportunistic maintenance for multi-component shock
models. Mathematical Methods of Operations Research 63(3): 493–511. doi:10.1007/
s00186-005-0058-9.
216. Tambe, P. P. and Kularni, M. S. (2013). An opportunistic maintenance decision of a
multi-component system considering the effect of failures on quality. In: Proceedings
of the World Congress on Engineering 2013, Vol. 1, July 3–5, 2013, London, UK: WCE
2013: pp. 1–6.
217. Tambe, P. P., Mohite, S., and Kularni, M. S. (2013). Optimisation of opportunistic main-
tenance of a multi-component system considering the effect of failures on quality and
production schedule: A case study. International Journal of Advanced Manufacturing
Technology 69(5): 1743–1756.
218. Huynh, T. K., Barros, A., and Berenguer, C.H. (2013). A  reliability-based opportu-
nistic predictive maintenance model for k-out-of-n deteriorating systems. Chemical
Engineering Transactions 33: 493–498.
219. Cheng, Z., Yang, Z., Tan, L., and Guo, B. (2011). Optimal inspection and maintenance
policy for the multi-unit series system. In: Proceedings of 9th International Conference
on Reliability, Maintainability and Safety (ICRMS) 2011, June 12–15, 2011, Guiyang,
China: pp. 811–814.
220. Cheng, Z., Yang, Z., and Guo, B. (2013). Optimal opportunistic maintenance model of
multi-unit systems. Journal of Systems Engineering and Electronics 24(5): 811–817.
doi:10.1109/JSEE.2013.00094.
221. Taghipour, S. and Banjevic, D. (2012). Optimal inspection of a complex system sub-
ject to periodic and opportunistic inspections and preventive replacements. European
Journal of Operational Research 220: 649–660. doi:10.1016/j.ejor.2012.02.002.
222. Ormon, S. W. and Cassady, C. R. (2004). Cannibalization policies for a set of paral-
lel machines. In: Reliability and Maintainability, 2004 Annual Symposium: RAMS,
January 26–29, 2004, Colorado Springs, CO: pp. 540–545.
223. Nowakowski, T. and Plewa, M. (2009). Cannibalization: Technical system maintenance
method (in Polish). In: Proceedings of XXXVII Winter School on Reliability, Warsaw,
Poland: Szczyrk, Publication House of Warsaw University of Technology: pp. 230–238.
224. Sherbrooke, C. C. (2004). Optimal Modeling Inventory of Systems. Multi-echelon
Techniques. Boston, MA: Kluwer Academic Publishers.
225. Lv, X.-Z., Fan, B.-X., Gu, Y., and Zhao, X.-H. (2013), Selective maintenance model
considering cannibalization and its solving algorithm. In: Proceedings of 2013
International conference on Quality, Reliability, Risk, Maintenance, and Safety
Engineering (WR2MSE), IEEE: pp. 717–723.
226. Simon, R. M. (1970). Cannibalization policies for multicomponent systems. SIAM
Journal on Applied Mathematics 19(4): 700–711.
227. Baxter, L. A. (1988). On the theory of cannibalization. Journal of Mathematical
Analysis and Applications 136: 290–297. doi:10.1016/0022-247X(88)90131-X.
228. Khalifa, D., Hottenstein, M., and Aggarwal, S. (1977). Technical note: Cannibalization
policies for multistate systems. Operations Research 25(6): 1032–1039.
229. Byrkett, D. L. (1985). Units of equipment available using cannibalization for repair-part
support. IEEE Transactions on Reliability R-34(1): 25–28.
230. Jodejko-Pietruczuk, A. and Plewa, M. (2012). The model of reverse logistics, based on
reliability theory with elements’ rejuvenation. Logistics and Transport 2(15): 27–35.
231. Salman, S., Cassady, C. R., Pohl, E. A., and Ormon, S. W. (2007). Evaluating the
impact of cannibalization on fleet performance. Quality and Reliability Engineering
International 23: 445–457. doi:10.1002/qre.826.
Preventive Maintenance Modeling 39

232. Sherbrooke, C. C. (1971). An evaluator for the number of operationally ready aircraft in
a multilevel supply system. Operations Research 19(3): 618–635.
233. Shah, J. and Avittathur, B. (2007). The  retailer multi-item inventory problem with
demand cannibalization and substitution. International Journal of Production
Economics 106: 104–114. doi:10.1016/j.ijpe.2006.04.004.
234. Gaver, D. P., Isaacson, K. E., and Abell, J. B. (1993). Estimating aircraft recoverable
spares requirements with cannibalization of designated items. Santa Monica, CA:
RAND Corporation. https://www.rand.org/pubs/reports/R4213.html.
235. Hoover, J., Jondrow, J. M., Trost, R. S., and Ye, M. (2002). A  model to study:
Cannibalization, FMC, and customer waiting time. Alexandria, VA: CNA.
236. Albright, T. L., Geber, C. A., and Juras, P. (2014). How naval aviation uses the Balanced
Scorecard. Strategic Finance 10: 21–28.
237. Meenu, G. (2011). Identification of factors affecting product cannibalization in Indian
automobile sector. IJCEM International Journal of Computational Engineering and
Management 12: 2230–7893.
238. Curtin, N. P. (2001). Military Aircraft: Cannibalizations Adversely Affect Personnel
and Maintenance. Washington, DC: US General Accounting Office.
239. Cheng, Y.-H. and Tsao, H.-L. (2010). Rolling stock maintenance strategy selection,
spares parts’ estimation, and replacements’ interval calculation. International Journal
of Production Economics 128: 404–412. doi:10.1016/j.ijpe.2010.07.038.
240. Garg, J. (2013). Maintenance: Spare Parts Optimization. M2 Research Intern Theses,
Ecole Centrale de Paris, Capgemini Consulting.
241. Ondemir, O. and Gupta, S. M. (2014). A  multi-criteria decision making model for
advanced repair-to-order and disassembly-to-order system. European Journal of
Operational Research 233: 408–419. doi:10.1016/j.ejor.2013.09.003.
242. Silver, E. A. and Fiechter, C.-N. (1995). Preventive maintenance with limited historical
data. European Journal of Operational Research 82: 125–144.
243. Nguyen, K.-A., Do, P., and Grall, A. (2015). Multi-level predictive maintenance for
multi-component systems. Reliability Engineering and System Safety 144: 83–94.
doi:10.1016/j.ress.2015.07.017.
244. Predictive maintenance 4.0. Predict the unpredictable. PWC, Mainnovation,
Pricewaterhouse Coopers B.V. 2017.
245. Predictive maintenance and the smart factory. Deloitte Development LLC. 2017.
2 Inspection Maintenance
Modeling for
Technical Systems
An Overview
Sylwia Werbińska-Wojciechowska

CONTENTS
2.1 Introduction..................................................................................................... 41
2.2 Inspection Maintenance Modeling for Single-Unit Systems...........................44
2.2.1 Inspection Maintenance for Two-State Systems.................................44
2.2.2 Inspection Maintenance for Multi-state Systems................................ 47
2.3 Inspection Maintenance Modeling for Multi-unit Systems............................. 56
2.3.1 Inspection Maintenance for Standby Systems..................................... 56
2.3.2 Inspection Maintenance for Operating Systems.................................. 58
2.4 Hybrid Inspection Models............................................................................... 65
2.5 Other Inspection Maintenance Models........................................................... 67
2.6 Conclusions and Directions for Further Research........................................... 67
References................................................................................................................. 69

2.1 INTRODUCTION
All equipment breaks down from time to time, requiring materials, tradespeople
to repair it, and causing some negative consequences, such as loss in production or
transportation delays. To reduce the number of these breakdowns, planned main-
tenance actions are implemented. One of the most familiar planned maintenance
actions is inspection.
Currently, inspection and inspection policy development have an important role in
various technical systems, thus they attract a lot of attention in the literature. In many
situations there are no apparent systems indicating the forthcoming failure. In such
systems with non-self-announcing failures (also called unrevealed faults or latent
faults), the typical preventive maintenance policies cannot be used [1]. In maintenance
of such systems the inspection actions performance is introduced. Examples of these
systems include protective devices, emergency devices, and standby units (see [1,2]).
The  main purpose of an inspection is to determine the state of equipment
based on the chosen indicators, such as bearing wear, gauge readings, and quality
of a product [3]. Following this, the main definition of inspection can be derived.
41
42 Reliability Engineering

According to EN 13306:2018 standard  [4], inspection is defined as “examination


for conformity by measuring, observing, or testing the relevant characteristics of an
item.” The authors [5] extend this definition, providing that inspection is defined as
“­measuring, examining, testing, and gauging one or more characteristics of a prod-
uct or service and comparing the results with specified requirements to determine
whether conformity is achieved for each characteristic.”
The main benefits obtained from inspection performance include detection and
correction of minor defects before major breakdown occurs. Consequently, the
inspection maintenance optimization is strictly connected with system’s deteriora-
tion processes, which are generally stochastic. Thus, the condition of a system is
revealed only by its inspection. In other words, inspection models usually assume
that the state of the system is completely unknown unless an inspection is performed.
Following this, the knowledge about the true status of an inspected system gives the
possibility to take appropriate maintenance actions. However, execution of frequent
inspections incurs substantial cost. Conversely, infrequent inspections result in a
higher cost for system downtime because of longer intervals between performance
of these maintenance actions. Following this, to determine an inspection policy, the
correct balance between the number of inspections and the resulting output accord-
ing to the defined optimization criteria (e.g., maximization of profit, minimization of
downtime, and maximization of availability) must be sought.
Moreover, inspection schemes may be periodic and non-periodic (sequential) [6].
In  this chapter, the focus is on periodic inspection maintenance modeling issues.
More information about non-periodic inspection maintenance modeling may be
found in [1,7].
Early inspection maintenance models were developed in 1959 by R.E. Barlow
and L.C. Hunter in their work Mathematical models for system reliability (according
to [8]). A standard decision problem includes answering for the question: An unde-
tected failure causes an economic loss which increases in time, whereas inspec-
tions are costly too. What is the most cost-efficient way to schedule inspections
in time? Many extensions and modifications of the standard inspection model have
been developed and investigated. They have been surveyed in the last five decades.
One of the first research works that surveys inspection models is [9], where the
authors focus on the inspection and replacement problems of single and multi-unit
systems. The summary of optimal scheduling of replacement and inspection of sto-
chastically failing equipment is developed in [10]. Later, in [11] the authors review
the research studies that appeared between 1965 and 1976. In this work, the authors
present the discrete time maintenance models in which a unit (or units) is monitored
and a decision is made to repair, replace, and/or restock the unit(s). In [3], the author
gives a state-of-the-art review of the literature related to optimal inspection model-
ing of failing systems. The surveyed research papers were published in the 1960s
and 1970s. In 1989, the authors in [12] present a survey on the research published
after [11]. In this work, the authors focus on single-unit systems (one-unit and com-
plex systems), providing a section on inspection models. The  authors indicate the
main differences between developed models are time horizon, available information,
the nature of cost functions, models objective, and system’s constraints. The focus on
multi-unit systems inspection problems is given in [13]. In [14], the authors present
Inspection Maintenance Modeling for Technical Systems 43

the literature review on inspection maintenance models. The authors focus on the


inspection models with different types of inspection information (perfect or not) and
different costs of inspections (costly or costless inspection information). The same
year, the author in [15] reviews recent developments in the methodology for solving
inspection problems. The author focuses on the most important issues that need fur-
ther development (e.g., fallible tests performance).
In 2002, the authors in work [16] review classical maintenance models ­including
inspection strategies. They focus on the models developed in the 1960s and 1970s that
are based on the general inspection policy discussed by R. E. Barlow and F. Proschan
in Mathematical Theory of Reliability. The  author also investigates the standard
inspection policies in [17].
Later, in 2012 the authors in [8] review the main inspection models for systems.
They present the two main maintenance models—an inspection without replacement
and an inspection with replacement. The first group of inspection models includes
solutions for three situations: lifetime distribution is known, lifetime distribution is
partially known, and lifetime distribution is unknown.
In  the second group of maintenance models, the assumption of inspection-
replacement process is introduced. The  next year, the authors in  [18] present the
three classes of inspection problems: (1) inspection frequencies for equipment that is
in continuous operation and subject to breakdown, (2) inspection intervals for equip-
ment used only in emergency conditions, and (3) condition monitoring of equipment.
The recent literature review on inspection maintenance also is provided in [19], where
the author focuses on inspection maintenance for single-unit and multi-unit systems.
Moreover, some recent research works are dedicated to comparing the problems
with various maintenance policies. The  main comparisons between optimum and
nearly optimum inspection policies are given in [20,21], where authors refer to the
models developed by R. E. Barlow and F. Proschan as standard optimal policies.
In  [22], the three sub-optimal inspection polices are proposed and compared—
periodic policy, mean residual life policy, and constant hazard policy. The review
and comparison of known classical optimum-checking policies is given in  [23].
Comparisons for inspection and repair policies are analyzed in [24–26].
In summary, based on the developed literature reviews, the existing inspection
models can be classified many ways. One classification is given in [15], where the
author defines five main groups of optimal inspection models: imperfect inspec-
tion models, inspection with replacement policies, inspection policies with delayed
symptoms of failure, inspection models for stand-by systems, and Bayesian models.
More general classifications divide existing maintenance models into the inspection
models for two-states systems and multi-states systems ([27]), or inspection models
for single- and multi-unit systems ([28,29]). According to [1], inspection models are
classified considering the type of maintained systems: protective devices (safety sys-
tems), or standby units, and operating devices.
In this chapter, classification proposed divides the known models into four main
groups of inspection strategies: single-unit systems, multi-unit systems, hybrid
inspection models, and models dedicated to solving other maintenance problems
(e.g., case studies). Thus, the main scheme for classification of inspection models for
technical systems is given in Figure 2.1.
44 Reliability Engineering

INSPECTION MAINTENANCE MODELS FOR


INSPECTION MODELS FOR INSPECTION MODELS FOR
TECHNICAL SYSTEM
MULTI-UNIT SYSTEMS SINGLE-UNIT SYSTEMS

* finite/infinite horizon case * standby units/operating systems

* standby/operating systems * optimal or nearly optimal inspection


policy
* standby unit types HYBRID INSPECTION OTHER INSPECTION
MODELS MODELS * perfect/imperfect inspection performance
* perfect/imperfect inspection
performance * known/unknown lifetime distributions
* Risk-based inspection * case studies
* test procedure searching or optimal * shock models
* preventive maintenance with * safety issues in inspection
inspection models
inspections maintenance *two- or multi-state objects
* maintenance with reliability
constraints

* cumulative damage modeling


issues

* inventory policy joint optimization

*delay-time modeling concept

FIGURE 2.1  Inspection maintenance models for technical systems – the main classifica-
tion. (Own contribution based on Tang, T., Failure finding interval optimization for peri-
odically inspected repairable systems, PhD Thesis, University of Toronto, 2012; Beichelt, F.,
Nav. Res. Logist. Q., 28, 375–381, 1981; Cazorla, D.M. and R. Perez-Ocon, Eur. J. Oper. Res.,
190, 494–508, 2008; Boland, P.J. and E. El-Neweihi, Comput. Oper. Res., 22, 383–390, 1995.)

2.2 INSPECTION MAINTENANCE MODELING


FOR SINGLE-UNIT SYSTEMS
In this section, the author investigates a one-unit stochastically failing or deteriorat-
ing system in which only actual inspection can detect a system’s failure. Following
Figure 2.1, inspection models for two-state, single-unit systems are investigated first.

2.2.1  Inspection Maintenance for Two-State Systems


The first inspection model formulated by R. E. Barlow and F. Proschan [7] is called a
pure inspection model for a system and is characterized by the following assumptions:

• Two-stated system’s condition (functioning and failed state)


• The system’s condition is known only by inspections
• Inspections are perfect in the sense that a failure will be identified at inspection
• Inspections do not degrade or rejuvenate the system
• System cannot fail or age during inspection performance
• Inspection actions take negligible time

For the given assumptions, the expected total cost is obtained according to the formula:

∑∫
n+1
t in
C (Tin ) = tinn
cin1(n + 1) + cin2 (tinn+1 − x )  dF ( x ) (2.1)
n=0

where:
C(Tin) Long-run expected cost per unit time
cin1 Cost of first inspection action performance
cin2 Cost of second (and subsequent) inspection action performance
F(x) Probability distribution function of system/unit lifetime
Inspection Maintenance Modeling for Technical Systems 45

The main extensions of this pure inspection model of a system applies to perfect/


imperfect inspection process performance, assuming known/unknown system
lifetime distribution, cost/reliability optimization criteria use, or shock modeling
implementation.
One of the first extensions of the given pure inspection model applies to finite
horizon case implementation. In [30], the author analyzes a model that is based on
the selection of the best maintenance strategy for the object’s reliability state. In [31],
the author analyzes the problem of determining an optimum checking schedule over
the finite horizon with cost considerations.
In [32,33], a heuristic approach for determining the optimal inspection interval is
investigated. The authors in [33] assume that the optimal interval between inspec-
tions depends on a likelihood of malfunction, a cost of inspection, and a cost of
treatment. The developed model is examined later to analyze the relation of subjects’
judgments to the model description. Later, in [32], the author focuses on the develop-
ment of a mathematical model for determining a periodic inspection schedule in a
preventive maintenance program for a single machine.
The second, and very often investigated, extension of the basic inspection model
includes the situation when no or only partial information on a lifetime distribution of
a system is available. One of the first works that investigates this issue is given in [34].
The author in this work considers that the system lifetime distribution is unknown. To
find the optimal inspection policy parameters, the author uses the minimax inspection
strategies with respect to cost criterions. This model later is extended in [35] and [36].
Another interesting problem applies to the imperfect inspection performance
analysis. For  example, in  [37] the authors develop an imperfect inspection policy
for systems subject to a multiple correlated degradation process. In [38], the author
presents a problem of finding the optimum inspection procedure for a system,
whose time to failure is exponentially distributed. The problem is considered as a
­continuous-time Markovian decision process with two states (before and after fail-
ure) and provides a basis for the extended model given in [35].
A work worth noting is [39], where the authors introduce an optimal inspection
policy that is based on implementation of a failure detection zone. The idea is like a
delayed time approach (see [19]) or a Fault Trees with Time Dependencies modeling
approach (see [40]). In this model, if inspection is conducted in a pre-specified time
zone, a failure will be noticed before it occurs. Otherwise, the failure will remain
undetected. The analytical algorithm for searching for the optimal inspection inter-
val is given considering cost and availability criteria.
Another interesting problem is presented in [41], where the authors propose a
model in which the ith test increases a remaining failure rate without changing the
form of the conditional lifetime distribution. The solution algorithms for finding
the best testing times are developed for two cases of uniform and exponential fail-
ure time distributions.
The  problem of determination of an optimal inspection policy when inspec-
tions may be harmful to a maintained unit is continued also in [42]. The author in
this work develops a hazardous-inspection model where every performed test may
impair the tested unit. The proposed model is developed based on a Markov decision
process implementation and the emphasis is put on maximization of the expected
46 Reliability Engineering

lifetime of the inspected unit. A non-Markovian case is analyzed in [43]. The author


in this work develops two inspection policies: one-test and two-test. The two-stage
inspection procedure is dedicated to expensive devices and is based on perform-
ing a fallible test first and an error-free test whenever the first test reports a failure.
The models are based on the assumptions of arbitrary failure distributions, general
optimality conditions, and algorithms for reduction of the infinite horizon optimiza-
tion to two dimensions. This inspection problem is continued later in [44].
The problem of imperfect inspections with the implementation of multiple post
repair inspections and accidents during inspection is analyzed in [45]. The authors
in this model propose an inspection policy for single- and two-unit systems, where
a repairman is called immediately to repair a failed unit. The analytical solutions
are provided for various measures of reliability such as mean time to system failure,
steady-state availability, busy period of repairman for repair, and inspection per unit
time by using semi-Markov processes and regenerative point techniques.
Another interesting model is given in  [46]. The  author in this work considers
the problem of the optimal choice of periodic inspection intervals for a renewable
equipment without preventive replacement performance. The model is based on two
optimization criteria: minimization of maintenance costs and maximization of sys-
tem availability. The author develops an approximate method for inspection interval
calculations and proves that the obtained solutions are very close to the exact ones.
The  extended inspection models with imperfect testing also are investigated
in [47–50,51]. The continuation of inspection modeling with availability constraints,
given in [51], is presented in [52]. The authors in this work analyze the instantaneous
availability of a system maintained under periodic inspection with the use of random
walk models. Two cases are analyzed: deterministic and stochastic .
Some summary and extensions of the models presented in  [52] are given also
in [53]. In this work, the authors focus on periodic inspection, developing five basic
models with availability requirements. All the inspection models are based on dif-
ferent approaches to the determination of inspection times. In a later work [54], the
authors also extend the inspection models given in [51]. The main extension is based
on the assumption that periodic inspections take place at fixed time points after repair
or replacement in case of failure. The  implementation of minimal repairs before
replacement or perfect repair is analyzed in [55]. The authors in this work propose a
minimal repair model with periodic inspection and constant repair time. The instan-
taneous availability of the proposed model is derived by a set of recursive formulas,
providing the introduction to optimization of system reliability characteristics.
Recently, in  [56] the authors focus on the availability of a system under peri-
odic inspection with perfect repair/replacement and non-negligible downtime due
to repair/replacement for a detected failure and due to inspection. The  model is
an extension of the works given in [51,54,57]. The authors in this work analyze a
­calendar-based inspection policy and an age-based inspection policy.
The last group of inspection policies for two-stated, single-unit systems applies
to implementation of shock models. One of the first works focused implementation
of random shocks modeling for systems with non-self-announcing failures is given
in [85]. The authors in this work consider a periodic inspection model for a system
with randomly occurring shocks that follows a Poisson process and cumulatively
Inspection Maintenance Modeling for Technical Systems 47

damages the system. This  model is investigated and extended later in  [59,60].
The new inspection policy considers random shock magnitudes and times between
shock arrivals and focuses on optimization of availability criterion.
Another extension of the model presented in [58] is given in [61]. The authors in
this work incorporate a more general deterioration process that includes both shock
degradation and graceful degradation (continuous accumulation of damage). With
the use of regenerative arguments and considering a constant rate of graceful deg-
radation occurrence, an expression for the limiting average availability is derived.
The maintenance models for systems with two failure modes—type I failure rela-
tive to non-maintainable failure mode, and type II failure relative to periodically
maintainable failure mode—are developed in [62–65].
In  2006, a model with three types of inspections is introduced in  [66]. In  this
article, the authors assume that a system can fail because of three competing failure
types: I, II, and III. Partial inspections detect type I failures without error. Failures
of type II can be detected by imperfect inspections. Type III failures are detectable
only by perfect inspections. If the system is found to have failed in an inspection, a
perfect repair is made.
The  summary of the main known models published in the recent literature is
presented in Table 2.1. The author considers a few main criteria for summarizing
this review:

• The problem category (the main model characteristic that distinguishes it)


• Planning horizon (investigating infinite or finite case)
• Assumption about the quality of performed inspections in a maintained system
• Type of introduced failure modes (for shock modeling)
• Used optimality criterion (cost or reliability constraints)
• Modeling method that is used in order to optimize the inspection policy
• Model’s reference with the year of its publication

2.2.2  Inspection Maintenance for Multi-state Systems


In  some systems, such as critical infrastructure where the safety issues are very
important, reliability analysis carried out in relation to two-state technical objects
usually is insufficient (see [19] for a review). The solution to this problem is to con-
sider a technical object in terms of a minimum of three reliability states, where a
third state is the state of partial failure.
The  known inspection models for multi-state deteriorating single unit systems
may be classified to the two main groups: models for systems with perfect/­imperfect
inspection and models for systems subjected to shocks. Following are the main
directions of research done in these model groups.
One of the first developed inspection models for multi-state units is given in [79].
In  this work, the author presents a Markovian model, which is focused on proper
scheduling of inspections and preventive repairs considering minimization of the
total expected cost per time unit. The main assumptions include performance of peri-
odic inspections, implementation of perfect repair and inspection actions, and ran-
dom holding times of systems.
48

TABLE 2.1
Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Original algorithm Infinite Perfect n/a Expected cost per unit of time Analytical [67] 1980
Original algorithm Infinite Perfect n/a Expected cost per unit of time Analytical/optimal [68] 1984
Original algorithm Infinite Perfect n/a Expected profit per unit of time Analytical/heuristic [32] 1996
approach
Original algorithm Infinite Perfect n/a Expected cost function Heuristic approach [33] 1992
Original algorithm Infinite Perfect n/a Expected total cost Analytical [69] 2005
Original algorithm Finite Perfect n/a Expected costs of loss Discrete dynamic [30] 1980
programming
One-parameter Infinite Perfect n/a Average total cost per time unit [60] 1998
optimization model
Model with unknown or Infinite Perfect n/a Expected loss cost per time unit Analytical [34] 1981
partially unknown system
lifetime probabilitya
Model with known or Infinite Perfect n/a Total expected cost Analytical [36] 2006
unknown slpa
Model with unknown slpa Infinite Perfect/imperfect n/a Total expected cost Analytical [35] 2001
Model with known slpa Infinite Imperfect n/a Cost per unit of time Analytical [70] 2002
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal reward [71] 1995
time/ availability function process/non linear
programming
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal reward [72] 2003
time process
Reliability Engineering

(Continued)
TABLE 2.1 (Continued)
Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Model with known slpa Infinite Imperfect n/a Long-run expected cost per unit Renewal theory, [37] 2016
time Wiener process
Model with known slpa Infinite Imperfect n/a Total cost over a lifetime Continuous-time [38] 1982
Markovian decision
process
Model with known slpa Infinite Imperfect n/a Expected cost per time unit Markovian model [73] 1998
Model with known slpa Infinite Fallible/ n/a Long-run cost per unit time Dynamic [43] 1993
error-free tests programming
Model with known slpa Infinite Fallible tests n/a Long-run cost per unit time Analytical [44] 1993
Model with known slpa Infinite Fallible tests n/a Mean loss per unit time Analytical [41] 1979
Model with known slpa Infinite Fallible tests n/a Expected lifetime of the unit Markov decision [42] 1979
process
Inspection Maintenance Modeling for Technical Systems

Model with known slpa Infinite/ Failure detection n/a Long-run cost per unit time Analytical [39] 2015
finite zone
Model with known slpa Finite Imperfect n/a Expected sum of discounted cost Markov decision [74] 2008
process +
quasi-Bayes
approach + dynamic
programming
(Continued)
49
50

TABLE 2.1 (Continued)


Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Optimization model Infinite Perfect n/a Limiting average availability and Analytical [72] 2000
long-run inspection rate
Optimization model Infinite Perfect n/a Limiting average availability Analytical [51] 2000
Optimization model Infinite Perfect n/a Long-run average cost per unit Analytical [75] 2012
time
Optimization model Infinite Perfect n/a Average availability and the Analytical [76] 2014
long-run average cost rate
Optimization model Infinite Imperfect n/a Expected operational readiness Analytical [77] 1963
of a system
Optimization model Infinite Imperfect n/a System stationary availability Analytical [78] 2008
Optimization model Infinite Imperfect n/a Measures of system reliability Semi-Markov process [45] 2005
+ regenerative point
technique
Optimization model Infinite Imperfect n/a Stationary availability coefficient Analytical [46] 2009
and total expected cost per one
renewal period
Optimization model Infinite Imperfect n/a Limiting average availability and Analytical [49] 2012
the long-run average cost per
unit time
(Continued)
Reliability Engineering
TABLE 2.1 (Continued)
Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Optimization model Finite/ Perfect n/a Limiting average availability, Analytical [53] 2004
infinite long-run inspection rate,
instantaneous availability,
instantaneous inspection rate
Optimization model Finite/ Perfect n/a Limiting average availability, Analytical [54] 2005
infinite instantaneous availability
Optimization model Finite/ Perfect n/a Limiting average availability, Analytical [56] 2013
infinite instantaneous availability
Optimization model Finite Perfect n/a Instantaneous availability Analytical (random [52] 2001
walk model)
Optimization model Finite Perfect n/a Instantaneous availability Analytical [55] 2013
Optimization model Finite Imperfect n/a Long-run average cost per unit Analytical [48] 2013
time or cost-rate over the time
Inspection Maintenance Modeling for Technical Systems

to retirement
Shock model Infinite Perfect Random shocks Time-stationary availability Analytical (renewal [58–60] 1994, 1998,
arriving according to process) 2000
a Poisson process
Shock model Infinite Perfect Random shocks (a Limiting average availability Analytical (renewal [61] 2002
Poisson process) and process)
graceful degradation
(Continued)
51
52

TABLE 2.1 (Continued)


Summary of Inspection Policies for Two-State, Single-Unit Systems
Quality of Modeling Method/
Planning Performed Checking Type of Publication
Problem Category Horizon Inspections Failure Modes Optimization Criterion Procedures References Years
Shock model Infinite Perfect Two dependable failure Expected maintenance cost per Analytical (renewal [64] 2006
modes: maintainable unit time process)
and non-maintainable
Shock model Infinite Partial, perfect, Three competing Cost rate function Analytical (renewal [66] 2006
and imperfect failure modes: I, II, III process)
Shock model Infinite Perfect Two failure modes: Expected cost per unit of time Analytical (renewal [62,63] 2006
minor failure and process)
catastrophic failures
Shock model Infinite Perfect Two failure modes: Expected net cost rate Analytical (renewal [65] 2015
minor failure and process)
catastrophic failures

a slp – information about system lifetime probability


Reliability Engineering
Inspection Maintenance Modeling for Technical Systems 53

Another implementation of Markovian modeling in multi-state, single-unit sys-


tems maintenance problems are given in [80]. The authors in this work use non-
homogeneous Markovian techniques to model systems with tolerable down times.
The issues of partially observable process are examined also in [81]. The author
in this paper presents a model of a system that deteriorates according to a discrete-
time Markov processes and its operation and repair costs increase with system
deterioration state number. He  proposes a monotonic four-region policy with cost
considerations, where the decision process adopts a countable state space and a finite
action space. The continuation of this problem is given in [82], where the authors
propose a semi-Markov decision algorithm operating on the class of control-limit
rules. This problem is extended later in [83], where the authors allow for delayed
replacement performance and investigate the discounted cost structure.
The semi-Markov processes are applied in [84]. The author in this work develops
a maintenance model for systems with five states that constitute all possible cycles,
which begin with inspections. The  solution is based on reliability characteristics
assessment (asymptotic availability, reliability function).
Moreover, the maintenance inspection issues of production multi-state systems
and processes are analyzed in [85–88].
The second investigated problem regards to shock modeling. One of the first works
that considers inspection policies for multi-state, single-unit systems with shock
modeling is given in  [89]. The  given model is extended later in work  [90], where
the author determines an optimal inspection policy for a system with deterioration
process assumed to be an increasing pure jump Markov process. Later, in work [91]
the authors develop an optimal inspection-replacement policy for an item subject to
cumulative damage. In this model, a unit fails depending on the accumulated damage
caused by gradual damage. The authors calculate the optimal damage limit according
to the long-run expected cost rate criterion using the renewal reward theory.
The problem of imperfect inspections and imperfect repairs is investigated in [92].
A model considers a system submitted to external and internal failures whose dete-
rioration level is known by means of inspections. Moreover, the authors assume the
performance of two types of repairs—minimal and perfect—depending on the dete-
rioration level and following a different phase-type distribution. The solutions are
based on implementation of a generalized Markov process and the use of a phase-
type renewal process as a special case.
Another extension of  [89] is given in  [93], where the authors propose a state-
dependent maintenance policy for a multi-state continuous-time Markovian dete-
riorating system subject to aging and fatal shocks. The  model incorporates the
assumptions of state-dependent cost structure, imperfect repair, and perfect inspec-
tions, and is based on implementation of periodic inspections.
The availability of periodically inspected systems subjected to shocks is analyzed
in [94]. In this model, the authors analyze a system whose deterioration process is
modulated by a continuous-time Markov chain and additional damage is induced by
a Poisson shock process.
The  summary of the main known models published in the recent literature is
presented in Table  2.2. The  author applies the same classification criteria as in
Section 2.2.1.
54

TABLE 2.2
Summary of Inspection Policies for Multi-state, Single-Unit Systems
Quality of
Planning Performed Optimization Modeling Method/ Type of Publication
Problem Category Horizon Inspections Failure Modes Criterion Checking Procedures References Years
Optimization model Infinite Perfect n/a Discounted and Discrete-time Markov [81] 1976
average cost process
Optimization model Infinite Perfect n/a Discounted and Markov decision [88] 1978
average cost process
Optimization model Infinite Perfect n/a Total expected Markovian model [79] 1976
cost per time unit
Optimization model Infinite Perfect n/a Long-run Markov renewal [85] 1997
expected average theory
cost per unit time
Optimization model Infinite Perfect n/a Expected long-run Semi-Markov decision [83] 1992
discounted cost process
Optimization model Infinite Imperfect n/a Long-run Analytical [76] 2014
expected cost per
unit time
Optimization model Infinite Imperfect n/a Expected total Discrete-time Markov [86] 1986
discounted cost chain
Optimization model Infinite Imperfect n/a Reliability Semi-Markov [77] 1962
function processes
Inspection with CBM Infinite Imperfect n/a Operational Analytical [50] 2013
modeling reliability
Optimization model Finite Perfect n/a Average cost Semi-Markov decision [82] 1984
model
(Continued)
Reliability Engineering
TABLE 2.2 (Continued)
Summary of Inspection Policies for Multi-state, Single-Unit Systems
Quality of
Planning Performed Optimization Modeling Method/ Type of Publication
Problem Category Horizon Inspections Failure Modes Criterion Checking Procedures References Years
Shock model Infinite Perfect Cumulative damage attributed Long-run average Analytical (renewal [89] 1980
to shocks occurrence cost per unit time reward theorem)
(Poisson process)
Shock model Infinite Perfect Deterioration level assumed Long-run average Markov process/ [90] 1987
as increasing pure jump cost per unit time control-limit policy
Markov process
Shock model Infinite Perfect Cumulative damage caused Expected long-run Analytical (renewal [91] 1997
by gradual damage cost rate reward theorem)
Shock model Infinite Perfect Poisson shock process Limiting average Continuous-time [94] 2006
availability Markov chain
Inspection Maintenance Modeling for Technical Systems

Shock model Infinite Perfect Fatal shocks occurrence Expected long-run Continuous-time [93] 2001
cost rate Markov process
Shock model Infinite Perfect/imperfect Internal and external failures Total costs per Generalized Markov [92] 2008
occurrence unit time process
55
56 Reliability Engineering

2.3 INSPECTION MAINTENANCE MODELING


FOR MULTI-UNIT SYSTEMS
The  general classification of the main investigated inspection policies for multi-­
component systems considers the type of hidden failures. According to [39], there
are two types of hidden failures:

• Type I: protective devices or standby unit. The function of these devices is


to protect the main system in case of failures.
• Type II: operating devices. They  are operating systems, and their failure
will cause direct loss.

At the beginning models are investigated for protective devices and standby units.

2.3.1  Inspection Maintenance for Standby Systems


The standby units are characteristic for many engineering systems. Spare compo-
nents, or systems, that are not in continuous operation are the examples of this sort
of unit  [129]. The  main function of the spare unit is to replace the component in
use when the latter fails so that the system is restored to operating condition as
soon as possible. However, the standby units also deteriorate and fail with its fail-
ures remaining undiscovered until the next attempt to use them, unless some test or
inspection is carried out (unrevealed failures).
Many inspection models dedicated to the inspection of standby systems were
developed in the 1970s and 1980s. For example, a two-unit repairable system is ana-
lyzed in [95]. In this work, the first unit is operative and the other is in cold standby.
The author in this work considers two types of failure situations: (1) a failure of an
active element is detected instantaneously but a failure of a standby unit is revealed
at inspection epochs only and (2) a failure of both the active and the standby units
is revealed at the time of an inspection only. The  extension of this model is pre-
sented also in  [96], where the authors discuss a two-unit cold standby redundant
system with repair, inspection, and preventive maintenance. The model is based on
the assumption of arbitrary distributions of failure time, inspection time, repair, and
preventive repair times.
The reliability analysis of a two-unit cold standby system with the consideration
of single repair facility performance is given in [97]. In this work [97], the authors
assume that a single repair facility facilitates inspection, replacement, preparation,
and repair. Moreover, failure, delivery, replacement, and inspection times have expo-
nential distributions, whereas all other time distributions are general.
A similar problem is analyzed in [98], where the authors investigate a two-unit
warm standby system with minor (internal) and major (external) repair. Another
extension of these works applies to the analysis of two non-identical units. Using the
regenerative point technique, various pointwise and steady-state reliability charac-
teristics of system effectiveness are obtained.
Later, a warm standby n-system with operational and repair times following
phase-type distributions is considered in [99]. The analyzed system is governed by
Inspection Maintenance Modeling for Technical Systems 57

a level-dependent quasi-birth-and-death process and the general Markov model is


provided. The main reliability characteristics that are calculated include availability
and rate of occurrence of failures.
Another extension of the inspection model developed in [97] is given in [100].
In  this work, the authors consider a reliability model for a two-unit cold standby
system with a single server. In  the work, various reliability measures of system
effectiveness are obtained by using a semi-Markov process and a regenerative point
technique. Later, this model is extended in [101], where the authors investigate two
non-identical units, where the first unit goes for repair, inspection, and post repair
(when needed), whereas the second unit is as good as new after repair. The priority
in operation is given to the first unit (lower running costs), while the priority in repair
is given to the second unit (less time consuming). The model also is based on various
calculations of reliability characteristics with the use of regenerative point technique
and Monte Carlo simulation.
Moreover, the extension of [100] is given in [102]. The authors in this work study
two dissimilar (automatic and manual) cold standby systems. An inspection policy is
introduced for an automatic machine to detect this kind of a failure. The model solu-
tion is based on the estimation of various measures of reliability and profit incurred
to the system using a semi-Markov process and a regenerative point technique.
The problem of time-dependent unavailability of periodically tested aging com-
ponents under various testing and repair policies is analyzed in [103,104].
The  investigation of maintenance for multi-component systems, which may be
either in operating condition or in the standby mode is presented in [70]. The authors
in this work define an inspection policy along with a preventive maintenance (PM)
procedure and imperfect testing for a series system. The cost optimization is per-
formed based on the renewal theory use.
The shock model implementation is considered in [105]. The authors in this work
consider a parallel redundant system consisting of n components. Considering the
assumption that the arrival rate of shocks and the failure probabilities of compo-
nents may depend on an external Markovian environment, the authors propose
several state-dependent maintenance policies based on system availability and cost
functions.
The  components failure interaction is considered in  [106]. The  authors in this
work investigate a two-component cold standby system under periodic inspections.
They assume that a failure of one component can modify the failure probability of a
component still operating with a constant probability and obtain the system reliabil-
ity function for the case of staggered inspections. The failure interaction scheme is
like the shock model used in studies of common cause failures (known as a β-Factor
model).
The continuation of research studies about testing policies for two-unit parallel
standby systems without identical components is presented in [107]. The authors in
this work propose an optimal testing policy for a system under the criteria of avail-
ability and maintenance costs. The analytical solution is provided in the context of
recognition of common cause failure.
Moreover, the comparison of various inspection models for redundant systems is
given in [108]. In this work, the authors provide the comparison of four models of
58 Reliability Engineering

two- and three-component systems using discrete Markov chains. The first model
applies to active redundancy without component repair, the second model includes
active redundancy with component repair, the third and fourth models analyze
standby redundancy without and with component repair.

2.3.2  Inspection Maintenance for Operating Systems


Inspection models for multi-unit operating systems include two main groups of research
works: test procedure searching models and optimal inspection models. The first group
of models is focused on the development of the best maintenance scheduling order,
answering the question: In what order the components should be tested to satisfy the
time requirements? The second group of inspection models focuses on optimal main-
tenance policy searching considering cost and/or reliability criteria.
One of the first research works on optimum test procedure models is given in [109].
The author in this work focuses on searching for test procedures that maximize the
probability of locating a failed component within the given time. The solution is pro-
vided using renewal theory and dynamic programming. Later, the authors in [110]
study the problem of scheduling activities of several types under time constraints.
The developed model is focused on finding an optimal schedule that specifies the
periods to execute each of the activity types to minimize the long-run average cost
per period. The discrete time maintenance problem of n machines is solved for finite
and infinite time horizon cases.
The  implementation of an imperfect inspection case into a maintenance man-
agement model is presented in [111]. The authors in this work analyze a two-stage
inspection process that considers detection and sizing activities. The purpose of this
study is to develop a method that simulates deterioration, inspection, repair, and
failure of structures over time using Markov matrices.
Another inspection model that includes an imperfect inspection problem is given
in [112]. The authors present a model for determining optimal inspection plans for
critical multi-characteristic components. The inspection is performed in stages by
inspectors who may make mistakes—errors of false acceptance and false rejection
occurrence possibility. This problem is continued later in [113] and the extension of
this model is given in [114]. The model is focused on finding the optimal number of
inspections necessary to minimize the total cost per accepted component.
The  issues of imperfect inspections performance are analyzed in  [115,116].
In [116], the authors investigate an imperfect inspection model focused on processes
of testing and estimation of model parameters. The probability of failure detection is
a constant variable and the solution is based on a Markov chain and use of simulation
modeling. In [115], the authors develop a maintenance policy for pipelines subjected
to corrosion, including predictive degradation modeling, time-dependent reliability
assessment, inspection uncertainty, and expected cost optimization. The solution is
obtained with the use of Bayesian modeling. The influence of the type I and type II
inspection errors on maintenance costs is investigated in [117].
The second group of models applies to the problem of optimization of inspec-
tion policy parameters. In this area, one of the preliminary models is given in [118].
The  author in this work develops an optimal inspection and replacement model
Inspection Maintenance Modeling for Technical Systems 59

for a coherent system with components having exponential life-time distributions.


The  solution is based on the implementation of a semi-Markov decision process
framework.
One of the extensions of this model is presented in [119], where the author develops
an optimal inspection strategy under two optimality criteria: the long-run average net
income and the total expected discounted net income. The author considers a multi-
unit machine in a series-reliability structure, if along the inspection process only one
unit can be tested. This problem later is investigated in [120], where the author gives
an example to demonstrate that the previously presented characterization of the opti-
mal inspection policy for series systems is not correct in the discounted case.
Another extension of the optimal inspection model given in [118] applies to the
investigation of reliability characteristics. For example, in [121] the author presents
an analytical method that gives upper and lower bounds for the reliability in a case of
systems subject to inspections at Poisson random times. This model later is extended
in  [122] by providing the exact expression of the reliability function, its Laplace
transform, and the Mean Time To Failure (MTTF) of the system.
Later, perfect and minimal repair policies in a reliability model are considered
in [123]. The author in this work considers two-unit systems with stochastic depen-
dence and two types of failures (soft and hard failures), providing analytical reliabil-
ity and cost models. The practical application is based on the optimization of steam
turbine system maintenance.
The issues of structural reliability are considered in [124], where the authors ana-
lyze the optimal time interval for inspection and maintenance of offshore structures.
The structural reliability is expressed here by means of closed-form mathematical
formulas that are incorporated into the cost-benefit analysis.
Moreover, in the literature inspection maintenance policies for multi-state sys-
tems can be found. For example, in [125] the authors focus on a periodic inspection
maintenance model for a system with several multi-state components over a finite
time horizon. The  degradation process of the components is modeled by the non-
homogeneous continuous-time Markov chain, and the particle swarm optimization
is used to optimize the maintenance threshold and inspection intervals under cost
constraints. Later, in [126] an optimization model of an inspection-based PM policy
is developed for three-state mechanical components subject to competing failure
modes, which integrates continuous degradation and discrete shock effects. Periodic
inspection of series systems with revealed and unrevealed failures is considered
in [127]. This model extends the one given in [118] by introducing the probability of
failure revealing. The simple maintenance model for n independent components in
series is based on renewal theory.
Series-parallel systems are considered in  [128]. The  authors propose a general
preventive maintenance model used to optimize the maintenance cost. The model is
developed using a simulation approach and a parallel simulation algorithm for avail-
ability analysis. A special ratio-criterion is based on a Birnbaum importance factor.
The optimization is performed using a genetic algorithm technique.
The  summary of the main known models published in the recent literature is
presented in Table 2.3. The author considers the same classification criteria as in the
previous sections.
60

TABLE 2.3
Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Standby Cold standby Infinite Perfect Main unreliability Analytical (regenerative [104] 1997
system characteristics point technique)
Standby Cold standby Infinite Perfect Reliability function, MTTF Analytical (renewal theory) [95] 1970
system
Standby Cold standby Infinite Perfect Expected loss due to system Analytical (renewal theory) [107] 2012
system unavailability per time
unit, the average system
unavailability per cycle
Standby Cold standby Infinite Perfect Main reliability Semi-Markov process and [102] 2016
system characteristics, the regenerative point
expected total profit per technique
unit of time
Standby Cold standby Infinite Perfect Main reliability Semi-Markov process and [100] 2011
system characteristics, the profit regenerative point
function technique
Standby Cold standby Infinite Perfect Main reliability Regenerative point [101] 2012
system characteristics, the technique, MC
expected total profit per simulation, Bayesian
unit of time setup
(Continued)
Reliability Engineering
TABLE 2.3 (Continued)
Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Standby Warm standby Infinite Perfect Main reliability Generalized Markov [99] 2008
system characteristics, the total process
cost of a system per unit
of time
Standby Warm standby Infinite Perfect/ imperfect Total cost per unit of time Analytical (renewal theory) [129] 2002
system
Standby Cold/warm Infinite Perfect Limiting average Analytical (renewal [105] 2009
system standby availability, the expected theory), Markov jump
cost rate process
Standby Cold standby Finite/infinite Perfect Main reliability Analytical (regenerative [97] 1995
system characteristics point technique)
Standby Warm standby Finite/infinite Perfect Main reliability Analytical (regenerative [98] 1995
system characteristics, the point technique)
Inspection Maintenance Modeling for Technical Systems

expected total profit in


(0,t] and per unit of time
Standby Cold standby Finite/infinite Perfect Main unreliability Analytical (regenerative [103] 1999
system characteristics point technique)
Standby Cold standby Finite Perfect Distribution function of Analytical (renewal theory) [96] 1970
system time to the first system
down and the mean time
to the first system down
(Continued)
61
62

TABLE 2.3 (Continued)


Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Standby Warm standby Finite Perfect Average unavailability in Analytical [106] 2005
system inspection interval
Operating n/a Infinite Perfect Long-run expected cost per Semi-Markov decision [118] 1987
system unit time framework
Operating n/a Infinite Perfect Long-run average net Renewal theory [119] 1989
system income and total expected
discounted net income
Operating n/a Infinite Perfect Expected cost of operation Renewal theory [126] 2016
system per unit of time
Operating n/a Infinite Perfect Average total cost of Renewal theory [127] 2009
system maintenance for unit of
time
Operating n/a Infinite Perfect Total expected discounted Analytical [120] 1991
system net income
Operating n/a Finite/infinite Perfect Long-run average cost per Analytical [110] 1998
system period
Operating n/a Finite Perfect Probability that the failed Renewal theory and [109] 1964
system component is checked out dynamic programming
before given time period
Operating n/a Finite Imperfect Total cost of inspection Analytical [114] 2008
system
(Continued)
Reliability Engineering
TABLE 2.3 (Continued)
Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Operating n/a Finite Imperfect Expected annual total cost Markov model and Event [111] 2010
system based decision theory
Operating n/a Finite Imperfect Expected total cost per Analytical and Bayes [112,113] 1995, 2002
system accepted component theorem
Operating n/a Finite Perfect Expected total cost Analytical [24] 1995
system
Operating n/a Finite Perfect Failure distribution Nonlinear programming [130] 2014
system parameters
Operating n/a Finite Perfect Total inspection cost Particle swarm [131] 2012
system optimization algorithm
Operating n/a Finite Perfect Availability function Analytical [132] 2011
system
Operating n/a Finite Perfect Sum of inspection, repair Simulation modeling [133] 1999
Inspection Maintenance Modeling for Technical Systems

system and risk cost


Operating n/a Finite Perfect Reliability characteristics Renewal theory [121] 1999
system
Operating n/a Finite Perfect Reliability function, MTTF Analytical [122] 2002
system
Operating n/a Finite Perfect Expected cost incurred in Analytical [123] 2016
system the inspection for each
cycle
(Continued)
63
64

TABLE 2.3 (Continued)


Summary of Inspection Policies for Multi-unit Systems
Quality of
System Stand by Planning Performed Modeling Method/ Type of
Type Unit Type Horizon Inspections Optimization Criterion Checking Procedures References Publication Years
Operating n/a Finite Perfect System availability GA and MC simulation [128] 2003
system function, inspection cost
Operating n/a Finite Perfect Maintenance cost rate in a Non-homogeneous [125] 2015
system renewal cycle continuous Markov chain
Operating n/a Finite Perfect/imperfect Total expected social cost Markov decision process [74] 2008
system and quasi-Bayes approach
Operating n/a Finite Imperfect Expected total cost function Analytical (cost-benefit [124] 2014
system analysis)
Operating n/a Finite Imperfect Expected cost incurred in a Analytical (and Bayes [115] 2013
system cycle theory)
Operating n/a Finite Imperfect Probability functions Analytical and three-state [116] 1993
system Markov chain
Reliability Engineering
Inspection Maintenance Modeling for Technical Systems 65

2.4  HYBRID INSPECTION MODELS


In the investigation of hybrid inspection models, two main groups of models can be
defined:

• Risk-based inspection models (RBI)


• Inspection models with preventive maintenance policy implementation

The first group of models focuses on “designing and optimization of an inspection


scheme based on the performance of a risk assessment progress using historical
database, analytical methods, experience and engineering judgment” [134]. In this
approach, risk assessment is used as a valuable tool to assign priorities among inspec-
tion and maintenance activities by analyzing the likelihood of failure and its conse-
quences [135,136]. This approach is predominantly used in the oil and gas industries
(see [134,136–139]), but some implementations also may be found for marine sys-
tems (see [135]), nuclear power plants (see [140–142]), or railway systems (see [143]).
A basic overview on RBI is given in [6].
The second group of the maintenance models is based on different types of prob-
lem investigations. For example, in the literature maintenance models can be found
that are based on the implementation of maintenance-free operating periods in the
development of inspection policy (see [73]). The maintenance model as a mixture
of a standard age replacement policy (ARP) and a maintenance procedure for unre-
vealed failures is given in [70]. The maintenance policy for a unit as inspected and
maintained preventively at periodic intervals is given in  [144]. The  author in this
work develops two maintenance models as an extension of the well-known ARP and
an inspection model with constant checking time.
The  introduction of an inspection-repair-replacement (IRR) policy is given
in  [71,72]. In  these works, the authors assume that a system is inspected at pre-
assigned times to distinguish between the up and down states. If the system is
identified as being in the down state during the inspection, then a repair action (per-
fect repair according to  [71] or minimal repair (according to  [72]) will be taken.
Moreover, periodic preventive replacement is performed. The focus is to determine
an optimal IRR policy so that the availability of the system is high enough at any
time considering the minimization of cost criterion. The models are based on the
renewal reward process use.
Simple and hybrid inspection policies focused on guaranteeing a high level of
availability are investigated in [175]. First, the simple periodical inspection is ana-
lyzed. To overcome its weaknesses and consider the information about remaining
life of a system, the quantile-based inspections are introduced. This inspection pol-
icy is valid for increasing failure rate of the system. Later, a hybrid inspection policy
is developed that considers performance of maintenance actions (periodic inspec-
tions or quantile-based inspections) according to the type of lifetime distributions:
increasing failure rate or decreasing failure rate. Analytical solutions and numerical
examples are provided for the limiting average availability and the long-run inspec-
tion rate assumptions.
66 Reliability Engineering

A randomly failing single unit system whose failures may be self-announcing or


not  self-announcing is considered later in  [78]. The  authors in this work consider
a randomly failing single unit system that is submitted to inspection when its age
reaches Tyin units of time. The model includes imperfect inspection and preventive
replacement performance. The proposed model is based on the implementation of
the basic strategy of an ARP for the case of self-announcing failures. The objective
is to determine the inspection and preventive maintenance interval that maximizes
the stationary availability of the system.
The hybrid inspection models are developed for maintenance of multi-unit sys-
tems. The block inspection and replacement policy is presented in [106], where the
authors introduce a periodical inspection for a two-unit parallel system. This model
considers the detection capacity of inspections (perfect/imperfect), minimal repairs,
and failure interactions to consider dependence between subsystems.
An interesting model is developed in [146], where the authors continue investiga-
tion of issues analyzed in [106] and [147]. The authors consider a multi-unit system
composed of identical units having periodic imperfect PM and periodic inspection
carried out every Tin time units. During the performance of inspection actions, units
are checked to ascertain whether they are working or not. Failed units are replaced
by new ones at inspection time. Assuming negligible PM times, the authors estimate
an average cost per unit time function.
Another interesting problem is presented in  [148], where the authors consider
periodic and opportunistic inspections of a system with hard-type and soft-type
components. Failures of soft-type components can be detected only at inspections.
Thus, a system can operate with a soft failure, but its performance may be reduced.
The  hard-type component failures are self-announcing and create an opportunity
for additional inspection (opportunistic inspection) of all soft-type components.
Moreover, the system also is inspected periodically. Based on this assumption, the
two optimization models are discussed using the simulation modeling approach and
cost criteria. This problem also is continued in [149].
The  problem of opportunistic inspection performance is considered in  [150].
The authors in this work investigate an nk-out-of-n system with hidden failures and
under periodic inspection. The  developed model is based on the assumption that
every system failure presents an additional opportunity for inspection. The objec-
tive is to find the optimal periodic inspection policy and the optimal maintenance
action at each inspection for the entire system. Moreover, three types of maintenance
are considered: minimal repair, preventive replacement, and corrective replace-
ment. The inspection maintenance model is based on implementation of a genetic
algorithm and on cost criteria. The extensions of this model is presented in [151],
where the authors focus on an nk-out-of-n system with components whose failures
follow a Non-Homogeneous Poisson Process (NHPP). This model does not optimize
the maintenance action, which is based on the components state (age dependent).
However, the model considers an inventory policy that focuses on supporting the
inspection policy to ensure the required spares when necessary (at inspection times).
The modeling approach is based on development of the simulation model.
Inspection Maintenance Modeling for Technical Systems 67

2.5  OTHER INSPECTION MAINTENANCE MODELS


When analyzing and reviewing the literature on inspection maintenance, other
issues (not mentioned in the previous subsections) also are noticeable. To the most
commonly investigated issues we may include:

• Production planning and quality control (see [152–155])


• Cumulative damage modeling (see [156,157])
• Joint optimization of inventory policy with inspection maintenance model-
ing (see [158,159])
• Safety and reliability in maintenance (see [6,160–165])

Some examples of case studies can be found on optimization of inspection schedules


for different systems. For example, in the literature optimization of inspection policy
can be found for railway carriers (see [166]), nuclear power plants (see [161,167,168]),
tunnel lighting systems (see [169]), a scale that weighs products in the final stage of
the manufacturing process (see [170,171]), sewing machines (see [172]), or wooden
poles structures (see  [173]). Other inspection problems that are investigated apply
to optimization of the periodic inspection of aircraft (see  [130]), maintenance of
transport systems with a subjective estimation approach (see [174]), investigations of
system reliability structure (see [175]), inspection frequency of safety-related control
systems of machinery (see [132,176]), optimization of inspection and maintenance
decisions for infrastructure facilities (see [74]), inspection issues of hydraulic com-
ponents (see [133]), safety-related control systems (see [132]), or multi-stage inspec-
tion problems (see [131]). Simulation modeling is investigated in [177].
A  widely investigated inspection of production process/systems and the main-
tenance issues is worth noting. Research in this area focuses mostly on computer-
aidediInspection planning systems (see [178] for state of the art) or maintenance and
inspection models for production inventory systems (see [179–183]). In this research
area, authors are interested in development of inspection policies for systems in stor-
age to provide high reliability (see [184–189]).

2.6  CONCLUSIONS AND DIRECTIONS FOR FURTHER RESEARCH


In  this chapter, the author provides a literature review on the most commonly
used optimal inspection maintenance models. The  literature was selected using
Google Scholar as a search engine and ScienceDirect, JStor, SpringerLink, and
SAGEJournals. The  author primarily searched the relevant literature based on
keywords, abstracts, and titles. Moreover, also articles were searched for relevant
references. The following main terms and/or a combination of them were used for
searching the literature: inspection maintenance, inspection model, and inspection
maintenance optimization.
The  selection methodology was based on searching for the defined keywords,
and later choosing the models that satisfy the main reviewing criteria. For example,
68 Reliability Engineering

2% 6%
10%

59% 23%

1962–1969 1970–1979 1980–1989 1990–1999 2000–2016

FIGURE 2.2  Models distribution in relation to the period of their publication.

when searching for the keyword “inspection maintenance” in Google search, there
were about 260 million hits. In the ScienceDirect database, this keyword had about
98,500 hits. Comparing the obtained search results to the main required criteria
such as periodic inspection, maintenance optimization, and technical system, 122
inspection models published from 1962 to 2016 (see Figure 2.2) were the focus of
this chapter.
Due to the plethora of available publications on inspection maintenance, there
was no possibility to present all the known models from this research area. The most
investigated ones that are not included in this chapter apply to:

• Sequential inspection maintenance modeling (see [17,23,57])


• Condition-based maintenance with inspection modeling issues (see [190])
• Delay-time modeling (see [19])

This literature overview lets the author draw the following main conclusions:

• The  most commonly used mathematical methods applied for analysis of


inspection maintenance scheduling problems include applied probability
theory, renewal theory, Markov decision theory, and Genetic Algorithms
(GA)  technique. However, there are a lot of inspection maintenance
problems that are too complex (e.g., shocks modeling and information
uncertainty) to be solved in an analytical way. Thus, in practice, simulation
processes and Bayesian approaches can be used widely.
• Most research on periodic inspections for hidden failures assumes that the
times for inspection are negligible. However, in some cases the inspection
time cannot be ignored due to its influence on system reliability characteris-
tics. Thus, the optimal inspection policy is not obtained using this assumption.
• Many inspection maintenance models are based on simplified assumptions
of infinite planning horizon, the steady-state conditions, perfect repair pol-
icy, available spare parts, and so on. These assumptions often are not valid
for performance of real-life systems.
Inspection Maintenance Modeling for Technical Systems 69

• Due to the complexity of models developed for inspection maintenance,


in many cases there are problems with optimal computation of checking
procedures. Thus, in such situations, the nearly optimal methods or algo-
rithms should be implemented. Such algorithms usually are developed for
the single-unit case.
• The widely known inspection maintenance models focus on performance
of the inspection action that only gives the information about the state of the
tested system (up state or down state). There are no models developed that
give additional information about the signals of forthcoming failures (some
defects occurrence); thus, this type of maintenance models is not enough for
systems in which such symptoms may be diagnosed.

REFERENCES
1. Tang T (2012) Failure finding interval optimization for periodically inspected repair-
able systems. PhD Thesis, University of Toronto.
2. Keller JB (1982) Optimum inspection policies. Management Science 28(4): 447–450.
3. Sheriff YS (1982) Reliability analysis: Optimal inspection & maintenance schedules of
failing equipment. Microelectronics Reliability 22(1): 59–115.
4. PN-EN 13306:2018 Maintenance—Maintenance terminology, The Polish Committee
for Standardization, Warsaw.
5. Gulati R, Kahn J, Baldwin R (2010) The professional’s guide to maintenance and reli-
ability terminology. Reliabilityweb.com.
6. Peters R (2014) Reliable, Maintenance Planning, Estimating, and Scheduling. Gulf
Professional Publishing.
7. Barlow RE, Hunter LC, Proschan F (1963) Optimum checking procedures. Journal
of the Society for Industrial and Applied Mathematics 11(4): 1078–1095. https://www.
jstor.org/stable/2946496.
8. Beichelt F, Tittmann P (eds.) (2012) Reliability and Maintenance. Networks and
Systems. CRC Press.
9. Radner R, Jorgenson DW (1962) Optimal replacement and inspection of stochasti-
cally failing equipment. In: Arrow KJ, Karlin S, Scarf H (eds.) Studies in Applied
Probability and Management Science, Stanford University Press: 184–206.
10. Jorgenson DW, Mccall JJ (1963) Optimal scheduling of replacement and inspection.
Operations Research 11(5): 732–746.
11. Pierskalla WP, Voelker JA (1976) A survey of maintenance models: The control and
surveillance of deteriorating systems. Naval Research Logistics Quarterly 23: 353–388.
12. Valdez-Flores C, Feldman R (1989) A survey of preventive maintenance models for sto-
chastically deteriorating single-unit systems. Naval Research Logistics 36: 419–446.
13. Cho ID, Parlar M (1991) A  survey of maintenance models for multi-unit systems.
European Journal of Operational Research 51(1): 1–23.
14. Thomas LC, Gaver DP, Jacobs PA  (1991) Inspection models and their application.
IMA Journal of Mathematics Applied in Business and Industry 3: 283–303.
15. Parmigiani G (1991) Scheduling inspections in reliability. Institute of Statistics and
Decision Sciences Discussion Paper no.  92–A11:1–21, Duke University. https://stat.
duke.edu/research/papers/1992-11 (accessed 17 October 2018).
16. Osaki S (ed.) (2002) Stochastic Models in Reliability and Maintenance, Springer-
Verlang, Berlin, Germany.
17. Nakagawa T (2005) Maintenance Theory of Reliability. Springer.
70 Reliability Engineering

18. Jardine AKS, Tsang AHC (2013) Maintenance, replacement and reliability. Theory and
Applications. CRC Press.
19. Werbińska-Wojciechowska S (2019) Technical System Maintenance. Delay-Time-Based
Modeling. Springer.
20. Kaio N, Osaki S (1989) Comparison of inspection policies. Journal of Operations
Research Society 40(5): 499–503. Palgrave Macmillan Journals.
21. Kaio N, Osaki S (1988) Inspection policies: Comparisons and modifications. Revenue
française d’automatique, d’informatique et de recherché opérationnelle. Recherche
opérationnelle 22(4): 387–400.
22. Munford AG (1981) Comparison among certain inspection policies. Management
Science 27(3): 260–267.
23. Jiang R, Jardine AKS (2005) Two optimization models of the optimum inspection prob-
lem. The Journal of the Operational Research Society 56(10): 1176–1183. doi:10.1057/
palgrave.jors.2601885.
24. Boland PJ, El-Neweihi E (1995) Expected cost comparisons for inspec-
tion and repair policies. Computers and Operations Research 22(4): 383–390.
doi:10.1016/0305-0548(94)00047-C.
25. Hu T, Wei Y (2001) Multivariate stochastic comparisons of inspection and repair poli-
cies. Statistics and Probability Letters 51: 315–324.
26. Mccall JJ (1963) Operating characteristics of opportunistic replacement and inspection
policies. Management Science 10(1): 85–97.
27. Choi KM (1997) Semi-Markov and delay time models of maintenance. PhD thesis,
University of Salford, UK.
28. Chelbi A, Ait-Kadi D (2009) Inspection strategies for randomly failing systems. In:
Ben-Daya M, Duffuaa SO, Raouf A, Knezevic J, Ait-Kadi D (eds.) Handbook of
Maintenance Management and Engineering. Springer, London, UK.
29. Lee C (1999) Applications of delay time theory to maintenance practice of complex
plant. PhD thesis, University of Salford, UK.
30. Bobrowski D (1980) Optimisation of technical object maintenance with inspections (in
Polish). In: Proceedings of Winter School on Reliability, Center for Technical Progress,
Katowice, Poland: 31–46.
31. Viscolani B (1991) A  note on checking schedules with finite horizon. Operations
Research 25(2): 203–208. doi:10.1051/ro/1991250202031.
32. Hariga MA (1996) A maintenance inspection model for a single machine with general
failure distribution. Microelectronics Reliability 36(3): 353–358.
33. Klatzky RL, Messick DM, Loftus J (1992) Heuristics for determining the optimal inter-
val between checkups. Psychological Science 3(5): 279–284.
34. Beichelt F (1981) Minimax inspection strategies for single unit systems. Naval Research
Logistics Quarterly 28(3): 375–381.
35. Leung FKN (2001) Inspection schedules when the lifetime distribution of a single-
unit system is completely unknown. European Journal of Operational Research 132:
106–115. doi:10.1016/S0377-2217(00)00115-6.
36. Okumura S (2006) Determination of inspection schedules of equipment by variational
method. Mathematical Problems in Engineering, Hindawi Publishing Corporation,
Article ID 95843: 1–16.
37. Liu B, Zhao X, Yeh R-H, Kuo W (2016) Imperfect inspection policy for systems with
multiple correlated degradation processes. IFAC-PapersOnLine 49–12: 1377–1382.
38. Senegupta B (1982) An exponential riddle. Journal of Applied Probability 19(3):
737–740.
39. Guo H, Szidarovszky F, Gerokostopoulos A, Niu P (2015) On determining optimal
inspection interval for minimizing maintenance cost. In: Proceedings of 2015 Annual
Reliability and Maintainability Symposium (RAMS), IEEE: 1–7.
Inspection Maintenance Modeling for Technical Systems 71

40. Magott J, Nowakowski T, Skrobanek P, Werbinska-Wojciechowska S (2010) Logistic


system modeling using fault trees with time dependencies—Example of tram network.
In: Bris R, Guedes Soares C, Martorell S (eds.) Reliability, Risk and Safety: Theory and
Applications. Vol. 3, Taylor & Francis, London, UK: 2293–2300.
41. Wattanapanom N, Shaw L (1979) Optimal inspection schedules for failure detection in
a model where tests hasten failures. Operations Research 27(2): 303–317.
42. Butler DA (1979) A hazardous-inspection model. Management Science 25(1): 79–89.
43. Parmigiani G (1993) Optimal inspection and replacement policies with age-dependent fail-
ures and fallible tests. The Journal of the Operational Research Society 44(11): 1105–1114.
44. Parmigiani G (1993) Optimal scheduling of fallible inspections. DP no. 92–38: 1–30,
https://stat.duke.edu/research/papers/1992-38 (accessed 17 October 2018).
45. Rizwan SM, Chauhan H, Taneja G (2005) Stochastic analysis of systems with accident
and inspection. Emirates Journal for Engineering Research 10(2): 81–88.
46. Hryniewicz O (2009) Optimal inspection intervals for maintainable equipment. In:
Martorell S, Guedes-Soares C, Barnett J (eds.) Safety, Reliability and Risk Analysis:
Theory, Methods and Applications, Taylor & Francis Group, London.
47. Berrade MD (2012) A  two-phase inspection policy with imperfect testing. Applied
Mathematical Modelling 36: 108–114. doi:10.1016/j.apm.2011.05.035.
48. Berrade MD, Cavalcante CAV, Scarf PA (2013) Modelling imperfect inspection over
a finite horizon. Reliability Engineering and System Safety 111: 18–29. doi:10.1016/j.
ress.2012.10.003.
49. Berrade MD, Cavalcante CAV, Scarf PA (2012) Maintenance scheduling of a protec-
tion system subject to imperfect inspection and replacement. European Journal of
Operational Research 218: 716–725. doi:10.1016/j.ejor.2011.12.003.
50. Berrade MD, Scarf PA, Cavalcante CAV, Dwight RA (2013) Imperfect inspection and
replacement of a system with a defective state: A cost and reliability analysis. Reliability
Engineering and System Safety 120: 80–87. doi:10.1016/j.ress.2013.02.024.
51. Sarkar J, Sarkar S (2000) Availability of a periodically inspected system under perfect
repair. Journal of Statistical Planning and Inference 91: 77–90.
52. Cui L, Xie M (2001) Availability analysis of periodically inspected systems with
random walk model. Journal of Applied Probability 38: 860–871. doi:10.1017/
S0021900200019082.
53. Cui L, Xie M, Loh H-T (2004) Inspection schemes for general systems. IIE Transactions
36: 817–825. doi:10.1080/07408170490473006.
54. Cui L, Xie M (2005) Availability of a periodically inspected system with random
repair or replacement times. Journal of Statistical Planning and Inference 131: 89–100.
doi:10.1016/j.jspi.2003.12.008.
55. Yang J, Gang T, Zhao Y (2013) Availability of a periodically inspected system main-
tained through several minimal repairs before a replacement of a perfect repair. Hindawi
Publishing Corporation, Abstracts and Applied Analysis, Article ID 741275: 1–6.
56. Tang T, Lin D, Banjevic D, Jardine AKS (2013) Availability of a system subject to
hidden failure inspected at constant intervals with non-negligible downtime due to
inspection and downtime due to repair/replacement. Journal of Statistical Planning
and Inference 143: 176–185. doi:10.1016/j.jspi.2012.05.011.
57. Luss H, Kander Z (1974) Inspection policies when duration of checkings is non-­negligible.
Operational Research Quarterly 25(2): 299–309.
58. Wortman MA, Klutke G-A, Ayhan A (1994) A maintenance strategy for systems sub-
jected to deterioration governed by random shocks. IEEE Transactions on Reliability
43(3): 439–445.
59. Chelbi A, Ait-Kadi D (2000) Generalized inspection strategy for randomly failing sys-
tems subjected to random shocks. International Journal of Production Economics 64:
379–384. doi:10.1016/S0925-5273(99)00073-0.
72 Reliability Engineering

60. Chelbi A, Ait-Kadi D (1998) Inspection and predictive maintenance strategies.


International Journal of Computer Integrated Manufacturing 11(3): 226–231.
doi:10.1080/095119298130750.
61. Klutke G-A, Yang Y (2002) The availability of inspected systems subject to shocks and
graceful degradation. IEEE Transactions on Reliability 51(3): 371–374.
62. Badia FG, Berrade MD (2006) Optimal inspection of a system with two types of fail-
ures under age dependent minimal repair. Monografias del Seminario Matematico
Garcia de Galdeano 33: 207–214.
63. Badia FG, Berrade MD (2006) Optimum maintenance of a system under two types of
failure. International Journal of Materials and Structural Reliability 4(1): 27–37.
64. Zequeira RI, Berenguer C (2006) An inspection and imperfect maintenance model
for a system with two competing failure modes. In: Proceedings of the 6th IFAC
Symposium: Supervision and Safety of Technical Processes: 932–937.
65. Sheu S-H, Tsai H-N, Wang F-K, Zhang ZG (2015) An extended optimal replacement
model for a deteriorating system with inspections. Reliability Engineering and System
Safety 139: 33–49. doi:10.1016/j.ress.2015.01.014.
66. Zequeira RI, Berenguer C (2006) Optimal scheduling of non-perfect inspections.
IMA Journal of Management Mathematics 17: 187–207. doi:10.1093/imaman/dpi037.
67. Nakagawa T, Yasui K (1980) Approximate calculation of optimal inspection times.
Journal of Operational Research Society 31: 851–853.
68. Aven T (1984) Optimal inspection when the system is repaired upon detection of fail-
ure. Microelectronics Reliability 24(5): 961–963.
69. Yeh RH, Chen HD, Wang C-H (2005) An inspection model with discount factor for
products having Weibull lifetime. International Journal of Operations Research 2(1):
77–81.
70. Badia FG, Berrade MD, Campos CA (2002) Optimal inspection and preventive main-
tenance of units with revealed and unrevealed failures. Reliability Engineering and
System Safety 78: 157–163.
71. Yeh L (1995) An optimal inspection-repair-replacement policy for standby systems.
Journal of Applied Probability 32(1): 212–223.
72. Yang Y, Klutke G-A (2000) Improved inspection schemes for deteriorating equipment.
Probability in the Engineering and Informational Sciences 14(4): 445–460.
73. Dagg RA, Newby M (1998) Optimal overhaul intervals with imperfect inspection and
repair. IMA Journal of Mathematics Applied in Business and Industry 9: 381–391.
74. Durango-Cohen PL, Madanat SM (2008) Optimization of inspection and maintenance
decisions for infrastructure facilities under performance model uncertainty: A Quasi-
Bayes approach. Transportation Research Part A: Policy and Practice 42(8): 1074–
1085. doi:10.1016/j.tra.2008.03.004.
75. Cheng GQ, Li L (2012) A geometric process repair model with inspections and its opti-
misation. International Journal of Systems Science 43(9): 1650–1655. doi:10.1080/002
07721.2010.549586.
76. Wang W, Zhao F, Peng R (2014) A  preventive maintenance model with a two-level
inspection policy based on a three-stage failure process. Reliability Engineering and
System Safety 121: 207–220. doi:10.1016/j.ress.2013.08.007.
77. Weiss GH (1963) Optimal periodic inspection programs for randomly failing equip-
ment. Journal of Research of the National Bureau of Standards—B. Mathematics and
Mathematical Physics 67B(4): 223–228.
78. Chelbi A, Ait-Kadi D, Aloui H (2008) Optimal inspection and preventive maintenance
policy for systems with self-announcing and non-self-announcing failures. Journal of
Quality in Maintenance Engineering 14(1): 34–45, doi:10.1108/13552510810861923.
79. Luss H (1976) Maintenance policies when deterioration can be observed by inspec-
tions. Operational Research 24(2): 359–366.
Inspection Maintenance Modeling for Technical Systems 73

80. Becker G, Camarinopoulos L, Ziouas G (1994) A  Markov type model for systems
with tolerable down times. The Journal of the Operational Research Society 45(10):
1168–1178. doi:10.2307/2584479.
81. Rosenfield D (1976) Markovian deterioration with uncertain information. Operations
Research 24(1): 141–155.
82. Tijms HC, Van Der Duyn Schouten FA (1984) A Markov decision algorithm for optimal
inspections and revisions in a maintenance system with partial information. European
Journal of Operational Research 21: 245–253. Elsevier.
83. Kawai H, Koyanagi J (1992) An optimal maintenance policy of a discrete time Markovian
deterioration system. Computers Mathematics with Applications 24(1/2): 103–108.
84. Weiss GH (1962) A  problem in equipment maintenance. Management Science 8(3):
266–277.
85. Fung J, Makis V (1997) An inspection model with generally distributed restoration and
repair times. Microelectronics Reliability 37(3): 381–389.
86. Ohnishi M, Kawai H, Mine H (1986) An optimal inspection and replacement policy for
a deteriorating system. Journal of Applied Probability 23(4): 973–988.
87. Wang GJ, Zhang YL (2014) Geometric process model for a system with inspections
and preventive repair. Computers and Industrial Engineering 75: 13–19. doi:10.1016/​
j.cie.2014.06.007.
88. White III ChC (1978) Optimal inspection and repair of a production process subject to
deterioration. The Journal of the Operational Research Society 29(3): 235–243.
89. Zuckerman D (1980) Inspection and replacement policies. Journal of Applied
Probability 17(1): 168–177.
90. Abdel-Hameed M (1987) Inspection and maintenance policies of devices subject to
deterioration. Advances in Applied Probability 19(4): 917–931.
91. Kong MB, Park KS (1997) Optimal replacement of an item subject to cumulative dam-
age under periodic inspections. Microelectronics Reliability 37(3): 467–472.
92. Delia M-C, Rafael P-O (2008) A  maintenance model with failures and inspection
following Markovian arrival processes and two repair modes. European Journal of
Operational Research 186: 694–707. doi:10.1016/j.ejor.2007.02.009.
93. Chiang JH, Yuan J (2001) Optimal maintenance policy for a Markovian system
under periodic inspection. Reliability Engineering and System Safety 71: 165–172.
doi:10.1016/S0951-8320(00)00093-4.
94. Kharoufer JP, Finkelstein DE, Mixon DG (2006) Availability of periodically inspected
systems with Markovian wear and shocks. Journal of Applied Probability 43(2): 303–
317. doi:10.1239/jap/1152413724.
95. Mazumdar M (1970) Reliability of two-unit redundant repairable systems when failures
are revealed by inspections. SIAM Journal on Applied Mathematics 19(4): 637–647.
96. Osaki S, Asakura T (1970) A two-unit standby redundant system with repair and pre-
ventive maintenance. Journal of Applied Probability 7(3): 641–648.
97. Mahmoud MAW, Mohie El-Din MM, El-Said Moshref M (1995) Reliability analysis of
a two-unit cold standby system with inspection, replacement, proviso of rest, two types
of repair and preparation time. Microelectronics Reliability 35(7): 1063–1072.
98. Pandey D, Tyagi SK, Jacob M (1995) Profit evaluation of a two-unit system with inter-
nal and external repairs, inspection and post repair. Microelectronics Reliability 35(2):
259–264.
99. Cazorla DM, Perez-Ocon R (2008) An LDQBD process under degradation, inspection,
and two types of repair. European Journal of Operational Research 190: 494–508.
doi:10.1016/j.ejor.2007.04.056.
100. Kumar J (2011) Cost-benefit analysis of a redundant system with inspection and priority
subject to degradation. IJCSI International Journal of Computer Science Issues 8(6/2):
314–321.
74 Reliability Engineering

101. Kishan R, Jain D (2012) A two non-identical unit standby system model with repair,
inspection and post-repair under classical and Bayesian viewpoints. Journal of
Reliability and Statistical Studies 5(2): 85–103.
102. Bhatti J, Chitkara AK, Kakkar MK (2016) Stochastic analysis of dis-similar standby
system with discrete failure, inspection and replacement policy. Demonstratio
Mathematica 49(2): 224–235.
103. Vaurio JK (1999) Availability and cost functions for periodically inspected preventively
maintained units. Reliability Engineering and System Safety 63: 133–140. doi:10.1016/
S0951-8320(98)00030-1.
104. Vaurio JK (1997) On time-dependent availability and maintenance optimization of
standby units under various maintenance policies. Reliability Engineering and System
Safety 56: 79–89. doi:10.1016/S0951-8320(96)00132-9.
105. Kenzin M, Frostig E (2009) M out of n inspected systems subject to shocks in random
environment. Reliability Engineering and System Safety 94: 1322–1330. doi:10.1016/j.
ress.2009.02.005.
106. Zequeira RI, Berenguer C (2005) On the inspection policy of a two-component parallel
system with failure interaction. Reliability Engineering and System Safety 88: 99–107.
doi:10.1016/j.ress.2004.07.009.
107. Lee BL, Wang M (2012) Approximately optimal testing policy for two-unit parallel
standby systems. International Journal of Applied Science and Engineering 10(3):
263–272.
108. Mendes AA, Coit DW, Duarte Ribeiro JL (2014) Establishment of the optimal time
interval between periodic inspections for redundant systems. Reliability Engineering
and System Safety 131: 148–165. doi:10.1016/j.ress.2014.06.021.
109. Greenberg H (1964) Optimum test procedure under stress. Operations Research 12(5):
689–692.
110. Anily S, Glass CA, Hassin R (1998) The scheduling of maintenance service. Discrete
Applied Mathematics 82(1–3): 27–42. doi:10.1016/S0166-218X(97)00119-4.
111. Sheils E, O’connor A, Breysse D, Schoefs F, Yotte S (2010) Development of a two-
stage inspection process for the assessment of deteriorating infrastructure. Reliability
Engineering and System Safety 95: 182–194. doi:10.1016/j.ress.2009.09.008.
112. Duffuaa S, Al-Najjar HJ (1995) An optimal complete inspection plan for critical
multicharacteristic components. Journal of the Operational Research Society 46(8):
930–942.
113. Duffuaa S, Khan M (2002) An optimal repeat inspection plan with several classifi-
cations. Journal of the Operational Research Society 53(9): 1016–1026. doi:10.1057/
palgrave.jors.2601392.
114. Duffuaa S, Khan M (2008) A general repeat inspection plan for dependent multicharac-
teristic critical components. European Journal of Operational Research 191: 374–385.
doi:10.1016/j.ejor.2007.02.033.
115. Sahraoui Y, Khelif R, Chateauneuf A (2013) Maintenance planning under imperfect
inspections of corroded pipelines. International Journal of Pressure Vessels and
Piping 104: 76–82. doi:10.1016/j.ijpvp.2013.01.009.
116. Srivastava MS, Wu Y (1993) Estimation and testing in an imperfect-inspection model.
IEEE Transactions on Reliability 42(2): 280–286. IEEE, doi: 10.1109/24.229501.
117. Godziszewski J (2001) The impact of errors of the first and second types made dur-
ing inspections on the costs of maintenance of a homogeneous equipment park (in
Polish). In: Proceedings of XIX Winter School on Reliability—Computer Aided
Dependability Analysis, Publishing House of Institute for Sustainable Technologies,
Radom: 89–100.
118. Aven T (1987) Optimal inspection and replacement of a coherent system.
Microelectronics Reliability 27(3): 447–450. doi:10.1016/0026-2714(87)90460-4.
Inspection Maintenance Modeling for Technical Systems 75

119. Zuckerman D (1989) Optimal inspection policy for a multi-unit machine. Journal of
Applied Probability 26: 543–551.
120. Qiu Y (1991) A note on optimal inspection policy for stochastically deteriorating series
systems. Journal of Applied Probability 28: 934–939.
121. Dieulle L (1999) Reliability of a system with Poisson inspection times. Journal of
Applied Probability 36(4): 1140–1154.
122. Dieulle L (2002) Reliability of several component sets with inspections at random
times. European Journal of Operational Research 139: 96–114.
123. Rezaei E (2017) A  new model for the optimization of periodic inspection intervals
with failure interaction: A case study for a turbine rotor. Case Studies in Engineering
Failure Analysis 9: 148–156. doi:10.1016/j.csefa.2015.10.001.
124. Tolentino D, Ruiz SE (2014) Influence of structural deterioration over time on the opti-
mal time interval for inspection and maintenance of structures. Engineering Structures
61: 22–30. doi:10.1016/j.engstruct.2014.01.012.
125. Lu Z, Chen M, Zhou D (2015) Periodic inspection maintenance policy with a general
repair for multi-state systems. In: Proceedings of Chinese Automation Congress (CAC):
2116–2121.
126. Zhang J, Huang X, Fang Y, Zhou J, Zhang H, Li J (2016) Optimal inspection-based pre-
ventive maintenance policy for three-state mechanical components under competing
failure modes. Reliability Engineering and System Safety 152: 95–103. doi:10.1016/j.
ress.2016.02.007.
127. Carvalho M, Nunes E, Telhada J (2009) Optimal periodic inspection of series sys-
tems with revealed and unrevealed failures. In: Safety, Reliability and Risk Analysis:
Theory, Methods and Applications—Proceedings of the Joint Esrel and SRA-Europe
Conference, CRC Press: 587–592.
128. Bris R, Chatelet E, Yalaoui F (2003) New method to minimize the preventive main-
tenance cost of series-parallel systems. Reliability Engineering and System Safety 82:
247–255. doi:10.1016/S0951-8320(03)00166-2.
129. Badia FG, Berrade MD, Campos CA  (2002) Maintenance policy for multivariate
standby/operating units. Applied Stochastic Models in Business and Industry 18:
147–155.
130. Huang J, Song Y, Ren Y, Gao Q (2014) An optimization method of aircraft periodic inspec-
tion and maintenance based on the zero-failure data analysis. In: Proceedings of 2014
IEEE Chinese Guidance, Navigation and Control Conference, Yantai, China: 319–323.
131. Azadeh A, Sangari MS, Amiri AS (2012) A particle swarm algorithm for inspection
optimization in serial multi-stage process. Applied Mathematical Modelling 36: 1455–
1464. doi:10.1016/j.apm.2011.09.037.
132. Dzwigarek M, Hryniewicz O (2011) Frequency of periodical inspections of safety-
related control systems of machinery—Practical recommendations for determining
methods. In: Proceedings of Summer Safety and Reliability Seminars, SSARS 2011,
Gdańsk-Sopot, Poland: 17–26.
133. Alfares H (1999) A simulation model for determining inspection frequency. Computers
and Indus-trial Engineering 36: 685–696. doi:doi.org/10.1016/S0360-8352(99)00159-X.
134. Bai Y, Bai Q (2014) Subsea Pipeline Integrity and Risk Management. Elsevier.
doi:10.1016/C2011-0-00113-8.
135. Bai Y, Jin W-L (2015) Marine Structural Design. Elsevier.
136. Zhaoyang T, Jianfeng L, Zongzhi W, Jianhu Z, Weifeng H (2011) An evaluation of main-
tenance strategy using risk-based inspection. Safety Science 49: 852–860. doi:10.1016/j.
ssci.2011.01.015.
137. Hagemeijer PM, Kerkveld G (1998) A methodology for risk-based inspection of pres-
surized systems. Proceedings of the Institution of Mechanical Engineers, Part E:
Journal of Process Mechanical Engineering 212(1): 37–47. SAGE Journals.
76 Reliability Engineering

138. Hagemeijer PM, Kerkveld G (1998) Application of risk-based inspection for pressurized
HC production systems in a Brunei petroleum company. Proceedings of the Institution of
Mechanical Engineers, Part E: Journal of Process Mechanical Engineering 212(1):
49–54.
139. Wang J, Matellini B, Wall A, Phipps J (2012) Risk-based verification of large off-
shore systems. Proceedings of the Institution of Mechanical Engineers, Part
M: Journal of Engineering for the Maritime Environment 226(3): 273–298.
doi:10.1177/1475090211430302.
140. Jovanovic A  (2003) Risk-based inspection and maintenance in power and process
plants in Europe. Nuclear Engineering and Design 226: 165–182.
141. Kallen MJ, Van Noortwijk JM (2005) Optimal maintenance decisions under imper-
fect inspection. Reliability Engineering and System Safety 90: 177–185. doi:10.1016/j.
ress.2004.10.004.
142. You J-S, Kuo H-T, Wu W-F (2006) Case studies of risk-informed inservice inspection
of nuclear piping systems. Nuclear Engineering and Design 236: 35–46.
143. Podofillini L, Zio E, Vatn J (2006) Risk-informed optimisation of railway tracks
inspection and maintenance procedures. Reliability Engineering and System Safety 91:
20–35. doi:10.1016/j.ress.2004.11.009.
144. Nakagawa T (1980) Replacement models with inspection and preventive maintenance.
Microelectronics and Reliability 20: 427–433.
145. Yeh L (2003) An inspection-repair-replacement model for a deteriorating system with
unobservable state. Journal of Applied Probability 40: 1031–1042.
146. Park JH, Lee SC, Hong JW, Lie CH (2009) An optimal block preventive maintenance
policy for a multi-unit system considering imperfect maintenance. Asia-Pacific Journal
of Operational Research 26(6): 831–847.
147. Sheu S-H, Lin Y-B, Liao G-L (2006) Optimum policies for a system with general imper-
fect maintenance. Reliability Engineering and System Safety 91(3): 362–369.
148. Taghipour S, Banjevic D (2012) Optimum inspection interval for a system under peri-
odic and opportunistic inspections. IIEE Transactions 44: 932–948. doi:10.1080/07408
17X.2011.618176.
149. Taghipour S, Banjevic D (2012) Optimal inspection of a complex system subject to
periodic and opportunistic inspections and preventive replacements. European Journal
of Operational Research 220: 649–660. doi:10.1016/j.ejor.2012.02.002.
150. Babishin V, Taghipour S (2016) Joint optimal maintenance and inspection for a k-out-
of-n system. International Journal of Advanced Manufacturing Technology 87(5):
1739–1749. doi:10.1109/RAMS.2016.7448039.
151. Bjarnason ETS, Taghipour S (2014) Optimizing simultaneously inspection interval and
inventory levels (s, S) for a k-out-of-n system. In: 2014 Reliability and Maintainability
Symposium, Colorado Springs, CO: 1–6. doi:10.1109/RAMS.2014.6798463.
152. Chen C-T, Chen Y-W, Yuan J (2003) On dynamic preventive maintenance policy
for a system under inspection. Reliability Engineering and System Safety 80: 41–47.
doi:10.1016/S0951-8320(02)00238-7.
153. Chen Y-C (2013) An optimal production and inspection strategy with preventive main-
tenance error and rework. Journal of Manufacturing Systems 32: 99–106. doi:10.1016/j.
jmsy.2012.07.010.
154. Duffuaa S, El-Ga’aly A (2013) A multi-objective mathematical optimization model for
process targeting using 100% inspection policy. Applied Mathematical Modelling 37:
1545–1552. doi:10.1016/j.apm.2012.04.008.
155. Wang H, Wang W, Peng R (2017) A two-phase inspection model for a single compo-
nent system with three-stage degradation. Reliability Engineering and System Safety
158: 31–40.
Inspection Maintenance Modeling for Technical Systems 77

156. Feng Q, Peng H, Coit DW (2010) A  degradation-based model for joint optimiza-
tion of burn-in, quality inspection, and maintenance: A light display device applica-
tion. International Journal of Advanced Manufacturing Technology 50: 801–808.
doi:10.1007/s00170-010-2532-7.
157. Tsai H-N, Sheu S-H, Zhang ZG (2016) A  trivariate optimal replacement policy for
a deteriorating system based on cumulative damage and inspections. Reliability
Engineering and System Safety 160: 122–135. doi:10.1016/j.ress.2016.10.031.
158. Bjarnason ETS, Taghipour S, Banjevic D (2014) Joint optimal inspection and inven-
tory for a k-out-of-n system. Reliability Engineering and System Safety 131: 203–215.
doi:10.1016/j.ress.2014.06.018.
159. Panagiotidou S (2014) Joint optimization of spare parts ordering and maintenance
policies for multiple identical items subject to silent failures. European Journal of
Operational Research 235: 300–314. doi:10.1016/j.ejor.2013.10.065.
160. Bukowski JV (2001) Modeling and analyzing the effects of periodic inspection on
the performance of safety-critical systems. IEEE Transactions on Reliability 50(3):
321–329. doi:10.1109/24.974130.
161. Ellingwood BR, Mori Y (1997) Reliability-based service life assessment of con-
crete structures in nuclear power plants: Optimum inspection and repair. Nuclear
Engineering and Design 175: 247–258.
162. Estes AC, Frangopol DM (2000) An optimized lifetime reliability-based inspection
program for deteriorating structures. In: Proceedings of the 8th ASCE Joint Specialty
Conference on Probabilistic Mechanics and Structural Reliability, Notre Dame, IN.
163. Faber MH, Sorensen JD (2002) Indicators for inspection and maintenance planning of
concrete structures. Structural Safety 24: 377–396. doi:10.1016/S0167-4730(02)00033-4.
164. Onoufriou T, Frangopol DM (2002) Reliability-based inspection optimization of com-
plex structures: A brief retrospective. Computers and Structures 80: 1133–1144.
165. Woodcock K (2014) Model of safety inspection. Safety Science 62: 145–156.
166. Ten Wolde M, Ghobbar AA (2013) Optimizing inspection intervals—Reliability and
availability in terms of a cost model: A  case study on railway carriers. Reliability
Engineering and System Safety 114: 137–147. doi:10.1016/j.ress.2012.12.013.
167. Ali SA, Bagchi G (1998) Risk-informed in service inspection. Nuclear Engineering
and Design 181: 221–224.
168. Garnero M-A, Beaudouin F, Delbos J-P (1998) Optimization of bearing-inspection
intervals. IEEE Proceedings of Annual Reliability and Maintainability Symposium:
332–338.
169. Aoki K, Yamamoto K, Kobayashi K (2007) Optimal inspection and replacement policy
using stochastic method for deterioration prediction, In: Proceedings of 11th World
Conference on Transport Research, Berkeley CA:1–13.
170. Sandoh H, Igaki N (2003) Optimal inspection policies for a scale. Computers and
Mathematics with Applications 46: 1119–1127.
171. Sandoh H, Igaki N (2001) Inspection policies for a scale. Journal of Quality in
Maintenance Engineering 7(3): 220–231.
172. Guduru RKR, Shaik SH, Yaramala S (2018) A dynamic optimization model for multi-
objective maintenance of sewing machine. International Journal of Pure and Applied
Mathematics 118(20): 33–43.
173. Gravito FM, Dos Santos Filho N (2003) Inspection and maintenance of wooden poles
structures. Global ESMO 2003, Orlando, Florida: 151–155.
174. Jazwinski J, Zurek J (2000) Principles of determining the maintenance set of the condi-
tion of the transport system with the use of expert opinions (in Polish). In: Proceeding of
XXVIII Winter School on Reliability—Decision Problems in Dependability Engineering,
Publishing House of Institute for Sustainable Technologies, Radom, Poland: 118–125.
78 Reliability Engineering

175. Salamonowicz T (2007) Maintenance strategy for systems in k-out-of-n reliability


structure (in Polish). In: Proceedings of XXXV Winter School on Reliability—Problems
of Systems Dependability, Publishing House of Institute for Sustainable Technologies,
Radom: 414–420.
176. Dzwigarek M, Hryniewicz O (2012) Periodical inspection frequency of protection sys-
tems of machinery—Case studies (in Polish). Journal of KONBiN 3(23): 109–120.
177. Landowski B, Woropay M (2003) Simulation of exploitation processes of technical
objects preventively maintained (in Polish). In: Proceedings of XXXI Winter School on
Reliability—Forecasting Methods in Dependability Engineering, Publishing House of
Institute for Sustainable Technologies, Radom: 297–308.
178. Zhao F, Xu X, Xie SQ (2009) Computer-aided inspection planning—The state of the
art. Computers in Industry 60: 453–466. doi:10.1016/j.compind.2009.02.002.
179. Ballou DP, Pazer HL (1982) The impact of inspector fallibility on the inspection pol-
icy in serial production systems. Management Science 28(4): 387–399. doi:10.1287/
mnsc.28.4.387.
180. Darwish MA, Ben-Daya M (2007) Effect of inspection errors and preventive mainte-
nance on a two-stage production inventory system. International Journal of Production
Economics 107: 301–313. doi:10.1016/j.ijpe.2006.09.008.
181. Lee HL, Rosenblatt MJ (1987) Simultaneous determination of production cycle and
inspection schedules in a production system. Management Science 33(9): 1125–1136.
182. Meyer RR, Rothkopf MH, Smith SA (1979) Reliability and inventory in a production-
storage system. Management Science 25(8): 799–807.
183. Tirkel I (2016) Efficiency of Inspection based on out of control detection in wafer
fabrication. Computers and Industrial Engineering 99: 458–464. doi:10.1016/j.
cie.2016.05.022.
184. Ito K, Nakagawa T (2000) Optimal inspection policies for a storage system with degra-
dation at periodic tests. Mathematical and Computer Modelling 31: 191–195.
185. Ito K, Nakagawa T (1995) An optimal inspection policy for a storage system with high
reliability. Microelectronics Reliability 36(6): 875–882.
186. Ito K, Nakagawa T (1995) An optimal inspection policy for a storage system with three
types of hazard rate functions. Journal of the Operations Research Society of Japan
38(4): 423–431.
187. Ito K, Nakagawa T, Nishi K (1995) Extended optimal inspection policies for a system
in storage. Mathematical and Computer Modelling 22(10–12): 83–87.
188. Martinez EC (1984) Storage reliability with periodic test. IEEE Proceedings of Annual
Reliability and Maintainability Symposium: 181–185.
189. Su Ch, Zhang Y-J, Cao B-X (2012) Forecast model for real time reliability of stor-
age system based on periodic inspection and maintenance data. Eksploatacja i
Niezawodnosc – Maintenance and Reliability 14(4): 342–348.
190. Neves ML, Santiago LP, Maia CA  (2011) A  condition-based maintenance policy
and input parameters estimation for deteriorating systems under periodic inspection.
Computers and Industrial Engineering 61: 503–511. doi:10.1016/j.cie.2011.04.005.
3 Application of
Stochastic Processes in
Degradation Modeling
An Overview
Shah Limon, Ameneh Forouzandeh Shahraki,
and Om Prakash Yadav

CONTENTS
3.1 Introduction..................................................................................................... 79
3.2 Continuous State Stochastic Processes............................................................80
3.2.1 Wiener Process.................................................................................... 81
3.2.2 Gamma Process................................................................................... 83
3.2.3 Inverse Gaussian Process.....................................................................84
3.2.4 Case Example: Degradation Analysis with a Continuous State
Stochastic Process................................................................................ 86
3.2.5 Selection of Appropriate Continuous State Stochastic Process........... 88
3.3 Discrete State Stochastic Processes.................................................................90
3.3.1 Markovian Structure............................................................................ 91
3.3.2 Semi-Markov Process..........................................................................99
3.4 Summary and Conclusions............................................................................ 104
References............................................................................................................... 104

3.1 INTRODUCTION
Most engineering systems experience the aging phenomena during their life cycle.
The operating conditions and external stresses further expedite the aging process of
these systems. The aging process reflects the propagation of the failure mechanism,
which ultimately results in a decline of product performances and finally product
failure. To reduce the downtime and ensure safe operations, it is desirable to identify
the product’s lifetime and reliability measure accurately so that appropriate main-
tenance policies can be executed. Therefore, the knowledge of product deteriora-
tion characteristics and fundamental root causes is a great source of information
to assess the product performance and reliability using the degradation modeling
(Limon et al. 2017a; Shahraki et al. 2017). In degradation modeling, a predefined
threshold value is considered to identify the time-to-failure. Further, the degradation

79
80 Reliability Engineering

approach provides more accurate reliability estimates compared to the traditional


failure time approaches.
In traditional deterministic models, system behavior is defined by a set of equa-
tions that can describe with certainty how the system performance will evolve over
the period of time. However, in a reality, there exists variation or uncertainty in sys-
tem performance that causes probabilistic behavior of the system. This situation led
to the increasing importance of the stochastic processes for modeling the probabilis-
tic degradation behavior of the engineering systems. A stochastic process is defined
by a collection of random variables that are associated with a set of numbers that
represent the random changes of a system over time. It can be divided into two broad
categories: discrete and continuous state stochastic process.
The continuous state stochastic processes, mostly the members of the Levy fam-
ily, such as the Wiener process, Gamma process, and Inverse Gaussian process are
being successfully used in modeling degradation processes of the system (Ye et al.
2013; Limon et al. 2017b; Limon et al. 2018). These processes have the independent
increment referred to as a Markov property that is very applicable to many engi-
neering degradation phenomena. Further, time-to-failure’s explicit expression by the
first passage of time concept provides clear advantages of continuous stochastic pro-
cesses in degradation modeling for reliability assessment.
On the other hand, the discrete state stochastic processes are used to model the
degradation process where the overall status of the degradation process can be
divided into a finite number of discrete levels ranging from perfect functioning to
complete failure. Each state can correspond to a certain level of performance of
a system under operation. The discrete state stochastic processes are used in deg-
radation modeling because of the simplicity associated with dealing with only a
limited number of states and their practical applications in degradation modeling
(Moghaddass and Zuo 2014; Shahraki and Yadav 2018). The change of the system
state may happen at the discrete or continuous time that leads to different models.
Moreover, in some applications that the system’s history and age may influence the
future state of the system, the aging Markovian and semi-Markov processes are used
as an extension of Markov processes.
The remainder of this chapter is organized as follows. Section 3.2 presents the
different types of continuous state stochastic processes, degradation modeling with
those processes, and selection of appropriate stochastic process. Section 3.3 describes
the discrete state stochastic processes with case examples. Finally, Section 3.4 sum-
marizes the application of stochastic processes in degradation modeling to evaluate
the system reliability.

3.2  CONTINUOUS STATE STOCHASTIC PROCESSES


The  continuous state stochastic process represents the continuity of the system
changes as a function of time and implies a well-behaved sample path property to
further analysis. The commonly used continuous state stochastic processes are mem-
bers of the Levy processes such as the Wiener process, Gamma process, and Inverse
Gaussian process. The fundamental idea of using the Levy processes in degradation
modeling is based on the assumption that every degradation process is a cumulative
Application of Stochastic Processes in Degradation Modeling 81

result of the small and independent degradation increments. Besides capturing the
temporal variation of the degradation processes, these members of the Levy pro-
cesses also have well-established mathematical properties useful for explaining the
degradation behavior. Further, the members of the Levy processes also have a strong
Markov property with the following mathematical expression:

Pr( X ti | X ti −1 , X ti −2 , X ti −3 ……… X t1 ) = Pr ( X ti | X ti −1 )

This  implies that the next degradation increment is only dependent on the cur-
rent state of the degradation and independent of the past degradation increments.
This  property is also intuitive and practical for many deterioration processes.
The following sections provide the details of each stochastic processes for degrada-
tion modeling.

3.2.1 Wiener Process
The basic Wiener process can be expressed as:

Y ( t ) = µΛ ( t ) + σ B ( Λ ( t ) ) (3.1)

Here B(.) is the standard Brownian motion, µ and σ represents the drift and volatility
parameter respectively, Λ(.) indicates the timescale function, and Y(t) is the charac-
teristic indicator that represents the system behavior. Suppose, a random variable
Y(t) follows the Wiener stochastic process, then it has the following mathematical
properties:

1. y ( 0 ) = 0
2. y(t ) follows a normal distribution with N ~ ( µ Λ(t ), σ 2Λ(t ))
3. y(t ) has an independent increment for every time interval ∆t ( ∆t = ti − ti −1 )
4. The independent increment ∆y ( t ) = yi − yi −1 follows the normal distribution
( )
N ~ µ ∆Λ(t ), σ 2∆Λ(t ) with probability density function (PDF):
  ∆y − µ ∆Λ t  2 
  ( ) 
− 
1  2σ 2 ∆Λ ( t ) 
f ∆y (t ) = e  
(3.2)
σ 2π∆Λ ( t )

The Wiener process is known also as the standard Brownian motion that is the random
movement of particles suspended in a fluid environment resulting from their collision.
This  random movement of small particles is very analogous to the random incre-
ment of the deterioration path. Besides, the Wiener process has many other attractive
properties that are well suited to model the degradation behavior. For example, the
degradation process can be viewed as an integration of small environmental effects in
a cumulative form. The increment process of these small effects can be approximated
by a normal distribution according to the central limit theorem. The  environmen-
tal effects such as temperature, shocks, and humidity are most often independent,
82 Reliability Engineering

and resulting degradation are also independent in the time interval. Considering this
aspect, the Wiener process is a good versatile model to describe many degradation
phenomena. In  a Wiener process, the drift parameter µ represents the degradation
rate and timescale function Λ(.) captures the nonlinearity in the degradation process.
The manufacturer often uses the accelerated degradation test (ADT) to quickly
analyze the reliability matrices during the product design stages. In ADT, to expedite
the degradation process, product samples are subjected to higher stress levels than
the normal operating conditions. The effect of stress on product degradation as well
as the lifetime can be explained by several existing physics or empirical-based reac-
tion rate models. For example, the temperature or any thermal effect on a product
deterioration can be captured easily by the Arrhenius model. Following are several
other well-established reaction rate models where d ( s) represents the rate of deg-
radation at stress level s, and a1 and a2 are the constant coefficients that depend on
material or product types (Nelson 2004):
a
− 2
d ( s) = a1e T ; Arrhenius model ( s = T )

= aV
1
a2
; Power law model ( s = V ) (3.3)

= a1e a2W ; Exponential model ( s = W )

Since the magnitude of stress measurement units may differ significantly in the
multi-stress scenario, it is important to use standardized transform stresses to disre-
gard the influence of stress measurement units. The transformed stress level is given
as (Park and Yum 1997):

1 S0′ − 1 Sk′
Sk = , for Arrhenius model
1 S0′ − 1 S M′

=
( ) ( ),
log Sk′ − log S0′
for Power law model (3.4)
log ( S ) − log ( S )

M

0

Sk′ − S0′
= , for Exponential law modell
S M′ − S0′
where So′ , Sk′ , and S M′ represent the operational, applied accelerated, and maximum
stress level in their original form, whereas Sk represents corresponding transformed
stress. It is considered the multiple stress degradation test with possible interaction
effect between stresses. The nonlinear behavior of the degradation is described by
the power law function ( Λ ( t ) = t c, c is a constant ). Considering both the Wiener
parameter is stress dependent, the log-likelihood function can be written as:

( )
2
n m p 1  ∆yijk − µ ( s) tijk
c
− t(ci −1) jk 
L (θ) = ∏∏∏
− log 2π c
2 ((
tijk − t(ci −1) jk )) 1
− log(σ ) −
2
2 
2σ 2 tijk
c
(− t(ci −1) jk)

 i =1 j =1 k =1

(3.5)
Application of Stochastic Processes in Degradation Modeling 83

The maximum likelihood estimation (MLE) method can be applied to estimate


the model parameter of the previous function. The  time to failure according
to the Wiener process is defined when the first passage of time reaches the
threshold degradation D and it follows the inverse Gaussian (IG) distribution
with the PDF:

1  b ( y − a )2 
 b  2 − 2 a2 y 

f IG , ( y , a,b ) = 3 
e (3.6)
 2π y 

Here, a and b are the IG distribution parameters. The mean time to failure than can
be written as:

1
 D − y0  c
ξw =   (3.7)
 µ ( s) 

The reliability function can be approximated with:

 D − y − µ ( s)t c 
R (t ) ≈ Φ   (3.8)
0


 σ 2
( s )t c 

3.2.2 Gamma Process
The  gamma process represents the degradation behavior in a form of cumulative
damage where the deterioration occurs gradually over the period of time. Assuming
a random variable Y(t) represents the deterioration, then the gamma process that
is a continuous-time stochastic process has the following mathematical properties
(O’Connor 2012):

1. y(0) = 0
2. y(t ) follow a gamma distribution with Ga ~ (α t , β )
3. y(t ) has an independent increment in a time interval ∆t ( ∆t = ti − ti −1 )
4. The independent increment ∆y(t ) = yi − yi −1 also follows the gamma distri-
bution Ga ~ (α∆t , β ) with PDF:

c c
β α ( ti −ti −1 ) α ( t c − t c ) −1
f ∆y ( t ) = ∆y i i −1 e −( β∆y ) (3.9)
Γ(α (ti − ti −1 ))
c c

where α > 0 and β > 0 represent the gamma shape and scale parameters, respectively,
c is a nonlinearity parameter, and Г(.) is a gamma function with Γ ( a ) = ∫0 x a−1e −( x ) dx.

84 Reliability Engineering

Now, considering the accelerated test and both gamma parameter dependent on stresses
with interaction effect, the log-likelihood function can be written as:

β ( s )  tijk
c
−t c 
n m p [α ( s)]  ( i −1) jk 
∆yijk

  c c  
α ( s )  tijk − t( i −1) jk  −1 − ∆yijk β ( s )
e 
L(θ ) = ∏∏∏ Γ α (s) (t
i =1 j =1 k =1
c
ijk − t(ci −1) jk 
 ) (3.10)

The MLE method with advanced optimization software can be used to solve this
complex equation. Now assuming that a failure occurs while the degradation path
reaches the threshold D, then the time to failure ξ is defined as the time when the
degradation path crosses the threshold D and the reliability function at time t will be:

R (t ) = P (t < tD ) = 1 −
(
Γ α t c, Dβ ) (3.11)
Γ αt ( ) c

where Dβ = (D−y0)β and y0 is the initial degradation value. The cumulative distribu-


tion function (CDF) of t D is given as:

F (t ) =
(
Γ α t c, Dβ ) (3.12)
Γ αt ( ) c

Because of the gamma function, the evaluation of the CDF becomes mathematically
intractable. To deal with this issue, Park and Padgett (2005) proposed an approxi-
mation of time-to-failure ξ with a Birnbaum-Saunders (BS) distribution having the
following CDF:

 1  tc b 
FBS ( t ) ≈ φ   − c   (3.13)
 a  b t  

where a = 1/√ (ωβ) and b = ωβ/α. Considering BS approximation, the expected failure
time can be estimated as:

1
ω 1 c
ξG =  β + (3.14)
 α 2α 

3.2.3  Inverse Gaussian Process


Consider a system’s behavior is represented by the IG process. If Y(t) indicates the
system’s performance characteristic at time t, then the IG process has the following
properties (Wang and Xu 2010):
Application of Stochastic Processes in Degradation Modeling 85

1. y(0) = 0 with probability one


2. y(t ) has an independent increment in each time interval ∆t ( ∆t = ti − ti −1 )
3. The  independent increment ∆y (t ) = yi − yi −1 follows the IG distribution
(
IG ~ µ ∆ Λ(t ), λ ∆ Λ(t )2 with PDF: )
 λ ∆y − µΛ ( t ) 2 
1/ 2 − ( ) 
 λ Λ (t ) 
2 
 2 µ 2 ∆y 

f( ∆y| µ∆Λ (t ), 2
λ∆Λ ( t ) )
= 3 
e (3.15)
 2π∆y 

Here µ and λ denote the mean and scale parameter and Λ (t) represents the shape
function. The mean of Y(t) is defined by µΛ(t) and the variance is µ3Λ(t)/λ. The shape
function is nonlinear, and a power law is chosen in this work to represent the nonsta-
tionary process (Λ (t) = tc). By the properties of the IG process and Equation 3.15, the
likelihood function of the degradation increment can be given as:

( ∆y ( ))
2
c c
− µijk tijk − tijk
( )
2 ijk
n m p λijk t − t
c c − λijk

∏∏∏
ijk ijk 2
2 µijk ∆yijk
L(θ) = e (3.16)
2π∆yijk
3
i =1 j =1 k =1

Suppose Y (t ) is a monotonic degradation process and the lifetime ξD is defined by


the first passage of time where degradation reaches the threshold value D. If the ini-
tial degradation is indicated by y0, then y(t)−y0 follows the IG distribution. Therefore,
the CDF of ξD can be written as:

 2λt c 
 λ  c D − y0     λ  c D − y0 
( c c 2
)
F ξ D | D, µ t , λ (t ) = Φ 
D − y0
t −
µ 
 − e
µ 
Φ −
D − y0
t +
µ 

    
(3.17)

where Φ (.) is the CDF of the standard normal distribution. However, when µΛ(t) and
t are large, Y(t) can be approximated by the normal distribution with mean µΛ(t) and
variance µ3Λ(t)/λ. Therefore, the CDF of ξD also can be approximated by the follow-
ing equation (Ye and Chen 2014):

 D − µ ( s)t c 
(
F ξ IG | D, µ t c , λ (t c )2 = Φ  )
 µ ( s)3 t c / λ
 (3.18)


And the approximated mean lifetime expression is:


1/ c
 D 
ξ IG =   (3.19)
 µ ( s) 
86 Reliability Engineering

3.2.4 Case Example: Degradation Analysis with a Continuous


State Stochastic Process
To demonstrate the proposed method, light emitting diodes (LEDs) are taken as a
case study example. Recently, LEDs have become very popular due to their very
low energy consumption, low costs, and long life (Narendran and Gu 2005). As
a solid-state lighting source, the use of LEDs is increasing in many sectors such
as communications, medical services, backlighting, sign-post, and general lighting
purposes. LEDs produce illumination and unlike the traditional lamp light instead
of catastrophic failure, the output light of LEDs is usually degraded over the useful
time and experiences soft failure modes. Therefore, it is reasonable to consider the
light intensity of LEDs as a degradation of performance characteristics in this study.
The  experiment data on degradation of LEDs are taken from the literature
(Chaluvadi 2008). Table 3.1 provides the details of experimental set up of the LED

TABLE 3.1
Accelerated Degradation Test Dataset of LEDs
Stress Level Degradation Measurement (lux)
Sample/time (hrs) 0 50 100 150 200 250
1 1 0.866 0.787 0.76 0.716 0.68
2 1 0.821 0.714 0.654 0.617 0.58
3 1 0.827 0.703 0.64 0.613 0.593
4 1 0.798 0.683 0.623 0.6 0.59
5 1 0.751 0.667 0.628 0.59 0.54
6 1 0.837 0.74 0.674 0.63 0.613
40 mA 7 1 0.73 0.65 0.607 0.583 0.58
8 1 0.862 0.676 0.627 0.6 0.597
9 1 0.812 0.65 0.606 0.593 0.573
10 1 0.668 0.633 0.593 0.573 0.565
11 1 0.661 0.642 0.594 0.58 0.553
12 1 0.765 0.617 0.613 0.597 0.56
1 1 0.951 0.86 0.776 0.7 0.667
2 1 0.933 0.871 0.797 0.743 0.73
3 1 0.983 0.924 0.89 0.843 0.83
4 1 0.966 0.882 0.851 0.814 0.786
5 1 0.958 0.89 0.84 0.81 0.8
6 1 0.94 0.824 0.774 0.717 0.706
35 mA 7 1 0.882 0.787 0.75 0.7 0.693
8 1 0.867 0.78 0.733 0.687 0.673
9 1 0.89 0.8 0.763 0.723 0.713
10 1 0.962 0.865 0.814 0.745 0.742
11 1 0.975 0.845 0.81 0.75 0.741
12 1 0.924 0.854 0.8 0.733 0.715

Source: Chaluvadi, V.N.H., Accelerated life testing of electronic revenue meters, PhD dissertation,
Clemson University, Clemson, SC, 2008.
Application of Stochastic Processes in Degradation Modeling 87

FIGURE 3.1  LED degradation data at a different stress level.

and degradation data from the test. Two different combinations of constant acceler-
ated stresses were used to accelerate the lumen degradation of LEDs. At each stress
level, twelve samples are assigned, and the light intensity of each sample LED was
measured at room temperature every 50 hours up to 250 hours. The operating stress
is defined as 30 mA and 50 percent degradation of the initial light intensity is con-
sidered to be the failure threshold value.
Figure  3.1  shows the nonlinear nature of the LEDs degradation path that
justifies our assumption of the non-stationary continuous state stochastic pro-
cess. The  nonlinear likelihood function with multiple model parameters makes
a greater challenge to estimate parameter values. The  MLE method with an
advanced optimization software R has been used to solve these complex equa-
tions. The built-in “mle” function that uses the Nelder-Mead algorithm (optim) to
optimize the likelihood function is used to estimate model parameters. After the
model parameters for each stochastic process have been estimated, the lifetime
and reliability under any given set of operating conditions can be estimated. Now,
considering the different stochastic process models, the parameter and lifetime
estimates are provided in Table 3.2.
The results show that the Wiener process has deviated (larger) lifetime ­estimates
compared to the Gamma and IG process. Figure 3.2 illustrates the reliability ­estimates
considering different stochastic process models. Similar to lifetimes, r­ eliability plots
also show deviated (higher) estimate by the Wiener process.

TABLE 3.2
Parameter and Lifetime Estimates with Different Degradation Model
Model γ 0 γ δ 0 δ 1 
c Lifetime
Weibull −4.3516 0.9483 −3.8413 0.1570 0.4569 3002.26
Gamma −0.7636 0.0954 4.2685 −.08528 0.5802 1812.28
IG −5.1956 0.9481 −6.1025 0.1185 0.6097 1611.15
88 Reliability Engineering

FIGURE 3.2  Reliability estimates using various continuous stochastic processes.

3.2.5 Selection of Appropriate Continuous State Stochastic Process


The appropriate selection of the stochastic process is very important because effec-
tive degradation modeling depends on the appropriate choice of the process. The reli-
ability estimation and its accuracy also are dependent on the appropriate stochastic
process selection. From the LED case study example, it is observed that the lifetime
and reliability estimates differ among three continuous state stochastic processes.
There  are several criteria to choose an appropriate stochastic process for specific
degradation cases which are discussed next.
The graphical analysis is a very common method to check the data patterns and
behavior. Figure 3.3 illustrates the histogram and CDF graphs to compare the fitness
of three different stochastic processes. The histogram and the CDF graphs suggest
that the Gamma process provides the best fit for LED degradation data. On the other
hand, the Wiener process is the least fitted degradation model for LED data. Besides,
quantile-quantile (Q-Q) plot and probability plots are also a very useful graphical
technique to check the model fitness. These plots also provide the same conclusion
for the LED data (see Figure 3.4).
Besides graphical methods, there are other stronger statistical methods that are
used to check the model fitness such as goodness-of-fit tests. Several parametric
or nonparametric methods are available to compare the model fitness such as KS
(Kolmogorov-Smirnov) statistic, CVM (Cramer-von Mises) statistic, AD (Anderson-
Darling) statistic, AIC (Akaike’s Information Criterion), and BIC (Bayesian
Information Criterion). All these statistics and criteria are used to select the best-
fitted model. Table  3.3 provides the goodness-of-fit statistic value to compare the
fitness to the stochastic processes for LED data. It is observed that the Gamma pro-
cess has the least statistic value in all cases and Wiener has the highest statistic
value. This  observation implies that the Gamma process is the most suitable and
Application of Stochastic Processes in Degradation Modeling 89

FIGURE 3.3  Graphical model fitness of LED degradation data.

FIGURE 3.4  Q-Q and probability plots of degradation data.

TABLE 3.3
Goodness-of-fit Statistics for Stochastic Processes
Goodness-of-fit Statistic Wiener Gamma Inverse Gaussian
KS statistic 0.1802 0.0708 0.1590
CVM statistic 1.27821 0.1159 0.5977
AD statistic 7.1927 0.6224 2.9771
AIC −315.2034 −407.427 −389.4947
BIC −309.6285 −401.852 −383.9197
90 Reliability Engineering

Wiener is the least suitable model for the LED degradation data. This result explains
the huge discrepancy between the lifetime and reliability estimates of the Wiener
process compared to other two degradation models. The physical degradation phe-
nomena also is intuitive to this fitness checking criteria. As LEDs are monotonically
degraded over a period of time, thus it basically follows the assumption of a mono-
tonic and nonnegative Gamma process most and then an IG process. Because of the
clear monotonic behavior of the LED data, the degradation definitely does not follow
the Wiener process. All the model fitness test statistic and criteria also indicate an
ill-fitted degradation behavior of Wiener process for LED data. Further, this poorly
fitted Wiener process also resulted in much lower nonlinear constant estimates (see
Table 3.2) that represent a slower degradation rate than the actual situation. This mis-
representation of the degradation increment and the lower degradation rate than the
actual situation causes the overestimate of the lifetime and reliability by the Wiener
degradation modeling. This case example clearly shows the importance of choosing
the right stochastic process for assessing the system’s degradation behavior.

3.3  DISCRETE STATE STOCHASTIC PROCESSES


This section presents and discusses different stochastic processes used to model the
discrete state degradation process. Unlike the Wiener process, Gamma process, and
IG process models, a finite state stochastic process evolves through a finite number
of states. In a continuous state degradation process, the degradation process is mod-
eled as a continuous variable. When the degradation process exceeds a predefined
threshold, the item is considered failed. However, most engineering systems consist
of components that have a range of performance levels from perfect functioning to
complete failure. In  the discrete-state space, the overall status of the degradation
process is divided into several discrete levels with different performances ranging
from perfect functioning to complete failure. It is important to highlight here that
when a number of states approach to infinity, the discrete-state space and continu-
ous-state space become equivalent to each other.
In  general, it is assumed that the degradation process { X (t ), t ≥ 0} evolves on a
finite state space S = {0,1, …, M − 1, M } with 0 corresponding to the perfect healthy
state, M representing the failed state of the monitored system, and others are inter-
mediate states. At time t = 0, the process is in the perfect state and as time passes it
moves to degraded states. A state transition diagram used for modeling the degrada-
tion process is shown in Figure 3.5. Each node represents the state of the degradation
process and each branch between two nodes represents the transition between the
states corresponding to the nodes. A system can degrade according to three types of
transitions: transition to the neighbor state (Type 1), transition to any intermediate
state (Type 2), and transition to the failure state (Type 3). Type 1 transitions from
one state to the next degraded state are typical of degradation mechanisms driven by
cumulative damage and is called minor degradation. Type 2 and Type 3 transitions
are called major degradation.
In the context of modeling degradation process, this section focuses on cases in
which there is no intervention in the degradation process; i.e., once the process tran-
sits to a degradation state, the previous state is not visited again.
Application of Stochastic Processes in Degradation Modeling 91

FIGURE 3.5  A multi-state degradation process with minor and major degradation.

The discrete state stochastic process used to model the degradation process can
be divided into different categories depending on the continuous or discrete nature
of the time variable, and Markovian and non-Markovian property (Moghaddass and
Zuo 2014).
From a time viewpoint, the multistate degradation process can evolve according
to a discrete-time stochastic process or a continuous-time stochastic process. In the
discrete-time type, the transition between different states occurs only at a specific
time; however, transitions can occur at any time for the continuous-time stochastic
process. With respect to the dependency of degradation transitions to the history
of the degradation process, the multistate degradation process can be divided into
Markovian degradation process and non-Markovian degradation process. When the
degradation transition between two states depends only on its current states, that is,
the degradation process is independent of the history of the process, the degradation
model follows the Markovian structure. On the other hand, in a multistate degrada-
tion process with a non-Markovian structure, the transition between two states may
depend on other factors like previous states, the age of the system, and on how long
the system has been in its current state. The following sections provide a detailed dis-
cussion on Markovian structure and semi-Markov process with suitable examples.

3.3.1  Markovian Structure


A  stochastic process { X (t ) | t ≥ 0} is called a Markov process if for any
t0 < t1 < t2 <  < t n−1 < t n < t the conditional distribution of X(t) for given values of
X (t0 ), X (t1 ), …, X (t n ) depends only on X (t n ):

Pr{X ( t ) ≤ x | X ( t n ) = xn , X ( t n −1 ) = xn −1, …, X (t1 ) = x1,


(3.20)
X (t0 ) = x0 )} = Pr{ X (t ) ≤ x | X (t n ) = xn}
This applies to a Markov process with discrete-state space or continuous-state space.
A Markov process with discrete-state space is known as a Markov chain. If the time
space is discrete, then it is a discrete-time Markov chain otherwise it is a continuous-
time Markov chain.
92 Reliability Engineering

A discrete-time Markov chain is a sequence of random variables X 0 , X 1, …, X n , …


that satisfy the following equation for every n (n = 0, 1, 2,…):

Pr ( X n = xn | X 0 = x0 , X 1 = x1, …, X n −1 = xn −1 ) = Pr ( X n = xn | X n −1 = xn −1 ) (3.21)

If the state of the Markov chain at time step n is xn, we denote it as X n = xn. Equa-
tion 3.21 implies that the chain behavior in the future depends only on its current
state and it is independent of its behavior in the past. Therefore, the probability that
the Markov chain is going from state i into state j in one step, which is called one-step
transition probability, is pij = Pr ( X n = j | X n −1 = i ). For time a homogeneous Markov
chain, the transition probability between two states does not depend on the n, i.e.,
pij = Pr ( X n = j | X n −1 = i ) = Pr ( X 1 = j | X 0 = i ) = constant. The  one-step transition
probabilities can be condensed into a transition probability matrix for a discrete-time
Markov chain with M + 1 states as follows:

 p00 p01 … p0 M 
 
p10 p11 … p1M
P =  (3.22)
 … … … … 
 
 pM 0 pM 1 … pMM 

The  sum of each row in P is one and all elements are non-negative. As the
discrete-time Markov chain is used to model the degradation process of an
­
item, the transition probability matrix P is in upper-triangular form ( pij = 0 for
i > j ) to reflect the system deterioration without considering maintenance or repair.
Moreover, for the failure state M, which is also known as an absorbing state,
pMM = 1 and pMj = 0 for j = 0,1,…, M − 1.
Having the transition probability matrix P and the knowing the initial conditions
of the Markov chain, p(0) = [ p0 (0), p1(0),…, pM (0) ], we can compute the state proba-
bilities at step n, p ( n ) =  p0 ( n ) , p1 ( n ) ,…, pM ( n) . p j ( n ) = Pr { X n = j} , j = 1, …, M,
which is the probability that the chain is in state j after n transitions. For  many
applications such as reliability estimation and prognostics, state probabilities are of
utmost interest.
Based on the Chapman-Kolmogorov equation, the probability of a process mov-
ing from state i to state j after n steps (transitions) can be calculated by multiplying
the matrix P by itself n times (Ross 1995). Thus, assuming that p(0) is the initial
state vector, the row-vector of the state probabilities after the nth step is given as:

p( n) = p(0).P n (3.23)

For most of the systems, as the system is in the perfect condition at the beginning of
its mission, the initial state vector is given as p(0) = [1, 0, 0,…, 0].
When the transition from the current state i to a lower state j takes place at any
instant of the time, the continuous-time Markov chain is used to model the degra-
dation process. In analogy with discrete-time Markov chains, a stochastic process
Application of Stochastic Processes in Degradation Modeling 93

{ X (t ) | t ≥ 0} is a continuous-time Markov chain if the following equation holds for


every t0 < t1 < … < t n−1 < t n (n is a positive integer):

Pr ( X (t n ) = xn | X (t0 ) = x0 , …, X (t n −1 ) = xn −1 ) = Pr ( X (t n ) = xn | X (t n −1 ) = xn −1 ) (3.24)

Equation  3.24 is analogous to Equation  3.21. Thus, most of the properties of the
continuous-time Markov process are similar to those of the discrete-time Markov
process. The probability of the continuous-time Markov chain going from state i into
state j during ∆t , which is called transition probability, is Pr ( X (t + ∆t ) = j | X (t ) = i ) =
π ij ( t , ∆t ) . They satisfy: π ij ( t , ∆t ) ≥ 0 and ∑ Mj = 0 π ij ( t , ∆t ) = 1.
For time homogeneous continuous-time Markov chain, the transition probability
between two states does not depend on the t but depends only on the length of the
time interval ∆t . Moreover, the transition rate ( λ ij (t ) ) from state i to state j ( i ≠ j ) at
π ij ( t , ∆t )
time t is defined as: λ ij (t ) = ∆lim t →0 ∆t , which does not depend on t and is constant
for a homogeneous Markov process.
Like the discrete-time case, it is important to get the state probabilities for calcu-
lating the availability and reliability measures for the system. The state probabilities
of X (t ) are:
M

p j ( t ) = Pr { X ( t ) = j} , j = 0,1, …, M for t ≥ 0 and ∑p (t ) = 1 (3.25)


j =0
j

Knowing the initial condition and based on the theorem of total probability and
Chapman-Kolmogorov equation, the state probabilities are obtained using the sys-
tem of differential equations as (Trivedi 2002; Ross 1995):
M M

∑ ∑λ ,
dp j (t )
p′ j ( t ) = = pi (t )λij − p j ( t ) ji j = 0,1, ..., M (3.26)
dt i =0 i =0
i≠ j i≠ j

Equation 3.26 can be written in the matrix notation as:

 λ00 λ01 … λ0 M 
 
dp(t ) λ10 λ11 … λ1M 
= p(t )λ , p ( t ) =  p0 ( t ) , p1 ( t ) ,…, pM (t )  , λ = 
dt  … … … … 
  
 λM 0 λM 1 … λMM 
(3.27)
In the transition rate matrix, λ jj = −∑ i ≠ j λ ji and ∑ j =0 λij = 0 for 0 ≤ i ≤ M. As the
M

continuous-time Markov chain is used to model the degradation process, the tran-
sition rate matrix λ is in upper-triangular form (λij = 0 for i > j ) to reflect the
degradation process without considering maintenance or repair. Since state M
is an absorbing state, all the transition rates from this state are equal to zero,
λMj = 0 for j = 0,1, …, M − 1.
Regarding the method to solve the system of Equation  3.27, there are several
methods including numerical and analytical methods such as enumerative method
94 Reliability Engineering

(Liu and Kapur 2007), recursive approach (Sheu and Zhang 2013), and Laplace-
Stieltjes transform (Lisnianski and Levitin 2003).

Example 3.3.1.1 

Consider a system that can have four possible states, S = {0,1,2,3}, where state
0 indicates that the system is in as good as new condition, states 1 and 2 are inter-
mediate degraded conditions, and state 3 is the failure state. The system has only
minor failures; i.e., there is no jump between different states without passing all
intermediate states. The transition rate matrix is given as:

 λ00 λ01 λ02 λ03   −3 3 0 0


   
λ10 λ11 λ12 λ13   0 −2 2 0
λ = =
 λ20 λ21 λ22 λ23   0 0 −1 1
   
 λ30 λ31 λ32 λ33   0 0 0 0 

The λ33 = 0 shows that the state 3 is an absorbing state. If the system is in the best
state=
at the beginning ( p(0) [ =p0 (0), p1(0), p2 (0), p3 (0)] [10
, ,0,0]), the goal is to com-
pute the system reliability at time t > 0.

Solution 3.3.1.1:  For  the multi-state systems, the reliability measure can be
based on the ability of the system to meet the customer demand W (required
performance level). Therefore, the state space can be divided into two subsets
of acceptable states in which their performance level is higher than or equal to
the demand level and unacceptable states. The reliability of the system at time
t is the summation of probabilities of all acceptable states. All the unacceptable
states can be regarded as failed states, and the failure probability is a sum of
probabilities of all the unacceptable states.
First, find the state probabilities at time t for each state solving the following
differential equations:
 dp0 (t )
 dt = −λ01 p0 ( t )

 dp1(t ) = −λ01 p0 ( t ) − λ12 p1 (t )
 dt

 dp2 (t ) = −λ12 p1( t ) − λ23 p2 (t )
 dt
 dp (t )
 3 = −λ23 p2 ( t )
 dt
Using the Laplace-Stieltjes transforms and inverse Laplace-Stieltjes transforms
(Lisnianski et al. 2010), the state probabilities at time t are found as:
 p0 (t ) = e − λ43t

 p1(t ) = λ01 (e − λ12t −e − λ01t )
 λ01 − λ12


 p2 (t ) = − λ 12λ 01[( λ01 − λ 12 )e − λ23 t
+ ( λ23 − λ01 )e − λ12 t
+( λ12 − λ23 )e − λ01t
]
 ( λ12 − λ21) ( λ01 − λ12 ) ( λ23 − λ01)

 p3 = 1− p2 (t ) − p1(t ) − p0 (t )
Application of Stochastic Processes in Degradation Modeling 95

FIGURE 3.6  System state probabilities: Example 3.3.1.1.

The plot of the state probabilities is shown in Figure 3.6. As shown, the probability


of being in state 0 is decreasing with time and the probability of being in state 3
is increasing with time.
Then the reliability of the system at time t is calculated based on the demand
level by summation of the probabilities of all acceptable states as:

If acceptable states are : 0,1, 2 → R1 ( t ) = p0 (t ) + p1(t ) + p2 ( t )



If acceptable states are : 0,1 → R2 ( t ) = p0 ( t ) + p1 ( t )
If acceptable states are : 0 → R ( t ) = p ( t )
 3 0

The plots of the system reliability for all three cases are shown in Figure 3.7.

FIGURE 3.7  System reliability for various cases.


96 Reliability Engineering

Let τ i denote the time that the degradation process spent in state i. According to
the Markov property in Equation 3.24, i does not depend on the past state of the
process, so the following equation holds:

P (τ i > t + ∆t τ i > t ) = h ( ∆t )  (3.28)

Function h(∆t ) in Equation 3.28 only depends on ∆t , and not on the past time t.
The  only continuous probability distribution that satisfies Equation  3.28 is the
exponential distribution. In the discrete time case, requirement in Equation 3.28
leads to the geometric distribution.
In a Markovian degradation structure, the transition between two states at time t
depends only on the two states involved and is independent of the history of the pro-
cess before time t (memoryless property). The fixed transition probabilities/rates and
the geometric/exponential sojourn time distribution limit the use of a Markov chain to
model the degradation process of real systems. For the degradation process of some
systems, the probability of making the transition from one state to a more degraded
state may increase with the age and the probability that it continuously stays at the
current state will decrease. That is, pii (t + ∆t ) ≤ pii (t ) and ∑ j =i +1 pij (t + ∆t )≥∑ j =i +1 pij (t ).
n n

Therefore, the transition probabilities and transition rates are not constant during the
time and an extension of the Markovian model, which is called aging Markovian
deterioration model, is used to include this aging effect.
For the discrete-time aging Markovian model, P(t ) is one-step transition prob-
ability matrix at time t and pij (t ) represents the transition probability from state i to
state j at time t. As shown in Chen and Wu (2007), each row of P(t ) represents a
state probability distribution given the current state at i that will form a bell-shape
distribution. Let Ni satisfy pi ,Ni (t ) = max { pi , j ( t ) , j = 0,1,… , M}, where Ni represents
j
the peak transition probability in the bell-shape distribution. Then:
Ni M

Pi L ( t ) ≡ ∑p (t ) ; P
j =1
ij i
R
(t ) ≡ ∑ p (t )
j = Ni +1
ij (3.29)

Pi L ( t ) and Pi R ( t ) are left-hand side and right-hand side cumulated probabilities,


respectively. Since ∑ j =1 pij ( t ) = 1, then Pi R ( t ) = 1− Pi L ( t ). For  j ≤ N i , pij (t + 1) ≤ pij (t )
M

and for j > Ni , pij (t + 1) ≤ pij (t ). When the system becomes older, Pi L increases while
Pi R decreases, therefore:

Pi L ( t ) ≥ Pi L ( t + 1) ; Pi R ( t ) ≤ Pi R ( t + 1) (3.30)

Then P(t + 1) can be modified as:

piL ( t + 1) piR ( t + 1)
pij ( t + 1) ≡ pij ( t ) ∀j ≤ N ; p ( t + 1) ≡ p ( t ) ∀j > Ni (3.31)
piL ( t ) piR ( t )
i ij ij

The aging factor δ (0 ≤ δ < 1) is defined by Chen and Wu (2007) as δ = PiP R( t +t 1) − 1 that


R

i ( )
can be estimated from historical data. Therefore, Equation 3.31 is represented as:

 piR ( t + 1) 
pij ( t + 1) ≡ pij ( t ) .  1−  ∀j ≤ Ni ; pij ( t + 1) ≡ pij ( t ) . (1+ δ ) ∀j > Ni (3.32)

 piL ( t ) 
Application of Stochastic Processes in Degradation Modeling 97

Starting with the initial transition probability matrix P(0), the values of the P(t),
which are changing during the time, can be calculated according to Equation 3.32.
For  the continuous-time aging Markovian model, which is called the non-
homogeneous continuous-time Markov process, the amount of time that the sys-
tem spends in each state before proceeding to the degraded state does not follow
the exponential distribution. Usually, the transition times are assumed to obey
Weibull distribution because of its flexibility, which allows considering hazard
functions both increasing and decreasing over time, at different speeds.
To get the state probabilities at each time t, we have to solve the Chapman-
Kolmogorov equations as:
M M

∑ ∑λ
dp j (t )
= pi (t )λij (t ) − p j ( t ) ji ( t ), j = 0,1,…,M  (3.33)
dt i =0 i =0
i≠ j i≠ j

Equation 3.33 can be written in the matrix form as:

d p (t )
= p (t )λ (t ),
dt
 λ00 ( t ) λ01 ( t ) … λ0 M ( t ) 
 
λ10 ( t ) λ11 ( t ) … λ1M ( t )  
p ( t ) =  p0 ( t ),…, p M (t ) , λ ( t ) =  (3.34)
… … … … 
 
 λM0 ( t ) λM1 ( t ) … λMM ( t ) 

The  transition rate matrix λ (t ) has the same properties as the transition matrix
in Equation  3.27. To find the state probabilities at time t, many methods have
been used to solve Equation 3.34 such as state–state integration method (Liu and
Kapur 2007) and recursive approach (Sheu and Zhang 2013). Equation 3.34 can
be recursively solved from state 0 to state M as follows:
t

∫λ00 ( s)ds
p0 ( t ) = e 0  (3.35)

t
j −1 t
∫ λ jj ( s ) ds
pj ( t ) = ∑∫p (τ
i =0 0
i i +1 ) λij (τ i +1) e
τ i +1
dτ i +1 , j = 1,… , M − 1 (3.36)

M −1
pM ( t ) = 1 − ∑ p (t ) 
j=0
j (3.37)

The initial conditions are assumed to be p( t ) =  p0 ( 0 ) =1, p1( 0 ) = 0,…. p M ( 0 ) = 0 .

Example 3.3.1.2 

(Sheu and Zhang 2013; Shu et al. 2015) Assume that a system degrades through
five different possible states, S = {0,1, 2, 3, 4} and state 0 is the best state and state
4 is the worst state. The time Tij spent in each state i before moving to the next state
j follows the Weibull distribution Tij ~ Weibull(1 / ( i − 0.5 j ) , 3) with scale parameter
98 Reliability Engineering

α ij = 1/(i − 0.5 j) and shape parameter β = 3. The  nonhomogeneous continuous


time Markov process is used to model the degradation process. The transition rate
from state i to state j at time t is λij ( t ) = 3t 2 / ( i − 0.5 j ) ∀ i, j ∈ S, i > j . Based on the
3

demand level, the states 3 and 4 are unacceptable states. The goal is to compute
the system reliability at time t(0 < t < 4) .

Solution 3.3.1.2: The transient degradation rate matrix is:

 λ00 ( t ) λ01 ( t ) λ02 ( t ) λ03 ( t ) λ04 ( t ) 


 
 0 λ11 ( t ) λ12 ( t ) λ13 ( t ) λ14 ( t ) 
λ (t ) =  0 0 λ22 ( t ) λ23 ( t ) λ24 ( t )  =
 
 0 0 0 λ33 ( t ) λ34 ( t ) 
 0 0 0 0 0 

 −0.419945 t 2 0.1920 t 2 0.1111t 2 0.06997t 2 0.046875t 2 


 
 0 −0.6781t 2 0.3750 t 2 0.1920 t 2 0.1111t 2 
 0 0 −1.2639t 2 0.8889t 2 0.3750 t 2 

 
 0 0 0 −3t 2 3t 2 
 
 0 0 0 0 0 

p0 ( 0 ) = 1, pj ( 0 ) = 0 j = 1, 2,… , M.
The state probabilities can be obtained using Equations 3.36 and 3.37 as:

3
p0 (t ) = e −0.14t

p1 ( t ) = −0.7439(e −0.226t − e −0.14t )


3 3

p2 ( t ) = 0.0139e −0.4213t + 0.4623e −0.14t − 0.4761e −0.226t


3 3 3

p3 ( t ) = −0.005109e − t + 0.24178e −0.14t − 0.24381e −0.226t + 0.007117e −0.4213t


3 3 3 3

p4 ( t ) = 1− p0 ( t ) − p1 ( t ) − p2 ( t ) − p3 ( t )

= 1− 2.44798e −0.14t + 1.46381e −0.226t − 0.021017e −0.4213t + 0.005109e − t


3 3 3 3

The system state probabilities are shown in Figure 3.8.


As the states 3 and 4 are unacceptable states, the reliability of the system at
time t is Rs ( t ) = p0 (t ) + p1(t ) + p2 (t ). Figure 3.9 shows the system reliability as a func-
tion of time.
The aging Markovian models used to overcome the limitations of Markov chain
structures can be framed as a semi-Markov process. Semi-Markovian structures
consider the history of the degradation process and consider arbitrary sojourn time
distributions at each state. Semi-Markovian models as an extension of Markovian-
based models will be explained in the next section.
Application of Stochastic Processes in Degradation Modeling 99

FIGURE 3.8  System state probabilities: Example 3.3.1.2.

FIGURE 3.9  System reliability as a function of time.

3.3.2 Semi-Markov Process
The  semi-Markov process can be applied to model the degradation process of
some systems whose degradation process cannot be captured by a Markov process.
For example, Ng and Moses (1998) used the semi-Markov process to model bridge
degradation behavior. They described the semi-Markov process in terms of a transi-
tion matrix and a holding time or sojourn time matrix. A transition matrix has a set
100 Reliability Engineering

of transition probabilities between states that describe the embedded Markov chain.
The holding time matrix has a set of probabilities obtained from the probability den-
sity function of the holding times between states.
For Markov models, the transition probability of going from one state to another
does not depend on how the item arrived at the current state or how long it has been
there. However, semi-Markov models relax this condition to allow the time spent in
a state to follow an arbitrary probability distribution. Therefore, the process stays in
a particular state for a random duration that depends on the current state and on the
next state to be visited (Ross 1995).
To describe the semi-Markov process X ≡ { X ( t ) : t ≥ 0}, consider the degrada-
tion process of a system with finite state space S = {0,1, 2,…, M } (M + 1: the total
number of possible states). The process visits some state i ∈ S and spends a random
amount of time there that depends on the next state it will visit, j ∈ S , i ≠ j . Let Tn
denote the time of the nth transition of the process, and let X (Tn ) be the state of the
process after the nth transition. The process transitions from state i to state j ≠ i
with the probability pij = P ( X (Tn+1 ) = j X (Tn ) = i ). Given the next state is j, the
sojourn time from state i to state j has a CDF, Fij . For a semi-Markov process, the
sojourn times can follow any distribution, and pij is defined also as the transition
probability of the embedded Markov chain.
The  one-step transition probability of the semi-Markov process transiting to
state 𝑗 within a time interval less than or equal to t, provided starting from state, is
expressed as (Cinlar 1975):

(
Qij ( t ) = Pr X (Tn+1 ) = j , Tn+1 − Tn ≤ t , X (Tn ) = i ) t ≥ 0 (3.38)

The random time between every transition (Tn +1 − Tn ), sojourn time, has a CDF as:

( )
Fij ( t ) = Pr Tn +1 − Tn ≤ t X (Tn +1 ) = j , X (Tn ) = i (3.39)

If the sojourn time in a state depends only on the current visited state, then the
unconditional sojourn time in state i is Fij ( t ) = Fi ( t ) = ∑ j∈S Qij (t ). The  transition
probabilities of the semi-Markov process ( Q ( t ) = [Qij (t )], i , j ∈ S ), which is called
semi-Markov kernel, is the essential quantity of a semi-Markov process and satisfies
the relation:

Qij ( t ) = pij Fij ( t ) (3.40)

Equation 3.40 indicates that the transition of the semi-Markov model has two steps.
Figure 3.10 shows a sample degradation path of a system. The system is in the state i
at the initial time instance and transits to the next worse state j with transition prob-
ability pij . As the process is a monotone non-increasing function without considering
the maintenance, j = i +1 with probability one. Before moving into the next state j,
the process will wait for a random time with CDF Fij (t ). This process continues until
Application of Stochastic Processes in Degradation Modeling 101

FIGURE 3.10  A sample degradation process.

the process enters the state M that is an absorbing state. For this example the transi-
tion probability matrix is given as:

0 1 0 … 0
 
0 0 1 … 0
P = (3.41)
… … … … …
 
0 0 0 … 1

When the semi-Markov process is used to model the degradation process, the initial
state of the process, the transition probability matrix P, and matrix F(t ) must be
known. Another way of defining the semi-Markov process is knowing the kernel
matrix and the initial state probabilities.
Like previous models, it is important to find the state probabilities of the semi-
Markov process. The  probability that a semi-Markov process will be in state j at
time t ≥ 0 given that it entered state i at time zero, π ij ( t ) ≡ Pr { X ( t ) = j | X ( 0 ) = i},
is found as follows (Howard 1960; Kulkarni 1995):

π ij ( t ) = δ ij [1 − Fi (t ) ] + ∑∫q
k ∈S 0
ik (ϑ ) π kj ( t − ϑ ) dϑ (3.42)

dQik (ϑ )
qik (ϑ ) = (3.43)

1i = j
δ ij =  (3.44)
0 i ≠ j
102 Reliability Engineering

In  general, it is difficult to obtain the transition functions, even when the kernel
matrix is known. Equation  3.42 can be solved using numerical methods such as
quadrature method (Blasi et al. 2004; Corradi et al. 2004) and Laplace and inverse
Laplace transforms (Dui et  al. 2015) or simulation methods (Sánchez-Silva and
Klutke 2016).
Moreover, the stationary distribution π = (π j ; j ∈S ) of the semi-Markov process
is defined, when it exists, as:
υ jw j
π j := lim π ij ( t ) = (3.45)

M
t →∞
υi wi
i =0

where υ j for j ∈ S denotes the stationary probability of the embedded Markov chain
satisfying the property: υ j = ∑ iM= 0 υi pij , ∑ iM= 0 υi = 1, and w j for j ∈ S is the expected
sojourn time in state j.
For some systems, degradation transitions between two states and may depend on
the states involved in the transitions, the time spent at the current state (t), the time
that the system reached the current state (s), and/or the total age of the system (t+s).
As another extension, a nonhomogeneous semi-Markov process is used for model-
ing the degradation of such systems in which degradation transition can follow an
arbitrary distribution.
The associated non-homogeneous semi-Markov kernel is defined by:

(
Qij ( s, t ) = Pr X (Tn +1 ) = j , Tn +1 ≤ t , X (Tn ) = i ,Tn = s t ≥ 0 (3.46) )
In non-homogeneous semi-Markov, the state probabilities are defined and obtained
using the following equation:
t

π ij ( t ) = Pr { X ( t ) = j | X ( 0 ) = i} = δ ij [1 − Fi (t , s) ] + ∑∫q (s,ϑ )π
k∈S s
ik kj (t − ϑ )( dϑ ) (3.47)

The obtained state probabilities can be used to find different availability and reli-
ability indexes.

Example 3.3.2

Consider a system (or a component) whose possible states during its evolution in
time are S = {0,1, 2}. Denote by U = {0,1} the subset of working states of the system
and by D = {2} the failure state. In this system, both minor and major failures are
possible. The state transition diagram is shown in Figure 3.11.
The holding times are normally distributed, i.e., Fij ~ N(µij ,σ ij ) . Therefore, the
CDF of the holding time from state i to state j is:

t  (u − µij ) 
− 
1
Fij ( t ) =
∫e
 2σ ij 
du ∀i, j ∈ S
2πσ ij2 0
Application of Stochastic Processes in Degradation Modeling 103

FIGURE 3.11  State transition diagram for semi-Markov model.

The goal is to find the system reliability at time t given the best state is the initial
state of the system.

Solution 3.3.2: As the system is at state 0 at the beginning, the reliability of the
system at time t is the probability of transition from state 0 to state 2 at time t,
π 02 ( t ).
First, we find the kernel matrix of the semi-Markov process Q ( t ) = [Qij (t )], i, j∈ S:

0 Q01(t ) Q02 (t )
 
Q ( t ) = 0 0 Q12 (t ) 
0 0 0 

Q01(t ) is the probability that the process transitions from state 0 to 1 within a time
interval less than or equal to t that can be determined as the probability that the
time of transition from state 0 to 1 (T01) is less than or equal to t and the time of
transition from state 0 to 2 (T02) is greater than t.



Q 01( t ) = Pr(T01 ≤ t and T02 > t ) = 1 − F02 (t ) dF01(t )
0

Other values of the kernel matrix are obtained as:



Q 02 ( t ) = Pr(T02 ≤ t and T01 > t ) = 1 − F01(t ) dF02 (t )
0
Q12 ( t ) = Pr(T12 ≤ t ) = F12

According to Equation 3.42, the following system of equations has to be solved to


obtain the system reliability (π 02 (t ) ):

 t


π 02 ( t ) = q01 (ϑ ) π 12 (t − ϑ )dϑ


0
t

 ∫
 π 12 ( t ) = q12 (ϑ ) π 22 (t − ϑ )dϑ
0

π
 22 ( t ) = 1

104 Reliability Engineering

All these models presented are based on the assumption that the degradation
process is directly observable. However, in many cases, the degradation level is
not directly observable due to the complexity of the degradation process or the
nature of the product type. Therefore, to deal with indirectly observed states,
models such as hidden Markov models (HMM) and hidden semi-Markov mod-
els (HSMM) have been developed. The HMM deals with two different stochastic
processes: the unobservable degradation process and measurable characteristics
(which is dependent on the actual degradation process). In HHMs, finding a sto-
chastic relationship between unobservable degradation process and the output
signals of the observation process is a critical prerequisite for condition monitoring
and reliability analysis. As discussed, the details of HMM are beyond the scope of
this chapter, interested readers can refer to Shahraki et al. (2017 and Si et al. (2011)
for more details.

3.4  SUMMARY AND CONCLUSIONS


This chapter presented the application of stochastic processes in degradation mod-
eling to assess product/system performances. All the stochastic processes are cat-
egorized into continuous state and discrete state processes. Among the continuous
state stochastic processes, the Wiener, Gamma, and IG processes are discussed and
applied for degradation modeling of engineering systems using accelerated deg-
radation data. The  lifetime and reliability estimation approaches also are derived
based on stochastic degradation models. For accurately assessing the product perfor-
mances, appropriate selection of the stochastic process is crucial. The graphical and
statistical methods are presented to assist in successful selection of the best-fitted
degradation model for a case specific situation.
In addition, discrete state stochastic processes have been discussed and applied
to model the degradation of systems when their degraded states take values from
discrete space. The discrete- and continuous-time Markov chain models are used to
model the degradation process when the state transitions will happen at a discrete or
continuous time, respectively. In Markov chain models, the next state of the system
only depends on the current and not the history of the system (memoryless property)
that limits their application for some systems. As the extensions of the Markovian
model, aging Markovian deterioration and semi-Markov models are applied to cap-
ture the influence of the age and the history on the future states. The system reli-
ability is calculated for systems that are degrading with time after modeling their
degradation process using proper models.

REFERENCES
Blasi, A., Janssen, J. and Manca, R., 2004. Numerical treatment of homogeneous and non-
homogeneous semi-Markov reliability models. Communications in Statistics, Theory
and Methods 33(3): 697–714.
Chaluvadi, V. N. H., 2008. Accelerated life testing of electronic revenue meters. PhD disser-
tation, Clemson, SC: Clemson University.
Chen, A. and Wu, G.S., 2007. Real-time health prognosis and dynamic preventive main-
tenance policy for equipment under aging Markovian deterioration. International
Journal of Production Research 45(15): 3351–3379.
Application of Stochastic Processes in Degradation Modeling 105

Cinlar E., 1975. Introduction to Stochastic Processes. Englewood Cliffs, NJ: Prentice-Hall.
Corradi, G., Janssen, J. and Manca, R., 2004. Numerical treatment of homogeneous semi-
Markov processes in transient case—a straightforward approach. Methodology and
Computing in Applied Probability 6(2): 233–246.
Dui, H., Si, S., Zuo, M. J. and Sun, S., 2015. Semi-Markov process-based integrated impor-
tance measure for multi-state systems. IEEE Transactions on Reliability 64(2): 754–765.
Howard R. 1960. Dynamic Programming and Markov Processes, Cambridge, MA: MIT
press.
Kulkarni, V. G. 1995. Modeling and Analysis of Stochastic Systems, London, UK: Chapman
and Hall.
Limon, S., Yadav, O. P. and Liao, H., 2017a. A  literature review on planning and analysis
of accelerated testing for reliability assessment. Quality and Reliability Engineering
International 33(8): 2361–2383.
Limon, S., Yadav, O. P. and Nepal, B., 2017b. Estimation of product lifetime considering
gamma degradation process with multi-stress accelerated test data. IISE Annual
Conference Proceedings, pp. 1387–1392.
Limon, S., Yadav, O. P. and Nepal, B., 2018. Remaining useful life prediction using ADT data
with Inverse Gaussian process model. IISE Annual Conference Proceedings, pp. 1–6.
Lisnianski, A., Frenkel, I. and Ding, Y., 2010. Multi-state System Reliability Analysis and
Optimization for Engineers and Industrial Managers, Berlin, Germany: Springer
Science & Business Media.
Lisnianski, A. and Levitin, G., 2003. Multi-state System Reliability: Assessment,
Optimization, and Applications, Singapore: World scientific.
Liu, Y. W. and Kapur, K. K. C., 2007. Customer’s cumulative experience measures for reli-
ability of non-repairable aging multi-state systems. Quality Technology & Quantitative
Management 4(2): 225–234.
Moghaddass, R. and Zuo, M. J., 2014. An integrated framework for online diagnostic and
prognostic health monitoring using a multistate deterioration process. Reliability
Engineering & System Safety 124: 92–104.
Narendran, N. and Gu, Y., 2005. Life of led-based white light sources. Journal of Display
Technology 1: 167–171.
Nelson, W., 2004. Accelerated Testing: Statistical Models, Test Plans and Data Analysis (2nd
ed.), New York: John Wiley & Sons.
Ng, S. K. and Moses, F., 1998. Bridge deterioration modeling using semi-Markov theory.
A. A. Balkema Uitgevers B. V, Structural Safety and Reliability 1: 113–120.
O’Connor, P. D. D. T. and Kleyner, A., 2012. Practical Reliability Engineering (5th ed.),
Chichester, UK: Wiley.
Park, C. and Padgett, W. J., 2005. Accelerated degradation models for failure based on geo-
metric Brownian motion and gamma processes. Lifetime Data Analysis 11: 511–527.
Park, J. I. and Yum, B. J., 1997. Optimal design of accelerated degradation tests for estimating
mean lifetime at the use condition. Engineering Optimization 28: 199–230.
Ross, S., 1995. Stochastic Processes, New York: Wiley.
Sánchez-Silva, M. and Klutke, G. A., 2016. Reliability and Life-cycle Analysis of
Deteriorating Systems (Vol. 182). Cham, Switzerland: Springer International
Publishing.
Shahraki, A. F. and Yadav, O. P., 2018. Selective maintenance optimization for multi-
state systems operating in dynamic environments. In  2018  Annual Reliability and
Maintainability Symposium (RAMS). IEEE: pp. 1–6.
Shahraki, A. F., Yadav, O. P. and Liao, H., 2017. A  review on degradation modelling and its
engineering applications. International Journal of Performability Engineering 13(3): 299.
Sheu, S. H. and Zhang, Z. G., 2013. An optimal age replacement policy for multi-state
systems. IEEE Transactions on Reliability 62(3): 722–735.
106 Reliability Engineering

Sheu, S. H., Chang, C. C., Chen, Y. L. and Zhang, Z. G., 2015. Optimal preventive mainte-
nance and repair policies for multi-state systems. Reliability Engineering  & System
Safety, 140, 78–87.
Si, X. S., Wang, W., Hu, C. H. and Zhou, D. H., 2011. Remaining useful life estimation:
A review on the statistical data driven approaches. European Journal of Operational
Research 213(1): 1–14.
Trivedi, K, 2002. Probability and Statistics with Reliability, Queuing and Computer Science
Applications, New York: Wiley.
Wang, X. and Xu, D., 2010. An inverse Gaussian process model for degradation data.
Technometrics 52: 188–197.
Ye, Z. S. and Chen, N., 2014. The  inverse Gaussian process as a degradation model.
Technometrics 56: 302–311.
Ye, Z. S., Wang, Y., Tsui, K. L. and Pecht, M., 2013. Degradation data analysis using Wiener
processes with measurement errors. IEEE Transactions on Reliability 62: 772–780.
4 Building a Semi-automatic
Design for Reliability
Survey with Semantic
Pattern Recognition
Christian Spreafico and Davide Russo

CONTENTS
4.1 Introduction................................................................................................... 107
4.2 Research Methodology and Pool Definition.................................................. 109
4.2.1 Definition of the Electronic Pool....................................................... 109
4.2.2 Definition of the Features of Analysis............................................... 109
4.2.2.1 Goals................................................................................... 110
4.2.2.2 Strategies (FMEA Interventions)........................................ 110
4.2.2.3 Integrations......................................................................... 111
4.3 Semi-automatic Analysis............................................................................... 111
4.4 Results and Discussion.................................................................................. 115
4.5 Conclusions.................................................................................................... 119
References............................................................................................................... 120

4.1 INTRODUCTION
Almost 70 years after its introduction, Failure Modes and Effects Analysis (FMEA)
has been applied in a large series of cases from different sectors, such as automotive,
electronics, construction and services, and has become a standard procedure in many
companies for quality control and for the design of new products. FMEA has also a
great following in the scientific community as testified by the vast multitude of related
documents from scientific and patent literature; to date, more than 3,600  papers
in Scopus DB and 146  patents in Espacenet DB come up by just searching for
FMEA without synonyms, with a trend of constant growth over the years.
The  majority of those contributions deals with FMEA  modifications involving
the procedure and the integrations with new methods and tools to enlarge the field
of application and to improve the efficiency of the analysis, such as by reducing the
required time and by finding more results.
To be able to orientate among the many contributions, the surveys proposed in
the literature can play a fundamental role, which have been performed according to
different criteria of data gathering and classification.

107
108 Reliability Engineering

In [1] the authors analyzed scientific papers about the description and review of
basic principles, the types, the improvements, the computer automation codes, the
combination with other techniques, and specific applications of FMEA.
The literature survey in [2] analyzes the FMEA applications for enhancing service
reliability by determining how FMEA is focused on profit and supply chain-oriented
service business practices. The significant contribution consists in comparing what
previously was mentioned about FMEA research opportunities and in observing how
FMEA is related to enhancement in Risk Priority Number (RPN), reprioritization,
versatility of its application in service supply chain framework and non-profit service
sector, as well as in combination with other quality control tools, which are proposed
for further investigations.
In  [3], the authors studied 62  methodologies about risk analysis by separat-
ing them into three different phases (identification, evaluation, and hierarchiza-
tion) and by studying their inputs (plan or diagram, process and reaction, products,
probability and frequency, policy, environment, text, and historical knowledge),
the implemented techniques to analyze risk (qualitative, quantitative, determin-
istic, and ­ probabilistic), and their output (management, list, probabilistic, and
hierarchization).
In  [4], the authors analyzed the innovative proposed approaches to overcome
the limitations of the conventional RPN method within 75 FMEA papers published
between 1992 and 2012 by identifying which shortcomings attract the most attention,
which approaches are the most popular, and the inadequacy of approaches.
Other authors focused on analyzing specific applications of the FMEA approach.
In [5] the authors studied how 78 companies of motor industry in the United Kingdom
apply FMEA by identifying some common difficulties such as time constraints, poor
organizational understanding of the importance of FMEA, inadequate training, and
lack of management commitment.
However, despite the results achieved by these surveys, no overview considers
all the proposals presented, including patents, and analyzes at a higher level than
“simple” document counting within the cataloging classes and tools used.
To fulfill this aim, a previous survey  [6] considerably increased the number
of analyzed documents, by including also patents. In  addition, the analysis of the
content was improved by carrying out the analysis on two related levels: followed
strategies of intervention (e.g., reduce time of application) and integrated tools
(e.g., fuzzy logic). Although the results achieved are remarkable, the main limita-
tions of this analysis are the onerous amount of time required along with the number
of correlations between different aspects (e.g., problems and solutions, methods and
tools, etc.).
This  chapter proposes a semi-automatic semantic analysis about documents
related to FMEA modifications and the subsequent manual review for reassuming
each of them through a simple sentence made by a causal chain including the decla-
ration of the goals, the followed strategies (FMEA modifications), and integrations
with methods/tools.
This chapter is organized as follows. Section 4.2 presents the proposed procedure
of analysis, Section 4.3 proposes the results and the discussions, and Section 4.4 draws
conclusions.
Building a Semi-automatic Design for Reliability Survey 109

4.2  RESEARCH METHODOLOGY AND POOL DEFINITION


The first step of this work is the definition of the pool of documents to be analyzed:
starting from the same pool of documents in  [6] proposing FMEA modifications.
This  pool counts 286  documents, 177  scientific papers (165  from academia and
12  from industry), and 109  patents (23  from academia and 86  from industry).
Figure  4.1  shows the time distribution for patents and for scientific publications.
The number of patents is increasing, except for the last period that does not include
all potential patents since they are not disclosed for the first 18 months.

4.2.1 Definition of the Electronic Pool


In order to automatically process the collected documents through available tools
for semantic analysis, for each document, an XML file was manually created, which
was nominated with a unique ID and compiled according to a rigid structure where
each part of the original document was inserted within specific text fields (e.g., Title,
Abstract, Introduction, State_of_the_Art, Proposal).
The objective of this classification is to divide the original proposals from each
document, within the field Proposal, from the previous ones, reported within the
field State_of_the_Art, so as not distort the survey with redundant results, and to
provide the possibility to separately process the different parts to achieve specific
purposes (e.g., keywords investigation). In  addition, the comparison with the ID
allows referencing the content to the specific document.

4.2.2 Definition of the Features of Analysis


An additional preliminary activity deals with the definition of the features to be
analyzed. Since one purpose of the proposed method is to perform a deeper analy-
sis by relating different aspects, the features deliberately consider heterogeneous
aspects (goal, strategies of interventions, and integrations) and they work at different
levels of detail (e.g., goals and sub-goals, methods and tools).
Some features have been hypothesized a priori by considering previous
FMEA surveys, while others iteratively emerged during the analysis.
In the following discussion, the features are presented in detail.

80
70
203
60 Papers 17
50
40
30 23 86
20 Patents
10
0
(a) (b) Academia Industry

Papers Patents

FIGURE 4.1  (a) Time distribution (priority date) of the collected documents and (b) compo-
sition of the final set of documents (papers vs. patents and academia vs. industry).
110 Reliability Engineering

4.2.2.1 Goals
These features deal with targets that the authors who is proposing the analyzed
FMEA modifications wants to achieve through them. All of them focus on improving
the main aspects related to the applicability of the method (e.g., reducing the required
input, improving expected output, ameliorating the approach of the involved actors):

• Reduce FMEA time/costs of application by applying the modified FMEA


version to reduce: the number of participant (e.g., experts), the time required
to gather the useful information and perform the analysis ([9], [30], [36], [37],
[47], [52], [56], [64], [72], [78], [80], [90], [99], [132]).
• Reduce production time/costs of the considered product by using FMEA mod-
ifications for finding and preventing possible faults during production that can
cause possible delays or extra costs, without modifying product design ([35],
[43], [57], [63], [88], [89], [93], [109], [110], [119], [121], [126]).
• Improve design of the product by applying a modified FMEA during design
process in order to specifically change the design of the product in order to
make it: more robust (i.e., robust design), more able to meet the requirements,
or to not dissatisfy them (i.e., product re-design), more easily to be manufac-
tured (i.e., design for manufacturing) though a radically change of product’s
shape and components, more easily been repaired (i.e., design for mainte-
nance) ([15], [19], [23], [24], [25], [27], [39], [40], [49], [58], [61], [62], [65],
[69], [70], [76], [79], [87], [92], [94], [96], [100], [103], [104], [107], [114]).
• Analyze complex systems. If the modified version of FMEA has been spe-
cifically improved to manage products with a high number of component
and functionalities ([26], [31], [32], [82], [98], [118], [117], [124], [128]).
• Ameliorate human approach. If the modified version of FMEA is able to
improve the user interface, reduce its tediousness and better involve the user in
a more pro-active approach ([10], [13], [16], [22], [28], [29], [33], [34], [41], [42],
[46], [48], [50], [51], [53], [54], [55], [59], [60], [66], [68], [71], [73], [74], [77],
[81], [83], [84], [85], [86], [105], [106], [108], [111], [112], [113], [115], [116],
[122], [123], [125], [130], [131]).

4.2.2.2  Strategies (FMEA Interventions)


These features investigate the strategies of intervention on FMEA  structure, or the
parts/steps of the traditional procedure that are modified by the considered documents:

• Improve/automate Bill of Material (BoM) determination to provide criteria


to (1) identify the parts (e.g., sub-assemblies and single components) and
their useful features and attributes and (2) facilitate the management of the
parts and their relations.
• Improve/automate function determination by suggesting modalities to
identify and describe product requirements, functions and sub-functions,
and associate them to the related parts.
• Improve/automate failure determination to increase the number of consid-
ered failure modes, effects and causes, identify their relations, and improve
their representation by introducing supporting models.
Building a Semi-automatic Design for Reliability Survey 111

• Improve/automatize Risk Analysis by overcoming the main limitations


of traditional indexes by providing explanations about their uses or new
complementary or alternative methods ([14], [18], [20], [21], [44], [45], [55],
[75], [91], [95], [120]).
• Improve/automate problem solving by improving the decision making and
solving phase.

4.2.2.3 Integrations
The following kinds of integrations have been collected:

• Templates (e.g., tables and matrices) to organize and manage the bill of
material, the list of functions and faults, and the related risk.
• Database (DB) containing information about product parts, functions,
historical failures, risk, and the related economic quantifications. They are
used to automatically or manually gather the content for the analysis.
• Tools for fault analysis (Fault A.) including Fault Tree Analysis (FTA),
Fishbone diagram and Root Cause Analysis (RCA) ([17], [38]).
• Interactive graphical interfaces or software that directly involve user inter-
actions through graphical elements and representations (e.g., plant schemes
and infographics) for data entry and visualization.
• Artificial Intelligence (AI) based tools involving Semantic Recognition and
Bayesian Networks ([12], [67], [102], [125], [127], [129], [133]).

Other considered integrations are function analysis (FA), fuzzy logic, Monte Carlo
method, quality function deployment (QFD), hazard and operability study (HAZOP),
ontologies, theory of inventive problem solving (TRIZ), guidelines, automatic mea-
surements (AM) methods, brainstorming techniques, and cognitive maps (C Map).

4.3  SEMI-AUTOMATIC ANALYSIS


At this point, the defined features have been semi-automatically investigated within
the collected pool using a software for semantic analysis. The first step of the proce-
dure deals with the manual translation of each considered feature into one or more
search queries consisting of single keywords (e.g., name, verb, adjective).
For each keyword, the software provides its main linguistic relations with other
term found within the specific sentences of the documents through semantic analysis.
The  kinds of relations are different depending on the linguistic nature of the
used keyword. If a substantive (e.g., FMEA) is used, then the following can be
identified: the modifiers, or adjectives or substantives acting as adjectives (e.g., tra-
ditional FMEA, fuzzy FMEA, cost-based FMEA), nouns and verbs modified by the
keyword (e.g., FMEA  table, FMEA  sheet), verbs with the keyword used as object
(e.g., executing FMEA, evaluate FMEA), verbs with the keyword used as subject
(e.g., FMEA is …, FMEA generates …), substantives linked to the keyword through
AND/OR relations (e.g., FMEA  and QFD, FMEA  and risk), prepositional phrases
112 Reliability Engineering

TABLE 4.1
Keywords Used to Explain the Features Through the Queries
Generic terms
Name Verbs FMEA Terms Methods/Tool
FMEA, Human, Approach, Improve, Anticipate, Failures, Modes, Fuzzy, TRIZ, Database,
Design, Production, Ameliorate, Effects, Cause, Artificial Intelligence,
Maintenance, Time, Automatize, Analyze, Risk, Solving, QFD, Function
Costs, Problem Reduce, Eliminate, Decision making Analysis, etc.
Solve

(e.g., … of FMEA, … through FMEA). When a verb is used as keyword, the follow-


ing can be identified: its modifiers (e.g., effectively improve), the objects (e.g., improve
quality, improve design), the subjects (e.g., QFD improves), and other particles used
before or after the verb (e.g., improve and evaluate).
In this way, by using the restricted number of keywords, reported in Table 4.1, all
the features can be easily investigated.
Thus, the translation of a generic feature (e.g., ameliorate human approach)
depends on the manual formulation of a keyword (e.g., ameliorate), the automatic
processing, and the manual research of the more suitable relations to express the
features itself (e.g., ameliorate + human approach).
However, since the features can be expressed in a variety of ways, by increasing
the number of alternative keywords, the number of pertinent identified documents
also increases (recall). What achieves this aim is the expansion of the synonyms
(e.g., improve in addition to ameliorate) and the research of the alternative forms that
can be used to express the feature (e.g., Reduce Tediousness and Reduce Subjectivity
for Improve Human Approach).
The research of specific terms, such as the name of the integrated tools (e.g., fuzzy,
TRIZ, QFD), can instead be carried out according to different strategies: (1) includ-
ing them within the keywords, (2) using verbs (e.g., introduce, integrate), and search-
ing the tools among the objects (e.g., Introduce fuzzy logic), (3) using the modifiers
of FMEA (e.g., fuzzy FMEA), and (4) searching the relations between FMEA and
linguistic particles (e.g., FMEA and TRIZ).
Then, for each interesting relation identified, the software provides the list of the
related sentences for each document manually checked in order evaluate its adher-
ence with the investigated feature.
At this point, each selected sentence is summarized through a triad consisting of
subject + verb + object.
Table 4.2 shows the followed steps to define the triads in the paper proposed in [7].
All the identified triads are then collected within a table (as shown in Table 4.3),
the data for each document (row) is organized according to the features (columns),
where, in each cell, the subject of a triad is reported (e.g., The  improved failure
Building a Semi-automatic Design for Reliability Survey 113

TABLE 4.2
Example of the Strategy Used to Build the Triads
Considered document
Investigated Used Syntactic Triad Subject +
Features Keyword Parser Related Sentence Verb + Object
Ameliorate Improve Improve + The objective of this paper is The improved
Human Human to propose a new approach Failure Modes
Approach Approach for simplifying FMEA by Determination
determining the failures in a ameliorates human
more practical way by better approach
involving the problem solver
in a more pro-active and
creative approach
Improve Failure Improve Improve + Perturbed Functional Analysis Perturbed Function
Modes Failure is proposed in order to Analysis improves
determination Modes improve the capability of Failure Modes
determine Failure Modes determination
Introduce TRIZ TRIZ TRIZ + Specifically, an inedited The authors propose
Perturbed version of TRIZ function the Perturbed
Function analysis, called “Perturbed Function Analysis
Analysis Function Analysis” is
(Modifier) proposed

Source: Spreafico, C. and Russo, D., Can TRIZ functional analysis improve FMEA? Advances in
Systematic Creativity Creating and Managing Innovations, Palgrave Macmillan, Cham,
Switzerland, pp. 87–100, 2019.

TABLE 4.3
An Extract from the Table of Comparison of the Documents and the Triads
Features
Goal Strategy Methods/Tools
Ameliorate Improve Failures Introduce Perturbed
Document Human Approach … Determination … Function Analysis …
[7] The improved … Perturbed Function … The authors …
failure modes Analysis
… … … … … … …
114 Reliability Engineering

Why? Why?

Spreafico and Ameliorate Improve Perturbed


Russo (2019) Human Failures Funcon
Approach Determinaon Analysis

How? How?
Node N+1 Node N Node N-1

PART 1 - GOAL PART 1 - STRATEGY PART 3 -


METHODS/TOOLS

FIGURE 4.2  Example of a causal chain constituted by goal, strategy, and method/tool.

modes) related to a determined feature that has been redefined by using the verb and
the object of the triad (e.g., ameliorates human approach).
Therefore, the identified subjects are used as links to build the causal chains,
starting from the latter ones, related to the integrations with methods and tools.
For example, the causal chain resulting from the previous example (Table 4.3) is the
authors introduce the Perturbed Function Analysis (METHOD/TOOL) IN ORDER
TO Improve the failure identification (STRATEGY) IN ORDER TO Ameliorate
Human Approach (GOAL).
By reading the causal chain in this manner, the logic on its base is the following:
each node provides the explanation of the existence of the previous one (WHY?) and
it represents a way to obtain the next one (HOW?).
Figure  4.2 shows an example of the simpler causal chain that can be built,
which is constituted by one goal (i.e., Ameliorate Human Approach), one strategy
(e.g., Improve Failure Determination), and one integration with methods or tools
(i.e., The Perturbed Function Analysis).
This example represents the simplest obtained causal chain, consisting of only
three nodes arranged in sequence: one for the goals, one for the strategies, and one
for the integrations with methods/tools.
However, the structure of the causal chain can be more complex because the num-
ber of nodes can increase and their reciprocal disposition can change from series to
parallel and by a mix of both.
In the first case (nodes in series), each intermediate node is preceded (on the left)
by another node expressing its motivation (WHY?—relation) and it is followed by
another representing a way to realize it (HOW?—relation). More goals can be con-
nected in the same way, through their hierarchization: e.g., the goal “reduce the
number of experts” can be preceded by the more generic goal “reduce FMEA costs.”
The  same reasoning is valid for the strategies and the integrations with methods/
tools. In particular, in this case, we stratified them into four hierarchical levels: (1)
theories and logics (e.g., fuzzy logic), (2) methods (e.g., TRIZ), (3) tool, which can be
included in the methods (e.g., FA is part of TRIZ), and (4) knowledge sources (e.g.,
costs DB).
Building a Semi-automatic Design for Reliability Survey 115

Automate
Reduce FMEA
Failures Fuzzy logic Failures DB
me/costs
CN202887188 determinaon

Analyze complex Automate Risk


Fuzzy logic Risks DB
systems Analysis

FIGURE 4.3  Example of a complex causal chain obtained from the patent. (From Ming, X.
et al., System capable of achieving failure mode and effects analysis (FMEA) data multi­
dimension processing, CN202887188, filed June 4, 2012, and issued April 17, 2013.
Representation is courtesy of the authors.)

In the second case (nodes in parallel), two or more nodes can concurrently pro-
vide a motivation for a previous node or be two possibilities to realize the subsequent
node.
As example of a more complex causal chain, consider the Chinese patent  [8].
Table 4.4 represents an extract from the table of comparison relative to this docu-
ment: as can be seen, the resulting relations between the included subjects and the
features are more complex and interlaced in comparison to the example shown in
Table 4.3.
Figure 4.3 represents the causal chain obtained for this document. In this case the
two nodes reduce FMEA time/costs and analyze complex systems represent the two
main independent goals pursued by this contribution. The  two nodes Automate
Failure Determination and Automate Risk Analysis are the two followed strategies
both for reduce FMEA  time/cost” and to analyze Complex Systems. Finally, the
node fuzzy logic represents a high-level integration to realize the two strategies,
while a failure DB and a risk DB have been used to provide the knowledge for a
fuzzy logic-based reasoning in two different ways: the first one is used for Automate
the Failure determination (through fuzzy logic) and the second one is to Automate
Risk Analysis (through fuzzy logic).

4.4  RESULTS AND DISCUSSION


The  proposed methodology has been tested during two distinct phases. During
phase 1 (automatic semantic analysis), all the documents in the selected pool were
processed because the algorithm of semantic parsing of the used tool is strictly influ-
enced by number of analyzed sentences in terms of founded linguistic synonyms
and relations. During phase 2 (manual review and causal chains building), instead
a restricted set of documents was considered to test the methodology in a restricted
time period under a temporal burden of required operations.
To obtain a significant sample, the documents were selected based on the typol-
ogy (papers or patents), date of publication, kind of source (for papers—journal or
proceedings), and nationality (for patents). The resulting sample counts 127 docu-
ments consisting of 80 papers and 47 patents.
After the sample was processed, the features were investigated, and the docu-
ments were classified, one causal chain was built for each document, which usually
116

TABLE 4.4
An Extract from the Table of Comparison of the Documents and the Triads, Line of the Document
Features
Goal Strategy Methods/Tools
Reduce Analyze Complex Automate Failure Automate Introduce Fuzzy
Document FMEA Time/Costs Systems Determination Risk Analysis Logic Introduce Failure DB Introduce Risk DB
[8] Automate Failure Automate Failure Fuzzy logic Fuzzy logic Failure DB The authors The authors
Determination Determination
Automate Risk Automate Risk Risk DB
Analysis Analysis

Source: Ming, X. et al., System capable of achieving failure mode and effects analysis (FMEA) data multi-dimension processing, CN202887188, filed June 4, 2012, and
issued April 17, 2013.
Reliability Engineering
Building a Semi-automatic Design for Reliability Survey 117

consists of more than four nodes, including at least one for each part (goal, strategy,
and integration). The total number of the causal chains is the same of the analyzed
document (127), since their correspondence is biunivocal: for each document there
was only one causal chain and vice versa.
In  general, the more followed goals are Improve Design and Improve Human
Approach, which together are contained within 61 percent of the triads, while the
more considered strategies are related to the failure determination (automate and
improve), followed by Automate Risk Analysis.
Among the integrations with methods and tools, fuzzy logic and databases are
the most diffused, respectively, with 37 and 28 occurrences within the causal chains,
followed by the interface with 23 occurrences.
More detailed considerations are possible by analyzing the relations between goals
and strategies. In fact, the two more diffused strategies are considered differently:
those for failure determination are implemented to realize all the goals, while those
for Improving Risk Analysis are especially considered to Improve Human Approach
but practically ignored for achieving other purposes (i.e., Improve Design and
Analyze Complex Systems).
Other considerations can be done by comparing the couplings between multiple
goals, strategies, and tools.
By comparing the combinations between goals, the most considered combina-
tions found are: Improve Design—Improve Human Approach (8 occurrences) and
Improve Design—Analyze Complex Systems (7 occurrences), and Improve Human
Approach—Reduce Production Time/Costs (7 occurrences).
Among the combinations of the strategies that  emerged, the most considered
combinations are: Automate Failure Determination—Automate Risk Analysis
(12  occurrences) and Automate Failure Determination—Improve Risk Analysis
(7 occurrences).
Finally, the analysis of the multiple integrations revealed that the common cou-
pling is between fuzzy logic and DBs with 6 occurrences.
A deeper analysis can be done by considering the causal chains. Among the dif-
ferent possibilities, the most significant deals with the comparison of the common
triads, or the combinations of three nodes: goal, strategy, and integration. In this way,
a synthetic but sufficiently significant indication is obtained to understand how the
authors are working to improve FMEA.
Figure 4.4 shows the tree map of the common triads, where the five main areas
are the goals, their internal subdivisions (colored) represent the strategies, in turn
divided between the integrations, where are reported the documents index (please
refer to the legend).
For example, analyzing the graph shows that the three documents [11,97,101] pro-
pose modified versions of FMEA  based on the same common triad, or with the
objective to Improve Design phase, by improving the determination of the failures
through the introduction of databases (DB). Other goals, strategies, or integrations
differentiate the three contributions.
Analysis of the common triad shows that the most diffused consider the
goal Improve Human Approach: Improve Human Approach—Improve Risk
Analysis—Fuzzy (8  documents), Improve Human Approach—Improve Function
118 Reliability Engineering

FIGURE 4.4  Main solutions proposed in papers and patents to improve FMEA, represented
through triads (goal, strategy, and method/tool).
Building a Semi-automatic Design for Reliability Survey 119

Determination—Interface (5 documents), and Improve Human Approach—Improve


BoM Determination—Interface (5 documents).
By considering the triads, some more interesting observations can be made about
the integrations. In  general, their distribution is quite heterogeneous in relation
to strategies and goals. In  fact, fuzzy logic almost always has been introduced to
Improve and Automate Risk Analysis to achieve all the goals, while has been used
for improving Failure Determination or automate it, but only in order to Improve
Design but not for other purposes.
Another case is represented by the interfaces, which have been introduced to
improve almost the strategies and goals.
Other integrations instead are related almost exclusively to same strategy for
achieving each goal. This is the case of the databases used to Automate Risk Analysis
and secondly to Automate Failures Determination and guidelines that generally are
used to Automate or to Improve Risk Analysis.

4.5 CONCLUSIONS
In  this chapter a method for performing semi-automatic semantic analysis about
FMEA  documents has been presented and applied on a pool of 127  documents,
consisting of paper and patents, selected from international journals, conference
proceedings, and international patents.
As a result, each document has been summarized through a specific causal chain
including its considered goals (i.e., Improve Design, Improve Human Approach,
Reduce FMEA  Time/Costs, Reduce Production Time/Costs, Analyze Complex
Systems), its strategies of intervention (Improve/Automate BoM, Function, Failures
Determination, Risk Analysis and Problem solving) and the integrated methods,
tools, and knowledge sources.
The main output of this work is summarized in an infographic based on a Treemap
diagram style comparing all the considered documents on the basis of the common
elements in their causal chains, which highlights the more popular direction at dif-
ferent levels of detail (i.e., strategies, methods, and tools) of intervention in relation
to the objective to pursue.
The consistent reduction of required time along with the number of considered
analyzed sources and the level of deepening of the same, represented by the ability to
determine the relationships between the different parameters of the analysis within
the causal chain, are elements of novelty compared to previous surveys, which could
positively impact scientific research in the sector.
The  main limitations of the approach consist of the complexity of the manual
operations required to define the electronic pool and to create part of the relations
within the causal chains, which will be partly solved by automating the method for
future developments.
120 Reliability Engineering

REFERENCES
1. Bouti, A., and Kadi, D. A. 1994. A  state-of-the-art review of FMEA/FMECA.
International Journal of Reliability Quality and Safety Engineering 1(04): 515–543.
2. Sutrisno, A., and Lee, T. J. 2011. Service reliability assessment using failure mode and
effect analysis (FMEA): Survey and opportunity roadmap. International Journal of
Engineering Science and Technology 3(7): 25–38.
3. Tixier, J., Dusserre, G., Salvi, O., and Gaston, D. 2002. Review of 62 risk analysis meth-
odologies of industrial plants. Journal of Loss Prevention in the Process Industries
15(4): 291–303.
4. Liu, H. C., Liu, L., and Liu, N. 2013. Risk evaluation approaches in failure mode and
effects analysis: A literature review. Expert Systems with Applications 40(2): 828–838.
5. Dale, B. G., and Shaw, P. 1990. Failure mode and effects analysis in the UK motor
industry: A state‐of‐the‐art study. Quality and Reliability Engineering International
6(3): 179–188.
6. Spreafico, C., Russo, D., and Rizzi, C. 2017. A state-of-the-art review of FMEA/FMECA
including patents. Computer Science Review 25: 19–28.
7. Spreafico, C., & Russo, D. (2019). Case: Can TRIZ Functional Analysis Improve FMEA?
In Advances in Systematic Creativity (pp. 87–100). Palgrave Macmillan, Cham.
8. Ming, X., Zhu, B., Liang, Q., Wu, Z., Song, W., Xia R., and Kong, F. 2013. System
capable of achieving failure mode and effects analysis (FMEA) data multi-dimension
processing. CN202887188, filed June 4, 2012, and issued April 17, 2013.
9. Ahmadi, M., Behzadian, K., Ardeshir, A., and Kapelan, Z. 2017. Comprehensive risk
management using fuzzy FMEA  and MCDA  techniques in highway construction
projects. Journal of Civil Engineering and Management 23(2): 300–310.
10. Almannai, B., Greenough, R., and Kay, J. 2008. A decision support tool based on QFD
and FMEA for the selection of manufacturing automation technologies. Robotics and
Computer-Integrated Manufacturing 24(4): 501–507.
11. Arcidiacono, G., and Campatelli, G. 2004. Reliability improvement of a diesel engine
using the FMETA approach. Quality and Reliability Engineering International 20(2):
143–154.
12. Augustine, M., Yadav, O. P., Jain, R., and Rathore, A. 2009. Modeling physical systems
for failure analysis with rate cognitive maps. Industrial Engineering and Engineering
Management. IEEM 2009 IEEE International Conference 1758–1762.
13. Lai, J., Zhang, H., & Huang, B. (2011, June). The object-FMA based test case generation
approach for GUI software exception testing. In the Proceedings of 2011 9th International
Conference on Reliability, Maintainability and Safety (pp. 717–723). IEEE.
14. Banghart, M., and Fuller, K. 2014. Utilizing confidence bounds in Failure Mode Effects
Analysis (FMEA) hazard risk assessment. Aerospace Conference, 2014 IEEE 1–6.
15. Bertelli, C. R., and Loureiro, G. 2015. Quality problems in complex systems even con-
sidering the application of quality initiatives during product development. ISPE CE
40–51.
16. Bevilacqua, M., Braglia, M., and Gabbrielli, R. 2000. Monte Carlo simulation
approach for a modified FMECA in a power plant. Quality and Reliability Engineering
International 16(4): 313–324.
17. Bluvband, Z., Polak, R., and Grabov, P. 2005. Bouncing failure analysis (BFA):
The  unified FTA-FMEA  methodology. Reliability and Maintainability Symposium
Proceedings Annual 463–467.
18. Bowles, J. B., and Peláez, C. E. 1995. Fuzzy logic prioritization of failures in a system
failure mode, effects and criticality analysis. Reliability Engineering & System Safety
50(2): 203–213.
Building a Semi-automatic Design for Reliability Survey 121

19. Braglia, M., Fantoni, G., and Frosolini, M. 2007. The house of reliability. International
Journal of Quality & Reliability Management 24(4): 420–440.
20. Braglia, M., Frosolini, M., and Montanari, R. 2003. Fuzzy TOPSIS approach for
failure mode, effects and criticality analysis. Quality and Reliability Engineering
International 19(5): 425–443.
21. Doskocil, D. C., and Offt, A. M. 1993. Method for fault diagnosis by assessment
of confidence measure. CA2077772, filed September 9, 1992, and issued April 25,
1993.
22. Draber S. 2000. Method for determining the reliability of technical systems.
CA2300546, filed March 7, 2000, and issued September 8, 2000.
23. Chang, K. H., and Wen, T. C. 2010. A novel efficient approach for DFMEA combining
2–tuple and the OWA operator. Expert Systems with Applications 37(3): 2362–2370.
24. Chen, L. H., and Ko, W. C. 2009. Fuzzy linear programming models for new product
design using QFD with FMEA. Applied Mathematical Modelling 33(2): 633–647.
25. Chin, K. S., Chan, A., and Yang, J. B. 2008. Development of a fuzzy FMEA based prod-
uct design system. The International Journal of Advanced Manufacturing Technology
36(7–8): 633–649.
26. Zhang, L., Liang, W., and Hu, J. 2011. Modeling method of early warning model of
mixed failures and early warning model of mixed failures. CN102262690, filed June 7,
2011, and issued November 30, 2011.
27. Pan, L., Chin, X., Liu, X., Wang, W., Chen, C., Luo, J., Peng, X. et al., 2012. Intelligent
integrated fault diagnosis method and device in industrial production process.
CN102637019, filed February 10, 2011, and issued August 15, 2012.
28. Ming, X., Zhu, B., Liang, Q., Wu, Z., Song, W., Xia, R., and Kong, F. 2012. System
for implementing multidimensional processing on failure mode and effect analysis
(FMEA) data, and processing method of system. CN102810112, filed June 4, 2012, and
issued December 5, 2012.
29. Li, G., Zhang, J., and Cui, C. 2012. FMEA (Failure Mode and Effects Analysis) pro-
cess auxiliary and information management method based on template model and text
matching. CN102831152, filed June 28, 2012, and issued December 19, 2012.
30. Li, R., Xu, P., and Xu, Y. 2012. Accidence safety analysis method for nuclear fuel repro-
cessing plant. CN102841600, filed August 24, 2012, and issued December 26, 2012.
31. Jia, Y., Shen, G., Jia, Z., Zhang, Y., Wang, Z., and Chen, B. 2013. Reliability com-
prehensive design method of three kinds of functional parts. CN103020378, filed
December 26, 2012, and issued April 3, 2013.
32. Chen, Y., Zhang, X., Gao, L., and Kang, R. 2014. Newly-developed aviation electronic
product hardware comprehensive FMECA method. CN103760886, filed December 2,
2013, and issued April 30, 2014.
33. Liu, Y., Deng, Z., Liu, S., Chen, X., Pang, B., Zhou, N., and Chen, Y. 2014. Method
for evaluating risk of simulation system based on fuzzy FMEA. CN103902845, filed
April 25, 2014, and issued July 2, 2014.
34. He, C., Zhao, H., Liu, X., Zong, Z., Li, L., Jiang, J., and Zhu, J. 2014. Data mining-based
hardware circuit FMEA (Failure Mode and Effects Analysis) method. CN104198912,
filed July 24, 2014, and issued December 10, 2014.
35. Xu, H., Wang, Z., Ren, Y., Yang D., and Liu, L. 2015. Failure knowledge storage and
push method for FMEA  (failure mode and effects analysis) process. CN104361026,
filed October 22, 2014, and issued February 18, 2015.
36. Tang, Y., Sun, Q., and Lü, Z. 2015. Failure diagnosis modeling method based on design-
ing data analysis. CN104504248, filed December 5, 2014, and issued April 8, 2015.
37. David, P., Idasiak, V., and Kratz, F. 2010. Reliability study of complex physical systems
using SysML. Reliability Engineering & System Safety 95(4): 431–450.
122 Reliability Engineering

38. Demichela, M., Piccinini, N., Ciarambino, I., and Contini, S. 2004. How to avoid the
generation of logic loops in the construction of fault trees. Reliability Engineering &
System Safety 84(2): 197–207.
39. Deshpande, V. S., and Modak, J. P. 2002. Application of RCM to a medium scale
industry. Reliability Engineering & System Safety 77(1): 31–43.
40. Doble, M. 2005. Six Sigma and chemical process safety. International Journal of Six
Sigma and Competitive Advantage 1(2): 229–244.
41. Van Bossuyt, D., Hoyle, C., Tumer, I. Y., and Dong, A. 2012. Risk attitudes in risk-
based design: Considering risk attitude using utility theory in risk-based design. AI
EDAM 26(4): 393–406.
42. Ebrahimipour, V., Rezaie, K., and Shokravi, S. 2010. An ontology approach to support
FMEA studies. Expert Systems with Applications 37(1): 671–677.
43. Draber, C. D. 2000. Method for determining the reliability of technical systems.
EP1035454, filed March 8, 1999, and issued September 8, 2000.
44. Eubanks, C. F., Kmenta, S., and Ishii, K. 1996. System behavior modeling as a basis
for advanced failure modes and effects analysis. ASME Computers in Engineering
Conference, Irvine, CA, pp. 1–8.
45. Eubanks, C. F., Kmenta, S., and Ishii, K. 1997. Advanced failure modes and effects
analysis using behavior modeling. ASME Design Engineering Technical Conferences,
Sacramento, CA, pp. 14–17.
46. Gandhi, O. P., and Agrawal, V. P. 1992. FMEA—A  diagraph and matrix approach.
Reliability Engineering & System Safety 35(2): 147–158.
47. Hartini, S., Nugroho, W. P., and Subekti, K. R. 2010. Design of Equipment Rack with
TRIZ Method to Reduce Searching Time in Change Over Activity (Case Study: PT.
Jans2en Indonesia). Proceedings of the Apchi Ergo Future.
48. Hassan, A., Siadat, A., Dantan, J. Y., and Martin, P. 2010. Conceptual process plan-
ning–an improvement approach using QFD, FMEA, and ABC methods. Robotics and
Computer-Integrated Manufacturing 26(4): 392–401.
49. Hu, C. M., Lin, C. A., Chang, C. H., Cheng, Y. J., and Tseng, P. Y. 2014. Integration with
QFDs, TRIZ and FMEA for control valve design. Advanced Materials Research Trans
Tech Publications 1021: 167–180.
50. Jenab, K., Khoury, S., and Rodriguez, S. 2015. Effective FMEA  analysis or not.
Strategic Management Quarterly 3(2): 25–36.
51. Jong, C. H., Tay, K. M., and Lim, C. P. 2013. Application of the fuzzy failure mode and
effect analysis methodology to edible bird nest processing. Computers and Electronics
in Agriculture 96: 90–108.
52. Koizumi, A., Shimokawa K., and Isaki, Y. 2003. Fmea system. JP2003036278, filed
July 25, 2001, and issued February 7, 2003.
53. Wada, T., Miyamoto, Y., Murakami, S., Sugaya, A., Ozaki Y., Sawai, T., Matsumoto, S.
et al., 2003. Diagnosis rule structuring method based on failure mode analysis, diagnosis
rule creating program, and failure diagnosis device. JP2003228485, filed February 6,
2002, and issued August 15, 2003
54. Yatake, H., Konishi, H., and Onishi T. 2009. Fmea sheet creation support system and
creation support program. JP2011008355, filed June 23, 2009.
55. Suzuki, K., Hayata, A., and Yoshioka, M. 2009. Reliability analysis device and method.
JP2011113217, issued November 25, 2009.
56. Kawai, M., Hirai, K., and Aryoshi, T. 1990. Fmea simulation method for analyzing
circuit. JPH0216471, filed July 4, 1988, and issued January 19, 1990.
57. Sonoda, Y., and Kageyama., T. 1992. Plant diagnostic device. JPH086635, filed May 9,
1990, and issued January 23, 1992.
Building a Semi-automatic Design for Reliability Survey 123

58. Kim, J. H., Kim, I. S., Lee, H. W., and Park, B. O. 2012. A Study on the Role of TRIZ
in DFSS. SAE International Journal of Passenger Cars-Mechanical Systems 5(2012–
01–0068): 22–29.
59. Kimura, F., Hata, T., and Kobayashi, N. 2002. Reliability-centered maintenance plan-
ning based on computer-aided FMEA. Proceeding of the 35th CIRP-International
Seminar on Manufacturing Systems 506–511.
60. Kmenta, S., and Ishii, K. 2000. Scenario-based FMEA: A life cycle cost perspective.
Proceedings of ASME Design Engineering Technical Conference, Baltimore, MD.
61. Kmenta, S., and Ishii, K. 2004. Scenario-based failure modes and effects analysis using
expected cost. Journal of Mechanical Design 126(6): 1027–1035.
62. Kmenta, S., and Ishii, K. 1998. Advanced FMEA using meta behavior modeling for
concurrent design of products and controls. Proceedings of the 1998 ASME Design
Engineering Technical Conferences.
63. Kmenta, S., Cheldelin, B., and Ishii, K. 2003. Assembly FMEA: A simplified method
for identifying assembly errors. ASME 2003 International Mechanical Engineering
Congress and Exposition 315–323.
64. Lee, M. S., and Lee, S., H. 2013. Real-time collaborated enterprise asset management
system based on condition-based maintenance and method thereof. KR20130065800,
filed November 30, 2011, and issued June 24, 2013.
65. Choi, S. H., Kim, G. H., Cho, C. H., and Kim, Y., G. 2013. Reliability centered main-
tenance method for power generation facilities. KR20130118644, filed April 20, 2012,
and issued December 12, 2013.
66. Lim, S. S., and Lee, J., Y. 2014. Intelligent failure asset management system for railway
car. KR20140036375, filed September 12, 2012, and issued March 3, 2014.
67. Ku, C., Chen, Y. S., and Chung, Y. K. 2008. An intelligent FMEA system implemented
with a hierarchy of back-propagation neural networks. Cybernetics and Intelligent
Systems IEEE Conference 203–208.
68. Kutlu, A. C., and Ekmekçioğlu, M. 2012. Fuzzy failure modes and effects analysis by
using fuzzy TOPSIS-based fuzzy AHP. Expert Systems with Applications 39(1): 61–67.
69. Laaroussi, A., Fiès, B., Vankeisbelckt, R., and Hans, J. 2007. Ontology-aided
FMEA  for construction products. Bringing ITC knowledge to work. Proceedings of
W78 Conference 26(29): 6.
70. Lee, B. H. 2001. Using FMEA models and ontologies to build diagnostic models. AI
EDAM 15(4): 281–293.
71. Lindahl, M. 1999. E-FMEA—a new promising tool for efficient design for environment.
Proceedings of Environmentally Conscious Design and Inverse Manufacturing 734–739.
72. Liu, H. T. 2009. The extension of fuzzy QFD: From product planning to part deploy-
ment. Expert Systems with Applications 36(8): 11131–11144.
73. Liu, J., Martínez, L., Wang, H., Rodríguez, R. M., and Novozhilov, V. 2010. Computing
with words in risk assessment. International Journal of Computational Intelligence
Systems 3(4): 396–419.
74. Liu, H. C., Liu, L., Liu, N., and Mao, L. X. 2013. Risk evaluation in failure mode
and effects analysis with extended VIKOR method under fuzzy environment. Expert
Systems with Applications 40(2): 828–838.
75. Grantham, K. (2007). Detailed risk analysis for failure prevention in conceptual design:
RED (Risk in early design) based probabilistic risk assessments.
76. Mader, R., Armengaud, E., Grießnig, G., Kreiner, C., Steger, C., and Weiß, R. 2013.
OASIS: An automotive analysis and safety engineering instrument. Reliability
Engineering & System Safety 120: 150–162.
77. Mandal, S., and Maiti, J. 2014. Risk analysis using FMEA: Fuzzy similarity value and
possibility theory-based approach. Expert Systems with Applications 41(7): 3527–3537.
124 Reliability Engineering

78. Montgomery, T. A., and Marko, K. A. 1997. Quantitative FMEA  automation.


Proceedings of Reliability and Maintainability Symposium 226–228.
79. Moratelli, L., Tannuri, E. A., and Morishita, H. M. 2008. Utilization of FMEA during
the preliminary design of a dynamic positioning system for a Shuttle Tanker. ASME
27th International Conference on Offshore Mechanics and Arctic Engineering, Estoril,
Portugal, pp. 787–796.
80. Ormsby, A. R. T., Hunt, J. E., and Lee, M. H. 1991. Towards an automated FMEA assis-
tant. Applications of Artificial Intelligence in Engineering VI, Springer, the Netherlands
739–752.
81. Ozarin, N. 2008. What’s wrong with bent pin analysis, and what to do about it.
Reliability and Maintainability Symposium, Washington, DC: IEEE Computer Society,
pp. 386–392.
82. Pelaez, C. E., and Bowles, J. B. 1995. Applying fuzzy cognitive-maps knowledge-
representation to failure modes effects analysis. Proceedings of Reliability and
Maintainability Symposium: 450–456.
83. Pang, L. M., Tay, K. M., and Lim, C. P. 2016. Monotone fuzzy rule relabeling for the
zero-order TSK fuzzy inference system. IEEE Transactions on Fuzzy Systems 24(6):
1455–1463.
84. Kim, J. H., Jeong, H. Y., and Park, J. S. 2009. Development of the FMECA process
and analysis methodology for railroad systems. International Journal of Automotive
Technology 10(6): 753.
85. Petrović, D. V., Tanasijević, M., Milić, V., Lilić, N., Stojadinović, S.,  & Svrkota, I.
2014. Risk assessment model of mining equipment failure based on fuzzy logic. Expert
Systems with Applications 41(18): 8157–8164.
86. Price, C. J. 1996. Effortless incremental design FMEA. Reliability and Maintainability
Symposium Proceedings. IEEE International Symposium on Product Quality and
Integrity 43–47.
87. Regazzoni, D., and Russo, D. 2011. TRIZ tools to enhance risk management. Procedia
Engineering 9: 40–51.
88. Rhee, S. J., and Ishii, K. 2003. Using cost based FMEA to enhance reliability and ser-
viceability. Advanced Engineering Informatics 17(3–4): 179–188.
89. Rhee, S. J., and Ishii, K. 2002. Life cost-based FMEA incorporating data uncertainty.
ASME International Design Engineering Technical Conferences and Computers and
Information in Engineering Conference 309–318.
90. Russomanno, D. J., Bonnell, R. D., and Bowles, J. B. 1994. Viewing computer-aided
failure modes and effects analysis from an artificial intelligence perspective. Integrated
Computer-Aided Engineering 1(3): 209–228.
91. Shahin, A. 2004. Integration of FMEA and the Kano model: An exploratory examina-
tion. International Journal of Quality & Reliability Management 21(7): 731–746.
92. Sharma, R. K., and Sharma, P. 2010. System failure behavior and maintenance decision
making using, RCA, FMEA and FM. Journal of Quality in Maintenance Engineering
16(1): 64–88.
93. Sharma, R. K., Kumar, D., and Kumar, P. 2005. Systematic failure mode effect analy-
sis (FMEA) using fuzzy linguistic modelling. International Journal of Quality  &
Reliability Management 22(9): 986–1004.
94. Sharma, R. K., Kumar, D., and Kumar, P. 2007. Modeling and analysing system failure
behaviour using RCA, FMEA and NHPPP models. International Journal of Quality &
Reliability Management, 24(5): 525–546.
95. Sharma, R. K., Kumar, D., and Kumar, P. 2008. Fuzzy modeling of system behav-
ior for risk and reliability analysis. International Journal of Systems Science 39(6):
563–581.
Building a Semi-automatic Design for Reliability Survey 125

96. Su, C. T., and Chou, C. J. 2008. A systematic methodology for the creation of Six Sigma
projects: A  case study of semiconductor foundry. Expert Systems with Applications
34(4): 2693–2703.
97. Suganthi, S., and Kumar, D. 2010. FMEA without fear AND tear. In Management of
Innovation and Technology (ICMIT)IEEE International Conference 1118–1123.
98. Ming Tan, C. 2003. Customer-focused build-in reliability: A case study. International
Journal of Quality & Reliability Management 20(3): 378–397.
99. Meng Tay, K., and Peng Lim, C. 2006. Fuzzy FMEA with a guided rules reduction
system for prioritization of failures. International Journal of Quality  & Reliability
Management 23(8): 1047–1066.
100. Teng, S. H., and Ho, S. Y. 1996. Failure mode and effects analysis: An integrated
approach for product design and process control. International Journal of Quality &
Reliability Management 13(5): 8–26.
101. Teoh, P. C., and Case, K. 2004. Failure modes and effects analysis through knowledge
modelling. Journal of Materials Processing Technology153: 253–260.
102. Throop, D. R., Malin, J. T., and Fleming, L. D. 2001. Automated incremental design
FMEA. IEEE Aerospace Conference. Proceedings 7: 7–3458.
103. Johnson, T., Azzaro, S., and Cleary, D., 2004. Method, system and computer prod-
uct for integrating case-based reasoning data and failure modes, effects and corrective
action data. US2004103121, filed November 25, 2002, and issued May 27, 2004.
104. Johnson, T. L., Cuddihy, P. E., and Azzaro, S. H. 2004. Method, system and computer
product for performing failure mode and effects analysis throughout the product life
cycle. US2004225475, filed November 25, 2002, and issued November 11, 2004.
105. Chandler, F. T., Valentino, W. D., Philippart, M. F., Relvini, K. M., Bessette, C.  I.
and  Shedd, N. P. 2004. Human factors process failure modes and effects analysis
(hf pfmea) software tool. US2004256718, filed April 15, 2004, and issued December 23,
2004.
106. Liddy, R., Maeroff, B., Craig, D., Brockers, T., Oettershagen, U., and Davis, T.
2005. Method to facilitate failure modes and effects analysis. US2005138477, filed
November 25, 2003, and issued June 23, 2005.
107. Lonh, K. J., Tyler, D. A., Simpson, T. A., and Jones, N. A. 2006. Method for predict-
ing performance of a future product. US2006271346, filed May 31, 2005, and issued
November 30, 2006.
108. Mosleh, A., Wang, C., and Groen, F. J. 2007. System and methods for assessing risk
using hybrid causal logic. US2007011113, filed March  17, 2006, and issued July  11,
2007.
109. Coburn, J. A., and Weddle, G. B. 2009. Facility risk assessment systems and methods.
US20090138306, filed September 25, 2008, and issued May 28, 2009.
110. Singh, S., Holland, S. W. and Bandyopadhyay, P. 2012. Graph matching system for
comparing and merging fault models. US2012151290, filed December  9, 2010,  and
issued June 14, 2012.
111. Harsh, J. K., Walsh, D. E., and Miller, E., M. 2012. Risk reports for product qual-
ity planning and management. US2012254044, filed March  30, 2012,  and issued
October 4, 2012.
112. Abhulimen, K. E. 2012. Design of computer-based risk and safety management sys-
tem of complex production and multifunctional process facilities-application to fpso’s,
US2012317058, filed June 13, 2011, and issued December 13, 2012.
113. Oh, K., P. 2013. Spreadsheet-based templates for supporting the systems engineering
process. US2013013993, filed August 24, 2011, and issued January 10, 2013.
114. Chang, Y. 2014. Product quality improvement feedback method. US20140081442, filed
September 18, 2012, and issued March 20, 2014.
126 Reliability Engineering

115. Barnard, R. F., Dohanich, S. L., and Heinlein, P., D. 1996. System for failure mode and
effects analysis. US5586252, filed May 24, 1994, and issued December 17, 1996.
116. Williams, E., and Rudoff, A. 2006. System and method for performing automated sys-
tem management. US7120559, filed June 29, 2004, and issued October 10, 2006.
117. Williams, E., and Rudoff, A. 2008. System and method for automated problem diagno-
sis. US7379846, filed June 29, 2004, and issued May 27, 2008.
118. Williams, E., and Rudoff A., 2009. System and method for providing a data structure
representative of a fault tree. US7516025, filed June 29, 2004, and issued April 7, 2009.
119. Dreimann, M., Ehlers, P., Goerisch, A., Maeckel, O., Sporer, R., and Sturm, A. 2007.
Method for analyzing risks in a technical project. US8744893, filed April 11, 2006, and
issued November 1, 2007.
120. Vahdani, B., Salimi, M., and Charkhchian, M. 2015. A new FMEA method by inte-
grating fuzzy belief structure and TOPSIS to improve risk evaluation process.
The International Journal of Advanced Manufacturing Technology 77(1–4): 357–368.
121. Wang, C. S., and Chang, T. R. 2010. Systematic strategies in design process for inno-
vative product development. Industrial Engineering and Engineering Management
Proceedings: 898–902.
122. Wang, M. H. 2011. A cost-based FMEA decision tool for product quality design and
management. IEEE Intelligence and Security Informatics Proceedings 297–302.
123. Wirth, R., Berthold, B., Krämer, A., and Peter, G. 1996. Knowledge-based support of
system analysis for the analysis of failure modes and effects. Engineering Applications
of Artificial Intelligence 9(3): 219–229.
124. Selvage, C. 2007. Look-across system. WO2007016360, filed July 28, 2006, and issued
February 28, 2007.
125. Bovey, R. L., and Senalp, E., T. 2010. Assisting with updating a model for diagnosing
failures in a system, WO2010038063, filed September 30, 2009, and issued April 8,
2010.
126. Snooke, N. A. 2010. Assisting failure mode and effects analysis of a system,
WO2010142977, filed June 4, 2010, and issued December 16, 2010.
127. Snooke, N. A. 2012. Automated method for generating symptoms data for diagnostic
systems, WO2012146908, filed April 12, 2012, and issued November 1, 2012.
128. Xiao, N., Huang, H. Z., Li, Y., He, L., and Jin, T. 2011. Multiple failure modes analysis
and weighted risk priority number evaluation in FMEA. Engineering Failure Analysis
18(4): 1162–1170.
129. Yang, C., Letourneau, S., Zaluski, M., and Scarlett, E. 2010. APU FMEA validation
and its application to fault identification. ASME International Design Engineering
Technical Conferences and Computers and Information in Engineering Conference
959–967.
130. Zafiropoulos, E. P., and Dialynas, E. N. 2005. Reliability prediction and failure mode
effects and criticality analysis (FMECA) of electronic devices using fuzzy logic.
International Journal of Quality & Reliability Management 22(2): 183–200.
131. Yang, Z., Bonsall, S., and Wang, J. 2008. Fuzzy rule-based Bayesian reasoning approach
for prioritization of failures in FMEA. IEEE Transactions on Reliability 57(3), 517–528.
132. Zhao, X., and Zhu, Y. 2010. Research of FMEA knowledge sharing method based on
ontology and the application in manufacturing process. Database Technology and
Applications (DBTA), 2nd International Workshop 1–4.
133. Zhou, J., and Stalhaane, T. 2004. Using FMEA for early robustness analysis of Web-
based systems. In Computer Software and Applications Conference Proceedings (2):
28–29.
5 Markov Chains and
Stochastic Petri Nets
for Availability and
Reliability Modeling
Paulo Romero Martins Maciel, Jamilson
Ramalho Dantas, and Rubens de Souza
Matos Júnior

CONTENTS
5.1 Introduction................................................................................................... 127
5.2 A Glance at History....................................................................................... 128
5.3 Background.................................................................................................... 130
5.3.1 Markov Chains.................................................................................. 130
5.3.2 Stochastic Petri Nets.......................................................................... 131
5.4 Availability and Reliability Models for Computer Systems.......................... 133
5.4.1 Common Structures for Computational Systems Modeling.............. 134
5.4.1.1 Cold, Warm, and Hot Standby Redundancy....................... 135
5.4.1.2 Active-Active and k-out-of-n Redundancy Mechanisms.......138
5.4.2 Examples of Models for Computational Systems.............................. 140
5.4.2.1 Markov Chains.................................................................... 140
5.4.2.2 SPN Models........................................................................ 143
5.5 Final Comments............................................................................................. 147
Acknowledgment.................................................................................................... 147
References............................................................................................................... 148

5.1 INTRODUCTION
Due to the ubiquitous provision of services on the internet, dependability has become
an attribute of prime concern in hardware/software development, deployment, and
operation. Providing fault-tolerant services is related inherently to the adoption of
redundancy. Redundancy can be exploited either in time or in space. Replication of
services usually is provided through distributed hosts across the world so that when-
ever the service, the underlying host, or network fails another service is ready to take
over. Dependability of a system can be understood as the ability to deliver a specified
functionality that can be justifiably trusted. Functionality might be a set of roles or

127
128 Reliability Engineering

services (functions) observed by an outside agent (a human being, another system,


etc.) that interacts with system at its interfaces; and the specified functionality of a
system is what the system is intended.
Two fundamental dependability attributes are reliability and availability. The
task of estimating reliability and availability metrics may be undertaken by adopting
combinatorial models such as reliability block diagrams and fault trees. These mod-
els, however, lack the modeling capacity to represent dynamic redundancies. State-
based models such as Markov chains and stochastic Petri nets have higher modeling
power, but the computation cost for performing the evaluation is usually an issue
to be considered. This chapter studies the reliability and availability modeling of a
system through Markov chains and stochastic Petri nets.
This  chapter is divided into four sections. After the introduction follows a
glance on some key authors and papers of area. Section 5.3 brings out background
concepts on Markov chains and Stochastic Petri Nets. Section 5.4 presents some
availability and reliability models for computer systems. Section  5.5 closes the
chapter.

5.2  A GLANCE AT HISTORY


This section provides a summary of early work related to dependability and briefly
describes some seminal efforts as well as the respective relations with current preva-
lent methods. This effort is undoubtedly incomplete; nonetheless, the intent is that it
provides key events, people, and noteworthy research related to what is now called
dependability modeling [28].
Dependability is related to disciplines such as fault tolerance and reliability.
The  concept of dependable computing first appeared in the 1820s when Charles
Babbage carried out the initiative to conceive and build a mechanical calculating
engine to get rid of the risk of human errors [1,2]. In his book, On the Economy of
Machinery and Manufacture, he remarks “The first objective of every person who
attempts to make an article of consumption is, or ought to be, to produce it in perfect
form” [3]. In the nineteenth century, reliability theory advanced from probability and
statistics as a way to support estimating maritime and life insurance rates. In the early
twentieth century, methods had been proposed to estimate survivorship of railroad
equipment [4,5].
The first IEEE (formerly AIEE and IRE) public document to mention reliability
is “Answers to Questions Relative to High Tension Transmission” that archives the
meeting of the Board of Directors of the American Institute of Electrical Engineers
held on September  26, 1902  [6]. In  1905, H. G. Stott and H. R. Stuart discuss
“Time-Limit Relays and Duplication of Electrical Apparatus to Secure Reliability
of Services” at New York [4] and Pittsburg [5]. In these works, the concept of reli-
ability was chiefly qualitative. In  1907, A. A. Markov began the study of a nota-
ble sort of chance process. In this process, the outcome of a given experiment can
modify the outcome of the next experiment. This  sort of process is now  called a
Markov chain [7]. Markov’s classic textbook, Calculus of Probabilities, was pub-
lished four times in Russian and was translated into German [9]. In 1926, 20 years
after Markov’s initial discoveries, a paper by Russian mathematician S. N. Bernstein
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 129

used the term “Markov chain” [8]. In the 1910s, A. K. Erlang studied telephone traf-
fic planning for reliable service provisioning [10].
The first generation of electronic computers was entirely undependable; thence
many techniques were investigated for improving their reliability. Among such tech-
niques, many researchers investigated design strategies and evaluation methods.
Many methods then were proposed for improving system dependability such as error
control codes, replication of components, comparison monitoring, and diagnostic
routines. The  leading researchers during that period were Shannon  [13], Von
Neumann [14], and Moore [15], who proposed and developed theories for building
reliable systems by using redundant and less reliable components. These theories
were the forerunners of the statistical and probabilistic techniques that form the
groundwork of modern dependability theory [17].
In the 1950s, reliability turns out to be a subject of great interest because of the
cold war efforts, failures of American and Soviet rockets, and failures of the first
commercial jet—the British de Havilland Comet [18,19]. Epstein and Sobel’s 1953
paper on the exponential distribution was a landmark contribution [20]. In 1954, the
first Symposium on Reliability and Quality Control (it is now the IEEE Transactions
on Reliability) was held in the United States, and in 1958 the First All-Union
Conference on Reliability was held in Moscow [7,21]. In 1957, S. J. Einhorn and
F. B. Thiess applied Markov chains for modeling system intermittence [22], and in
1960 P. M. Anselone employed Markov chains for evaluating the availability of radar
systems [23]. In 1961, Birnbaum, Esary, and Saunders published a pioneering paper
introducing coherent structures [24].
The  reliability models might be classified as combinatorial (non-state space
model) and state-space models. Reliability Block Diagrams (RBD) and Fault Trees
(FT) are combinatorial models and the most widely adopted models in ­reliability
evaluation. RBD is probably the oldest combinatorial technique for reliabil-
ity analysis. Fault Tree Analysis (FTA) was initially developed in 1962 at Bell
Laboratories by H. A. Watson to analyze the Minuteman I Intercontinental Ballistic
Missile Launch Control System. Afterward, in 1962, Boeing and AVCO expanded
the use of FTA to the entire Minuteman II [25]. In 1965, W. H. Pierce unified the
Shannon, Von Neumann, and Moore theories of masking and redundancy as the
concept of failure tolerance [26]. In 1967, A. Avizienis combined masking methods
with error detection, fault diagnosis, and recovery into the concept of fault-tolerant
systems [27].
The  formation of the IEEE Computer Society Technical Committee on Fault-
Tolerant Computing (now Dependable Computing and Fault Tolerance TC) in 1970 and
of IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance in 1980
was an essential mean for defining a consistent set of concepts and terminology. In early
1980s, Laprie coined the term dependability for covering concepts such as reliability,
availability, safety, confidentiality, maintainability, security, and integrity [1,29].
In  late 1970s some works were proposed for mapping Petri nets to Markov
chains [30,32,47]. These models have been extensively adopted as high-level Markov
chain automatic generation models and for discrete event simulation. Natkin was the
first to apply what is now generally called stochastic Petri nets (SPNs) to depend-
ability evaluation of systems [33].
130 Reliability Engineering

5.3 BACKGROUND
This section provides a very brief introduction to Continuous Time Markov Chains
(CTMCs) and SPNs, which are the formalism adopted to model availability and reli-
ability in this chapter.

5.3.1  Markov Chains


Markov chains have been applied in many areas of science and engineering. They have
been widely adopted for performance and dependability evaluation in manufacturing,
logistics, communication, computer systems, and so forth [34]. The name Markov chains
came from the Russian mathematician Andrei Andreevich Markov. Markov was born on
June, 14, 1856, in Ryazan, Russia, and died on July 20, 1922, in Saint Petersburg [35].
The References offers many books on Markov chains [36–40]. These books cover
Markov chain theory and applications in different depth and styles.
A  stochastic process is defined as a family of random variables ({Xi(t): t ∈ T})
indexed through some parameter (t). Each random variable (Xi(t)) is defined on some
probability space. The parameter t usually represents time, so Xi(t) denotes the value
assumed by the random variable at time t. T is called the parameter space and is a subset
of R (the set of real numbers).
If T is discrete, that is, T  = {0,1,2,...}, the process is classified as discrete-time
parameter stochastic process. On the other hand, if T is continuous, that is, T = {t:
0 ≤ t < ∞}, the process is a continuous-time parameter stochastic process. In CTMC,
a change of state may occur at any point in time. A CTMC is a continuous time, dis-
crete state-space stochastic process, that is, the state values are discrete, but param-
eter t has a continuous range over [0,∞].
A CTMC can be represented by a state-transition diagram in which the vertices rep-
resent states and the arcs between vertices i and j are labeled with the respective transi-
tion rates, that is, λij, i ≠ j. Consider a chain composed of three states, s0, s1, and s2, and
their transition rates, α, β, γ, and λ. The model transitions from s0 to s1 with rate α; from
state s1, the model transitions to state s0 with rate β, and to state s2 with rate γ. When in
state s2, the model transitions to state s1 with rate λ. The rate matrix, Q is:

 −α α 0 
 
Q= β −(β + γ ) γ 
 0 λ −λ 

For time homogeneous CTMCs:

dΠ (t )
= Π ( t ) ⋅ Q, (5.1)
dt
that has the following solution [12,16]:

 ∞
Qt k 
Π ( t ) = Π ( 0 ) e Qt = Π ( 0 )  I +



k =1
k!
 . (5.2)


Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 131

In many cases, however, the instantaneous behavior, Π(t), of the Markov chain is
more than needed. In many cases, often it is satisfied already when computing the
steady-state probabilities, that is, Π = limt → ∞Π(t). Hence, consider the system of
differential equations  presented in Equation 5.1. If the steady-state distribution
exists, then dΠ(t):

dΠ (t )
= 0
dt
Consequently, for calculating the steady-state probabilities, the only necessity is to
solve the system:

Π ⋅ Q = 0, ∑ ∀i
π i = 1. (5.3)

5.3.2 Stochastic Petri Nets


The  first SPN extensions were proposed independently by Symons, Natkin, and
Molloy  [30,31,32]. After, many other stochastic extensions were introduced, Marsan
et al. extended the basic SPNs by considering stochastic timed transitions and immediate
transitions [41]. This model was named Generalized Stochastic Petri Nets (GSPN) [43].
Later on, Marsan and Chiola proposed an extension that also supported determinis-
tic timed transitions  [42], which was named Deterministic Stochastic Petri Nets
(DSPN)  [46]. Many other extensions followed, among them extended Deterministic
Stochastic Petri Nets (eDSPN) [44,45] and Stochastic Reward Nets (SRN) [48].
The SPN considered here is a very general stochastic extension of Place-Transition
nets. Its modeling capacity is well beyond that presented by Symons, Natkin, and
Molloy. The original SPN considered only exponential distributions. GSPNs adopted,
besides exponential distributions, immediate transitions. These models shared the
memoryless property also presented in untimed Petri nets since reachable marking
is only dependent on the current Petri net marking.
Stochastic Petri Nets—Let SPN = (P, T, I, O, H, M0,Atts) be an SPN, where P, T,
I, O, and M0 are defined as for Place-Transition nets, that is, P is the set of places, T
is the set of transitions, I in input matrix, O is the output matrix, and M0 is the initial
marking. The  set of transition, T, is, however, divided into immediate transitions
(Tim), timed exponentially distributed transitions (Texp), deterministic timed transi-
tions (Tdet), and timed generically distributed transitions (Tg):

T = Tim ∪ Texp ∪ Tdet ∪ Tg .

Immediate transitions are graphically represented by thin black rectangles, timed


exponentially distributed are depicted by white rectangles, deterministic timed tran-
sitions are represented by thick black rectangles, and timed generically distributed
gray rectangles denote transitions. The  matrices I and O represent the input and
output arcs of transitions. These matrices may be marking dependent, that is the arc
weights may be dependent on current marking:

I = (i p,t ) P × T , i p,t : MD × RSSPN → ,


132 Reliability Engineering

and

O = (o p,t ) P × T , o p,t : MD × RSSPN → ,

where MD = {true, false} is a set that specify if the arc between p and t is marking
dependent or not. If the arc is marking dependent, the arc weight is dependent on the
current marking M ∈ RSSPN, RSSPN is the reachability set of the net SPN. Otherwise,
it is constant.

H = ( hp,t ) P × T , hp,t : MD × RSSPN → 

is a matrix of inhibitor arcs. These arcs may also be marking dependent, that is the
arc weight may be dependent on current marking. hp,t: MD  ×  RSSPN  →  ℕ, where
MD = {true, false} is a set that specify if the arc between p and t is marking depen-
dent or not. If the arc is marking dependent, the arc weight is dependent on the cur-
rent marking M ∈ RSSPN. Otherwise, it is constant.

• Atts  =  (Π, Dist, MDF, W, G, Policy, Concurrency) is set of attributes


assigned to transitions, where:
• Π: T  →  N is a function that assigns a firing priority on transitions.
The larger the number the higher is the firing priority. Immediate transi-
tions have higher priorities than timed transitions, and timed determinis-
tic transitions have higher priorities than random timed transitions, that is,
π(ti) > π(tj) > π(tk), ti ∈ Tim, tj ∈ Tdet, and tk ∈ Texp ∪Tg.
• Dist: Texp∪Tg → F is a function that assigns non-negative probability distri-
bution function to random delay transitions. F is the set of functions.
• MDF: T  →  MD is a function that defines if the probability distribution
functions assigned to delays of transitions are marking dependent or not.
MD = {true, false}.
• W: Texp∪Tdet∪Tim → R+ is a function that assigns a non-negative real num-
ber to exponential, deterministic, and immediate transitions. For expo-
nential transitions, these values correspond to the parameter values of the
exponential distributions (rates). In the case of deterministic transitions,
they are the deterministic delays assigned to transitions. Moreover, in
the case of immediate transitions, they denote the weights assigned to
transitions.
• G: T → 7N|P| is a partial operator that assigns to transitions a guard expres-
sion. The  guards are evaluated by GE: (T  →  7N|P|)  →  {true, false} that
results in true or false. The guard expressions are Boolean formulas com-
posed of predicates specified regarding marking of places. A transition may
be enabled only if its guard function is evaluated as true. It is worth noting
that not every transition may be guarded.
• Policy: T  →  {prd, prs}, where prd denotes pre-emptive repeat different
(restart), and prs is pre-emptive resume (continue). The timers of transitions
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 133

with prd are discarded and new values are generated in the new marking.
The timers of transitions with prs hold the present values.
• Concurrency: T − Tim → {sss, iss} is a function that assigns to each timed
transition a timing semantics, where sss denotes single server semantics
and iss is infinite server semantics.

SPNs are usually evaluated through numerical methods. However, if the state space
is too big, infinite or even if non-phase-type distributions should be represented, the
evaluation option may fall into the simulation. With simulation, there are no funda-
mental restrictions on the models that can be evaluated. Nevertheless, the simulation
does have pragmatical constraints, since the amount of computer time and memory
running a simulation can be prohibitively large. Therefore, the general advice is to
pursue an analytical model wherever possible, even if simplifications and or decom-
position is required.
For a detailed introduction to SPNs, refer to [43,45].

5.4 AVAILABILITY AND RELIABILITY MODELS


FOR COMPUTER SYSTEMS
Dependability aspects deserve great attention for assuring of the quality of service
provided by a computer system. Dependability studies look for determining reliabil-
ity, availability, security, and safety metrics for the infrastructure under analysis [50].
RBD [51], FT [53] and Petri nets are, as well as Markov chains, widely used to capture
the system behavior and allow the description and prediction of dependability metrics.
The  most basic dependability aspects of a system are the failure and repair
events, which may bring the system to different configurations and operational
modes. The steady-state availability is a common measure extracted from depend-
ability models. Reliability, downtime, uptime, and mean time to system failure are
other metrics usually obtained as output from a dependability analysis in computer
systems.
The  combined analysis of performance and dependability aspects, so-called
performability analysis, is another frequent necessity when dealing with computer
systems, since many of them may continue working after partial failures. Such grace-
fully degrading systems [54] require specific methods to achieve an accurate evalu-
ation of their metrics. Markov reward models constitute an essential framework for
performability analysis. In this context, the hierarchical modeling approach is also
a useful alternative in which distinct models may be used to represent the depend-
ability relationships of the system in the upper level and performance aspects in the
lower level, or vice versa [49,55,58].
For all kinds of Markov chain or SPN analyses, an important assumption must
be kept in mind: the exponential distribution of transition rates or firing delays,
respectively. The  behavior of events in many computer systems may fit better to
other probability distributions, but in some of these situations, the exponential dis-
tribution is a fair approximation, enabling the use of Markovian models. In cases
when the exponential distribution is not  a reasonable approximation, SPN exten-
sions may be used that enable non-exponential distributions. Such a deviation
134 Reliability Engineering

from Markovian assumptions requires the adoption of simulation for a model solu-
tion [57,59–61]. It is possible also to adapt transitions to represent other distributions
employing phase approximation or moment matching as shown in [36,52]. The use
of such techniques allows the modeling of events described by distributions such as
Weibull, hypoexponential, hyperexponential, and Erlang and Cox [13,16].

5.4.1 Common Structures for Computational Systems Modeling


Consider a single component repairable system. This system may be either opera-
tional or in failure. If the time to failure (TTF) and the time to repair (TTR) are
exponentially distributed with rates λ and µ, respectively, the CTMC shown in
Figure  5.1a is its availability model. The  state U (Up) represents the operational
state, and the state D (Down) denotes the faulty system. If the system is operational,
it may fail. The system failure is represented by the transition from state U to state
D. The faulty system may be restored to its operational state by a repair. The repair is
represented by the transitions from state D to state U. The matrix rate, Q, is presented
in Figure 5.1b.
The instantaneous availability is the instantaneous probability of being in state U
and D is, respectively:

µ λ
A (t ) = πU (t ) = e ( ) (5.4)
− λ +µ t
+
λ+µ λ+µ

and

λ λ
UA ( t ) = π D ( t ) = e ( ) , (5.5)
− λ +µ t

λ+µ λ+µ

such that πU(t) + πD(t) = 1.


If t  →  ∞, then the steady-state availability and unavailability is obtained,
respectively:

µ
A = πU = (5.6)
λ+µ
and
λ
UA = π D = , (5.7)
λ+µ

FIGURE 5.1   Single component system: (a) Availability model and (b) Matrix rate.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 135

such that πU + πD = 1. The steady-state measures can be obtained also by solving:

Π ⋅ Q = 0, π U + π D = 1,

where Π = (πU,πD). The downtime in a period T is DT = πD × T. For a time period of


1 year (365 days), the number of hours T is 8760 h and 525,600 min. Now assume
a CTMC that represents the system failure. This model has two states, U and D, and
only one transition. This transition represents the system failure; that is, when the
system is operational (U), it may fail, and this event is represented by the transition
from the state U to state D, with failure rate (λ). Solving:

dΠ (t )
= Π ( t ) ⋅ Q,
dt

where Π(t) = (πU(t),πD(t)) and πU(t) + πD(t) = 1, πU(t) = e−λ t and πD(t) = 1−e−λ t are


obtained. The system reliability is:

R(t ) = π U (t ) = e − λt (5.8)

and the unreliability is:

UR(t ) = π D (t ) = 1 − e − λt . (5.9)

It is worth mentioning UR(t) = F(t), where F(t) is cumulative distribution function of the
∞ ∞
time to failure. Consequently, as MTTF = ∫ 0 R(t ) dt , we have: MTTF = ∫0 e − λt dt = λ1 .
The  mean time to failure (MTTF) also can be computed from the rate matrix
Q [56,65].

5.4.1.1  Cold, Warm, and Hot Standby Redundancy


Systems with stringent dependability requirements demand methods for detecting,
correcting, avoiding, and tolerating faults and failures. A failure in a large-scale sys-
tem can mean catastrophic losses. Many techniques have been proposed and adopted
to address dependability issues in computer systems in such a way that failures can
be tolerated and circumvented. Many of those techniques are based on redundancy,
i.e., the replication of components so that they work for a common purpose, ensuring
data security and availability even in the event of some component failure. Three
replication techniques deserve special attention due to its extensive use in clustered
server infrastructures [28]:

• Cold Standby: The backup nodes, or modules, are turned off on standby


and will only be activated if the primary node fails. One positive point
for this technique is that the secondary node has low energy consumption.
While in standby mode, the reliability of the unit is preserved, i.e., it will
not fail or at least its mean time to failure is expected to be much higher
than a fully active component. On the other hand, the secondary node needs
136 Reliability Engineering

significant time to be activated, and clients who were accessing information


on the primary node lose all information with the failure of the primary
node and must redo much of the work when the secondary node activates.
• Hot Standby: This type can be considered the most transparent of the rep-
lication modes. The replicated modules are synchronized with the operat-
ing module; thereby, the active and standby cluster participants are seen by
the end user as a single resource. After a node fails, the secondary node
is activated automatically and the users accessing the primary node will
now access the secondary node without noticing the change of equipment.
• Warm Standby: This technique tries to balance the costs and the recovery
time delay of cold and hot standby techniques. The secondary node is on
standby, but not  completely turned off, so it can be activated faster than
in the cold standby technique, as soon as a monitor detects the failure of
the primary node. The replicated node is synchronized partially with the
operating node, so users who were accessing information on the operating
node may lose some information that was being written close to the moment
when the primary node failed. It is common to assume that in such a state
the standby component has higher reliability than when receiving the work-
load (i.e., properly working).

Figure 5.2 depicts an example SPN for a cold-standby server system, comprising two
servers (S1 and S2). There are two places (S1 -Up and S2 -Down) representing the
operational status of the primary server, indicating when it is working or has failed,
respectively. Three places (S1 Up, S2 Down, and S2 Waiting) represent the opera-
tional status of the spare server, indicating when it is working, failed, or waiting for
activation in case of a primary server failure.
Notice that in the initial state of the cold-standby model, both places S1 -up and
S2 Waiting have one token, denoting the primary server is up, and the spare server
is in standby mode. The  activation of the spare server occurs when the transition
S1 Fail fires, consuming the token from S1 Up. Once the place S1 -Up is empty, the
transition S2 Switch On becomes enabled, due to the inhibitor arc that connects it to
S1 Up. Hence, S2 Switch On fires, removing the token from S2 Waiting, and putting
one token in place S2 Up. This is the representation of the switchover process from
the primary server to the secondary server, which takes an activation time specified
in the S2 Switch On firing delay.
The repair of the primary server is represented by firing the S1 Repair transition.
The places S1 Down and S2 Up become empty, and S1 -Up receives one token again.
As previously mentioned, the time to failure of primary and secondary servers will
be different after the spare server is preserved from the effects of wear and tear when
it is on shut off or in standby mode. The availability can be numerically obtained
from the expression:

A = P ( ( # S1UP = 1) ∨ ( # S 2UP = 1) )

Figure  5.3 depicts an example CTMC for a warm-standby server system, origi-
nally shown in  [49]. This  model has many similarities to the SPN model for the
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 137

S1_Fail

S1_Up S1_Down

S1_Repair

S2_Fail

S2_Up S2_Down S2_Waiting

S2_Repair S2_Switch_On

FIGURE 5.2  SPN for cold standby redundancy.

FIGURE 5.3  CTMC for warm standby redundancy.


138 Reliability Engineering

cold-standby system, despite the distinct semantics and notation. It might be inter-
esting to verify that both approaches can be used interchangeably, mainly when the
state-space size is not a major concern.
The CTMC has five states: UW, UF, FF, FU, and FW, and considers one pri-
mary and one spare server. The first letter in each state indicates the primary server
status, and the second letter indicates the secondary server status. The  letter U
stands for Up and active, F means Failed, and W indicates Waiting condition (i.e.,
the server is up but in standby waiting for activation). The shaded states represent
that the system has failed (i.e., it is not operational anymore). The state UW rep-
resents the primary server (S1) is functional and secondary server (S2) in standby.
When S1 fails, the system goes to state FW, where the secondary server has not yet
detected the S1 failure. FU represents the state where S2 leaves the waiting condi-
tion and assumes the active role, whereas S1 is failed. If S2 fails before taking the
active role, or before the repair of S1, the system goes to the state FF, when both
servers have failed. For this model, we consider a setup where the primary server
repair has priority over the secondary server repair. Therefore, when both serv-
ers have failed (state FF) there is only one possible outgoing transition: from FF
to UF. If S2 fails when S1 is up, the system goes to state UF and returns to state
UW when the S2 repair is accomplished. Otherwise, if S1 also fails, the system
transitions to the state FF. The failure rates of S1 and S2, when they are active, are
denoted by λ1. The rate λ2 denotes the failure rate of the secondary server when it
is inactive. The repair rate assigned to both S1 and S2 is µ. The rate α represents
the switchover rate (i.e., the reciprocal of the mean time to activate the secondary
server after a failure of S1).
The  warm standby system availability is computed from the CTMC model by
summing up the steady-state probabilities for UW, UF, and FU states, which denote
the cases where the system is operational. Therefore, A = πUW + πUF + πFU. System
unavailability might be computed as U = 1 − A, but also as U = πFF + πFW .
A CTMC model for a cold standby system can be created with little adjustments
to the warm standby model, described as follows. The switchover rate (α) must be
modified accordingly to reflect a longer activation time. The transition from UW to
the UF state should be removed if the spare server is not assumed to fail while inac-
tive. If such a failure is possible, the failure rate (λ2) should be adjusted to match the
longer mean time to failure expected for a spare server that is partially or entirely
turned off.
A CTMC model for a hot standby system also can be derived from the warm
standby model by reducing the value of the switchover rate (α) to reflect a smaller
activation time or even removing state FW to allow transition from UW to FU
directly if the switching time from primary to spare server is negligible. In every
case, the failure rate of the spare server (λ2) should be replaced by the same rate
of the primary server since the mean time to failure is expected to be the same for
both components.

5.4.1.2  Active-Active and k-out-of-n Redundancy Mechanisms


Active-active redundancy means that two operational units share the workload, but
workload can be served with acceptable quality by a single unit.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 139

FIGURE 5.4  CTMC for 3-out-of-5 redundancy.

The concept of active-active redundancy can be generalized by assuming that a


system may depend strictly only on a subset of its components. Consider a system
composed of n identical and independent components that is operational if at least
k out of its n components are working correctly. This sort of redundancy is named
k-out-of-n.
Combinatorial models, such as RBD [62], are widely used for representing k-out-
of-n arrangements, but they also might be modeled and analyzed with CTMC mod-
els with equivalent accuracy and even more flexibility [28,57]. Figure 5.4 depicts an
example of CTMC model for a 3-out-of-5 redundant server system.
In  such a CTMC, the 5U state represents that all five servers are operational.
The failure rate of a single server is denoted by λ, whereas the repair rate is denoted
by µ. The transition from state 5U to state 4U occurs with the rate 5λ, according to
the properties of exponential distribution that is assumed in a Markov chain, con-
sidering that the failure of each unit is statistically independent of one to each other,
which simply means they may fail concurrently. Similarly, the model goes from state
4U to state 3U with a rate of 4λ after there are only four operational servers remain-
ing. If the model is in state 3U, another failure brings it to the Down state, which
represents that the whole system is not operational anymore, and the other servers
are turned off, and hence no other failure can occur. Only the repair of at least one
server can bring the system to an operational state again. This model considers that
only one server can be repaired at a time, which may be the case in many companies
where the maintenance team has a limited number of members or equipment needed
for the repair. For such a reason, the repair occurs with a µ rate for all transitions
outgoing from Down, 3U, and 4U states.
The availability for such a system may be computed as:

A = P {5U } + P {4U } + P {3U } (5.10)

60λ 3
A = 1−
60λ + 20λ 2 µ + 5λµ 2 + µ 3
3

The capacity-oriented availability (COA) allows to estimate how much service the


system can deliver considering failure states [63,64]:

5 × P {5U } + 4 × P {4U } + 3 × P {3U }


COA = (5.11)
5

COA =
(
λ 60λ 2 + 16λµ + 3µ 2 )
60λ + 20λ µ + 5λµ + µ 3
3 2 2
140 Reliability Engineering

The mean time to failure is:

400λ 4 + 275λ 3µ + 107λ 2 µ 2 + 13λµ 3 + µ 4


MTTF =
(
60λ 3 20λ 2 + 5λµ + µ 2 )
5.4.2 Examples of Models for Computational Systems
To demonstrate how to analyze the availability and reliability of computing systems,
an example of architecture that is presented in Figure 5.5 is used. The system is com-
posed of a switch/router and server subsystem. The system fails if the switch/router
fails or if the server subsystem fails. The server subsystem comprises two servers, S1
and S2. S1 is the main server, and S2 is the spare server. They are configured in cold
standby mechanism, that is, S2 starts as soon as S1 fails. The startup time of S2 may
be configured according to the adopted switching mechanism. If the start-up time of
S2 is equal to zero, then it is perfect switching.
For computing the availability and reliability for such a system, a modeling strat-
egy consisting of Markov chains and SPN models is used.

5.4.2.1  Markov Chains


5.4.2.1.1  Availability CTMC Model
The architecture described in Figure 5.5 enables availability analysis through a het-
erogeneous modeling approach. Many formalisms may be used to compute such
metrics. However, the redundancy mechanism used in the systems requires the use
of state-based models, such as Markov chains or SPNs. Therefore, this example
depicts the use of CTMC model to compute availability and reliability measures.
Figure 5.6 represents the CTMC availability model. The CTMC represents the
detailed behavior of the system which employs redundancy, the start-up time of S2
is zero. The CTMC has six states as a tuple: (D, S2,D), (S1,S2,D), (S1,S2,SR), (D,
S2,SR), (D, D, D), and (D, D, SR), and considers one primary and one spare server,
S1 and S2 respectively, and one switch/router.

S1

Switcher/Router
Clients

S2

Servers

FIGURE 5.5  A simple example.


Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 141

(D,S2,D)

S2
µ_

µ_
S1

SR
λ_
S1
µ_
(S1,S2,D)
S2
λ_

λ_
SR
λ_SR

µ_SR
S
µ_

λ_
(D,D,D) S1
(S1,S2,SR)
λ_ µ_S1
SR (D,S2,SR)
S
µ_

S2
λ_

µ_ S2
SR µ_

(D,D,SR)

FIGURE 5.6  CTMC availability model.

Each state name comprises three parts. The first one represents the server one (S1),
the second denotes the server two (S2), and the third letter describes the switch/router
component (SR). The S1 denotes that S1 is running and operational, the S2 represents
the S2 is running and operational, and SR represents the Switch/router is running
and operational. The letter D represents the failure state. The initial state (S1,S2,SR)
represents the primary server (S1) is running and operational, the secondary server
(S2) is the spare server, and the switch/router (SR) is functional. When S1 fails, the
system goes to the state (D,S2,SR), outgoing transition: from (S1,S2,SR) to (D,S2,SR),
when S1 repair, the system returns to the initial state. Once in the state (D,S2,SR), the
system may go to the state (D,S2,D) through the SR failure or, the system may go to
state (D, SR) through the S2 failure. In both cases, the system may return to the previ-
ous state across the SR repair rate or S2 repair rate, respectively. As soon as the state
(D,D,SR) is achieved, the system may go to the state (D,D,D) with the SR failure, or
returns to the initial state (S1,S2,SR), when the repair is accomplished (i.e., the repair
142 Reliability Engineering

of the systems S1 and S2). The failure rates of S1, S2, and SR are denoted by λ_S1,
λ_S2, and λ_SR, respectively, as well as the repair rates for each component µ_S1,
µ_S2, and µ_SR. The µ_S denotes the repair rate when the two servers are in a fail-
ure state.
The CTMC that represents the architecture enables obtaining a closed-form equa-
tion to compute the availability (see Equation 5.12). It is important to stress that the
parameters µ_S1=µ_S2=µ_SR are equal to µ and λ_S1=λ_S2 are equal to λ.

µ ( µ ( µ + µ s ) + λ ( µ + 2µs ) )
A= (5.12)
( λSR + µ ) ( λ 2 + µ ( µ + µs ) + λ ( µ + 2µs ) )
5.4.2.1.2  Reliability CTMC Model
Figure  5.7 depicts the CTMC reliability model for this architecture. The  main
characteristics of the reliability models are the absence of repair, i.e., when the
system goes to the failure state the repair is not considered. This action is neces-
sary to compute with more ease the system mean time to failure, and subsequently
the reliability metric. The reliability model has three states as a tuple: (S1,S2,SR);
(D,S2,SR); and Down state. The initial state (S1,S2,SR) represents all components
running. If S1 fails, the system may go to (D,S2,SR) state, then this event repre-
sents that even with the failure of S1 server, the system may continue the operation
with the secondary server (S2). When S1 is repaired, the system returns to the
initial state. Outgoing transition: from (S1,S2,SR) to Down, when SR fails, repre-
sents the system failure; thus, the system is offline and may not provide the service.
Once in (D,S2,SR) state, the system may go to the Down state with S2 failure rate
or SR failure rate. Once in the Down state, the system goes to the failure condition,
and it is possible to obtain the reliability metric. The up states of the system are
represented by (S1,S2,SR) and (D,S2,SR).

Down
R
λ _S
λ _S
R+λ
_S2

λ _S1

(S1,S2,SR) (D,S2,SR)
µ _S1

FIGURE 5.7  CTMC reliability model.


Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 143

5.4.2.1.3 Results
Table  5.1 presents the values of failure and repair rates, which are the reciprocal
of the MTTF and mean time to repair (MTTR) of each component represented in
Figures 5.6 and 5.7. Those values were estimated and were used to compute the avail-
ability and reliability metrics.
It is important to stress that the µ S represents twice the repair rate of µ S1 con-
sidering just one maintenance team. The availability and reliability measures were
computed herein for the architecture described in Figure 5.5, using the mentioned
input parameters. The results are shown in Table 5.2, including steady-state avail-
ability, number of nines, annual downtime, reliability, and unreliability, considering
4,000 h of activity.
The downtime provides a view of how much time the system is unavailable for its
users for 1 year. The downtime value of 10.514278 h indicates that the system can be
improved; this downtime indicates that the system stands still for 10 hours of total
outage through a year. At 4,000 h of activity, the system has a reliability a little over
80 percent.

5.4.2.2  SPN Models


5.4.2.2.1  Availability SPN Model
An SPN model may be used to represent the same system already analyzed with the
CTMC model discussed in the previous section, and to obtain availability and reli-
ability measures similarly.

TABLE 5.1
Input Parameters
Variable Value (h−1)
λ -SR 1/20,000
λ -S1 = λ S2 1/15,000
µ S1 = µ S2 = µ -SR 1/24
µS 1/48

TABLE 5.2
CTMC Results
Availability 0.9987997
Number of nines 2.9207247
Downtime (h/yr) 10.514278
Reliability (4,000 h) 0.8183847
Unreliability (4000 h) 0.1816152
144 Reliability Engineering

The redundant mechanism is employed to represent switch/router component and


two servers, S1 and S2. The servers are configured in cold Standby; that is, S2 starts
as soon as S1 fails. The start-up time of S2 is denoted by S2 Switching-On transition.
Figure 5.8 shows the SPN model adopted to estimate the availability and downtime
of the servers with cold standby redundancy.
The markings of the places SR OK and S1 OK denote the operational states of
the switch/router and S1 server. The  marking of the S2 -OFF indicates the wait-
ing state before the activation of S2 server. When the place S2 OK is marked, the
server S2 is operational and in use. The places SR F, S1 F, and S2 F indicate the
failure states of these components. When the main module fails (S1), the transition
S2 Switching-On is enabled. Its firing represents the start of the spare in operational
state (S2). This period is the Mean Time to Activate (MTA).
The following statement is adopted for estimating availability and unavailability:

( (
A = P ( # SR _ OK = 1) AND ( ( # S1 _ OK = 1) OR ( # S 2 _ OK = 1) ) ))
( (
UA = 1 − P ( # SR _ OK = 1) AND ( ( # S1 _ OK = 1) OR ( # S 2 _ OK = 1) ) ))
SR_OK
S1_OK

SRR S1R

SRF S1F

SR_F S1_F

S2_OK

S2R

S2F
S2_SwitchingOn S2_OFF

S2_F

FIGURE 5.8  Availability SPN model.


Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 145

5.4.2.2.2  Reliability SPN Model


Figure 5.9 shows the SPN reliability model for architecture presented in Figure 5.5.
The  main difference between models of Figures  5.8 and 5.9 is the repair time
for the entire system, i.e., the system reliability considers the time until the first
failure. The model represents an active/active redundancy, with the failure of S1
and S2 servers the immediate transition is enabled and may be fired, marking the

SR_OK
S1_OK

S1R

SRF S1F

SR_F S1_F
Failure_sys

System_OFF S2_OK

S2R

S2F

S2_F

FIGURE 5.9  SPN Reliability model.


146 Reliability Engineering

place System OFF with a token. The following expressions are adopted for esti-
mating reliability and unreliability, respectively:

R(t ) = 1 − P ((# SR F = 1)V (# SystemOFF = 1))(t )

= ((# SR F 1)V (# SystemOFF = 1))(t )


UR(t ) P=

5.4.2.2.3 Results
Table  5.3 presents the values of mean time to failure (MTTF) and mean time to
repair (MTTR) used for computing availability and reliability metrics for the SPN
models. We computed the availability and reliability measures using the mentioned
input parameters. The results are shown in Table 5.4, including steady-state avail-
ability, number of nines, annual downtime, reliability, and unreliability, considering
4,000 h of activity. The switching time considered is 10 minutes, which are enough
for the system startup and software loading.
This SPN model enables the computation of the reliability function of this sys-
tem over time, which is plotted in Figure  5.10, considering the baseline setup of
parameters shown in Table  5.3, and also a scenario with improved values for the
switch/router MTTF (30,000 h) and both servers MTTR (8 h). It is noticeable that, in
the baseline setup, the system reliability reaches 0.50 at around 15,000 h, and after
60,000 h (about 7 years), the system reliability is almost zero. When the improved
version of the system is considered, the reliability has a smoother decay, reaching
0.50 just around 25,000  h, and approaching zero only near to 100,000  h. For  the
sake of comparison, the reliability at 4,000  h is 0.8840773, wherein the baseline
setup is 0.818385. Such an analysis might be valuable for systems administrators to

TABLE 5.3
Input Parameters for SPN Models
Transition Value (h) Description
SRF 20,000 Switch/Router MTTF
S1F = S2F 15,000 Servers MTTF
SRR = S1R = S2R 24 MTTR
S2 Switching On 0.17 MTA

TABLE 5.4
SPN Results
Availability 0.998799
Number of nines 2.920421
Downtime (h/yr) 10.521636
Reliability (4,000 h) 0.818385
Unreliability (4,000 h) 0.181615
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 147

1 Baseline setup
Improved setup

Reliability 0.8

0.6

0.4

0.2

0
0 20,000 40,000 60,000 80,000 100,000
Time (h)

FIGURE 5.10  Reliability function for the example system.

make decisions regarding system maintenance and replacement of components to


avoid failures that will cause significant damage for revenue, customer satisfaction,
or other corporate goals.

5.5  FINAL COMMENTS


The process of analytical modeling for computational systems must consider a vari-
ety of strategies and characteristics of each available formalisms. The choice of one
type of model may involve accuracy issues, expressiveness power, accessible soft-
ware tools, and the complexity of the target system.
The  concepts and examples presented in the chapter should be viewed as an
introduction and motivation on possible methods to select when studying computing
reliability and availability metrics. The conciseness and power of SPNs especially
can be useful in many cases when complexity grows and many details must be
represented. Nevertheless, CTMCs always will be kept as an option which provides
enough resources for performing many kinds of analyses. Other modeling formal-
isms, such as FTs, RBDs, Reliability Graphs, and stochastic Automata networks, are
also significantly important and enable different views for the same dependability
concepts approached here.
The  world is a place where information systems control almost every aspect
of daily lives. The  knowledge and framework exposed here may be increasingly
required as regulatory agencies and big corporate customers demand the estimation
of boundaries on how dependable their systems are.

ACKNOWLEDGMENT
This work was supported by a grant of contract number W911NF1810413 from the
U.S. Army Research Office (ARO).
148 Reliability Engineering

REFERENCES
1. Laprie, J.C. Dependable Computing and Fault Tolerance: Concepts and terminology.
Proceedings of the 15th IEEE International Symposium on Fault-Tolerant Computing.
1985.
2. Schaffer, S. Babbage’s Intelligence: Calculating Engines and the Factory System.
Critical Inquiry. 1994, Vol. 21, No. 1, 203–227.
3. Blischke, W.R., Murthy, D.P.  [ed.]. Case Studies in Reliability and Maintenance.
Hoboken, NJ: John Wiley & Sons, 2003. p. 661.
4. Stott, H.G. Time-Limit Relays and Duplication of Electrical Apparatus to Secure
Reliability of Services at New York. Transactions of the American Institute of Electrical
Engineers, 1905, Vol. 24, 281–282.
5. Stuart, H.R. Time-Limit Relays and Duplication of Electrical Apparatus to Secure
Reliability of Services at Pittsburg. Transactions of the American Institute of Electrical
Engineers, 1905, vol. XXIV, 281–282.
6. Board of Directors of the American Institute of Electrical Engineers. Answers to
Questions Relative to High Tension Transmission. s.l.: IEEE, 1902.
7. Ushakov, Igor. Is Reliability Theory Still Alive? Reliability: Theory  &Applications.
2007, Vol. 2, No. 1, Mar. 2017, p. 10.
8. Bernstein, S. Sur l’extension du théorème limite du calcul des probabilités aux´ sommes
de quantités dépendantes. Mathematische Annalen. 1927, Vol. 97, 1–59. http://eudml.
org/doc/182666.
9. Basharin, G.P., Langville, A.N., Naumov, V.A. The  Life and Work of A.A. Markov.
Linear Algebra and Its Applications. 2004, Vol. 386, 3–26. doi:10.1016/j.laa.2003.12.041.
10. Principal Works of A. K. Erlang—The  Theory of Probabilities and Telephone
Conversations. First published in Nyt Tidsskrift for Matematik B. 1909, Vol. 20, 33–39.
11. Kotz, S., Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications.
Imperial College Press. doi:10.1142/9781860944024.
12. Kolmogoroff, A. Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung [in
German]  [Springer-Verlag]. Mathematische Annalen. 1931, Vol.  104, 415–458.
doi:10.1007/BF01457949.
13. Shannon, C.E. A Mathematical Theory of Communication. The Bell System Technical
Journal. 1948, Vol. 27, 379–423, 623–656.
14. Neumann, J.V. Probabilistic Logics and the Synthesis of Reliable Organisms from
Unreliable Components. Automata studies, 1956, Vol. 34, 43–98.
15. Moore, E.F. Gedanken-Experiments on Sequential Machines. The Journal of Symbolic
Logic. 1958, Vol. 23, No. 1, 60.
16. Cox, D. A  Use of Complex Probabilities in the Theory of Stochastic Processes.
Mathematical Proceedings of the Cambridge Philosophical Society. 1955, Vol.  51,
No. 2, 313319. doi:10.1017/S0305004100030231.
17. Avizienis, A. Toward Systematic Design of Fault-Tolerant Systems. IEEE Computer.
1997, Vol. 30, No. 4, 51–58.
18. Barlow, R.E. Mathematical Theory of Reliability. New York: John Wiley & Sons, 1967.
SIAM series in applied mathematics.
19. Barlow, R.E., Mathematical Reliability Theory: From the Beginning to the Present
Time. Proceedings of the Third International Conference on Mathematical Methods
In Reliability, Methodology and Practice. Trondheim, Norway, 2002.
20. Epstein, B., Sobel, M. Life Testing. Journal of the American Statistical Association.
1953, Vol. 48, No. 263, 486–502.
21. Gnedenko, B., Ushakov, I. A., & Ushakov, I. (1995). Probabilistic reliability engineering.
John Wiley & Sons.
Markov Chains and Stochastic Petri Nets for Availability and Reliability Modeling 149

22. Thiess, S. J. Einhorn and F. B. Intermittence as a stochastic process. S. J. Einhorn


and F. B. Thiess, Intermittence as a stNYU-RCA  Working Conference on Theory of
Reliability. New York: Ardsley-on-Hudson, 1957.
23. Anselone, P.M. Persistence of an Effect of a Success in a Bernoulli Sequence. Journal
of the Society for Industrial and Applied Mathematics. 1960, Vol. 8, No. 2, 272–279.
24. Birnbaum, Z.W., Esary, J.D., Saunders, S.C. Multi-component Systems and Structures
and Their Reliability. Technometrics. 1961, Vol. 3, No. 1, 55–77.
25. Ericson, C. Fault Tree Analysis—A  History. Proceedings of the 17th International
Systems Safety Conference. Orlando, FL, 1999.
26. Pierce, W.H. Failure-Tolerant Computer Design. New  York: Academic Press, 1965,
65–69.
27. Avizienis A., Laprie J.C., Randell, B. Fundamental Concepts of Computer System
Dependability. IARP/IEEE-RAS Workshop on Robot Dependability: Technological
Challenge of Dependable Robots in Human Environments—Seoul, Korea, 2001
28. Maciel, P., Trivedi, K., Matias, R., Kim, D. Dependability Modeling. Performance and
Dependability in Service Computing: Concepts, Techniques and Research Directions
ed. Hershey, PA: IGI Global, 2011.
29. Laprie, J.C. Dependability: Basic Concepts and Terminology. s.l. New  York:
SpringerVerlag, 1992.
30. Natkin, S.O. (1980). Les reseaux de Petri stochastiques et leur application a l’evaluation
des systemes informatiques. Conservatoire National des Arts et Metiers. PhD thesis.
CNAM. Paris, France.
31. Molloy, M.K. (1982). On The Integration of Delay and Throughput Measures in
Distributed Processing Models. PhD thesis. UCLA. Los Angeles, CA.
32. Symons, F.J.W. Modelling and Analysis of Communication Protocols using Numerical
Petri Nets. PhD Thesis, University of Essex, also Dept of Elec. Eng. Science
Telecommunications Systems Group Report No. 152, 1978.
33. Chiola, G., Franceschinis, G., Gaeta, R., Ribaudo, M. GreatSPN 1.7: Graphical Editor
and Analyzer for Timed and Stochastic Petri Nets. Performance Evaluation. Vol. 25,
No. 1–2, 47–68, 1995.
34. Haverkort, B.R. Markovian Models for Performance and Dependability Evaluation.
Lectures on Formal Methods and Performance Analysis. Berlin, Germany: Springer,
2001.
35. Seneta, E. Markov and the Creation of the Markov Chains. School of Mathematics and
Statistics, University of Sydney, NSW, Australia, 2006.
36. Trivedi, K.S. Probability and Statistics with Reliability, Queuing, and Computer
Science Applications, 2nd ed. Hoboken, NJ: John Wiley & Sons, 2001.
37. Parzen, E. Stochastic Processes. Dover Publications. San Francisco, CA, 1962.
38. Stewart, W.J. Probability, Markov Chains, Queues and Simulation. Princeton, NJ:
Princeton University Press, 2009.
39. Ash, R.B. Basic Probability Theory. New York: John Wiley & Sons, 1970.
40. Feller, W. An Introduction to Probability Theory and Its Applications. Vols. I, II.
New York: John Wiley & Sons, 1968.
41. Marsan, M.A., Conte, G., Balbo, G. A Class of Generalized Stochastic Petri Nets for the
Performance Evaluation of Multiprocessor Systems. ACM Transactions on Computer
System. 1984, Vol. 2, No. 2, 93–122.
42. Ajmone Marsan, M., Chiola, G. On Petri Nets with deterministic and exponentially
distributed firing times. In G. Rozenberg, editor, Advances in Petri Nets 1987, Lecture
Notes in Computer Science 266, pp. 132–145. Springer-Verlag, 1987.
43. Marsan, M.A., Balbo, G., Conte, G., Donatelli, S., Franceschinis, G. Modelling with
Generalized Stochastic Petri Nets. Wiley, 1995.
150 Reliability Engineering

44. German, R., Lindemann, C. Analysis of Stochastic Petri Nets by the Method of
Supplementary Variables. Performance Evaluation. 1994, Vol. 20, No. 1, 317–335.
45. German, R. Performance Analysis of Communication Systems with NonMarkovian
Stochastic Petri Nets. New York: John Wiley & Sons, 2000.
46. Lindemann, C. (1998). Performance modelling with deterministic and stochastic Petri
nets. ACM sigmetrics performance evaluation review, 26(2), 3.
47. Molloy, M.K. Performance Analysis Using Stochastic Petri Nets. IEEE Transactions
on Computers. 1982, Vol. 9, 913–917.
48. Muppala, J., Ciardo, G., Trivedi, K.S. Stochastic Reward Nets for Reliability Prediction.
Communications in Reliability, Maintainability and Serviceability. 1994, Vol. 1, 9–20.
49. Matos, R., Dantas, J., Araujo, J., Trivedi, K.S., Maciel, P. Redundant Eucalyptus Private
Clouds: Availability Modeling and Sensitivity Analysis. Journal Grid Computing.
2017, Vol. 15, No. 1, 1–23.
50. Malhotra, M., Trivedi, K.S. Power-hierarchy of Dependability-Model Types. IEEE
Transactions on Reliability. 1994, Vol. 43, No. 3, 493–502.
51. Shooman, M.L. The  Equivalence of Reliability Diagrams and Fault-Tree Analysis.
IEEE Transactions on Reliability. 1970, Vol. 19, No. 2, 74–75.
52. Watson, J.R., Desrochers, A.A. Applying Generalized Stochastic Petri Nets to
Manufacturing Systems Containing Nonexponential Transition Functions. IEEE
Transactions on Systems, Man, and Cybernetics. 1991, Vol. 21, No. 5, 1008–1017.
53. O’Connor P, Kleyner A. Practical Reliability Engineering. John Wiley & Sons; 2012
Jan 30.
54. Beaudry, M.D. Performance-Related Reliability Measures for Computing Systems.
IEEE Transactions on Computers. 1978, Vol. 6, 540–547.
55. Dantas, J., Matos, R., Araujo, J., Maciel, P. Eucalyptus-based Private Clouds:
Availability Modeling and Comparison to the Cost of a Public Cloud. Computing. 2015,
Vol. 97, No. 11, 1121–1140.
56. Buzacott, J.A. Markov Approach to Finding Failure Times of Repairable Systems.
IEEE Transactions on Reliability. 1970, Vol. 19, No. 4, 128–134.
57. Maciel, P., Matos, R., Silva, B., Figueiredo, J., Oliveira, D., Fe, I., Maciel, R.,
Dantas, J. Mercury: Performance and Dependability Evaluation of Systems with
Exponential, Expolynomial, and General Distributions. In: The 22nd IEEE Pacific Rim
International Symposium on Dependable Computing (PRDC 2017). January 22–25,
2017. Christchurch, New Zealand.
58. Guedes, E., Endo, P., Maciel, P. An Availability Model for Service Function Chains
with VM Live Migration and Rejuvenation. Journal of Convergence Information
Technology. Volume 14 Issue 2, April, 2019. Pages 42–53.
59. Silva, B., Matos, R., Callou, G., Figueiredo, J., Oliveira, D., Ferreira, J., Dantas, J.,
Junior, A.L., Alves, V., Maciel, P. Mercury: An Integrated Environment for Performance
and Dependability Evaluation of General Systems. Proceedings of Industrial Track at
45th Dependable Systems and Networks Conference (DSN-2015). 2015. Rio de Jan