Maintenance Strategy Review Course
Maintenance Strategy Review Course
Module 1 – Introduction
Getting started
Course structure
The course tuition is divided into five lessons, as listed here. Click on each lesson name to view its learning
objectives.
1. On completing the lesson "What is MSR?" you will be able to discuss some of the shortcomings associated
with the traditional approach to maintenance. You will also be able to explain the purpose of a
maintenance strategy review, and to describe its place within the maintenance work management process.
2. The second lesson of the course focuses on Reliability Centred Maintenance, or RCM. On completion of
this lesson you will be able to define this term, and to discuss the origins and evolution of RCM in general
terms. You will be able to explain the basic methodology that underpins RCM. You will also be able to list
seven criteria that must be satisfied if a maintenance strategy review process is to be considered "true
RCM".
3. Lesson three looks at a number of processes that have been derived from the original RCM process. After
completing this lesson you will be able to outline the evolution of the SKF‘RCM process and discuss the
ways in which it differs from the "classic" approach. You will also be able to explain the basic principles of
Risk Based Maintenance, and show how SKF's AMST software helps to apply these. Finally, you will be
able to discuss a cost-based approach to RCM that was developed by SKF for Shell Global Solutions and
Royal Dutch Shell.
4. In lesson four you'll be taken through an SKF MSR project. At the end of the session you will be able to list
and discuss the seven basic steps that this entails. You will also be able to list and explain five criteria that
should be applied when analysing non-critical plant items.
5. The final lesson looks at the implementation of the results of a review. On completion you will be able to
describe a simple process for task implementation, and discuss the merits of employing "standard job
plans". You will also be able explain what is meant by a "living program", why it is needed, and describe a
simple process for its implementation.
End-of-course test
At the end of the course you can check what you learned by taking a test. If you pass the test on-line then you
will be able to download and print your own course completion certificate.
To begin the course, click “lessons”.
Module 2 – Lessons
Lesson 1: What is maintenance strategy review?
In order to understand the Maintenance Strategy Review process, it is first necessary to have an
understanding of what a good maintenance strategy is meant to achieve.
Page 1 of 17
To-do List
The objective of a well-defined maintenance strategy must be to ensure that:
• the right work is done
• on the right equipment
• at the right time, by the right people
• and for the right reasons.
A maintenance strategy review should therefore be a systematic review of plant or equipment, evaluating the
manner in which it fails within a given operational context, the consequences of failure and the identification of
technically feasible and cost-effective maintenance strategies to minimize the consequences and / or
frequency of failure.
Your organisation
Before we go any further, think for a moment about the maintenance strategy that’s currently in place in your
organisation. Where did this come from?
Traditional maintenance - Often based on little more than judgment and experience.
For many plants, maintenance practices don’t result from a formal, structure strategy review project. More
typically they are based on little more than judgment and experience, taking account of a combination of
factors including past experience, vendor recommendations, company general practices, or response to
earlier failures without questioning the reason. In some cases these practices have been introduced or
modified to achieve compliance with legislation and standards. It’s also not unusual for plants to apply new
technology that they learned about or were “sold” without really questioning whether this technology was
really needed or worth doing.
Page 2 of 17
3. Control: The “work control” facet of AEO is about planning and scheduling the required tasks. The input to
this part of the process is the work order from the “work identification” facet. In this facet the following
details are defined:
- When will the work be done?
- How will the work be done? What will be the method, the steps by which the work will be done; how
many hours it will take?
- What tools and equipment are needed?
- How will the quality of the work be verified?
4. Execute:The work is actually conducted in the “Work Execution” facet of the process. We also need to
record here what we actually did, and what it achieved.
5. Optimize: In the Optimize facet the information that is captured in the “Work execution” stage is used to
decide what (if anything) should be done differently in the future. The Optimize facet is the pro-active
domain, which drives continuous improvement. We sometimes refer to this continual updating of the
strategy as the “living program”.
Maintenance maturity
Very often, maintenance departments will be better at some activities than they are at others. Consider the
“maturity” pyramid shown here, which depicts stages of excellence. An organisation could have (for example)
best in class performance, (that is, “Innovating”), at the “maintenance strategy” facet of their management
process but could be could be failing to execute the defined tasks properly, and therefore “fire-fighting” in their
“work execution” facet. In other words, they do the right things, but don’t do them well enough to achieve the
expected result. Conversely, they could be world class in terms of work execution practices, but poor in the
maintenance strategy area. In other words, the maintenance work that they carry out is done well, but they’re
actually doing the wrong things. An understanding of maturity throughout all of the facets is therefore needed.
There is little point in developing a well-defined and justified maintenance strategy if lack of maturity in the
other facets prevents efficient and effective implementation. The goals that will be set must be matched to
current capabilities.
Assessing maturity
Some form of assessment should be undertaken to establish the organisation’s maturity in the various
aspects of its work management process. Assessments can take many forms. SKF’s client needs analysis
application employs forty searching and carefully targeted questions in order to achieve this. The application
facilitates graphical comparison of users’ responses with continuously updated benchmarks representing
performance of like companies. The assessment, however, does not end here, indeed it's arguably just
beginning. The real challenge is to use the data that is represented by such graphics to drive a realistic
improvement program.
Getting started
We need to first understand the business that the maintenance organisation is intended to support. This is an
essential element in the process of identifying which pieces of equipment are critical to the business. A sound
understanding of the business goals and key performance indicators is the key to success in devising the
correct maintenance strategy for an organisation. This understanding provides the primary input to the
maintenance strategy review process, irrespective of the method that is chosen to perform the review.
The maintenance strategy review is then undertaken as a sub-process and can be accomplished using a
variety of techniques.
Page 3 of 17
Transplanting best practices
Think about this for a moment. Can we optimize maintenance by simply adopting best practices from our own
experience, or from the experience of others?
What is RCM?
The principles of Reliability Centered Maintenance are central to a number of maintenance strategy review
processes, and so it’s important that we understand them, and how they were derived. So what is Reliability
Centred Maintenance (RCM)?
• It is a process used to determine what must be done to ensure that any physical asset continues to fulfil its
intended function(s) in its present operating context
Page 4 of 17
Reliability in industry
Routine maintenance, then, should be performed to prevent the effects of failure, not necessarily the failure
itself (i.e., root cause) – some failures may be acceptable. It is most often not cost-effective to develop generic
maintenance programs for equipment types which do not consider the specific application, and it is not
necessary to have an initial set of good failure rate data to develop an effective maintenance program.
Vendors have only a limited role to play in the identification of optimum maintenance for their equipment.
Reliability theory
Documented research into equipment failure probability and advanced age has shown that such a view of
equipment life is over-simplistic and not typical of most machinery. Three major studies were conducted by
United Airlines, Bromberg, and the US Navy, the results of which have had a major impact on the way in
which maintenance is now regarded. These showed that the majority of equipment or components do not
follow all parts of the classic “bathtub curve”.
• Approximately 85% of equipment only shows infant and random failures.
• Only about 15% shows a significant increase in failure probability in later stages of its life.
• Many machines display a relatively constant failure rate throughout their service life.
Time-based maintenance
Taking the relatively constant failure probability exhibited by most types of equipment, this illustration shows
the danger of intrusive maintenance. Maintenance activity can re-introduce the possibility of “infant failures“.
Thus the probability of failure actually increases as a result of the maintenance intervention, putting the plant
at a greater risk due to the component failure.
“Classic” RCM
The early work resulted in a first set of guidelines issued in 1968 by the Maintenance Steering Group (MSG)
formed by the Federal Aviation Administration and the airline industry. These guidelines became known as
MSG-1 and were developed around the introduction of the Boeing 747. Guidelines MSG-2 and EMSG-2 were
issued in 1970 and 1972 respectively to cover the L-1011, DC-10, Concorde and Airbus aircraft.
MSG-1 and -2 marked the first time that people applied the concepts of “on-condition” and “condition
monitoring”, due to the findings that most equipment does not become more likely to fail as it gets older. The
best that can be done is monitor its condition and repair or replace when required. The original process that
evolved from the aviation industry is nowadays often referred to as “Classical RCM”.
Focus shift
Some notes on these methods show that MSG-1 and 2 used a “bottom-up” approach in which the integrity of
individual components was the focus. In 1980 guideline MSG-3 was introduced, bringing in the “top-down”
approach in which systems and their functions are looked at. Component failures are analyzed in terms of
their safety, operational and economic consequences through the function the system is intended to perform.
Page 5 of 17
RCM Derivatives
Many commonly used MSR methods are derived from Reliability Centered Maintenance (RCM). Under close
scrutiny, however, it becomes questionable whether all of the processes that claim to be “RCM-based”
actually meet the intent of the original process. Taking short-cuts in certain parts of the process can be
lucrative for the short term, but can prove to be detrimental for the absolute life cycle of plant assets. As a
result the Society of Automotive Engineers recognized that was potentially a devastating problem to end
users. The society therefore developed standards to provide users with an understanding of what is meant by
RCM and what constitutes a genuine RCM process. The society’s standard JA-1011 provides the raw
compliance criteria that a process must meet in order to be proclaimed as a true RCM process. Standard JA-
1012 describes and further clarifies the JA-1011 compliance criterion. When selecting an MSR method to use,
research should be conducted to find out whether the chosen process meets the conditions laid out in these
standards.
SAE JA1011
The RCM standard SAE JA1011 addresses seven key questions. The first four questions form what is
commonly known as a Failure Modes and Effects Analysis (FMEA) which identifies the critical plant failure
modes in a systematic and structured manner. It starts with functions, by asking what do the users want or
expect the asset to do. Both primary and secondary functions should be considered. By way of an example,
consider a pump. Its primary function might be to deliver fluid to a process within specified limits of pressure
or flow. However, the pump also performs a secondary function as part of the containment system for the
fluid. A leak may therefore not be serious enough to impair the primary function but could impair the
secondary function. Obviously, this could represent a significant functional failure if the fluid were toxic or
corrosive. The last three questions prompt the user to think about the consequences of each functional failure.
Measures that might prevent the failure, or mitigate its consequences are identified. Where no proactive task
can be identified then there must be an action plan to deal with the failure.
Page 6 of 17
Reliability Centred Maintenance - Expanded definition
In summary, then, RCM is a methodical, logical process which meets specified standards, to identify the right
maintenance, on the right equipment, at the right time, for the right reason. In the next lesson, we’ll look at
some variations on the way in which the principles of RCM can be applied.
Page 7 of 17
3. In the third part of the process the changes to the existing PM program which have been identified and
agreed are required are implemented. A continuous improvement loop is also put in place to verify that
the implemented tasks achieve what was expected, and to ensure that the PM program continues to
evolve in line with the developing needs of the business.
Risk
Now let’s move on, and consider another variation on the RCM theme – Risk Based Maintenance (RBM).
We’ve already learned that RCM and its variations are intended to provide a maintenance program that will
maintain high system reliability and minimize costly failures. A failure results in a risk to this intention.
Risked Based Maintenance was developed to provide means of quantifying that risk.
RBM process
Here’s another view of the RBM process which follows on from the blocks on the previous page, showing in a
little more detail the process inputs and outputs.
Page 8 of 17
Condition monitoring
RBM uses maintenance tasks to reduce the risk of failures, and typically condition monitoring is the backbone
of the resulting maintenance strategy. A range of condition monitoring tools and techniques exist that, when
properly applied, are very effective in identifying the onset of failure and are thus very cost effective. In RBM
the choice of technique is cost-benefit driven, with the frequency of monitoring being dictated by the warning
time thus provided.
Protective systems
For many plant items, a failure to function may cause production loss but will frequently not impose any safety
hazard. However, where a product is used for a safety function (e.g. a fire detection system) then a functional
safety evaluation should be performed to ensure that the device offers an appropriate level of risk mitigation.
Standards exist that specify methodology for this process, and to ensure that personnel involved in
undertaking such assessments are qualified to do so in terms of competence and training.
Page 9 of 17
Cost-based Reliability Centred Maintenance - Basic steps
The basic steps in this Cost Based method are shown here, and the process generally follows other RCM
methods. What the Cost Based activity adds is the knowledge base of cost of failures, and cost of
maintenance. With this information the cost benefit of performing maintenance can be determined and
maintenance tasks chosen that are the most cost effective.
A “funnelling” process
SKF RCM® can be thought of as a funnel. In order for the SKF RCM® to be completed all of the groups
involved in maintenance have to agree on the final maintenance program. This means that the differing views
of maintenance that typically exist at facilities will have to become a single, consistent and coherent view. By
having the different groups involved in the collective decision process, the results will have to be consistent for
all.
Page 10 of 17
SKF RCM® Process model
Do you remember this slide from the previous lesson? This is the process model for SKF RCM® approach.
Viewed like this, in its entirety, it’s a little daunting, isn’t it? Let’s follow the process though, and try to break it
down into a number of individual steps.
Page 11 of 17
FMEA and criticality analysis
In SKF RCM® the failure mode of a component is how it fails to function, not why it fails. In other words, a
pump’s failure mode is that it fails to deliver flow at the proper rate and pressure. We will come back to the
reasons for this to happen, for example, a worn impeller, as a failure cause of this failure mode. For each
functional failure, evaluate every component that could cause it. List the most significant (aka likely or
dominant) failure modes for each component (in other words, how it fails). Then, list the most significant
effects of each component failure including health safety or environmental impacts (that’s what happens when
it fails). SKF’s Asset Management Support Tool (AMST) software aids this process by providing pick-lists of
component types with associated failure modes. The Criticality analysis then determines which pieces of
equipment, or components, are important to the operation of the system. SKF RCM® categorizes them as
"critical" (in cases where failure will result in an unacceptable effect), or "non-critical" (meaning that failures
can be tolerated). Please note that calling a component “non-critical” merely states that the component is not
functionally important. There may still be important reasons to maintain non-critical equipment, however; for
example, it may be an expensive piece of equipment to repair if maintenance is not performed. For critical
components, since their failure prevention is very important, the likely or dominant failure causes (piece part
failures) are determined – the AMST software offers pick lists of likely failure causes for the failure modes and
equipment type being analyzed.
Criticality Matrix
As a part of the SKF RCM® process, a criticality matrix, such as shown here, is often used to establish what
the dividing line is between tolerable and in-tolerable failure effects. The matrix shown is only an example and
must be customized for every organization. The purpose of the matrix is simple, if the effects on production or
HSE that occur upon equipment failure do not lead to significant impacts in the areas shown in red, then the
equipment is not critical – no matter how loudly people may argue. Use of such a matrix allows work to be
prioritized based on the criticality of the equipment to important business goals – in other words, the reason
for maintaining equipment is to help achieve the business goals. This also helps in the planning and
scheduling of work at a site because all corrective and preventive work can be prioritized based on functional
criticality and can even be automated in the computerized maintenance management system (or CMMS).
Simply said, the most important equipment should be repaired upon failure first, before less important
equipment, and critical equipment should be maintained before non-critical equipment. To assist in this
process, criticality codes are developed in SKF RCM® and can be imported from the AMST software into the
CMMS and then used by plant staff to prioritize work.
Page 12 of 17
Task hierarchy
When developing the tasks, we first start with tasks operators can already do as part of their rounds. Then we
move to tasks that operators can do but which aren’t normally part of a round such as functional testing a
backup pump (starting it up) or switching the operating pumps from the A pump to the B pump. Next, we
select simple tasks like lubrication and filter changes before moving on to consider tasks requiring a skilled
craftsperson. The time-directed rebuilding or replacement of equipment is considered only as a last resort
and, even then, only if the tasks types above are ineffective. It is important to note that scheduled
replacements or rebuilds requires evidence to show that the equipment in question has a defined life that is
known and repeatable. For example, a well-known mean time between failure is necessary in order to make
scheduled repairs / replacements workable.
Maintenance template
When selecting tasks in SKF RCM®, a maintenance standard library of maintenance templates is available
within the AMST software. This feature is used to simplify the process and make the resulting maintenance
more consistent. This template is only a starting point when discussing tasks and it is segregated by
equipment type and application. However, when using the standard, particular attention must be paid by the
analyst and the team to look for differences on the equipment being studied and where the maintenance
standard must be modified, either in scope or frequency.
Non-critical analysis
The non-critical analysis within SKF RCM® is the process that is used to determine whether maintenance is
justified on non-critical components. By using this process, a complete and fully-documented basis for the
maintenance program is developed. The non-critical analysis evaluates components that do not support an
“important function”. Remember our earlier discussion about the meaning of “non-critical”? It simply means
that the component is not functionally important. This does NOT mean that no maintenance is justified. There
may still be important reasons to maintain non-critical equipment, for example it may be an expensive piece of
equipment to repair if maintenance is not performed. In cases where no maintenance is deemed justifiable,
the non-critical analysis fully documents the basis for a “run-to-failure” decision.
Five criteria
In the non-critical analysis five questions should be addressed. If the answer to any of these five questions is
“yes”, then a preventive maintenance task should be selected. Click each question for more information.
1. First, what is the repair or replacement cost? For example, in a system with 3 redundant compressors
where only 1 is required for the process, any one of the compressors could be determined to be non-
critical (for production reasons, though possibly not for HSE). However, although the compressor may not
be critical to the business goals, it may still be quite expensive to repair if no maintenance is performed.
2. Question 2 asks if there is a simple (cost effective) task that can be performed that will maintain the
component at its peak. For example, lubrication of a valve stem is simple, but assures the operability of the
component.
3. Question 3 considers possible “knock-on” effects from failure. For example, a leaking valve stem may drip
water on electrical components, thereby inducing other (possibly critical) failures. Another example might
apply to protective instrumentation, where the “fail safe” failure of a logic component might result in a
spurious trip situation. Another example might be the failure of a heat exchanger’s isolation valve. In itself
the isolation valve is not critical, but it’s failure might prevent maintenance of the critical heat.
Page 13 of 17
4. Question 4 considers the possible increase in personnel hazards that might arise from failure of systems
that are not critical to business goals.
5. Finally, question 5 considers the past history of non-critical components. If there is a history of frequent or
costly corrective maintenance then consideration should be given to planned maintenance tasks that might
reduce the cost and / or frequency of the failures.
Task comparison
The purpose of the task comparison is to compare the output from the SKF RCM® process with the
incumbent maintenance task list. It provides information to the plant about whether to
• ADD new tasks identified in SKF RCM®
• DELETE existing tasks from the plant's present program that were not identified as necessary in the SKF
RCM® work.
• RETAIN an existing task because it matches a recommended task from the SKF RCM® analysis, perhaps
making changes to the task’s content or frequency as recommended by SKF RCM®
Prerequisites to the task comparison are:
• A List of all current PMs
• A List of all existing failure-finding tasks
• Details of existing operator rounds
The Task Comparison should not require much time if the data is electronically available, but it may be difficult
to do if the PMs are not stored electronically and sorted by tag number. When comparing tasks, it is often
difficult to obtain a complete list of present tasks performed by the plant on the system components being
analyzed. This is because of different groups of people involved in these activities often have their own ways
of identifying work and plant items, and frequently do not employ the same documentation systems.
Task prioritization
Part of the AEO model's strategy facet is the use of the criticality developed in the SKF RCM® analysis to
determine the actual priority of the work that is performed in the facility. In other words, a priority scheme can
be developed that allows you to place higher priority (and hence focus more maintenance effort) on more
critical equipment and less focus on less important equipment.
Page 14 of 17
Step 7. Implementation and living program
For the study to be effective we must know ensure that the resulting recommendations are faithfully
implemented. But that’s not the end of the review process. We must also put in place a process to monitor the
effects of the changes that we made and to make any adjustments to the strategy that might be dictated by
real experience. Furthermore, the needs of the business often change over time, so our process must also
ensure that the strategy remains aligned to the business goals. So, as we undertake the tasks identified
through the SKF RCM® process, we need to look at what we did, and understand if it achieved the intended
result. We must then learn from that experience and adjust our strategy accordingly. This step of the process
is known as the “living program". Issues associated with this step of the process will be dealt separately, in the
final lesson of this course.
Lesson 5: Implementation
Typical steps
The most important part of any study at a plant, or facility, is the actual implementation of the results. If the
results of the study are not used, then all of the effort that was put into the study is wasted. Specialist
(external) consultants can provide valuable support and input to the maintenance strategy review process, but
scope for their involvement in the implementation stage is typically more restricted. Implementation will NOT
be successful, nor will the original project, if you do not have management support, and support from all
affected departments:
• Maintenance
• Engineering
• Operations
• Outage/Turnaround Management
• Others influenced by an integrated PM program
To get the program successfully implemented in the CMMS, support will be needed from each of these
groups, and possibly more.
Job plan
In some cases, the maintenance task recommendation from the SKF RCM® analysis is not detailed enough
for the plant to actually see the benefit. In that the use of "job plans" should be considered. As described here
and in the next few slides, the purpose of a job plan is not to tell a craft person how to do their work, it is to tell
them what is required of the task recommendation; to be clear on what work is intended, what inspections are
required, what measurements are to be taken, and other details needed to ensure a consistent task
performance, irrespective of who actually does the work. This is arguably a “knowledge management”
exercise. Experienced personnel know what to do and how to do it and a job plan is a good way to capture
and retain that knowledge for future use. If left to the individual then sometimes the work will be inadequate,
and on other occasions it will be “overkill”. As tasks are executed there may also be need to capture specific
feedback information on PM and CM work orders. For example consider a simple task, such as “Clean,
inspect, lubricate every 6 months”. The experience of skilled personnel can be of great help in specifying
precisely what needs to be inspected and recorded. The work order may also make provision for the person
who is actually executing this task to record what they found, whether or not they had to clean, and whether or
not lubrication was found to be necessary.
Page 15 of 17
Standard job plans
After a system analysis is completed and a certain number of maintenance tasks are developed into job plans,
it will be recognized that the job plans for one system in part, may be applicable to tasks assigned to
components in other systems. In that case, it may be desirable to develop standard job plans that are more
generic and can be used wherever applicable with little modification. This will eliminate the need to write
individual job plans for every task on every component. A “standard job plan”, then, is a generic template for a
job that will be done repetitively. Its purpose is to facilitate speedy production of job plans for specific
maintenance tasks. Maintaining a library of job plans will allow a pool of information to be developed for use in
any application. For example, It could be shared across different plants in the same company to be more
efficient and more consistent in developing work plans. Standard job plans should also include a post-
maintenance testing step.
Constraints
Going back to the SKF RCM® process, whether job plans are developed or not, it should be noted that there
will be obstacles to implementation. This slide highlights some of the more common ones that have been seen
in the past and will likely be seen in future projects. Of these constraints to implementation, the most important
is “lack of personnel buy-in and management support”. The next slide offers some suggestions for addressing
these issues.
Ensuring “buy-in”
With these steps, it will be more likely that the results will be implemented and the project will be considered a
success. An important element of this list is the training of plant staff and management. By providing
management with a brief overview training of the process, they will understand what is at stake and the
benefits to be derived from implementing the results.
Exercise
At what point would an MSR study be considered “finished”?
Assign responsibility
The primary attribute of a living program is to have someone responsible for it. Without responsibility and
accountability, there will be no living program. A living program, where continuous improvements are to be
made, requires that reviews are made at least on some periodic basis. Review should be undertaken as
needed based on:
• Failure history
• Design changes
• Operational changes
• General Industry concerns
• New developments in maintenance techniques or technology
The living program should be fully integrated with maintenance process, making use of documented feedback.
Page 16 of 17
Living program
So what is necessary for a living program? This slide lists some of the elements including what data sources
are useful. “Feedback” is also a very useful tool to a good living program – it collects input from the
maintenance staff who are the closest to the equipment and always have a good sense of what is happening
and what should be done about it. The following slide is an example of a Feedback form that is intended to get
input from the maintenance staff to improve the program.
Feedback forms
Note that information collected from such a feedback mechanism must be taken seriously and
acknowledgment provided to whoever submitted it or this feature of the program will die quickly. It should be a
responsibility of the person in charge of the living program to see that the maintenance staff is promptly
informed of receipt and action on any feedback items provided.
Keeping track
In order to implement a living program it is necessary to provide a means of evaluating the items discussed
previously in this presentation. The evaluations should be developed for program changes, improvements,
and expansion. Another idea to consider may be the need for in-depth analyses requiring study of various
sources including outside consultants. This type of activity borders on root cause analysis with respect to
maintenance activities. Also, certain areas to consider, particularly with new technology or methods, might be
generic and apply to many components. As can be imagined, it will be necessary to keep track of all of this
review activity. Some sort of tracking form is recommended that will track the review for changes as well as
tracking implementation of the changes themselves.
Module 3: Test
End of course test
Now it’s time to see what you learned.
If you pass the test on-line then you’ll be able to download and print your course completion certificate.
Page 17 of 17