0% found this document useful (0 votes)
56 views11 pages

Methods and Tools in Maintenance

The document outlines various methods and tools used in maintenance engineering to analyze asset failures and improve decision-making, including Simplified Failure Mode and Effects Analysis (SFMEA), Mean Time Between Failures (MTBF), and Root Cause Analysis (RCA). SFMEA helps identify failure modes and their impacts, while MTBF measures the average time between failures, and RCA provides a structured approach to identify root causes of failures. Additionally, the document discusses the importance of visual inspections and the use of Pareto charts to prioritize issues based on their frequency and significance.

Uploaded by

Benny Mugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views11 pages

Methods and Tools in Maintenance

The document outlines various methods and tools used in maintenance engineering to analyze asset failures and improve decision-making, including Simplified Failure Mode and Effects Analysis (SFMEA), Mean Time Between Failures (MTBF), and Root Cause Analysis (RCA). SFMEA helps identify failure modes and their impacts, while MTBF measures the average time between failures, and RCA provides a structured approach to identify root causes of failures. Additionally, the document discusses the importance of visual inspections and the use of Pareto charts to prioritize issues based on their frequency and significance.

Uploaded by

Benny Mugo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Methods and Tools in Maintenance

Introduction

A primary responsibility of the maintenance engineer is to identify and analyze asset


failures and deviations from optimum performance. This responsibility requires tools or
methods that the engineer can use to effectively determine the potential failure modes of
assets, as well as determine the true root cause of the problem.

Tools and methods aid in identifying, analyzing, and evaluating various types of risk, and
thus contribute to improved decision-making for reduction of equipment losses.

I) Simplified Failure Mode and Effects Analysis (SFMEA)

SFMEA is a process that permits plant personnel, such as maintenance engineers,


craftsperson, and operators, identify the common failure modes of manufacturing and
process systems and their associated components.

A SFMEA evaluates three criteria:

1) severity or the impact a failure will have on the plant ability to achieve the capacity
or through-put needed to meet delivery requirements;
2) the probability that a specific failure mode will occur, based on in-plant history or
industrial statistics; and
3) the probability that current maintenance methods will detect the specific failure
mode before it occurs.

These three factors are combined in one number called the risk priority number (RPN) to
reflect the priority of the failure modes identified. The risk priority number (RPN) is simply
calculated by multiplying the severity rating, times the occurrence probability rating, times
the detection probability rating.
Steps in FMEA

a) For each of the potential failure modes, determine the anticipated potential effect(s)
of failure and the severity that the failure would have on the plant’s ability to meet
its mission, that is, its impact on production capacity, product quality, or total
operating cost; as well as safety or environmental impacts. Relative severity or
impact is ranked on a 1 to 10 scale.
b) determine the more probable cause or causes of each specific failure mode. For
example, total loss of function might be caused by loss of motive power,
mechanical binding or other causes.
c) determine the probability that each of the causes could occur. This determination is
based on known failures based on industrial or plant-specific histories.
d) determine whether or not the current methods used to monitor the system and its
components will detect each of the failure modes before failure or serious damage
could occur. A column is provided to define the specific preventive maintenance or
system monitoring method that is currently used to monitor for the specific failure
mode.
e) Calculate a risk priority number (RPN) for each failure mode.
f) Lastly, evaluate the potential for lowering the calculated RPN by improving the
preventive maintenance or system monitoring methods. For example, the addition
of predictive maintenance technologies could be used to provide early detection of
failure modes or forcing functions such as misalignment, abnormal loading, and
many others.

SFMEA is an ideal tool that can be used to establish or enhance preventive and predictive
maintenance programs. It can identify most, if not all, of the more probable failure modes
and forcing functions that would occur. This information can then be used to develop
specific preventive or predictive maintenance tasks that will eliminate or substantially
reduce the potential for these problems.
II) Mean Time Between Failures (MTBF)

It simply refers to the average time it takes for a failure to occur within a repairable piece
of equipment in a plant. The predicted failure rate or mean time between failures (MTBF)
of a plant is usually expressed in terms of in-operation hours, not calendar time.

Failure rate = number of failures/ time. This is based on historical data.

The system’s predicted MTBF should be within acceptable limits; low MTBFs means that
you can expect your system to fail more often, and you may need to take steps to improve
it. By making changes to your design, for example, maybe lowering temperatures or stress
levels, the predicted MTBF may improve and better product reliability can be expected.

Mean time between failure (MTBF) = number of operating hour/number of breakdowns

We notice that MTBF is thus the inverse of failure rate.

III) Root Cause Analysis (RCA)

Definition: Root Cause Analysis (RCA) is a step-by-step method used to analyze failures
and problems down to their root cause.

RCA is based on two premises.

a) that there are always one or more reasons that a deviation from an established or
acceptable norm, including equipment or system failure, occurs. Therefore, the first
task that is required for effective root cause analysis is the ability to differentiate
between normal and abnormal.
b) that these problems or deviations from acceptable norm do not just happen,
something changed and that change caused the deviation or failure. Therefore, a
process must be developed and followed that will permit accurate, effective
identification of the change or changes that resulted in the observed deviation,
problem, or failure. That process is called RCA.
Simple Analysis

Many of the equipment-related problems that plague industrial plants and facilities can be
resolved by visual inspection of the failed parts. For example, premature failure of rolling
element bearings is a common problem in most plants and facilities. Too many plants
simply replace the bearing and throw the failed bearing in the nearest trash bin. This
approach does little to eliminate the real reason that the bearing failed and there is a high
probability that the failure will recur.

Visual Inspection

A simple, visual inspection of the failed bearing, in most cases, will permit plant personnel
to identify the underlying reason, that is, root cause, of the premature bearing failure. Using
the information obtained by the visual inspection, the investigator, using root cause logic,
can look for changes in the design, installation, mode of operation, and other reasons that
could have caused this root cause (for example, an abnormal loading pattern that causes
excessive wear on one side of the bearing).

In many cases, this can be accomplished by a few simple tests that will isolate the root
cause of the load shift and the resultant bearing failure. One might discover that, for
instance, excessive V-belt tension was the actual factor that caused the bearing to fail; but
the root causes were:

 Improper maintenance procedures. The preventive maintenance task list failed to


provide adequate instructions for proper tensioning of the v-belt.
 Improper technician training. Technicians were not given proper training thus the
correct methods required to properly tension v-belts.
 Inadequate supervision. Supervisors, with the proper training, were not available
to ensure that maintenance technicians followed best practices.

Simply correcting the v-belt tension in this example would not correct the true root causes
of the problem. Unless the real factors that caused the problem are corrected, the problem
will recur in this specific application, as well as other v-belt-driven equipment in the
facility.
Visual inspection of other failed components, such as gears, will also provide insight into
the potential root cause of premature failures. Figures 1 and 2 illustrate the more common
failure modes of gears.

Figure 1: Abrasive wear caused by contamination

Figure 2: Severe pitting caused by overloading gear.

Figure 1 clearly indicates that abrasive contaminates within the lube oil system has
damaged the gears. The pattern from root to tip of the gear face could only be caused by
abrasives impinged between the mating gears as they roll into and out of mesh.
The severe pitting in Figure 2 is indicative of the implosion of the lubricating film in an
overloaded gearbox. The excessive backpressure caused by the overload results in multiple
implosions of the oil film and the resultant removal of metal from the gear teeth.

Even simple analysis requires verification of the assumed root cause. This verification may
be as simple as using vibration analysis to confirm the forcing function (root problem) that
caused the failure; or more detailed tests designed to eliminate or confirm the assumed root
cause. Do not assume anything. Always verify your assumptions.

The Formal RCA Process

A formal root cause failure analysis will require an investment in both time and manpower.
Typically, it will require a two to four-person team between 5 and 15 days to complete. If
plant personnel cannot resolve the problem within this timeframe, it is unlikely that it will
ever be solved without expert assistance.

Case Study

A classic example is an in-house problem-solving team of six engineers, twelve


technicians, and three experts from the turbine vendor, worked on a chronic steam
turbine-generator problem for more than 10 years without resolution. The turbines
exhibited chronic failure of the coupling that connected it to an electrical generator.
Over a period of 5 years, each of the six turbines had at least one failure of its
coupling. In an attempt to resolve the problem, the team had basically replaced,
modified, or changed every part of the turbine-generator drive train without
success—in fact the overall reliability of the machine train declined. The
incremental cost of this 10-year exercise was well over $1 million with no
measurable benefits.

When proper root cause analysis techniques were applied, the problem—lack of
sufficient foundation and structural support—was resolved within 7 days.

Root Cause: The turbines-generators, mounted on the second floor of the steam
plant, were flexing during normal operation, but the most serious damage to the
couplings was during startup and coast-down. During these transients, the lack of
rigidity in the floor and support structure permitted the entire drive train to move
or displace in the horizontal plane. This radical misalignment resulted in premature
coupling failure. Unfortunately, the cost required to undo all of the modifications
and changes that had been made during the 10-year troubleshooting exercise were
substantial, almost $750,000. The actual cost to correct the real root cause was less
than $5000 per turbine-generator or $30,000.

In formal RCA, the investigating team will need input from all plant personnel who may
have direct or indirect knowledge of the deviation, event, or problem that is being
investigated. This information input activity may be limited to interviews, either
individually or in groups; but could entail additional support gathering data, records, and
other pertinent information.

Methodology

RCA follows a logical sequence or methodology, shown in Figure 3, which is designed to


facilitate the solution of the investigated problem, deviation, or event. The following steps
are involved.

Step 1. Problem definition and data gathering.

Entails collecting information on conditions before, during, and after the occurrence;
personnel involvement (including actions taken); environmental factors; and other
information having relevance to the condition or problem. Questions to be asked include

a) What questions
- What happened?
- What are the symptoms?
- What is the complaint?
- What went wrong?
- What is the undesirable event or behavior?
b) When questions
- When did it occur: what date and what time?
- During what phase of the production process?
c) Where questions
- What plant?
- Where did it happen?
- What process?
- What production stream?
- What equipment?
d) How questions
- How was the situation before the incident?
- What happened during the incident?
- How is the situation after the incident?
- What is the normal operating condition?
- Is there any injury, shutdown, trip, or damage?
- How frequent is the problem?
- How many other processes, equipments or items affected by this
incident?
Step 2. Control barriers

Control barriers are administrative or physical aids that are made part of work conditions.
They are devices employed to protect employees or equipment and enhance the safety and
performance of the machine system. The purpose of checking control barriers in a failure
investigation process is to determine if all the control barriers pertaining to the failure under
investigation are present and effective. Examples of physical control barriers include
conservative design allowance, engineered safety features, fire barriers and seals, ground
fault protection, locked doors, valves, breaks, and controls, insulation, redundant system,
emergency shutdown system, etc., examples of administrative control barriers include
alarms, safety rules and procedures, certification of operators and engineers, methods of
communication, policies and procedures, work permits, standards, training and education,
etc.
Step 3. Event and causal factor charting

Event and causal factor charting is an analysis tool whereby events relations, conditions,
changes, barriers, and causal factors are charted on a timeline using a standard
representation as shown in Figure 3.

Figure 3: Standard symbols for factor charting


Step 4. Cause and Effect Analysis

When the entire occurrence has been charted out, the investigators are in a good position
to identify the major contributors to the problem, the causal factors. The diagram will help
to show the cause and effect relationship between factors, even if significantly removed
from each other in the system.

Step 5. Root cause identification

After identifying all causal factors, the team begins the root cause identification. This step
generally involves the use of a decision diagram or “fishbone” diagram. This diagram
structures the reasoning process of the investigators by helping them answer questions
about why a particular causal factor exists or occurred. For every event there will likely be
a number of causal factors. For each causal factor there will likely be a number of root
causes.

Step 6. Corrective actions effectiveness assessment

The final step of the process is to generate recommendations for corrective action taking
into consideration the following questions:
• What can be done to prevent the problem from happening again?
• How will the solution be implemented?
• Who will be responsible for it? and
• What are the risks of implementing the solution?

Step 7. Report generation

It is important to report and document the RCA process including a discussion of corrective
actions, management and personnel involved. Information of interest to other facilities
should also be included in the report. The study report should include

• Problem definition;

• Event and causal factors chart;

• Cause and effect analysis;


• Root cause(s) of the problem;

• Problem solution; and

• Implementation plan with clear responsibilities and follow-up.

IV) Pareto Chart

The Pareto chart is one of the seven basic tools of quality control, which include the
histogram, Pareto chart, check sheet, control chart, cause-and-effect diagram, flowchart,
and scatter diagram.

The chart is named after Vilfredo Pareto the Italian economist who noted that 80 % of the
income in Italy went to 20 % of the population. The Pareto Principle illustrates the fact that
80 % of the problems stem from 20 % of the causes.

A Pareto Chart is a bar graph made of a series of bars whose heights reflect the frequency
of problems or causes. The bars are arranged in descending order of height from left to
right. This means the factors represented by the tall bars on the left are relatively more
significant than those on the right. This helps sort out the important few from the trivial
many so that resources and efforts are focused where we can obtain maximum returns.

Homework: Read more on the steps used to construct pareto charts and fishbone diagrams.

You might also like