Failure Mode and Effects Analysis (FMEA) is a structured, proactive method for identifying potential failure modes in a system, process, or product — and evaluating their consequences before those failures actually occur. Originally developed by the U.S. military in the 1940s and later adopted by NASA and the automotive industry, FMEA has become one of the most trusted reliability engineering tools used by maintenance and engineering teams worldwide. According to a study published in the Reliability Web, organizations that apply FMEA systematically can reduce unplanned equipment failures by up to 40%.
A complete FMEA analysis involves five core activities: defining the scope, identifying potential failure modes, evaluating their severity and likelihood, calculating a Risk Priority Number (RPN), and implementing corrective actions. This guide walks you through each step with a practical example so your team can start applying FMEA immediately.
Failure Mode and Effects Analysis (FMEA) is a systematic technique that examines each component or process step, asks "how could this fail?", and assesses the impact of that failure on the broader system. The goal is to catch high-risk failure modes early — before they cause costly downtime, safety incidents, or quality defects.
FMEA is widely used in manufacturing, oil and gas, aerospace, healthcare, and facility management. It is not just a design tool. Maintenance teams use FMEA to prioritize which assets need the most attention, shape preventive maintenance schedules, and justify maintenance investments to leadership.
The International Electrotechnical Commission standard IEC 60812 defines FMEA as a method for identifying failure modes in a system and their potential effects, forming the basis for risk mitigation activities.
There are three main types of FMEA. Understanding which type applies to your situation is the first step before you start any analysis:
For most industrial maintenance teams, Maintenance FMEA is the most relevant type. It feeds directly into your preventive maintenance software schedules and asset risk assessments.
Every FMEA is built around a structured worksheet. Before you run your first session, your team needs to understand what each column means:
The Risk Priority Number (RPN) is the central metric in any FMEA. It is calculated as:
RPN = Severity (S) × Occurrence (O) × Detection (D)
Each factor is scored on a scale of 1 to 10. The maximum RPN is 1,000 (10 × 10 × 10), representing the most critical risk scenario. In practice, most maintenance teams treat any RPN above 100–125 as requiring immediate corrective action, and anything above 200 as high priority.
An important nuance: a high Severity score alone — even with low Occurrence and Detection scores — should still trigger action. According to the SAE J1739 FMEA standard, any failure mode with a Severity score of 9 or 10 must be addressed regardless of the overall RPN, because the consequences of that failure are too serious to ignore.
Follow these eight steps to run a thorough FMEA from start to finish:
Decide which system, asset, or process you are analyzing. Be specific — "entire facility" is too broad. "Cooling water pump circuit in Building 3" is a workable scope. Assemble a cross-functional team that includes maintenance technicians, engineers, operations supervisors, and safety personnel. FMEA works best when people with hands-on equipment knowledge participate alongside process engineers.
For each component within your defined scope, clearly state what it is supposed to do — its intended function under normal operating conditions. This baseline helps the team think about what "failure" actually means for each part. Avoid vague statements like "works correctly." Use precise functional language: "Maintains fluid pressure at 6–8 bar to supply downstream heat exchangers."
For each function, ask: "In what ways could this function fail, either partially or completely?" Failure modes are not causes — they are descriptions of how the component stops performing its function. Common failure modes include fracture, wear, corrosion, leakage, binding, overheating, and short circuit. One component can have multiple failure modes. Document all of them.
For each failure mode, describe what happens at the system level when that failure occurs. Consider effects at three levels: local (on the component itself), upstream (on connected equipment), and end-user or production (on output quality, safety, or uptime). This step helps the Severity score reflect the true business impact.
This is the most judgment-intensive step. The team scores each failure mode across all three dimensions:
Multiply S × O × D for each failure mode. Sort the worksheet by RPN, highest to lowest. Focus your resources on the top failure modes — typically those in the top 20% of RPN scores, or any with a Severity score ≥ 9. Remember that a failure mode with RPN = 300 deserves more attention than one with RPN = 80, but both still need corrective action plans.
For each high-priority failure mode, the team defines specific corrective actions. Good corrective actions target one of three levers:
Assign each action an owner and a due date. Track these as work orders in your work order management system to ensure accountability.
After corrective actions are implemented, the team revisits the FMEA worksheet and re-scores each affected failure mode. Calculate the revised RPN to confirm that risk has been reduced. FMEA is a living document — update it whenever a new failure occurs, a design change is made, or a process modification is introduced. Schedule formal FMEA reviews annually or after any significant incident.
Consider a centrifugal pump that supplies cooling water to a manufacturing line. The maintenance team runs an FMEA and identifies the following failure mode:
The team’s corrective actions: install a coolant filtration unit (reduces Occurrence from 6 to 3), add a vibration sensor on the pump to detect early seal wear (reduces Detection from 7 to 3). Revised RPN = 8 × 3 × 3 = 72 — a 79% reduction in risk. This kind of measurable result is exactly what FMEA delivers when executed correctly.
Even experienced teams make these errors. Knowing them in advance saves time and improves results:
FMEA identifies what could go wrong — but it takes a reliable maintenance management system to execute the corrective actions that prevent those failures. Cryotos CMMS gives maintenance teams the tools to turn FMEA findings into operational reality:
FMEA should be conducted at the design or early operational stage of a new asset or process, after a significant equipment failure or near-miss, when introducing changes to an existing process or system, and during periodic reliability reviews — typically annually for critical assets. The earlier FMEA is applied in the asset lifecycle, the less expensive it is to address findings.
There is no universal "good" RPN because acceptable risk levels vary by industry and asset criticality. As a general guideline, an RPN below 50 is considered low risk, 50–100 requires monitoring, 100–200 warrants corrective action planning, and above 200 demands immediate action. Any failure mode with a Severity score of 9 or 10 should be addressed regardless of the calculated RPN.
A focused FMEA for a single asset or process subsystem typically takes 4–8 hours across one or two sessions. A system-level FMEA covering an entire production line may take several days. The time investment depends on team experience, data availability, and the complexity of the system being analyzed. Teams with access to historical CMMS data can complete scoring steps significantly faster.
FMEA is proactive — it identifies and prioritizes potential failure modes before failures occur. Root cause analysis (RCA) is reactive — it investigates failure modes that have already happened to determine why they occurred and how to prevent recurrence. The two tools complement each other: FMEA prioritizes where to apply preventive effort, while RCA deepens understanding when a failure actually occurs.
If your maintenance team is ready to move from reactive firefighting to proactive reliability management, FMEA is one of the most powerful tools available. Cryotos CMMS gives you the data, workflows, and visibility to execute FMEA findings at scale — from corrective work orders to predictive sensor alerts. Explore how Cryotos supports reliability-centered maintenance and start building a more resilient operation today.
Cryotos AI predicts failures, automates work orders, and simplifies maintenance—before problems slow you down.

