How to Conduct an FMEA Analysis? A Step-by-Step Guide

Calendar
Duration:
9 min read
person
calendar today
Published on
June 3, 2026
Featured Image

How to Conduct an FMEA Analysis? A Step-by-Step Guide

Failure Mode and Effects Analysis (FMEA) is a structured, proactive method for identifying potential failure modes in a system, process, or product — and evaluating their consequences before those failures actually occur. Originally developed by the U.S. military in the 1940s and later adopted by NASA and the automotive industry, FMEA has become one of the most trusted reliability engineering tools used by maintenance and engineering teams worldwide. According to a study published in the Reliability Web, organizations that apply FMEA systematically can reduce unplanned equipment failures by up to 40%.

A complete FMEA analysis involves five core activities: defining the scope, identifying potential failure modes, evaluating their severity and likelihood, calculating a Risk Priority Number (RPN), and implementing corrective actions. This guide walks you through each step with a practical example so your team can start applying FMEA immediately.

What is FMEA?

Failure Mode and Effects Analysis (FMEA) is a systematic technique that examines each component or process step, asks "how could this fail?", and assesses the impact of that failure on the broader system. The goal is to catch high-risk failure modes early — before they cause costly downtime, safety incidents, or quality defects.

FMEA is widely used in manufacturing, oil and gas, aerospace, healthcare, and facility management. It is not just a design tool. Maintenance teams use FMEA to prioritize which assets need the most attention, shape preventive maintenance schedules, and justify maintenance investments to leadership.

The International Electrotechnical Commission standard IEC 60812 defines FMEA as a method for identifying failure modes in a system and their potential effects, forming the basis for risk mitigation activities.

Types of FMEA

There are three main types of FMEA. Understanding which type applies to your situation is the first step before you start any analysis:

  • Design FMEA (DFMEA): Focused on product design. Engineers use this during the design phase to catch failure modes before the product is manufactured. Common in automotive and aerospace industries.
  • Process FMEA (PFMEA): Focused on manufacturing or service processes. It identifies where a process step could go wrong and affect product quality or safety.
  • Maintenance FMEA (also called Functional FMEA or System FMEA): Focused on operational assets. Maintenance teams use this to determine the most likely failure modes for equipment in service, prioritize maintenance activities, and develop or refine preventive maintenance plans.

For most industrial maintenance teams, Maintenance FMEA is the most relevant type. It feeds directly into your preventive maintenance software schedules and asset risk assessments.

Key Components of an FMEA Worksheet

Every FMEA is built around a structured worksheet. Before you run your first session, your team needs to understand what each column means:

  • Item / Function: The component being analyzed and what it’s supposed to do. Example: "Pump bearing — supports shaft rotation."
  • Failure Mode: How the component could fail. Example: "Bearing seizure due to lubrication loss."
  • Effect of Failure: What happens to the system or process when this failure occurs. Example: "Pump shutdown, production line stops."
  • Cause of Failure: The root cause or mechanism that leads to the failure mode. Example: "Inadequate lubrication interval, contaminated grease."
  • Severity (S): Rated 1–10. How serious is the effect if this failure occurs? A 10 means a safety-critical or catastrophic outcome with no warning.
  • Occurrence (O): Rated 1–10. How likely is this failure to occur? A 10 means failure is almost certain without controls in place.
  • Detection (D): Rated 1–10. How easily can current controls detect the failure before it affects the customer or operation? A 10 means it’s virtually impossible to detect.
  • Risk Priority Number (RPN): Severity × Occurrence × Detection. The higher the RPN, the more urgently the team needs to act.
  • Recommended Actions: What the team will do to reduce severity, lower occurrence, or improve detection.
  • Revised RPN: After implementing corrective actions, the team re-scores each factor to confirm improvement.

How to Calculate the Risk Priority Number (RPN)

The Risk Priority Number (RPN) is the central metric in any FMEA. It is calculated as:

RPN = Severity (S) × Occurrence (O) × Detection (D)

Each factor is scored on a scale of 1 to 10. The maximum RPN is 1,000 (10 × 10 × 10), representing the most critical risk scenario. In practice, most maintenance teams treat any RPN above 100–125 as requiring immediate corrective action, and anything above 200 as high priority.

An important nuance: a high Severity score alone — even with low Occurrence and Detection scores — should still trigger action. According to the SAE J1739 FMEA standard, any failure mode with a Severity score of 9 or 10 must be addressed regardless of the overall RPN, because the consequences of that failure are too serious to ignore.

Step-by-Step: How to Conduct an FMEA Analysis

Follow these eight steps to run a thorough FMEA from start to finish:

Step 1: Define the Scope and Assemble Your Team

Decide which system, asset, or process you are analyzing. Be specific — "entire facility" is too broad. "Cooling water pump circuit in Building 3" is a workable scope. Assemble a cross-functional team that includes maintenance technicians, engineers, operations supervisors, and safety personnel. FMEA works best when people with hands-on equipment knowledge participate alongside process engineers.

Step 2: List Functions and System Boundaries

For each component within your defined scope, clearly state what it is supposed to do — its intended function under normal operating conditions. This baseline helps the team think about what "failure" actually means for each part. Avoid vague statements like "works correctly." Use precise functional language: "Maintains fluid pressure at 6–8 bar to supply downstream heat exchangers."

Step 3: Identify All Potential Failure Modes

For each function, ask: "In what ways could this function fail, either partially or completely?" Failure modes are not causes — they are descriptions of how the component stops performing its function. Common failure modes include fracture, wear, corrosion, leakage, binding, overheating, and short circuit. One component can have multiple failure modes. Document all of them.

Step 4: Determine the Effects of Each Failure Mode

For each failure mode, describe what happens at the system level when that failure occurs. Consider effects at three levels: local (on the component itself), upstream (on connected equipment), and end-user or production (on output quality, safety, or uptime). This step helps the Severity score reflect the true business impact.

Step 5: Score Severity, Occurrence, and Detection

This is the most judgment-intensive step. The team scores each failure mode across all three dimensions:

  • Severity: Based on the worst realistic outcome if the failure occurs. Reference past incidents, safety data sheets, and near-miss records.
  • Occurrence: Based on historical failure frequency data, manufacturer specifications, or engineering experience. A CMMS with detailed downtime tracking history makes this step much more accurate.
  • Detection: Based on your existing inspection routines, sensor coverage, and operator awareness. If you have no current controls in place, Detection defaults to a high score (8–10).

Step 6: Calculate RPN and Prioritize

Multiply S × O × D for each failure mode. Sort the worksheet by RPN, highest to lowest. Focus your resources on the top failure modes — typically those in the top 20% of RPN scores, or any with a Severity score ≥ 9. Remember that a failure mode with RPN = 300 deserves more attention than one with RPN = 80, but both still need corrective action plans.

Step 7: Define and Assign Corrective Actions

For each high-priority failure mode, the team defines specific corrective actions. Good corrective actions target one of three levers:

  • Reduce Severity: Design changes, safety interlocks, or redundant systems that limit the damage when failure occurs.
  • Reduce Occurrence: Improved lubrication schedules, better operating procedures, or upgraded component specifications that make the failure less likely.
  • Improve Detection: Adding vibration sensors, thermal cameras, or more frequent inspection intervals so the failure is caught earlier.

Assign each action an owner and a due date. Track these as work orders in your work order management system to ensure accountability.

Step 8: Review, Validate, and Update the FMEA

After corrective actions are implemented, the team revisits the FMEA worksheet and re-scores each affected failure mode. Calculate the revised RPN to confirm that risk has been reduced. FMEA is a living document — update it whenever a new failure occurs, a design change is made, or a process modification is introduced. Schedule formal FMEA reviews annually or after any significant incident.

FMEA in Action — A Real-World Maintenance Example

Consider a centrifugal pump that supplies cooling water to a manufacturing line. The maintenance team runs an FMEA and identifies the following failure mode:

  • Component: Pump mechanical seal
  • Failure Mode: Seal leakage
  • Effect: Coolant loss leads to overheating of downstream equipment; production line shutdown
  • Cause: Seal face wear from particulate contamination in coolant
  • Severity: 8 (significant production impact, potential equipment damage)
  • Occurrence: 6 (has occurred twice in the past 18 months)
  • Detection: 7 (no inline sensors; detected only during operator rounds every 4 hours)
  • RPN: 8 × 6 × 7 = 336 — high priority

The team’s corrective actions: install a coolant filtration unit (reduces Occurrence from 6 to 3), add a vibration sensor on the pump to detect early seal wear (reduces Detection from 7 to 3). Revised RPN = 8 × 3 × 3 = 72 — a 79% reduction in risk. This kind of measurable result is exactly what FMEA delivers when executed correctly.

Common FMEA Mistakes to Avoid

Even experienced teams make these errors. Knowing them in advance saves time and improves results:

  • Defining failure modes as causes: "Inadequate lubrication" is a cause, not a failure mode. The failure mode is "bearing seizure." Keep these columns distinct.
  • Over-relying on RPN alone: A failure mode with Severity = 10 and RPN = 100 still demands urgent action. Never ignore high-severity items just because the RPN is low.
  • Running FMEA as a solo activity: FMEA accuracy depends on diverse perspectives. A session run by one engineer without technician input will miss field-level failure modes that only operators observe.
  • Treating FMEA as a one-time exercise: Equipment ages, processes change, and new failure modes emerge. An FMEA that isn’t reviewed and updated becomes misleading rather than helpful.
  • Skipping the revised RPN step: Without re-scoring after corrective actions, you cannot confirm whether risk has actually been reduced. This step closes the loop and demonstrates ROI.

How Cryotos CMMS Supports Your FMEA Process

FMEA identifies what could go wrong — but it takes a reliable maintenance management system to execute the corrective actions that prevent those failures. Cryotos CMMS gives maintenance teams the tools to turn FMEA findings into operational reality:

  • Historical failure data for accurate scoring: Cryotos tracks every breakdown, work order, and asset failure over time. When your team sits down to score Occurrence in the FMEA worksheet, you have real data rather than guesswork. The BI Dashboard makes this data accessible at the asset, department, and facility level.
  • Maintenance checklists aligned to FMEA corrective actions: Once your FMEA defines new inspection frequencies or procedures, you can build these directly into maintenance checklists within Cryotos — ensuring technicians follow the updated protocol every time.
  • Work order tracking for corrective action accountability: Every corrective action identified in an FMEA becomes a trackable work order with an assigned owner, due date, and completion status.
  • Root cause analysis integration: Cryotos includes a built-in 5 Whys root cause analysis tool. When a failure mode from your FMEA actually occurs, technicians can document the root cause investigation directly in the platform — feeding that data back into future FMEA reviews.
  • IoT sensor alerts to improve Detection scores: By connecting IoT meter reading and sensor data to Cryotos, maintenance teams can lower Detection scores — and therefore overall RPN — for critical assets, making preventive action faster and more reliable.

Frequently Asked Questions

When should FMEA be conducted?

FMEA should be conducted at the design or early operational stage of a new asset or process, after a significant equipment failure or near-miss, when introducing changes to an existing process or system, and during periodic reliability reviews — typically annually for critical assets. The earlier FMEA is applied in the asset lifecycle, the less expensive it is to address findings.

What is a good RPN score?

There is no universal "good" RPN because acceptable risk levels vary by industry and asset criticality. As a general guideline, an RPN below 50 is considered low risk, 50–100 requires monitoring, 100–200 warrants corrective action planning, and above 200 demands immediate action. Any failure mode with a Severity score of 9 or 10 should be addressed regardless of the calculated RPN.

How long does an FMEA take?

A focused FMEA for a single asset or process subsystem typically takes 4–8 hours across one or two sessions. A system-level FMEA covering an entire production line may take several days. The time investment depends on team experience, data availability, and the complexity of the system being analyzed. Teams with access to historical CMMS data can complete scoring steps significantly faster.

What is the difference between FMEA and root cause analysis?

FMEA is proactive — it identifies and prioritizes potential failure modes before failures occur. Root cause analysis (RCA) is reactive — it investigates failure modes that have already happened to determine why they occurred and how to prevent recurrence. The two tools complement each other: FMEA prioritizes where to apply preventive effort, while RCA deepens understanding when a failure actually occurs.

If your maintenance team is ready to move from reactive firefighting to proactive reliability management, FMEA is one of the most powerful tools available. Cryotos CMMS gives you the data, workflows, and visibility to execute FMEA findings at scale — from corrective work orders to predictive sensor alerts. Explore how Cryotos supports reliability-centered maintenance and start building a more resilient operation today.

Want to Try Cryotos CMMS Today?

Get Free Demo

Let AI Take Control of Your Maintenance

Cryotos AI predicts failures, automates work orders, and simplifies maintenance—before problems slow you down.

Try AI-Powered CMMS
🡢