How Are FMEA and RCA Connected? A Practical Guide for Maintenance Teams

Calendar
Duration:
8 min read
calendar today
Published on
June 1, 2026
Featured Image

FMEA (Failure Mode and Effects Analysis) and RCA (Root Cause Analysis) are two of the most powerful tools in a maintenance team's reliability toolkit — and they're more closely connected than most people realize. FMEA is a proactive method that predicts failures before they happen, while Root Cause Analysis is a reactive investigation that digs into why a failure already occurred. Together, they form a continuous improvement loop: FMEA helps you prevent failures, and RCA helps you learn from the ones that slip through. According to a Plant Engineering report, unplanned downtime costs industrial manufacturers an average of $250,000 per hour. Using both FMEA and RCA together is one of the most effective ways to reduce that number. This guide explains what each method does, how they differ, and how to use them as a connected system.

What Is FMEA?

The 5 core steps of FMEA: Define Scope, List Failure Modes, Assess Effects, Score RPN, Define Actions | Cryotos

Failure Mode and Effects Analysis (FMEA) is a structured, proactive risk assessment method. Maintenance and reliability engineers use it to identify every possible way an asset or process could fail, assess the consequences of each failure, and prioritize corrective actions before anything breaks down. The goal is to catch high-risk failure modes on paper — before they become unplanned downtime events on the plant floor.

FMEA was first developed by the U.S. military in the 1940s and has since become a standard practice in industries ranging from automotive to aerospace to manufacturing. The ISO 31000 risk management framework and the SAE J1739 standard both provide formal guidance on how to conduct FMEA correctly.

The 5 Core Steps of FMEA

  • Define the scope: Identify the system, asset, or process you are analyzing. Be specific — a centrifugal pump, a conveyor belt system, or a specific production line segment.
  • List potential failure modes: For each component, ask: "In what ways could this part fail?" Examples include bearing seizure, seal leak, motor overheating, or blade erosion.
  • Assess effects and causes: For each failure mode, document what would happen if it occurred and what conditions or root causes could trigger it.
  • Score the Risk Priority Number (RPN): Multiply Severity (S) × Occurrence (O) × Detection (D) scores on a 1–10 scale. Higher RPN scores demand immediate attention.
  • Define and assign actions: For high-RPN failure modes, assign preventive maintenance tasks, design changes, or inspection intervals to reduce risk.

Risk Priority Number (RPN) Explained

The Risk Priority Number is the numerical backbone of FMEA. It combines three scores — each rated 1 to 10 — to produce a priority ranking for every failure mode on your list.

  • Severity (S): How serious is the impact if this failure occurs? A score of 1 means negligible; 10 means a safety hazard or complete system failure.
  • Occurrence (O): How often is this failure mode likely to happen? A score of 1 means extremely rare; 10 means near-certain under normal operating conditions.
  • Detection (D): How easy is it to detect this failure before it causes harm? A score of 1 means it's almost always caught early; 10 means it's virtually undetectable.

An RPN above 100 is typically flagged as high-priority. Teams then focus their preventive efforts on the top-ranked failure modes first, making the best use of limited maintenance resources.

What Is Root Cause Analysis (RCA)?

Root Cause Analysis is a structured investigation method used after a failure, incident, or near-miss. The goal is not just to fix the symptom — it's to find and eliminate the underlying cause so the same problem doesn't come back. Done properly, RCA turns every failure into a learning event that makes the system more reliable over time.

Where FMEA asks "what could go wrong?", RCA asks "what actually went wrong, and why?" The two questions are complementary, not competing.

Common RCA Methods Used in Maintenance

  • 5 Whys: Ask "why?" five times in succession to peel back the layers of a problem. Best for straightforward failures with a clear cause chain. Simple, fast, and requires no special tools.
  • Maps out all possible categories of causes — Man, Machine, Method, Material, Measurement, Environment — in a visual format. Effective for complex failures with multiple contributing factors.
  • Fault Tree Analysis (FTA): A top-down, logic-based method that traces all paths that could lead to a specific failure event. Used widely in high-consequence industries like oil and gas, aviation, and nuclear power.
  • FRACAS (Failure Reporting, Analysis and Corrective Action System): A formal reporting loop where failures are logged, analyzed, and acted upon systematically across an organization.

When Should You Trigger an RCA?

Not every failure warrants a full RCA. Reserve formal investigations for events that meet one or more of these criteria: the failure caused or nearly caused a safety incident, the downtime cost exceeded a defined threshold (e.g., more than four hours of lost production), the same failure has recurred two or more times in a 90-day window, or the failure damaged a critical asset with long lead times for replacement parts. For minor failures, a quick 5 Whys is usually enough. For major or repeat failures, a structured method like Fishbone or FTA gives better results.

How FMEA and RCA Differ — and Why That Matters

Understanding the distinction between these two methods helps you know which one to reach for at any given point in the maintenance cycle. The key difference is timing: FMEA works before a failure; RCA works after one.

Criterion FMEA RCA
Purpose predict and prevent failures before they occur investigate and eliminate the root cause of a failure that already happened
Timing proactive, conducted during design, commissioning, or PM planning reactive, triggered after a failure, incident, or near-miss
Input engineering knowledge, historical data, and risk judgment failure evidence, maintenance logs, sensor data, and witness accounts
Output ranked list of failure modes with prevention tasks documented root cause with corrective and preventive actions
Frequency periodic review (often annually or after major changes) event-driven, triggered by each qualifying failure
Team involved reliability engineers, design engineers, operations maintenance technicians, engineers, supervisors, and safety team

    Neither tool is superior to the other. They serve different phases of the reliability cycle. A maintenance program that uses only FMEA has a prevention strategy but no learning mechanism. One that uses only RCA is always reactive with no structured way to anticipate risk.

    How FMEA and RCA Work Together

    FMEA and RCA feedback loop: Prevention, Failure, RCA Investigation, Feed Findings Back, Updated FMEA | Cryotos

    The real power of these two methods comes from using them as a connected system — not as isolated exercises. Here is how that connection works in practice.

    FMEA Before the Failure (Prevention)

    During asset commissioning or annual maintenance planning, your team runs a full FMEA on critical equipment. You identify a bearing failure mode on a centrifugal pump, score it a high RPN of 140 (Severity 7 × Occurrence 5 × Detection 4), and assign a quarterly vibration analysis check plus a biannual bearing replacement. The FMEA has just created a structured preventive maintenance plan based on risk — not guesswork.

    RCA After the Failure (Investigation)

    Despite the PM schedule, the pump's bearing fails unexpectedly. Your team triggers an RCA. Using the 5 Whys method, they trace the failure chain: the bearing failed → because it overheated → because the lubrication interval was too long for the actual operating conditions → because the FMEA used design-spec data rather than real-world runtime hours. The root cause was a gap between assumed and actual operating conditions — something the original FMEA did not capture.

    Closing the Loop: Feeding RCA Findings Back Into FMEA

    This is the step most teams skip — and it's the most important one. After completing the RCA, the findings should directly update the existing FMEA. In the pump example, the Occurrence score for the bearing failure mode should increase (because real-world data shows it fails more often than assumed), and the maintenance action should change (shorter lubrication interval based on actual runtime). This FMEA update also reduces the RPN going forward — because the corrective action improves detection and occurrence scores. The result is a living FMEA document that gets smarter with every failure your team investigates. According to Reliable Plant, organizations that systematically feed RCA findings back into their FMEA documents reduce repeat failures by up to 40%.

    A Real-World Example: Pump Failure in a Manufacturing Plant

    Here is a complete walkthrough of how FMEA and RCA work as a connected system in a real maintenance scenario.

    Step 1 — Initial FMEA (before operations begin): A reliability engineer runs a process FMEA on a cooling water pump. They identify 12 potential failure modes. The top-ranked one is mechanical seal failure (RPN = 168). They assign monthly visual inspections and a 6-month seal replacement PM.

    Step 2 — Failure occurs (Month 4): The mechanical seal fails ahead of schedule. Cooling water leaks, causing a 6-hour production shutdown costing $87,000 in lost output.

    Step 3 — RCA is triggered: The team uses a Fishbone diagram. Contributing causes identified: abnormal vibration levels not flagged in the previous inspection, a process fluid pH outside the seal's rated operating range, and an inspection checklist that did not include pH monitoring.

    Step 4 — FMEA is updated: The seal failure mode is revised. Occurrence score rises from 4 to 7 (real-world data shows higher frequency). Detection score drops from 6 to 3 (because the new checklist now includes pH and vibration thresholds). The updated RPN is 147 — still high, but now backed by accurate data and a stronger prevention action.

    Step 5 — System improves: Eighteen months later, the plant has had zero seal failures. The maintenance cost for that pump has dropped by 34% because the PM tasks now match actual risk — not generic manufacturer recommendations.

    FMEA + RCA in a CMMS: How Software Supports Both

    How CMMS supports FMEA and RCA: Failure Logging, Built-in RCA, PM Schedule Updates, BI Dashboard Trends | Cryotos

    Running FMEA and RCA manually — through spreadsheets and shared drives — creates version control problems, incomplete records, and no automatic link between a failure event and the FMEA it should update. A maintenance management software system closes those gaps by connecting failure data, work orders, and asset history in one place.

    With Cryotos CMMS, your team can log every failure event directly against the affected asset, with photo attachments, technician notes, and timestamps. The built-in 5 Whys RCA tool guides investigators through the root cause process and stores the findings in the asset's history. When the RCA is complete, maintenance planners can immediately update the asset's PM schedules — adjusting intervals, checklists, or task types — based on what the RCA uncovered.

    Cryotos's BI Dashboard tracks repeat failures by asset, failure mode, and location — giving reliability teams the trend data they need to know when an FMEA review is overdue. Teams using this approach have reported a 30% reduction in unplanned downtime and 25% faster repair times, because they are fixing the right things in the right sequence. You can also use the Root Cause Analysis Investigation Checklist to standardize how your team conducts investigations across shifts and departments.

    Frequently Asked Questions

    Does FMEA replace RCA?

    No. FMEA and RCA serve different purposes and different phases of the reliability cycle. FMEA is a proactive risk assessment done before failures occur. RCA is a reactive investigation done after a failure happens. You need both. FMEA helps you prevent the most critical failures; RCA helps you learn from the ones that still occur so your FMEA gets more accurate over time.

    Which comes first — FMEA or RCA?

    FMEA always comes first chronologically — it is conducted during asset design, commissioning, or maintenance planning, before any failure has occurred. RCA comes after a specific failure event. However, in practice, the two tools work in a cycle: FMEA → operations → failure → RCA → updated FMEA → improved operations. There is no fixed endpoint; the cycle repeats throughout the asset's life.

    Can FMEA and RCA be done at the same time?

    Not in the same session, but they can run in parallel on different failure modes. For example, your reliability team might be conducting an FMEA on a newly commissioned compressor while your maintenance team is simultaneously running an RCA on a separate pump failure. The outputs from the RCA might even inform how the FMEA team scores similar failure modes on the compressor. The key is to have a clear process for sharing findings between both activities.

    What tools are used in RCA?

    The most common RCA tools in maintenance are the 5 Whys, the Fishbone (Ishikawa) Diagram, Fault Tree Analysis, and FRACAS. The right tool depends on the complexity of the failure. Simple, linear failures are best handled with 5 Whys. Complex failures with multiple contributing factors need a Fishbone or FTA. FRACAS is best for organizations that want a formal, documented system for tracking all failures and corrective actions over time.

    How do you update an FMEA after an RCA?

    After completing an RCA, review the FMEA document for the affected asset. Find the failure mode that matches the failure you investigated. Update the Occurrence score based on real-world failure frequency data from the RCA. Update the Detection score if new inspection steps were added. Revise the recommended action column to reflect the corrective actions the RCA produced. Recalculate the RPN and re-rank all failure modes. Document the update date and the RCA reference number so future reviewers can trace the change.

    If your maintenance team is ready to connect FMEA and RCA into a single, data-driven reliability system, Cryotos CMMS gives you the tools to do it — from structured work order management and built-in RCA workflows to asset-level failure tracking and PM scheduling. Book a demo to see how leading maintenance teams use Cryotos to turn every failure into a smarter maintenance plan.

    Want to Try Cryotos CMMS Today?

    Get Free Demo

    Let AI Take Control of Your Maintenance

    Cryotos AI predicts failures, automates work orders, and simplifies maintenance—before problems slow you down.

    Try AI-Powered CMMS
    🡢