
FMEA (Failure Mode and Effects Analysis) and RCA (Root Cause Analysis) are two of the most powerful tools in a maintenance team's reliability toolkit — and they're more closely connected than most people realize. FMEA is a proactive method that predicts failures before they happen, while Root Cause Analysis is a reactive investigation that digs into why a failure already occurred. Together, they form a continuous improvement loop: FMEA helps you prevent failures, and RCA helps you learn from the ones that slip through. According to a Plant Engineering report, unplanned downtime costs industrial manufacturers an average of $250,000 per hour. Using both FMEA and RCA together is one of the most effective ways to reduce that number. This guide explains what each method does, how they differ, and how to use them as a connected system.

Failure Mode and Effects Analysis (FMEA) is a structured, proactive risk assessment method. Maintenance and reliability engineers use it to identify every possible way an asset or process could fail, assess the consequences of each failure, and prioritize corrective actions before anything breaks down. The goal is to catch high-risk failure modes on paper — before they become unplanned downtime events on the plant floor.
FMEA was first developed by the U.S. military in the 1940s and has since become a standard practice in industries ranging from automotive to aerospace to manufacturing. The ISO 31000 risk management framework and the SAE J1739 standard both provide formal guidance on how to conduct FMEA correctly.
The Risk Priority Number is the numerical backbone of FMEA. It combines three scores — each rated 1 to 10 — to produce a priority ranking for every failure mode on your list.
An RPN above 100 is typically flagged as high-priority. Teams then focus their preventive efforts on the top-ranked failure modes first, making the best use of limited maintenance resources.
Root Cause Analysis is a structured investigation method used after a failure, incident, or near-miss. The goal is not just to fix the symptom — it's to find and eliminate the underlying cause so the same problem doesn't come back. Done properly, RCA turns every failure into a learning event that makes the system more reliable over time.
Where FMEA asks "what could go wrong?", RCA asks "what actually went wrong, and why?" The two questions are complementary, not competing.
Not every failure warrants a full RCA. Reserve formal investigations for events that meet one or more of these criteria: the failure caused or nearly caused a safety incident, the downtime cost exceeded a defined threshold (e.g., more than four hours of lost production), the same failure has recurred two or more times in a 90-day window, or the failure damaged a critical asset with long lead times for replacement parts. For minor failures, a quick 5 Whys is usually enough. For major or repeat failures, a structured method like Fishbone or FTA gives better results.
Understanding the distinction between these two methods helps you know which one to reach for at any given point in the maintenance cycle. The key difference is timing: FMEA works before a failure; RCA works after one.
| Criterion | FMEA | RCA |
|---|---|---|
| Purpose | predict and prevent failures before they occur | investigate and eliminate the root cause of a failure that already happened |
| Timing | proactive, conducted during design, commissioning, or PM planning | reactive, triggered after a failure, incident, or near-miss |
| Input | engineering knowledge, historical data, and risk judgment | failure evidence, maintenance logs, sensor data, and witness accounts |
| Output | ranked list of failure modes with prevention tasks | documented root cause with corrective and preventive actions |
| Frequency | periodic review (often annually or after major changes) | event-driven, triggered by each qualifying failure |
| Team involved | reliability engineers, design engineers, operations | maintenance technicians, engineers, supervisors, and safety team |
Neither tool is superior to the other. They serve different phases of the reliability cycle. A maintenance program that uses only FMEA has a prevention strategy but no learning mechanism. One that uses only RCA is always reactive with no structured way to anticipate risk.

The real power of these two methods comes from using them as a connected system — not as isolated exercises. Here is how that connection works in practice.
During asset commissioning or annual maintenance planning, your team runs a full FMEA on critical equipment. You identify a bearing failure mode on a centrifugal pump, score it a high RPN of 140 (Severity 7 × Occurrence 5 × Detection 4), and assign a quarterly vibration analysis check plus a biannual bearing replacement. The FMEA has just created a structured preventive maintenance plan based on risk — not guesswork.
Despite the PM schedule, the pump's bearing fails unexpectedly. Your team triggers an RCA. Using the 5 Whys method, they trace the failure chain: the bearing failed → because it overheated → because the lubrication interval was too long for the actual operating conditions → because the FMEA used design-spec data rather than real-world runtime hours. The root cause was a gap between assumed and actual operating conditions — something the original FMEA did not capture.
This is the step most teams skip — and it's the most important one. After completing the RCA, the findings should directly update the existing FMEA. In the pump example, the Occurrence score for the bearing failure mode should increase (because real-world data shows it fails more often than assumed), and the maintenance action should change (shorter lubrication interval based on actual runtime). This FMEA update also reduces the RPN going forward — because the corrective action improves detection and occurrence scores. The result is a living FMEA document that gets smarter with every failure your team investigates. According to Reliable Plant, organizations that systematically feed RCA findings back into their FMEA documents reduce repeat failures by up to 40%.
Here is a complete walkthrough of how FMEA and RCA work as a connected system in a real maintenance scenario.
Step 1 — Initial FMEA (before operations begin): A reliability engineer runs a process FMEA on a cooling water pump. They identify 12 potential failure modes. The top-ranked one is mechanical seal failure (RPN = 168). They assign monthly visual inspections and a 6-month seal replacement PM.
Step 2 — Failure occurs (Month 4): The mechanical seal fails ahead of schedule. Cooling water leaks, causing a 6-hour production shutdown costing $87,000 in lost output.
Step 3 — RCA is triggered: The team uses a Fishbone diagram. Contributing causes identified: abnormal vibration levels not flagged in the previous inspection, a process fluid pH outside the seal's rated operating range, and an inspection checklist that did not include pH monitoring.
Step 4 — FMEA is updated: The seal failure mode is revised. Occurrence score rises from 4 to 7 (real-world data shows higher frequency). Detection score drops from 6 to 3 (because the new checklist now includes pH and vibration thresholds). The updated RPN is 147 — still high, but now backed by accurate data and a stronger prevention action.
Step 5 — System improves: Eighteen months later, the plant has had zero seal failures. The maintenance cost for that pump has dropped by 34% because the PM tasks now match actual risk — not generic manufacturer recommendations.

Running FMEA and RCA manually — through spreadsheets and shared drives — creates version control problems, incomplete records, and no automatic link between a failure event and the FMEA it should update. A maintenance management software system closes those gaps by connecting failure data, work orders, and asset history in one place.
With Cryotos CMMS, your team can log every failure event directly against the affected asset, with photo attachments, technician notes, and timestamps. The built-in 5 Whys RCA tool guides investigators through the root cause process and stores the findings in the asset's history. When the RCA is complete, maintenance planners can immediately update the asset's PM schedules — adjusting intervals, checklists, or task types — based on what the RCA uncovered.
Cryotos's BI Dashboard tracks repeat failures by asset, failure mode, and location — giving reliability teams the trend data they need to know when an FMEA review is overdue. Teams using this approach have reported a 30% reduction in unplanned downtime and 25% faster repair times, because they are fixing the right things in the right sequence. You can also use the Root Cause Analysis Investigation Checklist to standardize how your team conducts investigations across shifts and departments.
No. FMEA and RCA serve different purposes and different phases of the reliability cycle. FMEA is a proactive risk assessment done before failures occur. RCA is a reactive investigation done after a failure happens. You need both. FMEA helps you prevent the most critical failures; RCA helps you learn from the ones that still occur so your FMEA gets more accurate over time.
FMEA always comes first chronologically — it is conducted during asset design, commissioning, or maintenance planning, before any failure has occurred. RCA comes after a specific failure event. However, in practice, the two tools work in a cycle: FMEA → operations → failure → RCA → updated FMEA → improved operations. There is no fixed endpoint; the cycle repeats throughout the asset's life.
Not in the same session, but they can run in parallel on different failure modes. For example, your reliability team might be conducting an FMEA on a newly commissioned compressor while your maintenance team is simultaneously running an RCA on a separate pump failure. The outputs from the RCA might even inform how the FMEA team scores similar failure modes on the compressor. The key is to have a clear process for sharing findings between both activities.
The most common RCA tools in maintenance are the 5 Whys, the Fishbone (Ishikawa) Diagram, Fault Tree Analysis, and FRACAS. The right tool depends on the complexity of the failure. Simple, linear failures are best handled with 5 Whys. Complex failures with multiple contributing factors need a Fishbone or FTA. FRACAS is best for organizations that want a formal, documented system for tracking all failures and corrective actions over time.
After completing an RCA, review the FMEA document for the affected asset. Find the failure mode that matches the failure you investigated. Update the Occurrence score based on real-world failure frequency data from the RCA. Update the Detection score if new inspection steps were added. Revise the recommended action column to reflect the corrective actions the RCA produced. Recalculate the RPN and re-rank all failure modes. Document the update date and the RCA reference number so future reviewers can trace the change.
If your maintenance team is ready to connect FMEA and RCA into a single, data-driven reliability system, Cryotos CMMS gives you the tools to do it — from structured work order management and built-in RCA workflows to asset-level failure tracking and PM scheduling. Book a demo to see how leading maintenance teams use Cryotos to turn every failure into a smarter maintenance plan.
Cryotos AI predicts failures, automates work orders, and simplifies maintenance—before problems slow you down.

