Risk Assessment and FMEA-Based Analysis

Risk Assessment is a structured evaluation of potential failure modes associated with GMP-relevant systems, equipment, utilities, and processes that may impact product quality, patient safety, data integrity, or regulatory compliance. Within a validation lifecycle, risk assessment is used to define validation scope, testing depth, and control strategy, ensuring that validation effort is proportionate to system criticality and intended use.

In this article, Risk Assessment refers primarily to an FMEA-based analysis. Supporting tools such as flow diagrams, checklists, or Fault Tree Analysis may be used to inform the assessment, but they do not replace FMEA as the primary risk evaluation method.

A risk assessment is recommended for all validation activities. Risk assessments previously performed for existing equipment may be applied to functionally equivalent new equipment when equivalency is justified and documented

Failure Mode and Events Analysis Process Flow

Purpose of Risk Assessment in Validation

The objective of risk assessment is to understand and manage risk, not to eliminate it entirely. By identifying credible failure modes and evaluating their potential impact, organizations can focus validation activities on what truly matters to product quality and patient safety.

Risk assessment supports:

Selection of full versus lean validation approaches
Identification of critical components, parameters, and controls
Definition of qualification and validation requirements
Integration of risk controls into lifecycle management

When Risk Analysis Is Required

A formal risk analysis shall be performed when:

A lean validation approach is planned
The quality or business impact of a new system or utility is unknown
Changes are introduced under change control and the impact to quality, compliance, or business continuity is unclear

In these cases, documented risk analysis provides the justification for validation strategy and downstream controls.

FMEA as the Primary Risk Analysis Method

Failure Mode and Effects Analysis (FMEA) is the primary method used to evaluate risk during validation. FMEA systematically identifies potential failure modes, their causes, and their effects on system performance and product quality. It relies on sound understanding of system design, intended use, operating conditions, and historical performance.

Once failure modes are identified, risk reduction measures may be applied to eliminate, reduce, contain, or control risk through engineering controls, procedural controls, monitoring, maintenance, training, or validation activities.

Risk Assessment Methodology

The Risk Assessment identifies, analyzes, and evaluates parameters that are critical to GMP compliance and system performance.

The assessment typically begins with system requirements derived from the User Requirements Specification (URS), when available. Each GMP-critical requirement is evaluated, and one or more risk scenarios are defined.

The Risk Assessment team is responsible for:

Identifying potential failure modes
Assigning Severity (S), Probability (P), and Detectability (D) scores
Applying the defined scoring model consistently
Documenting rationale and assumptions

Risk Scoring Model

Severity (S)

Score	Classification	Description
1	Negligible	Failure causes no impact on product quality, no interruption to manufacturing, no compliance risk. E.g., non-critical display light failure.
2	Minor	Failure affects a non-GMP utility or secondary function. No direct product or data impact, minimal downtime (e.g., minor HVAC fluctuation outside manufacturing area).
3	Moderate	Failure impacts a non-critical step but could lead to deviation or rework. Possible short production delay (e.g., buffer tank temperature drift detected and corrected before batch impact).
4	Major	Failure affects a GMP-critical utility (WFI loop, clean steam) or key equipment control, likely to lead to batch rejection or process deviation requiring investigation.
5	Critical	Failure results in confirmed product contamination, sterility breach, or major data integrity issue. Could trigger regulatory action or product recall.

The potential impact of the failure on product quality, patient safety, or regulatory compliance.

Probability of Occurrence (P)

Score	Classification	Description
1	Remote	Failure has never been observed; robust preventive maintenance (PM) and monitoring in place (e.g., validated UPS for control systems).
2	Unlikely	Failure could occur due to unusual conditions; historical data shows rare events (e.g., filter housing gasket failure once in 5 years).
3	Possible	Occasional failure modes documented; some known wear points (e.g., autoclave door seal replacement required once or twice a year).
4	Likely	Frequent operational issues; dependent on manual intervention or aging components (e.g., recurring chiller trips, known PLC faults).
5	Frequent	High likelihood of recurring failure unless mitigated (e.g., known history of valve sticking or flow meter sensor drift impacting batches).

The likelihood that a specific failure mode will occur under normal operating conditions.

Detectability (D)

Score	Classification	Description
1	High Detectability	Automatic alarms, interlocks, or monitoring systems reliably catch failures (e.g., System temperature deviation alarms).
2	Good	Single automated or manual control that typically detects failure (e.g., in-process pH checks).
3	Moderate	Failure might go unnoticed until later QA checks or trending (e.g., pressure drop across filters reviewed post-run).
4	Low	Detection is only possible via operator observation or delayed test results (e.g., microbial excursions in WFI).
5	Undetectable	No reliable detection until after product release or significant impact (e.g., hidden PLC logic error with no alarms).

The likelihood that a failure or its effect will be detected before adverse impact.

Risk Priority Number (RPN)

After assigning individual scores for Severity (S), Probability (P), and Detectability (D), the Risk Priority Number (RPN) is calculated as: RPN = S × P × D

The RPN is not an absolute measure of risk, but a prioritization tool used to focus resources on higher-priority failure modes.

RPN Priority Categories

Priority	PRN Range	Actions
Low Risk	1-19	Acceptable; monitor via PM and trending.
Medium Risk	20-39	Mitigation or additional control required before qualification.
High Risk	40-125	Mitigation mandatory; requires CAPA, design change, or enhanced validation approach.

Risk Classification and Mitigation

Failure modes with potential adverse impact on product quality or patient safety are classified as Critical or Direct Impact. Failure modes with no such impact are classified as Non-Critical or No Impact.

For all Critical or Direct Impact risks, mitigation measures shall be defined to reduce probability, severity, or improve detectability. These measures must be necessary, appropriate, and integrated into qualification, validation, and operational controls.

Documentation

All identified failure modes, risk scores, mitigation measures, and supporting justifications shall be documented as part of the formal Risk Assessment record. Risk Assessments are subject to review and approval by the System Owner, appropriate Subject Matter Experts, and Quality Assurance to ensure accuracy, consistency, and regulatory compliance.

Risk assessments previously executed for existing equipment or systems may be applied to functionally equivalent systems, provided that equivalency is appropriately evaluated, justified, and documented.

The table below is provided as an example to demonstrate how Failure Mode and Effects Analysis (FMEA) outputs may be documented and evaluated within a validation risk assessment. The example demonstrates the application of Severity, Probability, and Detectability scoring, the calculation of the Risk Priority Number (RPN), and the reassessment of residual risk following the implementation of mitigation measures.

This example does not represent a complete or prescriptive risk assessment and does not establish acceptance criteria for any specific system or process. Actual risk assessments shall be performed based on system-specific requirements, intended use, operating conditions, historical performance, and Quality oversight.