RAMS: Integrating System Safety

Introduction

RAMS is the acronym for Reliability, Availability, Maintainability, and Safety, which are important quality attributes of all technical systems.

RAM1

Characteristics of RAMS

RAMS is characterised by both qualitative and quantitative indicators, depending on the extent of information available in a system or sub-system.

Qualitative indicators are used when system safety tasks such as assigning mishap risks to a hazard is being carried out. They focus on considering and analysing relevant processes and possible root causes which could result in an identified hazard. After all probable root causes have been validated, these causes will be prioritised. Necessary design solutions will be identified to minimise such hazard occurrences.

Quantitative indicators are used when performing system RAM calculations and prediction. They concentrate on determining the Reliability, Availability and Maintainability of a system, based on metrics such as Mean Time before Failure (MTBF), Mean Time to Repair (MTTR), Maintenance Ration (MR) etc.

The relation between system safety and system RAM

Performances for system safety shall be considered in conjunction with that of system RAM during the process of specifying design targets. This association is illustrated through the use of Failure Mode Effects, Criticality Analysis (FMECA) data, representing a key input for either analysis of reliability or maintenance requirements, as well as hazard and risk analysis processes.

As illustrated in figure 1, the FMECA involves the systematic identification of potential failure modes for each component of a system or subsystem. Each failure type is then evaluated for any safety implications affecting or arising from aborting missions and effects on the environment. Failures that expose people to an unacceptable level of risks, create an undesirable impact on system assets or the environment, should be eliminated or mitigated to reduce occurrence of failures to minimally acceptable levels.

Typically, two types of failures will occur:
(A) System safety failure: If the functional failure has an adverse effect on people’s health or monetary impact on the system, the effects are categorised as a safety hazard. Consequences of such failures include damage of property and equipment, injury to operators or other personnel within proximity, and in extreme cases, death.

To keep the occurrences of such failures at minimal numbers, if not eliminated, there is a need for corrective actions taking reference from safety standards and guidelines to be taken. Based on local context, a safety standard commonly adopted by the Singapore Armed Forces is the MIL-STD-882.

The MIL-STD-882 is a system engineering approach to eliminate hazards, where possible, and minimise risks in the event those hazards cannot be eliminated. The hazard and risk analysis process, generally consists of three main stages (Figure 2):

  • Identify any possible hazards
  • Provide solutions or means to mitigate the hazards
  • Verify that the design solutions or means is effective

(B) Operational Failure: The failure is categorised as an operational failure when it prevents the end system from completing a mission, but does not have adverse effects on safety. For many end systems, the cost required to repair or replace an operational failure also results in loss of revenue. Therefore, to reduce possibilities of such failure, the reliability and availability of a system must be calculated to ensure that the defined operational profile of the system required in a mission is fulfilled and validated through data whenever available.

Corrective actions may be needed in the event the predicted reliability falls short of the required level of performance or when the FMECA/hazard analysis reveals the existence of critical failure modes.

Potential sources of improvement include:

  • Replacing the item initially selected as part of the design with an item that has better reliability characteristics
  • Changing the design to reduce reliance on items with relatively low levels of reliability
  • Introducing redundancy in critical applications, where possible
  • Designing for “graceful degradation”, including diagnostic and self-test functionality as well as reducing proof test intervals

Integrating system safety into RAM

  1. All subsystems are identified through an Indenture List, serving as a checklist to ensure no subsystems are accidentally omitted from the assessment.
  2. The Functional Hazard Analysis (FHA) is performed to identify any safety hazards from each of the functional failure mode for each subsystem. Concurrently, FMECA is performed on each subsystem to identify failure mode down to component level and assess whether each mode will pose any safety hazards affecting a mission, or even causing a mission to be aborted.
  3. Each failure mode identified through the FMECA that could lead to a safety hazard would be extracted and included in the hazard analysis. In addition, Reliability Block Diagrams (RBD) can be developed using inputs from the FMECA to obtain reliability parameters. Other parameters such as the maintenance ratio and failure rates etc. shall be furnished by various OEMs for respective subsystems to form the data analysis where a system’s reliability and operational availability can be calculated.

Conclusion

In conclusion, this approach would allow each subsystem to be assessed in terms of the failure costs caused by system downtimes, equipment repair and personnel through RAM analysis. It also assesses the impact on the life of the personnel operating the system or system damage in the event of design failure through system safety engineering.