Reliability Engineering - Overview

Overview

Reliability may be defined in several ways:

  • The idea that an item is fit for a purpose with respect to time. In the most discrete and practical sense: "Items that do not fail in use are reliable" and "Items that do fail in use are not reliable".
  • The capacity of a designed, produced or maintained item to perform as required over time
  • The capacity of a population of designed, produced or maintained items to perform as required over time
  • The resistance to failure of an item over time
  • The probability of an item to perform a required function under stated conditions for a specified period of time.

Many engineering techniques are used in reliability engineering, such as Reliability Hazard analysis, Failure mode and effects analysis (FMEA), Fault tree analysis (FTA), Material Stress and Wear calculations, Fatigue and Creep analysis, Finite Element Analysis, Reliability Prediction, Thermal (Stress) management, Corrosion Analysis, Human error analysis, Reliability testing, Statistics, Monte-Carlo simulations, Design of Experiments, Reliability Centered Maintenance (RCM), Failure Reporting, Corrective Actions management and more. Because of the large number of reliability techniques, their expense, and the varying degrees of reliability required for different situations, most projects develop a reliability program plan to specify the reliability tasks that will be performed for that specific system.

In line with the creation of safety cases for safety, the goal is to provide a robust set of qualitative and quantitative evidence that an item or system will not contain unacceptable risk. The basic sort of steps to take are to :

  • First thoroughly identify as many as possible reliability hazards (e.g. relevant System Failure Scenarios, item Failure modes, the basic Failure mechanisms and root causes) by specific analysis or tests
  • Assess the Risk associated with them by analysis and testing
  • Propose mitigations by which the risks may be lowered and controlled to an acceptable level.
  • Select the best mitigations and get agreement on final (accepted) Risk Levels, possible based on cost-benefit analysis

Risk is the combination of probability and severity of the failure incident (scenario) occurring. Severity of failures include the cost of spare parts, man hours, logistics, damage (secondary failures) and downtime of machines which may cause production loss. What is Acceptable is determined by the managing authority or customers. Residual risk is the risk that is left over after all reliability activities have finished and includes the un-identified risk and is therefore not fully quantifiable!

Read more about this topic:  Reliability Engineering