Common Cause and Special Cause (statistics) - in Engineering

In Engineering

Common mode, or common cause, failure has a more specific meaning in engineering. It refers to events which are not statistically independent. That is, failures in multiple parts of a system caused by a single fault, particularly random failures due to environmental conditions or aging. An example is when all of the pumps for a fire sprinkler system are located in one room. If the room becomes too hot for the pumps to operate, they will all fail at essentially the same time, from one cause (the heat in the room).

Another example is an electronic system wherein a fault in a power supply injects noise onto a supply line, causing failures in multiple subsystems.

This is particularly important in safety-critical systems using multiple redundant channels. If the probability of failure in one subsystem is p, then it would be expected that an N channel system would have a probability of failure of pN. However, in practice, the probability of failure is much higher because they are not statistically independent; for example ionizing radiation or electromagnetic interference (EMI) may affect both channels.

The principle of redundancy states that, when events of failure of a component are statistically independent, the probabilities of their joint occurrence multiply. Thus, for instance, if the probability of failure of a component of a system is one in one thousand per year, the probability of the joint failure of two of them is one in one million per year, provided that the two events are statistically independent. This principle favors the strategy of the redundancy of components. One place this strategy is implemented is in RAID 1, where two hard disks store a computer's data redundantly.

But even so there can be many common modes: consider a RAID1 where two disks are purchased online and are installed in a computer, there can be many common modes:

  • The disks are likely to be from the same manufacturer and of the same model, therefore they share the same design flaws.
  • The disks are likely to have similar serial numbers, thus they may share any manufacturing flaws affecting production of the same batch.
  • The disks are likely to have been shipped at the same time, thus they are likely to have suffered from the same transportation damage.
  • As installed both disks are attached to the same power supply, making them vulnerable to the same power supply issues.
  • As installed both disks are in the same case, making them vulnerable to the same overheating events.
  • They will be both attached to the same card or motherboard, and driven by the same software, which may have the same bugs.
  • Because of the very nature of RAID1, both disks will be subjected to the same workload and very closely similar access patterns, stressing them in the same way.

Also, if the events of failure of two components are maximally statistically dependent, the probability of the joint failure of both is identical to the probability of failure of them individually. In such a case, the advantages of redundancy are negated. Strategies for the avoidance of common mode failures include keeping redundant components physically isolated.

A prime example of redundancy with isolation is a nuclear power plant. The new ABWR has three divisions of Emergency Core Cooling Systems, each with its own generators and pumps and each isolated from the others. The new European Pressurized Reactor has two containment buildings, one inside the other. However, even here it is not impossible for a common mode failure to occur (for example, caused by a highly-unlikely Richter 9 earthquake).

Read more about this topic:  Common Cause And Special Cause (statistics)

Famous quotes containing the word engineering:

    Mining today is an affair of mathematics, of finance, of the latest in engineering skill. Cautious men behind polished desks in San Francisco figure out in advance the amount of metal to a cubic yard, the number of yards washed a day, the cost of each operation. They have no need of grubstakes.
    Merle Colby, U.S. public relief program (1935-1943)