Failure Transparency

In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications. For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit high failure transparency.

Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly. Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable.

There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For example, if a distributed system attempts to mask a transient server failure by having the client try to contact the failed server multiple times, performance of the system may be negatively affected. In this case, it would have been preferable to have given up earlier and tried another server.

Famous quotes containing the words failure and/or transparency:

    All health and success does me good, however far off and withdrawn it may appear; all disease and failure helps to make me sad and does me evil, however much sympathy it may have with me or I with it.
    Henry David Thoreau (1817–1862)

    Life is filigree work.... What is written clearly is not worth much, it’s the transparency that counts.
    Louis-Ferdinand Céline (1894–1961)