Fault Management

In network management, fault management is the set of functions that detect, isolate, and correct malfunctions in a telecommunications network, compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, carrying out sequences of diagnostics tests, correcting faults, reporting error conditions, and localizing and tracing faults by examining and manipulating database information.

When a fault or event occurs, a network component will often send a notification to the network operator using a protocol such as SNMP. An alarm is a persistent indication of a fault that clears only when the triggering condition has been resolved. A current list of problems occurring on the network component is often kept in the form of an active alarm list such as is defined in RFC 3877,the Alarm MIB. A list of cleared faults is also maintained by most network management systems.

Fault management systems may use complex filtering systems to assign alarms to severity levels. These can range in severity from debug to emergency, as in the syslog protocol. Alternatively, they could use the ITU X.733 Alarm Reporting Function's perceived severity field. This takes on values of cleared, indeterminate, critical, major, minor or warning. Note that the latest version of the syslog protocol draft under development within the IETF includes a mapping between these two different sets of severities. It is considered good practice to send a notification not only when a problem has occurred, but also when it has been resolved. The latter notification would have a severity of clear.

A fault management console allows a network administrator or system operator to monitor events from multiple systems and perform actions based on this information. Ideally, a fault management system should be able to correctly identify events and automatically take action, either launching a program or script to take corrective action, or activating notification software that allows a human to take proper intervention (i.e. send e-mail or SMS text to a mobile phone). Some notification systems also have escalation rules that will notify a chain of individuals based on availability and severity of alarm.

Read more about Fault Management:  Types

Famous quotes containing the words fault and/or management:

    I often accuse my finest acquaintances of an immense frivolity; for, while there are manners and compliments we do not meet, we do not teach one another the lessons of honesty and sincerity that the brutes do, or of steadiness and solidity that the rocks do. The fault is commonly mutual; however, for we do not habitually demand any more of each other.
    Henry David Thoreau (1817–1862)

    The Management Area of Cherokee
    National Forest, interested in fish,
    Has mapped Tellico and Bald Rivers
    And North River, with the tributaries
    Brookshire Branch and Sugar Cove Creed:
    A fishy map for facile fishery....
    Allen Tate (1899–1979)