Problem Management - Problem Investigation and Diagnosis

Problem Investigation and Diagnosis

The result of an investigation for a problem will be a root cause diagnosis or a RCA report. The resolution should be the sum of the appropriate level of resources and skills used to find it. There are a number of useful problem solving techniques that can be used to help diagnosis and resolved problems.

  • The CMS must be used to help determine the level of impact and to assist in pinpointing the point of failure.
  • The Known Error Database or KEDB should be accessed and checked in order to find out if the problem has occurred in the past, if so a resolution should be already in place.
  • The Chronological analysis, the events that trigged the problem will be checked in chronological order in order to have a timeline of events. The purpose is to see which event trigger the next event and so on, or to rule out some possible events.

The Pain Value Analysis contains a broader view of the impact of an incident or a problem on the business. Rather than analysing the number of incidents/problems of a particular type in a particular time interval, the technique focus on in-depth analysis of what level of pain has been caused to the business by these incidents/problems. A formula to calculate the level of pain should take into account:

  • the number of people affected
  • the duration of the downtime caused
  • the cost to the business

The Kepner and Tregoe method is used to investigate deeper-rooted problems. They defined the following stages:

  • defining the problem
  • describing the problem in terms of identity, location, time (duration) and size (impact)
  • establishing possible causes
  • testing the most probable cause
  • verifying the true cause

Pareto Analysis is a technique for separating important potential causes from trivial issues. The following steps should be taken:

  1. Form a table listing the causes and their frequency as a percentage
  2. Arrange the rows in the decreasing order of importance of the causes (the most important cause first)
  3. Add a cumulative percentage column to the table
  4. Create a bar chart with the causes, in order of their percentage of total
  5. Draw a line at 80% on the Y-axis, then drop the line at the point of intersection with the X-axis. From the chart you can see the primary causes for the network failures. These should be targeted first.
Network failures
Causes Percentage of total Computation % Cumulative
Network Controller 35 0+35% 35
File corruption 26 35% + 26% 61
Server OS 6 61%+6% 67%

Read more about this topic:  Problem Management

Famous quotes containing the word problem:

    The great problem of American life [is] the riddle of authority: the difficulty of finding a way, within a liberal and individualistic social order, of living in harmonious and consecrated submission to something larger than oneself.... A yearning for self-transcendence and submission to authority [is] as deeply rooted as the lure of individual liberation.
    Wilfred M. McClay, educator, author. The Masterless: Self and Society in Modern America, p. 4, University of North Carolina Press (1994)