Incidents, Problems, Known Errors and Changes
Incident and Problem Management are valuable process domains in ITIL. This column explores how they're applied to real-world challenges.By
George Spafford
- Submit Feedback »
- More by Author »
COMMENT
The relationship of the terminology used is an interesting topic of discussion as we can explore the handling of a service error through the incident management process and opportunities for improvement. *
An incident is any event that is not part of the normal operation of a service and impacts, or threatens to impact, the quality of the service delivered. In response, IT opens an incident record to try to quickly restore the service to operating within the parameters of the service level agreement (SLA).
The perspective is grounded in the SLA because it should outline performance expectations from the customer - not just from IT's perspective. This reflects the need to support the business, not just push technology.
If the cause is readily apparent and can be corrected, then a work-around is developed or a request for change (RFC) created. Some corrections can be done without change -- such as resetting a device -- necessitating only a work-around.
On the other hand, if a change is required, it needs to be handled through the proper change management processes. Even though incident managements goal is the speedy restoration of service, it must not bypass change management or this will cause production build configurations to drift from their established baselines.
If the cause of the error is not readily apparent or it is felt that an investigation is required, then a problem record should be opened. This new problem record is then independent of the incident because the incident management function is tasked with restoring service as quickly as possible.
In contrast, the problem management function is tasked with identifying the underlying causal factor, which may relate to multiple incidents. It may take several incidents to transpire before problem management has enough data to understand the root cause. Once problem management identifies the causal factor and develops a work-around, then the problem becomes a known error.
The fact that sometimes problem management cannot immediately identify the root cause and establish a corrective action puts the two groups at odds, as incident management wants a quick fix, or work-around. If the incident management team develops a work-around, then the problem management record should be updated with the information so the problem management team can leverage the additional data.
In reviewing the incident management teams work-around, problem management may elect to accept the work as the resolution because it addresses the root cause. If it does not, then problem management will dig deeper. If problem management develops a work-around that addresses the incident without solving the root cause, then the incident becomes a known error.
As mentioned above, if a change is needed, then a RFC must be filed and handled through change management. If problem management establishes the root cause and a resolution, they need to alert incident management so the known error tickets can benefit from the resolution and have their status shifted to closed once the corrective work is completed.
Opportunities for Improvement
The above outlines the relationships between incidents, problems, known errors, RFCs and, finally, resolutions. Building on the topics discussed above, there are several opportunities for process improvement:
Summary
Incident and Problem Management are valuable process domains in ITIL. As the pervasiveness of IT increases in mission-critical aspects of the business, this trend will continue. As organizations look to ITIL to improve their processes, they will need to understand the relationship between incidents, problems, known errors, request for change and resolutions.
* This article focuses on the relationship of the terminology used to denote incidents, problems, known errors, requests for change and resolutions. For details on the processes, review the ITIL Service Support volume or go to the Incident and Problem Service Management Functions of Microsoft Operations Framework site or to the Reactive and Proactive portions of the BECTA site for Incident and Problem Management.