This post is the part two (and concluding part) of a series where we discuss the Major Incident Review process and how to put one together. Previously we discussed the elements and considerations that should go into the process design. We elaborated those considerations further with a sample process flow. We will describe the process activities further along with a reporting template you can use to implement the process.
The process design document provides a detailed description of the fields within the report template, so no plan to repeat. I think there are two factors to keep in mind when undertaking such process. First, don’t do the process just for the sake of doing it. Do it because your organization genuinely wants to improve service by eliminating as many of these incidents over the long-term as you can. If the organization chose not to implement certain solution for some reasons, costs, technical complexity, longevity of the technology, regulatory/compliance, or whatever, at least document the discussion. That way, it shows that the organization understood the risks and chose to accept them.
Second, perform meaningful measurements and, again, use the statistics to improve service. For example, if the majority of the incidents are reported by the end users, perhaps that is giving us a clue that we should be more proactive and beef up the automated monitoring? If a particular technology area has been experiencing more major incidents than the other areas, perhaps we should figure out what ills are plaguing the area and fix what are broken? If a particular business unit or segment has been experiencing more major incidents than the other segments, perhaps we owe it to the business communities to figure out what we can do to make things better? The business impact information we capture will enhance our understanding of the incidents and help us in formulating the solutions that make sense for the business.
Most organizations I know practice some type of incident review process, so I hope the information presented so far has been helpful. Please feel free to suggest other approaches that have worked for your organization.
Links to other posts in the series