We discussed the first several topics (scope, data considerations, and external dependencies) that should go into an ITSM tools operation continuity plan during a disaster recovery (DR) scenario. In this post, we will conclude the discussion by covering the people, procedures, checklist, and validation sections of the plan.
Recovery Team and Communication
The section spells out the key roles and the staff members who will be responsible for implementing the plan. It will include the typical information such as name, organization affiliation, location, and contact details. This section should also include agreed upon communication protocols for the duration of the recovery process. For example, teleconference bridge information could be something useful to include, if that is your organization’s standard practice. Also, if your organization has a major incident handling practice with standard communication and reporting requirements, you can cover the pertinent details within this section or at least make a reference to it.
Recovery Procedure and Configuration Details
This section outlines the recovery instructions and procedures with as many details as necessary for a successful recovery. This section also provides all configuration details that need to be in place in order for the recovery solution to work as designed. Just how much information do you really need to include in the document? I believe the level of detail will largely depend on whom your organization anticipate will execute the plan or heavily influenced by your organization’s disaster recovery plan.
By default, I would recommend having the level of detail based on the assumption that the people who will execute the plan are NOT the same folks who operate the production environment pre-disaster. This is the safer router in my opinion.
For example, your organization uses ITSM product XYZ, made by company ABC and operated by an off-shore team from vendor OPQ. When a disaster strikes your data center, the network connectivity between your data center and your production personnel from OPQ was cut off. At this time, your OPQ vendor has another team at a different location who can access the recovery site and knows your ITSM product. With a properly documented recovery plan with the right level of details, you can leverage your vendor’s second team quickly to start working on the recovery without any help from the production team.
Checklist of Key Actions or Milestones
To help everyone who needs to understand and execute the plan, I recommend having a section that outlines the key tasks and activities so no important activities will be missed. The checklist can also be used to show the various communication and escalation points.
Testing and Validation Steps
This section outlines the detailed testing strategy and validation activities required to implement the recovery plan. Similar to the recovery procedures section, it is highly recommended that these instructions are comprehensive, so the recovery solution can be fully tested by people who may not be the same team as your production crew. The section should also account for all potential conditions, events, and scenarios. The section should contain information that can be used during the testing such as:
- What are the success criteria for the test objectives?
- What are the assumptions defined for this test, if any?
- When is the testing window?
- How should the security setup be taken into consideration?
- Which tool modules will be tested?
- Who will be conducting the test?
- What are the steps to validate the application and data integrity?
This section describes the instructions/procedures necessary to return to the normal production environment once it has been restored. For this section, I would say probably the most key information to explain is how the data will be synchronized between the production and the DR systems. Again, the key objective is not to lose any data captured by the ITSM tools while it was operating in the DR setup. This section will also include the necessary testing and validation steps to ensure that the production system is back in full operation, so the DR system can be fully shut down and return to the stand-by mode.
I hope this particular discussion on the ITSM tools continuity plan provides something of useful tidbits for your organization to think about and to plan for. Considering the importance of the data and transactions captured by the ITSM tools for many organizations, having a workable continuity plan is really not optional anymore. If you don’t have anything in place now, I recommend start small by drafting a plan and making the scope compact enough to cover just a few high probability and high impact scenarios. Work with someone within your IT organization who is responsible for the overall IT continuity plan, so your plan can integrate well with the overall IT and business objectives.