If your IT organization is like most others, you rely heavily on your IT service management (ITSM) tools for delivering IT services to your business customers or constituents. Many IT shops also have a comprehensive suite of ITSM tools they use as part of the various aspects of their operation. It is my personal belief that the ITSM tools operate like the ERP systems for many businesses – the ITSM tools are providing a critical service to the IT organizations.
While many ITSM products on the market today are well made and come with industrial strength resiliency, technology failures or other disasters can still cause the tools to become unavailable for use. When the tools unavailability or outage stretches out from merely minutes into hours or days, you need to have a continuity plan to get the tool services restored so your IT organization’s operations can continue normally without further hindrances. The ITSM tools operation continuity plan needs not to be fancy or sophisticated, but it does need to be well thought-out with as many details called out beforehand as possible. This is the part one of a two-part post where we will go over what components should go into such plan.
This section provides an overview of the plan. Why the plan exists? Who is the owner accountable for drafting and/or executing plan? How will the plan be maintained and tested for validity or accuracy? Any other high-level, overview information about the plan will be helpful to include in this section.
This section includes the comments on what conditions will trigger the actions to invoke this plan. It is important to point out who will be authorized to invoke and to implement this plan. It is also important to outline the availability requirements and targets once the plan is invoked.
This section describes the ITSM modules, systems, infrastructure, services and facilities that will be part of this plan. A number of ITSM systems do not operate in isolation these days, so identifying all components required for a functional ITSM system could be daunting. That is OK. Just have a boundary in mind and do your best. If possible, include and provide as much information on the infrastructure that hosts the ITSM products as feasible. This could include the actual server names, databases, and other components deemed essential and critical for the operations of the ITSM tools. If you have a CMDB with the relationships documented, those relationships between the system components and your plan should be consistent with each other.
Depending on your operation, not all ITSM modules need to be part of this continuity plan. For example, I surmise tool modules or services such as Incident Management, Change Management, or anything the Service Desk uses could be high on the priority list to get restored ASAP. Problem Management module probably can wait and get restored as part of the normal system recovery cycle.
Data Dependencies and Considerations
This section includes comments about the data requirements that need to be met before the recovery plan can be implemented. What data is needed for the recovery and what preparation activities are required to get the data in place? This is more than just calling out what database servers are needed for recovery, which should have been discussed in the Scope section. I am talking about things such as how current the data need to be before the recovery procedures can be executed. Another consideration is how the data that was captured during the recovery phase will be incorporated back into the main database once the original systems come back online. The key objective here is not to lose data, during recovery and post recovery.
Security and Access Considerations
This section includes the important details about the security and access related matters. For example, what access rights will your systems and personnel require in order to fully execute the plan? Often we have the security and access considerations on the back burners and forget about them. During the recovery phase, things are not working as expected and, after many rounds of discussions and trouble-shooting exercises, we realize the security access might be preventing things from working. Don’t put yourself in the position of being unprepared and wasting time. Figure out those security and access details beforehand and document them in the plan.
External Dependencies and Considerations
This section calls out the systems, infrastructure, service, facility or interfaces that are external to the ITSM system but have inter-system dependencies that should be documented. Essentially, anything that has not been identified in the Scope section but still required for recovery should be mentioned here. That way, all dependent systems and the nature of dependency can be identified and taken into account during the plan execution. For example, we might want to include information about the email system and its key interface points because most ITSM systems have a reliance on the email systems for communication.
That is all for now. On the next post, we will conclude the discussion of the plan and cover the remaining topics:
- Recovery Team and Communication
- Recovery Procedure and Configuration Details
- A Checklist of Key Actions or Milestones
- Testing and Validation
- Return-To-Normal-Operation Procedure