Data Center Hub

Internet Data Center and Hosting News and Views

Six steps for implementing an incident recovery plan

Filed under: Jesus Factor — Bill Laakkonen at 6:47 am on Thursday, May 31, 2007

I got a call from one of my customers last Friday; the customer told me that they were without power. There was no storm; in fact the weather was bright and sunny in Florida. It wasn’t even a particularly hot day so there wasn’t a tremendous heat or power load in their office either. Unfortunately however, the power transformer that feeds their building was over 40 years old and it had failed.

This customer happens to be one of my few customers who plan ahead for contingencies. They have a beautiful office on the Caloosahatchee in Fort Myers, Florida. In 1998 their office flooded, as a result of this event, the business owner had their building elevated roughly 8 feet above its previous height. It’s not likely now that it will flood again. In 2004 Fort Myers was affected by hurricane Wilma, and the office was without power for several days. To mitigate this happening in the future, the customer installed a propane powered generation system complete with an automatic switch, to provide power in the event that mains power fails. Even though many businesses were down after hurricane Wilma hit Florida in 2006, this customer did not skip a beat. Planning and implementing ahead of time does pay off. Inside their office all the equipment is housed in a rack mount case and utilizes HP/Compaq UPS Systems. While the customer has a server maintenance subscription for their servers and workstations as well as Network Equipment in the office, they have no such service maintenance on the generator system itself. The generator system was initially installed and configured to test itself each Tuesday. At some point in the last few months the testing was occurring yet it was failing each time. Apparently nobody noticed in the office and of course the generator has no way to inform anyone of its other than the total lack of power.

It’s important to be prepared, it’s also important to plan in advance how an event may be handled and include scenarios for unlikely situations. You see in this case I received a call from the customer after the power went out, after the generator had failed, and after the UPS System exhausted its batteries. It is important to shut down complex systems in an orderly fashion rather than simply allowing the power to drain down. Shutting the server down in an orderly fashion allows the cache to be flushed properly and allows the system to be shut down in a known state. Failure to do so may result in data corruption and even hardware failure. Even though the UPS had the ability to shut down the server, it was not connected properly and in this particular situation the shutdown resulted in a hardware failure bringing down the customer’s entire domain requiring several hours of onsite repair. In the future the UPS should be able to shut down the server BEFORE the UPS fails (now that the pesky serial cable is reattached). As bad as it sounds there was no permanent damage, only the inconvenience and expensive weekend service calls. It brings to light the requirement to plan better and the need to have a document on hand which highlights the contacts who may be involved and a course of action corresponding with it.

An incident response plan is more general in nature and broader in scope than a Business Continuation Plan or Disaster Recovery Plan. The plan should cover things such as a network meltdown, lightning strikes, and things such as the broad steps to recover from a security breach.

So here are six steps to take in developing an incident response plan.

  1. Plan for events- security breaches, lack of power, data connection failures, toll free line failures. Try to identify as much as possible what could happen and plan around that possibility
  2. Identify Responders and Roles. You should choose people for your incident responses before the events occur including backup people (even if they exist outside your employee pool). Look beyond the IT department and include HR and vendors as well.
  3. Create a backup communications plan- perhaps an off-site SharePoint based web site, conference bridge lines, and VOIP systems in case your staff need to work from alternate locations. Don’t forget to create a contact list that includes mobile and home numbers as well as alternate email addresses.
  4. Decide who is in charge of what in advance. Plan ahead to delegate authority for things such as emergency Purchase Orders or Press Releases.
  5. Plan for an alternate work site. Choose a location in advance so that if your regular work location is not available, you and you staff will have a place to work. Make sure you have connectivity, power, and adequate cooling or heating as needed. Consider partnering with a vendor of roughly equal size and setting up a reciprocal arrangement with them. For example you could make a reciprocal agreement with a supplier located 200 miles from your office so that in a time of disaster you can temporarily route calls through or relocate there until normalcy is restored at your regular office.
  6. Test your plan ahead of time; there is nothing worse than finding out you missed an important detail while the incident is occurring. Start you evaluation as a quick discussion of events and scenarios first and later work your way up to a simulation. You also should make sure your plan is kept safe (for example- encrypted) and accessible; keep copies in multiple physical locations where it is ready as needed.

There is always the temptation to ignore planning while everything is running smoothly but don’t do it- a flick of a switch can set you scrambling in ways that might cause you to loose sleep at night. Plan ahead and sleep soundly. Don’t forget as well that over time you will still need to revisit your plans and revise them as needed.
 

  

No Comments »

No comments yet.

RSS feed for comments on this post. Share on Facebook TrackBack URI

Leave a comment

You must be logged in to post a comment.