A previous tip (‘The deciding factor‘) discussed a common difference of opinion, between the folks in IT and those in the business, as to what constitutes a recovery time objective for an IT system. It suggested that both parties need to understand what’s involved, from a process, rather than a technical point of view, in meeting an IT recovery time objective, and the associated times needed to carry out each part of the process.
Because, if we know how long it takes to recover a system (and we should, because we’ve tested it, haven’t we?), and we know how long it takes to make the environment available, deliver media or equipment, relocate people to the recovery site, or whatever else we need to do (all of which we’ve also tested, haven’t we?), we can work backwards from our recovery time objective and know exactly how long we have to make the decision to invoke our plans.
As an example, let’s assume we have a recovery time objective of twelve hours for a particular IT system. And let’s assume that the recovery strategy involves restoring the operating system, the application and its data onto equipment at another location, from our offsite backups (disk, tape or whatever, it doesn’t really matter for the purposes of this example). Because we’ve tested (haven’t we?), we know it takes two hours for our technical boffins to travel to the recovery site, an hour to configure the hardware and a further six hours to restore the operating system, application and data. In total, that’s nine hours, from the time that the proverbial button is pushed and the boffins are given the go-ahead. So, we now know that we have a maximum of three hours from the point of failure to make the decision to invoke. And if we don’t decide within three hours we can’t possibly meet our twelve-hour recovery time objective. Full stop. No ifs or buts.
This basic, but important, bit of understanding is a great help to both the decision-makers and the IT people. It helps the decision-makers because they now know exactly how long they have to make their decision. In the above example, they only have three hours. But they do have three hours, so they can take a little time, if necessary, to make a reasoned decision. And it helps the IT people because, if the decision-makers can’t decide within three hours, it’s them, not the IT folks, who get to carry the can if the recovery time objective isn’t then met.