How can I not have an article on Disaster Recovery and Business Continuity Planning? A must have understanding for anyone in Security.
If you are a security professional with years of experience, then you are very familiar with these important fundamental metrics that is used in developing a Business Impact Analysis (BIA) Report which will identity your business processes , identify resources required for recovering of these processes in the event of a disaster and a become part of your Business Continuity Plan (BCP).
The metrics I am referring to are RPO, RTO and WRT. Also, Maximum Tolerable Downtime. I hope someone who is just getting into security and trying to grasp this concept will find this explanation very useful.
Let us assume a business which is operating normally represented by the following chart. Note, the X axis represents Time. The concepts that we are going to learn are a function of time. Time scale = 1 hr
Recovery Efforts Begin
Normal Operation Resumes
A disaster hits a business which is under normal operation at 3 am, recovery starts at 6 am, normal operation resumes at 8 am. Then we can define the terms as follows:
- Recovery point objective (RPO) is defined as Measures maximum acceptable data point to be recovered.
- Recovery Time Objective (RTO) is defined as Maximum time needed for data recovery.
- Work Recovery Time (WRT) is defined as Maximum amount of time needed to verify data integrity to resume operation.
Maximum Tolerable Downtime (MTD) is defined as The amount of time business process can be disrupted without causing significant harm to the organization’s mission.
For this particular example, from Figure 4 shows a RTO of 3 hrs and WRT of 2 hrs. The MTD is calculated as follows:
MTD = RTO + WRT
MTD = 3 hrs. + 2 hrs.
MTD = 5 hrs.
This is a very simple example for understanding the concept of calculating the Maximum Tolerable Downtime. For a deeper understanding I recommend indulging into books and materials written on DR and BC. Note that there is a very thin line and it can get blurred between resuming total business normal operation which may mean that you have switched back to the primary site for operation. For practical purposes , getting back to normal operation is more critical and important than returning to the primary site.
If you would like to get more understanding of these topics please see the following references:
A technical article on RTO Vs RPO by msp360.com
A blog post from Default Reasoning by Marek Zdrojewski