Who is a Network Specialist?
I typically don’t dwell into this topic until unless I was told to speak about it in a conference / group discussion. While I admit, I do not have all the necessary certifications in place to be a subject matter expert, I certainly feel to understand the importance of Cybersecurity Frameworks in OT Security , you will need to understand some basic math and statistics. This post is to introduce you to these fundamental concepts.
Where do I begin?
It all starts with understanding that there are many thoughts, ideas and methodologies in cybersecurity practice. One thing is for sure, you will need some basic understanding of math. If I were starting from no where, then I would pick up a book on statistics and probability however, some basic concepts such as uncertainty and risk are important.
There is one important and vital concept that you need to have a very good understanding, it is called uncertainty. As in the case of a data breach, we are not certain or we lack data / information to calculate the true outcome of a data breach or when the data breach will actually occur. For example.
“There is a 35% chance that company ABC will have a data breach / data leak incident sometime in the next four years”
The objective is to be able to measure something and predict an outcome. In this particular example, we are very certain when something will happen.
“There is a 30% chance that company ABC will have a cyberattack in the form of data breach or data leak in the next three years”
Here is the same example with certainty that something will result (loss) from the data breach to the organization.
“There is a 20% chance that a data breach or data leak will result in a fine from GDPR regulation in the amount of $5 million dollars for the company ABC”
For cybersecurity and other risk management methodologies, understanding the terms such as Vulnerability (V) , Threat (T), Impact (I) and Likelihood (L) is very essential to be able to measure risk and apply counter measures. Also I want to point that there are two important methods to risk management , a qualitative approach (subjective) and a quantitative approach. Which approach is good? There are studies out there that suggest otherwise. Read What’s Wrong with Risk Matrices? by Tony Cox (Link to original publication)
However, if you are a beginner in Risk Analysis, I certainly recommend you start with qualitative analysis for your understanding and also, choosing between the two is like choosing between a scoop of vanilla ice cream in a cup and spoon of vanilla ice Dippin’ Dots.
In order to understand both qualitative and quantitative approaches of risk analysis, we have some key risk terminologies that one needs to understand.
The case of the Sandwich Theif.
Let us say I have a special sandwich- my asset, which is very valuable, may have some secret sauce or ingredient. The value of this asset is $10, this is how much it cost for me to make it. Now, if I really want to protect this from a sandwich thief, I would like to know how valuable it is, did I not say $10, but it is not the true value. If I were to loose the $10 sandwich, it might cost me $30, why? Well, you see the sandwich when it was made, the ingredients were cheaper, may be I got a discounted price or I might spend $20 for the its shelf life on refrigeration. So understanding this value is very important because, if I were to put a counter measure, such as a lock on a cabinet or something more secure than that, then I want to make sure I am spending more than what is worth. Who makes that decision?
In cybersecurity, a vulnerability is a flaw, a weakness, a missing defense. This can be accidental or intentionally put in place.
An analogy to real world is “Padlocks are easiest to pick as they have a massive vulnerability in the form of easy access to locking pin and cylinder mechanism which can be aligned to pop open the lock”
A threat is potential of exploiting a vulnerability which could result in a negative outcome. In cybersecurity, a threat is an exploitation of a vulnerability in a network, software or hardware that will allow a threat actor to gain privileged access to the system.
For example, a sandwich thief (threat actor) is a threat to your sandwich that is stored in a kitchen cabinet with a pad lock. How do we compute risk from threat and vulnerability?
Before we are able to define the risk, we need to also know what impact would this incident cost? Impact is the magnitude of harm that can be expected to result from a threat exploiting a vulnerability.
A sandwich stolen from the locked kitchen cabinet will result in a loss of $30 to your net worth.
We are almost ready to calculate risk, however, for the thief to exploit the vulnerability which is to pick the pad lock may seem easy enough but what if I told you, the kitchen is located in a armed location with 24 hours / 7 days a week surveillance and monitoring. Then what is the likely hood of such an event (incident) to even take place? You could say improbable or no chance at all and the impact would be moderate (subjective analysis).
We can express impact subjectively as follows:
Negligible-1, Minor-2, Moderate-3, Significant-4, Severe-5
Low-1, Moderate-2, High-3
This is the probability that a threat will exploit the vulnerability. It is usually not a specific number but a range.
Frequently – 5, Likely-4, Occasionally-3, Very Seldom-2, Not Likely at all-1
1=very unlikely, 2=low likelihood, 3=likely, 4=highly likely, and 5=near certain.
Risk Matrix / Risk Heat Map:
As you can see, we can then draw this Risk Matrix, also called as a Risk Heat Map.
As you can see, the qualitative analysis process involves judgment, intuition, and experience. For example, if I am a CSSP – Certified Sandwich Security Professional, with my intuition and judgement, I can categorize the risk of a sandwich thief stealing the sandwich to be LOW based on my understanding that it is unlikely for the sandwich thief to get into the kitchen and steal the sandwich which could have a protentional loss of $30 dollars, which is moderate. So, would I invest in putting any counter measures? As this is Risk is low, I would not consider it and accept this low risk.
End of this lesson, keep an eye out for more. Next- Quantitative Approach.
Before you start reading and understanding the core concepts in the context of BCP – Business Continuity Planning, DRP – Disaster Recovery Planning and Contingency Planning, make sure you understand that these are very important concepts and are interpreted differently by different organizations, individuals and security professionals. The main reason is that we as humans may think differently in terms of countermeasures, we have different risk appetite and so are the organizations that the individuals are made of and are in key positions to propose, accept and finalize on various business and operational contingency plans.
Before we begin, let us understand some of the core concepts.
What is a Plan?
Oxford Dictionary defines Planning as “an intention or decision about what one is going to do”.
So what is Contingency planning?
“A contingency plan is a plan devised for an outcome other than in the usual (expected) plan” – From Wikipedia.
Before we get into what is included in each of the plans, let us look into some definitions.
According to the NIST Special Publication 800-34, IT contingency planning refers to a coordinated strategy involving plans, procedures, and technical measures that enable the recovery of IT systems, operations, and data after a disruption.
Contingency planning generally includes one or more of the approaches to restore disrupted IT services:
- Restoring IT operations at an alternate location (Example: Hot Site, Warm Site and Cold Site)
- Recovering IT operations using alternate equipment (Example: Secondary Server, High Availability Configuration)
- Performing some or all of the affected business processes using non-IT (manual) means. (Example: Manually collect a customer’s credit card information through phone)
Because Contingency Plan includes broad scopes for recovery, continuity and response to business needs, business threats and emergencies, it is important to note that an organization may choose to implement the Contingency Plan in many different ways. This is when we start talking about BCP, DRP, COOP, IRP, etc. There are more. See Appendix A for expansion of these acronyms.
For a CISSP, it is important to understand the main differences between various types of plans.
What is a BCP?
Business Continuity Planning (BCP) is a process of creating or putting in place, systems and mechanisms for prevention and recovery of business systems to deal with potential threats to a business goal.
Business Continuity Plan is a formal document consisting of a set of processes, drawings, flow charts, ordered lists etc. that will help a business navigate through a business interruption(s) by providing tested and proven methods to recover and prevent a potential threat to the existence of the business. A BCP can have other plans included as part of its scope.
What is a DRP?
Disaster Recovery Plan (DRP) is a very detailed, hands on plan when compared to a Business Continuity Plan. It is highly reactive. It contains detailed instructions on how to respond to unplanned incidents such as hurricanes, flooding, earthquakes, power outages, cyber attacks and any other event that will cause disruptions to the business operations. The plan contains strategies on minimizing the effects of a disaster, so an organization will continue to operate – or quickly resume key operations.
Contingency plans help you continue to operate or sustain your business goals and can be reactive such as BCP, DRP and BRP. Parts of these plans can be proactive as well. For example, if you have servers configured in the High Availability (HA) mode, then you will limit downtime and improve performance. This is a proactive approach. If you have a backup server or a warm site, then you are making sure you can continue to operate when servers are down, this represents a reactive approach.
- BCP : Business Continuity Planning
- DRP : Disaster Recovery Planning
- BRP : Business Recovery Planning
- COOP : Continuity of Operations Plan
- IRP : Incident Response Plan
- OEP: Occupant Emergency Plan
This was chapter 1 on Contingency Planning. If you have any comments or questions, leave them below or message me!
Over and Out! Stay safe, think before you click (anywhere).
How can I not have an article on Disaster Recovery and Business Continuity Planning? A must have understanding for anyone in Security.
If you are a security professional with years of experience, then you are very familiar with these important fundamental metrics that is used in developing a Business Impact Analysis (BIA) Report which will identity your business processes , identify resources required for recovering of these processes in the event of a disaster and a become part of your Business Continuity Plan (BCP).
The metrics I am referring to are RPO, RTO and WRT. Also, Maximum Tolerable Downtime. I hope someone who is just getting into security and trying to grasp this concept will find this explanation very useful.
Let us assume a business which is operating normally represented by the following chart. Note, the X axis represents Time. The concepts that we are going to learn are a function of time. Time scale = 1 hr
Recovery Efforts Begin
Normal Operation Resumes
A disaster hits a business which is under normal operation at 3 am, recovery starts at 6 am, normal operation resumes at 8 am. Then we can define the terms as follows:
- Recovery point objective (RPO) is defined as Measures maximum acceptable data point to be recovered.
- Recovery Time Objective (RTO) is defined as Maximum time needed for data recovery.
- Work Recovery Time (WRT) is defined as Maximum amount of time needed to verify data integrity to resume operation.
Maximum Tolerable Downtime (MTD) is defined as The amount of time business process can be disrupted without causing significant harm to the organization’s mission.
For this particular example, from Figure 4 shows a RTO of 3 hrs and WRT of 2 hrs. The MTD is calculated as follows:
MTD = RTO + WRT
MTD = 3 hrs. + 2 hrs.
MTD = 5 hrs.
This is a very simple example for understanding the concept of calculating the Maximum Tolerable Downtime. For a deeper understanding I recommend indulging into books and materials written on DR and BC. Note that there is a very thin line and it can get blurred between resuming total business normal operation which may mean that you have switched back to the primary site for operation. For practical purposes , getting back to normal operation is more critical and important than returning to the primary site.
If you would like to get more understanding of these topics please see the following references:
I have been asked several times, at several occasions about this mysterious term called OT. While it stands for Operational Technology, what is Operational Technology?
Few years ago, I stood in front of a large audience from diverse backgrounds such as Process control, Maintenance, IT and management and delivering a motivating speech on cybersecurity for manufacturing, while I had never used the word OT in the context of Industrial Control System, the word existed but not necessarily used as commonly as today.
While different individuals use it differently to describe their trade, especially in Industrial Cybersecurity practice, the word itself has become somewhat of a open secret, we think we know what it means, but do we really? So to tackle this problem, I asked a bunch of people in my closest professional circle and tried to define it myself. While the definition of OT might change, but as far as I am concerned, what I am about to tell you will still be relevant because instead of defining all the different systems that the Operational Technology represents, I will simply define what it represents.
So here you,
Operational technology (OT) is a set of hardware, software, and communication systems that are used to monitor, control, and automate industrial processes. OT systems are typically used in critical infrastructure industries such as manufacturing, energy, and transportation.
These can include IACS and Control System Components, Information Technology components that are part of the control systems etc., but are defined by the organization to appropriately apply the necessary security measures and controls.
Lets look at a diagram. Yes, that is MS Paint and I did create it 5 years ago. As you can see, OT, Operational Technology is an umbrella term defined by the organization to include systems such as Industrial Automation and Control systems (IACS), Fire Systems, Access Control Systems, Lighting Controls etc.
What is the relevance of defining Operational Technology?
The importance of defining OT for your organization is simply to be able to develop / design and implement appropriate security controls and measures to protect your business operations. Properly identifying what falls inside the OT environment and what falls outside, what is included and what is excluded, will provide you with the right information to develop your OT Cybersecurity Strategy.
Do you have any comments or suggestions to improve this definition? Send me a message.