Confusion Matrix and Cyber Attacks
Cybercrime is an unlawful action against any person using a computer, its systems, and its online or offline applications. The fraud did by manipulating computer network is an example of Cybercrime ; Various types of Cyber crime attack modes are 1) Hacking 2) Denial Of Service Attack 3) Software Piracy 4) Phishing 5) Spoofing.
Types of cyber attacks
According to Gartner, the worldwide information security market is forecast to reach $170.4 billion in 2022. Around 88% of organizations worldwide experienced spear phishing attempts in 2019. Data breaches exposed 36 billion records in the first half of 2020. 86% of breaches were financially motivated and 10% were motivated by espionage. 45% of breaches featured hacking, 17% involved malware and 22% involved phishing.
15% of breaches involved healthcare organizations, 10% in the financial industry and 16% in the public Sector. The healthcare industry lost an estimated $25 billion to ransomware attacks in 2019. The average cost of a financial services data breach is $5.85 million USD.
Thus, detecting various cyber-attacks in a network is very necessary. The application of Machine Learning model in building an effective Intrusion Detection System (IDS) comes into play. A binary classification model can be used to identify what is happening in the network i.e., if there is any attack or not.
Understanding the raw security data is the first step to build an intelligent security model for making predictions about future incidents. The two categories being — normal and anomaly. Take into account the selected security features and performing all preprocessing steps, train the model that can be used to detect whether the test case is normal or an anomaly. For evaluation of model, one of the metric used is Confusion matrix.
A confusion matrix is a tabular summary of the number of correct and incorrect predictions made by a classifier. It is used to measure the performance of a classification model. It can be used to evaluate the performance of a classification model through the calculation of performance metrics like accuracy, precision, recall, and F1-score
Let’s understand the terms used here:
- In two-class problem, such as attack state, we assign the event normal as “positive” and anomaly as “negative“.
- True Negative: Model has given prediction No, and the real or actual value was also No.
- True Positive: The model has predicted yes, and the actual value was also true.
- False Negative: The model has predicted no, but the actual value was Yes, it is also called as Type-II error.
- False Positive: The model has predicted Yes, but the actual value was No. It is also called a Type-I error.
Confusion matrices have two types of errors: Type I and Type II
Now lets see these terms and their significance under the light of cyber attack prediction for better understanding.
Cyber attack prediction model —
IDS or Intrusion Detection System checks for any malicious activity on the system. It monitors the packets coming over internet using some ML model and predicts whether it is normal or an anomaly.
Lets say our model created the following confusion matrix for total of 165 packets it examined
A total of 165 packets were analyzed by our model in IDS system which have been classified in the above confusion matrix.
- “Positive” -> Model predicted no attack.
- “Negative” -> Model predicted attack.
- True Negative: Out of 55 times for which model predicted attack will take place, 50 predictions were ‘True’ which means 50 times attack actually took place. Due to prediction, Security Operations Centre (SOC) will receive notification and can prevent the attack.
- False Negative: Out of 55 times for which model predicted attack will take place, 5 times the attack didn’t happen. This can be considered as “False Alarm” and also Type II error.
- True Positive: The model predicted 110 times that attack wouldn’t take place, out of which 100 times actually no attack happened. These are the correct predictions.
- False Positive: 10 times the attack actually took place when the model had predicted that no attack will happen. It is also called as Type I error.
Confusion matrix analysis
Type I error
Type I error (False Positive)
This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.
Type II error:
Type II error — False Alarm (False Negative)
This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.
We can use confusion matrix to calculate various metrics:
- Accuracy: The values of confusion matrix are used to calculate the accuracy of the model. It is the ratio of all correct predictions to overall predictions (total values)
Accuracy = (TP + TN)/(TP + TN + FP + FN)
2. Precision: (True positives / Predicted positives) = TP / TP + FP
3. Recall: (True positives / all actual positives) = TP / TP + FN
4. Specificity: (True negatives / all actual negatives) =TN / TN + FP
5. Misclassification: (all incorrect / all) = FP + FN / TP + TN + FP + FN
It can also be calculated as -> 1-Accuracy