Cyber Crime Cases and Confusion Matrix

7 min readJun 6, 2021

What is a Confusion Matrix?

A Confusion matrix is the comparison summary of the predicted results and the actual results in any classification problem use case. The comparison summary is extremely necessary to determine the performance of the model after it is trained with some training data.

For a binary classification use case, a Confusion Matrix is a 2×2 matrix which is as shown below

From the above figure:
We have,

Actual Class 1 value= 1 which is similar to Positive value in a binary outcome.

Actual Class 2 value = 0 which is similar to a negative value in binary outcome.

The left side index of the confusion matrix basically indicates the Actual Values and the top column indicates the Predicted Values.

There are various components that exist when we create a confusion matrix. The components are mentioned below

Positive(P): The predicted result is Positive (Example: Image is a cat)
Negative(N): the predicted result is Negative (Example: Images is not a cat)
True Positive(TP): Here TP basically indicates the predicted and the actual values is 1(True)
True Negative(TN): Here TN indicates the predicted and the actual value is 0(False)

False Negative(FN): Here FN indicates the predicted value is 0(Negative) and Actual value is 1. Here both values do not match. Hence it is False Negative.

False Positive(FP): Here FP indicates the predicted value is 1(Positive) and the actual value is 0. Here again both values mismatches. Hence it is False Positive.

Accuracy and Components of Confusion Matrix

After the confusion matrix is created and we determine all the components values, it becomes quite easy for us to calculate the accuracy. So, let us have a look at the components to understand this better.

Classification Accuracy

From the above formula, the sum of TP (True Positive) and the TN (True Negative) are the correct predicted results. Hence in order to calculate the accuracy in percentage, we divide with all the other components. However, there are some problems in the accuracy and we cannot completely depend on it.

Let us consider that our dataset is completely imbalanced. In this Scenario, 98% accuracy can be good or bad based on the problem statement. Hence we have some more key terms which will help us to be sure about the accuracy we calculate. The terms are as given below:

TPR (True Positive Rate) or Sensitivity:

True Positive rate which is also known as Sensitivity measures the percentage of the True Positive with respect to the Total Actual Positives which is indicated by (TP+ FN)

TNR (True Negative Rate) or Specificity:

True Negative Rate or Specificity measures the proportion of actual negatives with respect to the Total Negatives

False Positive Rate(FPR):

False Positive Rate is the percentage of Predicted False Positive (FP) to the Total No of Predicted Positive Results (TP + FP).

False Negative Rate (FNR):

False Negative Rate is the percentage of Predicted False Negative (FP) to the Total No of Predicted Negative Results (TN + FN).

An Overview of False Positives and False Negatives

Understanding the differences between false positives and false negatives, and how they’re related to cybersecurity is important for anyone working in information security. Why? Investigating false positives is a waste of time as well as resources and distracts your team from focusing on real cyber incidents (alerts) originating from your SIEM.

On the flip side, missing false negatives (uncaught threats) increases your cyber risk, reduces your ability respond to those attackers, and in the event of a data breach, could lead to the end of your business…

What Are False Positives?

False positives are mislabeled security alerts, indicating there is a threat when in actuality, there isn’t. These false/non-malicious alerts (SIEM events) increase noise for already over-worked security teams and can include software bugs, poorly written software, or unrecognized network traffic.

By default, most security teams are conditioned to ignore false positives. Unfortunately, this practice of ignoring security alerts — no matter how trivial they may seem — can create alert fatigue and cause your team to miss actual, important alerts related to a real/malicious cyber threats (as was the case with the Target data breach).

These false alarms account for roughly 40% of the alerts cybersecurity teams receive on a daily basis and at large organizations can be overwhelming and a huge waste of time.

What Are False Negatives?

False negatives are uncaught cyber threats — overlooked by security tooling because they’re dormant, highly sophisticated (i.e. file-less or capable of lateral movement) or the security infrastructure in place lacks the technological ability to detect these attacks.

These advanced/hidden cyber threats are capable of evading prevention technologies, like next-gen firewalls, antivirus software, and endpoint detection and response (EDR) platforms trained to look for “known” attacks and malware.

No cybersecurity or data breach prevention technology can block 100% of the threats they encounter. False positives are among the 1% (roughly) of malicious malware and cyber threats most methods of prevention are prone to miss.

Strengthening Your Cybersecurity Posture

The existence of both false positives and false negatives begs the question: Does your cybersecurity strategy include proactive measures? Most security programs rely on preventative and reactive components — — establishing strong defenses against the attacks those tools know exist. On the other hand, proactive security measures include implementing incident response policies and procedures and proactively hunting for hidden/unknown attacks.

Here are a few simple rules to help govern your approach to cybersecurity with a preventative, reactive, and proactive mindset:

Assume you’re breached and begin your offensive (proactive) initiatives with the goal of finding those breaches. By doing so, you’ll seek to validate the strength of your defensive/prevention tools with the understanding that none of them are 100% effective.
Use asset discovery tools to discover the hosts, systems, servers, and applications within your network environment, because you can’t protect what you don’t know exists.
Execute regular compromise assessments (we recommend at least once a week) and inspect every asset residing on your network.
Define security policies and procedures, and implement educational/training requirements so your entire team knows what to do in the event you discover a hidden breach, or worse, fall victim to a data breach.
Time is your most valuable asset, so implementing tools/technology to speed your speed of detection and time to respond are key and can help your security team prevent a data breach.

If your team lacks the resources to proactively detect and respond to advanced persistent threats, consider outsourcing your security services to a Managed Detection and Response (MDR) provider. MDR companies independently advise and alert you of immediate threats and provide assistance in responding to and eliminating those threats.

Steps to Improve Forensic Analytics

Forensic analytics — the combination of advanced analytics, forensic accounting and investigative techniques — is making breakthroughs every day in identifying rare events of fraud, corruption and other schemes. To meet rising regulatory and customer demand for fraud mitigation, forensic analytics can reveal signals of emerging risks months — or sometimes even years — before they happen. Of course, predicting anomalous events can also create false positives.

In an effort to reduce false positives in fraud investigations, careful attention should be spent on steps including:

Create an analytics repository — Consolidate and integrate data from disparate sources so analytical models can take an enterprise-wide approach to anomalous activity detection.
Employ network mapping and analysis — Explore fraudsters’ networks, affinities and relationships, as well as others committing similar illicit acts.
Leverage both supervised and unsupervised modeling — Supervised modeling employs algorithms to sift through data, applying historical fraud patterns and digital fingerprints of fraudsters to new data and scoring the level of risk involved in new events based on historical data. Unsupervised modeling uses algorithms to sift through data independent of patterns relating to known historical cases, looking for new events following unprecedented patterns.
Use natural language processing (NLP) — Sift through unstructured data, including emails, messaging, audio and video files to unearth unexpected nuance to communication or connections otherwise unclear in structured, text-only data. For example, the ability of NLP to analyze word choice, tone and possible stress levels expressed in a voicemail can sometimes offer more insight during investigations than text on page alone could offer.
Training and self-learning — Train analytics to learn from a variety of data sources, such as risk issues the organization has confronted in the past. The corresponding models can adapt over time to future risks.
Back testing — Scientifically test forensic analytics performance to evaluate its continued use. Backtesting can help establish confidence that pattern recognition models and algorithms work well and are effective in finding suspicious patterns of interest.
Iterative approach — Iteratively develop, adapt and scale forensic analytics models so they respond to new and evolving fraud patterns. At the same time, develop a broader view of the risks an enterprise may face. This approach enables an organization to build the forensic analytics platform in stages — one step at a time with input and validation from the business stakeholders — while still staying a step ahead of bad actors.