Harnessing Data Science and AI in Cybersecurity

Ravisha Chugh, an Email Security Evangelist at Fortra, explains how companies harness data science and AI technologies in their cybersecurity initiatives. This article originally appeared in Insight Jam, an enterprise IT community that enables human conversation on AI.
In today’s digital landscape, email remains a primary communication channel. However, its widespread use makes it a prime target for advanced email threats. In recent years, the industry has seen a fundamental shift where attackers have moved beyond trying to deceive the email environment to deceive human beings. These modern attacks leverage impersonation techniques, where the attacker sends a message that appears to come from a known identity—an individual, organization, or consumer brand—that the recipient inherently trusts.
This shift is clearly illustrated by the kinds of threats that are currently reaching enterprise user inboxes. An in-depth analysis of threats observed in inboxes found that over 98 percent of threats getting past enterprise email security controls are impersonation threats like Business Email Compromise (BEC) and credential theft phishing lures. While they can consistently detect malware payloads, Secure Email Gateways and “baked-in” cloud add-ons do not reliably stop impersonation tactics and social engineering threats.
Leveraging data science and artificial intelligence (AI) has become crucial to combat these evolving challenges. Advanced threat detection can easily be achieved through machine learning algorithms trained on vast data sets of known phishing, spam, and legitimate emails.
One way to achieve this is to use natural language processing (NLP) techniques, which enable AI to understand and analyze the content of emails by detecting suspicious language, unusual requests, potential phishing attempts, or deviation from a pattern that may be considered an anomaly. NLP does this by parsing email text to identify keywords and phrases commonly associated with scams. In addition, to identify unknown threats, leveraging a combination of machine learning models, large language models (LLMs), and neural networks is key.
How Does Machine Learning (ML) in Work Cybersecurity?
To best explain how AI works behind the scenes, it is essential to break down the parts of a message that are analyzed and how the machine learning models can be applied–this includes the various components embedded in email header data, including:
- “From” field, or Sender Display Name
- The “To” field, or Recipient Display Name
- Subject Line
- Local Part, or Email Prefix
- Email Sending Domain of Webmail Address
In addition, ML models scan any contextual data like text and metadata. These include:
- The sender’s infrastructure
- The number of days the IP address has been used to send from on behalf of the domain
- The number of emails sent using the same Local Part of the email
- The number of emails using the same Display Name
- The intent of the email (examine the nature of the content, like the Subject Line)
- Matching Address Group and Display Name
- Character text used in the Local Part of email (Latin vs. Cyrillic)
- SPF (Sender Policy Framework) / DKIM (DomainKeys Identified Mail) / DMARC (Domain-based Message Authentication, Reporting & Conformance) records
In this vein, a type of pre-train and fine-tune paradigm can come in handy–to adapt pragmatic pre-trained models to various downstream tasks, such as:
- Analyzing specific text-based email components, such as Subject Lines and body content
- Determining message type or level of suspiciousness
- Identifying groupings or clusters in email data
- Creating data lakes with labeled data that other models can use
- Ingesting feeds of useful data–like lists of suspicious domains and IP address groups
When it comes to the application of AI, it can initiate takedowns of malicious domains and enforce policies like DMARC to prevent future email abuse from those domains. AI can also be implemented to automate incident response capabilities, arming systems with the ability to automatically quarantine suspicious emails, preventing them from reaching the intended recipient. This real-time intervention significantly reduces the risk of successful phishing attacks and enables quicker mitigation of threats, minimizing the window of opportunity for attackers to exploit vulnerabilities.
Though every machine learning paradigm has strengths and weaknesses, it continues to push forward and create opportunities to perform tasks more accurately and efficiently. For these reasons, it is critical to understand the task objective and select the ideal machine learning paradigm to perform the task well.
This is especially true for machine learning applications that can impact the efficiency of business operations, which could potentially bottleneck the flow of inbound and outbound communications. AI automation in detecting and responding to threats reduces the need for extensive manual intervention. This leaves security teams free to focus on more strategic tasks, improving overall operational efficiency.
Overall, using data science in email security can enhance an organization’s security posture by reducing phishing incidents, which in turn leads to fewer data breaches and financial losses for enterprise organizations.