New Predictive Analytics Techniques Leading in Fraud Prevention
Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise technology. In this feature, dotData CEO Ryohei Fujimaki offers commentary on how new predictive analytics techniques are leading in fraud prevention.
For years we’ve known that fraud is so widespread that it affects national GDPs and that it increases with financial downturns. As all industries and sectors embrace digital transformation, the digital attack surface expands, presenting new fraud opportunities for cybercriminals. A study from Juniper Research found that the online payment fraud losses between 2023 and 2027 will exceed $343 billion.
New machine learning (ML) models, AI applications, and predictive analytics techniques are being leveraged to combat the rising criminal trend, taking offensive approaches to make an impact. From identifying patterns and anomalies to detecting risks in large and complex data sets, predictive analytics can shut down fraud before it happens.
Predictive Analytics Fraud Prevention
Feature Engineering for Fraud Detection
One of the critical challenges in fraud detection is to extract relevant and informative features from the data that can capture the characteristics and behaviors of fraudsters.
Feature engineering is often manual and time-consuming, requiring domain knowledge and expertise. However, some recent advances in ML have enabled automated feature engineering methods that can reduce human effort and improve the quality of the features. Let’s look at some examples.
AutoML
AutoML is a framework that automates the end-to-end process of ML model development, including data preprocessing, feature engineering, model selection, hyperparameter tuning, and model evaluation. AutoML can help find the optimal combination of features and models for a given problem without human intervention.
Deep Learning
Deep learning is also being used in fraud detection. As a branch of ML that uses artificial neural networks (ANNs), the technique can be used to learn complex and nonlinear patterns from the data. Deep learning can perform feature engineering implicitly by learning high-level representations or embeddings from the raw data, such as images, text, or audio. Deep learning can also perform feature engineering explicitly by using techniques such as autoencoders or generative adversarial networks (GANs) to create new features from the data.
Reinforcement Learning
Reinforcement learning can perform feature engineering using techniques such as policy gradients or deep Q-networks (DQNs) to learn optimal policies or strategies for feature selection or generation.
Model Building for Fraud Detection
Building accurate and robust ML models capable of handling the complexity and dynamics of historical and live fraud detection scenarios is exceptionally challenging. These models must deal with big data, imbalanced data, drift, and adversarial attacks.
As fraud patterns can change over time due to changes in customer behavior, business environment, or fraudster tactics, models must be updated and well-maintained. They also need to be resilient to criminal interference and manipulation of data,. adding layers of security where possible to recognize evasion techniques.
To address these issues, some of the latest techniques for building predictive models for fraud detection include ensemble learning, active learning, semi-supervised learning, and others.
Ensemble learning is a technique that combines multiple models to create a more robust model. This strategy can improve the performance and stability of fraud detection models by reducing variance, bias, or overfitting. Ensemble learning can combine different models or algorithms, such as bagging, boosting, stacking, or voting.
On the other hand, active learning is used to select the most informative samples from a large pool of unlabeled data for human annotation, while semi-supervised learning leverages labeled and unlabeled data to train a model.
If a company struggles with data scarcity, semi-supervised learning can be used as it utilizes the abundant unlabeled data to work around the problem. Semi-supervised learning can use self-training, co-training, or graph-based methods to propagate labeled and unlabeled labels.
Deep learning is a technique that uses multiple layers of neural networks to learn complex and nonlinear features from the data.
Deep learning can capture fraud data’s high-dimensional and heterogeneous nature by extracting abstract and meaningful representations. This approach can use architectures such as autoencoders, convolutional neural networks, recurrent neural networks, or attention mechanisms to model different types of fraud data.
Different Sectors, Different Approaches to Fraud
It is also important to note that fraud techniques vary depending on their target sector. Each industry must adapt its countermeasures accordingly.
Banks and digital finance are among the most likely to be exposed to fraud attacks, as they involve high-value transactions, sensitive data, and multiple channels and parties. Cybercriminals will adopt a wide array of techniques to launch fraud campaigns in this sector. These include deceiving customers or employees through mass phishing, whaling, spear phishing, identity theft, account takeover, card skimming, money laundering, and others. Predictive analytics can help banks detect and prevent fraud using different concepts.
One of the most effective methods to detect banking fraud is generating scoring transactions based on risk level. ML models, trained on historical data and equipped with real-time features—amount, location, device, behavior, etc.—can generate very high-level risk scores, which, in turn, can be used to flag and shut down fraud attacks. Scoring has become a mainstream fraud security feature. Companies like IBM, through Watson Studio, provide a platform that supports visual programming and deep learning to develop and deploy ML models for fraud scoring.
Another way to go is to segment customers using profile data and behavior. In this strategy, ML models using techniques like clustering or classification identify normal and abnormal patterns and flag suspicious activities. Mastercard has been using this technique since 2019, segmenting customers into groups based on their spending habits and preferences to monitor their transactions for deviations from their usual patterns.
Natural language processing (NLP) and natural language generation (NLG) techniques can also be leveraged in ML models to generate alerts and recommendations and inform stakeholders of possible or existing fraud. The most considerable disruption in this area is ChatGPT, a natural language processing (NLP) technology that can be used as a novel tool in fraud detection and investigation. NLP can identify patterns and anomalies, enabling fraud investigators to streamline anti-fraud technologies.
But other sectors are not immune to fraud. In e-commerce, where online transactions are often anonymous, cross-border, and subject to chargebacks, criminals exploit online platforms and merchants using fake accounts, stolen credit cards, and refund abuse.
ML models that leverage biometric data (such as face or voice recognition), behavioral data (such as keystrokes or mouse movements), or device data (such as IP address or browser fingerprint) can be used to verify the identity and authenticity of customers.
In the insurance sector, ML developers need to adapt to combat complex, subjective, or delayed claims. Claims can be inflated or falsified, accidents can be staged, injuries exaggerated, and documents fabricated.