Can We Detect Digital Fraud in a Cashless Post COVID-19 Economy Using AI?

COVID-19 has changed the way we pay, the increasing usage of digital payments has pushed the potential for digital fraud at an all-time high in our soon to be cashless economy

May 3 ·8min read

iMBvI36.jpg!web

Photo by Muhammad Raufan Yusup & Pixaby on Unsplash & Pexels

COVID-19’s Acceleration of Digitization

After the World Health Organization announced that cash could harbor the coronavirus, several countries took immediate measures in quarantining or destroying large portions of their money supply, with some going to the extent of banning the use of cash altogether, forcing customers to fully embrace digital payments.

Is This The End For Cash?

VZvYbyn.jpg!web

Photo by Blake Wisz on Unsplash

Analysts now expect global non-cash transactions to surpass the $1 trillion milestone by 2024. Not only that, but this rapid increase in both the number and dollar value of electronic transactions has analysts predicting the elimination of cash in its entirety by government legislation in the near future to prevent another spread of a pandemic.

Additionally, this rapid ascension of digital payments is transforming not only how consumers, businesses, and governments are moving money, but also how criminals steal money: digital fraud.

Digital Fraud’s Growth

Online fraud has grown by 13% to $16.9b in 2019, even as instances of fraud fell from 14.4m to 13m in 2019, hackers managed to shift their focus on higher-value fraud as opposed to multiple lower-value fraud occurrences, overall stealing an extra $3.5b in a year at 1.4m fewer transactions.

YjYNBry.jpg!web

Photo by Clint Patterson on Unsplash

This steady uptrend of fraud is now being escalated as a result of quarantined people turning to online platforms more than ever before and attempted online payment fraud is expected to increase by at least 73% in 2020.

Building An Early Warning System — Digital Fraud

To better prepare ourselves for all the threats the digital-era is bringing, we decided to create an autoencoder fraud detection model that will not only detect fraud , but also simulate rare fraudulent cases , creating more “anomaly” transactions to examine.

JfErIr6.jpg!web

Photo by Ales Nesetril on Unsplash

Problem: Imbalanced Dataset

The dataset we are using contains credit card transactions that occurred within 2 days, with 492 frauds occurring out of 284,807 transactions , which means that only 0.17% of our dataset has instances of fraud.

In essence, our dataset is highly imbalanced, which means our model would learn how to better identify normal transactions as opposed to fraudulent ones, making it entirely useless when applied against new cases of fraud.

Tradeoff: Recall vs. Precision

Our objective is to maximize recall and trade a bit of the precision, as it is less financially damaging to predict “fraud” on non-fraudulent transactions than to miss any fraudulent ones.

Solution: Autoencoders

Autoencoders are known as complex unsupervised artificial neural networks that learn how to efficiently compress & encode data to reconstruct the data.

In essence, it reconstructs the data from the reduced encoded representation to a representation that acts as the closest replication as possible to the original input.

It does this largely by learning how to ignore the noise in the data to reduce the data dimensions.

2uIVziu.jpg!web

Autoencoder: Example of Input / Output Image From MNIST Dataset

Planning Our Model

We will train an Autoencoder Neural Network in an unsupervised manner, and our simulated rare events will vary slightly from the original ones and the model will be able to predict whether a case is fraudulent or not just by the input.

Evaluating Our Model

The main metric that will be used in our project, to determine whether a transaction is fraudulent (1) or normal (0) , is the reconstruction error which will be minimized by the model.

This will allow our autoencoder to learn important features of fraud present in the data , because when a representation allows a good reconstruction of its input, it has secured much of the information present in the input.

Exploratory Data Analysis

A quick summary of the dataset shows 31 columns, in which 2 of them are Time and Amount .

Class (Target Variable)

1: Fraudulent transaction

0: Normal/Non-fraudulent transaction

The remaining 29 variables are from the PCA transformation and have been transformed for security purposes.

Variables — Digital Fraud Model

There are no missing values so we can proceed to plot the data

Visualizing The Imbalanced Dataset

mQvemeE.png!web

Highly Imbalanced Dataset — Normal to Fraud

Do Fraudulent Transactions Occur At Specific Timeframes?

AVrYZvZ.png!web

Fraudulent Transactions — Timeframe Analysis

No visible insight can be extracted with the time variable as transaction lengths seem to vary for both types of transactions.

Data Preprocessing

Data Scaling

The time variable is dropped due to irrelevancy and the values are standardized in preparation for our autoencoder model.

Train-Test Split [80:20]

Unlike most models, our primary focus doesn’t revolve around building a classification model, it is to detect anomalies , hence our train & test split will be slightly different.

To account for the imbalanced dataset, we will train our model only on normal transactions , however, we will refrain from modifying the test set , and it will still maintain the original class split to retain an accurate & unbiased evaluation of the performance of our model.

Building Our Model

Model Setup

Next, the autoencoder model is set up using an input of 14 dimensions to be fed into 4 fully connected layer s with sizes 14,7,7, and 14 respectively.

As mentioned earlier, the first 2 layers represent the encoding part and the remaining 2 layers represent the decoding part.

To build a less complex model and address over-fitting and feature selection, we incorporate Lasso (L1) regularization.

Hyperparameters for each layer are specified with the kernel initializer set to glorot_uniform and alternating sigmoid and RELU activation functions.

The reason we picked these hyperparameters were because they tend to perform well and are considered the industry standard.

X_train = X_train.valuesX_test = X_test.valuesinput_dim = X_train.shape[1]encoding_dim = 14from keras.models import Model, load_modelfrom keras.layers import Input, Densefrom keras import regularizersinput_layer = Input(shape=(X_train.shape[1], ))encoder1 = Dense(14, activation="sigmoid", kernel_initializer= "glorot_uniform",activity_regularizer=regularizers.l1(0.0003))(input_layer)encoder2 = Dense(7, activation="relu", kernel_initializer= "glorot_uniform")(encoder1)decoder1 = Dense(7, activation='sigmoid',kernel_initializer= "glorot_uniform")(encoder2)decoder2 = Dense(X_train.shape[1], activation='relu',kernel_initializer= "glorot_uniform")(decoder1)autoencoder = Model(inputs=input_layer, outputs=decoder2)

Model Training

The model is trained for 20 epochs with a batch size of 32 samples to allow the model to learn the best weights. The best model weights are defined as the weights that minimize the loss function (reconstruction error).

The model is saved using the ModelCheckpoint callback on Tensorboard.

from keras.callbacks import ModelCheckpoint, TensorBoardautoencoder.compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])checkpoint = ModelCheckpoint(filepath=r"C:\Users\Ramy\Desktop\AI\autoencode.h", verbose=0, save_best_only=True)tensorboard = TensorBoard(log_dir=r"C:\Users\Ramy\Desktop\AI\logs", histogram_freq=0, write_graph=True, write_images=True)#early_stop = EarlyStopping(monitor=’loss’, patience=2, verbose=0, mode='min')history = autoencoder.fit(X_train, X_train,epochs= 20,batch_size=32,shuffle=True,validation_data=(X_test, X_test),verbose=1,callbacks=[checkpoint, tensorboard]).history

Results

To evaluate our model’s learning capabilities, we plot the train & test model losses to verify that the higher the number of epochs, the lower our error rate.

v2yqYr6.png!web

Model Loss — Epochs vs Loss

Our MSE is rebranded as the reconstruction error, and it seems it converges well on the test & training set.

Summary Statistics of Reconstruction Error & True Class

VruMvyF.png!web

Reconstruction Error vs. True Class

We then plot the reconstruction errors for both class types ( Normal and Fraudulent ).

NBRf63R.png!web

Reconstruction — Without Fraud

6f6Zv2m.png!web

Reconstruction — With Fraud

Validation

Recall vs. Precision

High Precision:Low False Positive Rate

High Recall:Low False Negative Rate

7BRj6fq.png!web

Autoencoder Model: Recall vs. Precision

The plot shows that the values are very extreme in this case. The model can either do well in precision or recall alone but can’t have both at the same time.

Autoencoder Model’s Optimal Points:

Recall: 20%

Precision: 60%

A granular plot of Precision & Recall curves by threshold

YvmMR3Q.png!web

Autoencoder Model: Precision & Recall by Threshold

The graph shows that as the reconstruction error threshold increases, the model precision increases as well. However, the opposite is seen for recall metrics.

Model Testing & Evaluation

To finally differentiate between fraudulent and normal transactions, we will introduce a threshold value. Through using the reconstruction error from the transaction data, if the error is larger than the defined threshold, then that transaction will be marked as fraudulent.

Optimal Threshold Value= 3.2

We could have also estimated the threshold value from the test data. However, there would have been the potential of overfitting which could prove detrimental in the long run.

Visualization of Division Between Normal & Fraudulent Transactions w/ Respect to Threshold Values

zmauE3q.png!web

Reconstruction Error — Normal vs. Fraudulent

Confusion Matrix

This offers a more comprehensive overview of our model’s precision & recall values. Overall, our autoencoder model is robust with high True Positives & True Negatives (fraud vs. normal transaction detection rates)

iUnim2J.png!web

Fraudulent Transactions

Looking at the confusion matrix, we can see that there are 16+85 = 101 fraudulent transactions .

85of them were correctly classified as fraudulent and 16 of them were incorrectly classified as normal transactions.

Normal Transactions

On the other hand, 1159 are incorrectly classified as fraudulent , equivalent to approximately 2% of the total normal cases .

Original Objective

In general, it’s much more costly to mistake a fraudulent transaction as a normal transaction as opposed to the reverse.

Solution

To make sure that this objective is satisfied, we try to boost the predictive power of detecting a fraudulent transaction by trading off our ability to accurately predict normal transactions.

Conclusion

Overall, the model is relatively robust because we did catch most of the fraudulent cases. However, this can be further improved if our dataset had been a bit more balanced.

GitHub

RamyBot/Simulating-Fraudulent-Transactions-with-Autoencoders

Contribute to RamyBot/Simulating-Fraudulent-Transactions-with-Autoencoders development by creating an account on…