Credit Card Fraud Detection: Using Deep Learning Techniques

Sachin Lendis
6 min readNov 19, 2021

--

With the rise of e-commerce and digital payment services in a now “new normal” remote world we are using the internet to pay for goods and services has become the norm. However, this exposes us to hackers and different scales of consumer fraud.

For this case study we are going to take the role of an e-commerce store payment provider who has just clocked it’s first 1000 transactions and is in need of beefing up security to combat online credit card fraud.

One of the biggest problems with credit card fraud is that anyone can steal a users credit card details if they become exposed. Currently our e-commerce store has just been using rule-based fraud detection, specifically:

  • Is this the users first purchase?
  • Is it over ZAR 8 000?

Some other flags could be that, the IP address is coming out of Nigeria, the credit card was issued in the UK and the customer’s address is in South Africa.

Our task is to identify which transactions are fraudulent and which are non-fraudulent. This can prove to be very difficult as in the majority of cases we could pick up fraud flags when a user is not being fraudulent, this may cause our data set to be highly imbalanced. We therefore have to use various sampling techniques and metrics in order to achieve +99% overall accuracy.

According to studies 0.1% of online transactions are fraudulent, which might seem minute but with the increase of digital payments this equates to billions in revenue increasing year-on-year.

Source: Nilson Report

As we can see above according to the Nilson Report card fraud worldwide is set to increase through 2027.

Source: Statista

According to a Statista survey as of August 2017 credit cards are South Africans preferred online payment method. Increasing the likelihood of online fraud and further proving our need for an automated fraud detector.

We will be using free sample data to train our model which can be found here.

Our model will be using Deep Learning techniques in order to classify our transactions as fraudulent or non-fraudulent and depict which classifier obtains the highest overall accuracy using confusion matrices.

The visual below depicts a general perceptron and how it extracts optimal weights(w1, w2).

X — input, W — weights, B — biases

Using example inputs such as IP address(x1), customer address(x2) we multiply these input values by our weights and add them to our bias in order to get a binary output to classify our transaction as fraudulent or non-fraudulent.

The idea of training a neural network is find the optimal values of w1 and w2 as depicted in our visual example, not only for a single transaction but for our whole data set and future transactions as well.

Using one perceptron might not be enough to get an accurate output for our classification, fortunately we are able to concatenate our perceptrons into more complex neural networks called dense layers as seen below.

We will be using a Sigmoid Function and a Relu Function to train our classifier.

We will begin by importing our data. Below is a graph of the head of our data set, as our data set is quite large for time sake I have just used the head.

Now that our data is imported we will split it into our training and testing sets using SkLearn, following this our data sets need to be shaped into arrays using NumPy.

For time sake this is a slice [:5] of our X_train data set as a visual example.
For time sake this is a slice [:5] of our X_test data set as a visual example.

Once our data is processed its time to use Keras to build our deep neural network using it’s sequential model with dense layers and using relu activation for our first two layers with a dropout layer followed by 2 more dense relu activation layer and lastly a sigmoid activation layer to get our desired binary output.

Basic hypothetical example of how we will train our model using dense layers stacked on top of one another and using a relu activation function to extract our weights then finally feeding it into a sigmoid function with our bias and using a drop out function to minimize over-fitting our model to get a desired output accuracy.
Model training

As seen above we have trained our model using our training data sets on 5 epochs obtaining an overall accuracy of 0.9993.

We will now evaluate our model using our testing data sets.

Model evaluation

Our evaluated model has obtained an accuracy of 0.9994.

Below we see our two confusion matrices, as depicted we have received a high number of transactions that are fraudulent but marked as non-fraudulent.

Training data set
Full data set

We will now explore the random forest algorithm. Below is a visual depiction of how a random forest algorithm functions.

Using the same data we have processed and feeding it to our random forest classifier our confusion matrix has produced substantially improved results when using a larger data set.

Training data set
Full data set

We will now feed our data to a decision tree classifier. Decision trees are if/else trees, where we are asked a certain question and our model decides if 1 else 0 or vice versa.

Full data set

As we see above the results are only slightly worse than our random forest classifier but still an improvement on our first dense layer classifier. Let’s look at some sampling techniques to improve our dense layer neural network.

Sampling can be classified into two categories Undersampling and Oversampling as an example suppose we have a data set with 6 cats and 3 dogs, the data set is clearly unbalanced.

If we Undersample our data set we can get 3 dogs and 3 out of the 6 cats to balance our data set. If we Oversample we get 6 cats and twice the 3 dogs in order to balance our data set.

We will now use Undersampling to see if we can improve our dense classifier.

Trained using a small balanced undersampled data set
Full data set

As we can see our results are worse as we have used a counter intuitive method to undersample our data. Let’s explore another Oversampling technique called SMOTE — Synthetic Minority Oversampling Technique. Below is a visual depiction of SMOTE.

SMOTE Sampling created a vector using existing data points, as seen above the light blue points are created based on the darker blue points.
Training data set
Full data set

As we can see the SMOTE sampling has vastly improved our dense layer classification model.

Conclusion

In conclusion we have explored three deep learning techniques used to train our fraud detection model. Using the SMOTE sampling technique to improve our dense layer model we have achieved great results, but there is always room for improvement.

Ways we could improve is to try and improve our other models using sampling techniques to see if their accuracy’s can be improved.

The Notebooks for the models can be found on my Github.

--

--

Sachin Lendis
Sachin Lendis

Written by Sachin Lendis

Hi! I’m Sachin Lendis, a Multimedia Designer🧙🏾‍♂️, B.Sc Big Data student👨🏾‍🎓 and aspiring Full-Stack Machine Learning Engineer🤖👨🏾‍💻 from Cape Town🌞

No responses yet