“DETECTING ANOMALIES IN CREDIT
CARD
TRANSECTION”
A
PROJECT SYNOPSIS
Submitted in partial fulfilment of the Requirements
For the award of Bachelor Of Commerce
LNCT UNIVERSITY, BHOPAL (M.P.)
MAJOR PROJECT-I
(Data Analysis and Interpretation)
Submitted by
ROHIT KUMAR YADAV (LNCBBBAIA128)
Under the Guidance of
Prof. YOGESH PAYASI
BACHELOR OF BUSINESS ADMINISTRATION
LNCT UNIVERSITY
1
BHOPAL (M.P.)
BATCH 2021-2024
LNCT UNIVERSITY, BHOPAL
(M.P.)
BCHELOR OF BUSINESS ADMINISTRATION
CERTIFICATE
This is to certify that the work embodied in this MAJOR PROJECT-I
(Data Analysis And Interpretation) “DETECTING ANOMALIES IN
CREDIT CARD TRANSECTION” has been satisfactorily completed
by ROHIT KUMAR YADAV (LNCBBBAIA128)
It is a Bonafide piece of work, carried out under my guidance in the
BACHELOR OF BUSINESS ADMINISTRATION, LNCT
University, Bhopal for the partial fulfillment of the BACHELOR
OFBUSINESS ADMINISTRATION degree during the academic
session January-May, 2024.
Guided By
Prof. YOGESH PAYASI
2
Approved By
Head of Department
Forwarded By
Director Dr. ARVIND SINGH
LNCT UNIVERSITY, BHOPAL
LNCT UNIVERSITY BHOPAL
(M.P)
BACHELOR OF BUSINESS ADMINISTRATION
CERTIFICATE OF APPROVAL
This foregoing MAJOR PROJECT-I (Data Analysis and Interpretation)
is hereby approved as a creditable study of a BUSINESS
ADMINISTRATION Subject carried out and presented in a manner
satisfactory to warranty its acceptance as a prerequisite to the
degree for which it has been submitted. It is understood that by this
approval the undersigned do not necessarily endorse or approve any
statement made, opinion expressed or conclusion drawn therein, but
approve the thesis only for the purpose for which it has been
submitted.
3
Signature Of The Supervisor
Date-
LNCT UNIVERSITY BHOPAL
(M.P)
BACHELOR OF BUSINESS ADMINISTRATION
DECLARATION
I, ROHIT KUMAR YADAV (LNCBBBAIA128) the student of
BACHELOR OF BUSINESS ADMINISTRATION , LNCT
University, Bhopal, hereby declare that the work presented in this
MAJOR PROJECT-I (Data Analysis And Interpretation) is outcome of
my own work, is bonafide, correct to the best of my knowledge and
4
this work has been carried out taking care of Engineering Ethics.
The work presented does not infringe any patented work and has not
been submitted to any University for the award of any degree or any
professional diploma.
ROHIT KUMAR YADAV
(LNCBBBAIA1
28)
Date -: 20/03/2024
LNCT UNIVERSITY BHOPAL
(M.P)
BACHELOR OF BUSINESS ADMINISTRATION
ACKNOWLEDMENT
5
We express our sincere indebtedness towards our guide Prof. YOGESH PAYASI,
BACHELOR OF BUSINESS ADMINISTRATION , LNCT UNIVERSITY,
Bhopal for his invaluable guidance, suggestions and supervision throughout the
work. Without his kind patronage and guidance the project would not have taken
shape. We would also like to express our gratitude and sincere regards for her kind
approval of the project, time to time counseling and advices. We would also like to
thank to our Director Dr. ARVIND SINGH, BACHELOR OF BUSINESS
ADMINISTRATION , LNCT UNIVERSITY, BHOPAL for his expert advice
and counselling from time to time. We owe sincere thanks to all the faculty
members in the department of BACHELOR OF BUSINESS
ADMINISTRATION , LNCT UNIVERSITY, Bhopal for their kind guidance
and encouragement from time to time.
6
CONTENTS
1. Abstract
2. Objectives
3. Introduction
4. System Design
5. Methodology
6. Challenges
7. Source Code
8. Output
9. Results
10. Conclusion
11. Bibliography
ABSTRACT
7
One of the most convenient ways to pay is by using a credit card. For both online and
offline transactions, it is a handy tool. Credit card numbers are used extensively in
online purchases, and there is a danger associated with this practice. There are
different systems to identify fraudulent transactions, but they only catch on when
there are many of them. The green area's layout includes the regulations and most of
the difficulties. We have fraud transaction detection systems but they can detect it
only after the occurrence of transactions. The Organizations keep detailed data
consisting of genuine transactions as well as fraudulent transactions. The fraudulent
are generally caught following a particular pattern. It is a difficult task to analyze each
transaction data among millions and billions of them. Predictive Algorithms could be
an asset for the detection of fraudulent transactions, here we need Data Mining. A
variety of statistical tests could be used for the prevention of fraud events. However,
we still have no perfect method for detecting fraudulent transactions. To, the banks,
these frauds are a major financial issue. The detection of fraudulent transactions
among genuine transactions is totally skewed towards the latter. According to the
estimation, out of 12 billion transactions made in a year, 10 million are frauds. We are
using the isolation forest algorithm and local outlier factor algorithm to analyze and
predict fraud. The accuracy and errors of both data have also been computed. To
detect credit card fraud, this article recommends the use of an algorithm called
autoencoder, which uses deep learning to identify transactions that relate to certain
coverage groups. In this neural net study, fraud on credit cards is detected using
neural nets. Random forest and long short-term memory (LSTM) that may be used to
solve the VAE problem are effective for learning order dependency in sequence
8
prediction tasks. To compress data and preserve its original structure when decoding
it, the LSTM autoencoder utilizes LSTM encoder-decoder
OBJECTIVES
Credit Card is a convenient payment mode. It is useful for both online and offline
modes of payment. For online, we need to use the Credit Card Number. The
Credit Card Number is sufficient for online transactions and that comes with a
risk. We have fraud transaction detection systems but they can detect it only after
the occurrence of transactions. The Organizations keep the detailed data
consisting of genuine transactions as well as fraudulent transactions. The
fraudulent are generally caught following a particular pattern. It is a difficult task
to analyze each and every transaction data among about millions and billions of
them. Predictive Algorithms could be a valuable asset for the detection of
fraudulent transactions, here we need Data Mining. A variety of statistical tests
could be used for the prevention of fraud events .However, we still have no
perfect method for detecting fraudulent transactions. To, the banks, these frauds
are a major financial issues. The detection of fraudulent transactions among the
genuine transactions is totally skewed towards the latter. According the
estimation, out of 12 billion transactions made in a year, 10 million are frauds. We
are using isolation forest algorithm and local outlier factor algorithm to analyze
and predict the frauds. The accuracy and errors of both the data has also been
computed. There are billions of dollars that are lost to fraudulent credit card
transactions every year. Many of these transactions are never noticed which
causes a tremendous pressure on the economical system for the financial and
9
credit institutions of interest. In addition to this, the usage of credit cards and
thus e-business are in its arise, which together causes a threat in parallel with
new developed data infringement method. The research and progress within
Machine Learning (ML) algorithms has been seen as an useful tool for the fraud
investigators. However, there are still lacking robust frameworks which provides
accurate and reliable methods within the field of ML
INTRODUCTION
In our day to day lives Credit Cards are used in daily lives to buy services and
goods using online transactions or offline transactions. In an offline purchase
, the customer uses his physical card to for the payment. If the transaction is
to be made fraudulent, the attacker needs to steal the card. If the user is
unaware of his lost card, it results in financial losses, for both the user and
the credit card company. In case of an online payment, the attackers, need
only little information to cause a fraud transaction. This ‘little information’
could be the card number. The sole method of detecting these types of
fraud is examining the patterns of transactions of each card and realizing the
abnormalities with respect to the normal pattern. The detected frauds with
the help of the purchase data of the card user can be used to lessen the
fraudulent transactions. Each and every Credit Card User has a specific
pattern , that contains, information and data regarding purchase , the
elapsed time since last buy, money used for the purchase etc. the
irregularity from such pattern is recognized as fraudulent transaction. These
10
Frauds are the issues, in finance, that can result in, many consequences. We
can define fraud as a criminal cheating that aims financial gain. The
internet’s frequent use has resulted to, a hike in the online transactions
using credit card. The Credit Card also attracts more vulnerable and fraud
events. The fraud mainly takes place because many a times, the credit card
detail and data of an individual is misappropriated, for making illegitimate
acquisition of items, withdrawing money. Online shopping is one of the most
popular trends and the various payment methods are net banking, debit
card and credit card. They eliminate any need of any physical card. If others
come to know the details, it becomes a risk. The card holder realizes the
fraud only after it has occurred. No system/model actually exists for
detecting a fraud transaction. In this project we use a dataset of about
29,000 transactions and more than one unsupervised anomaly detection
algorithms to detect transactions with good chances of being fraudulent
transactions. Also, we will be, using F1 scores, recall and precision to check
the reason of the efficiency of classification of the algorithms being
misleading. Further, we would be.
11
METHODOLOGY
To solve the actual problem in an agency setting, software engineer or a team of
engineers must incorporate a development strategy encompasses the process
method and tool and generic phase. This strategy is often referred to as a process
model or a software engineering paradigm. A process model for software
engineering is chosen base on the nature of the project and application, the
methods and tools to used, and the controls and deliverables that are required.
All software development can be categorized as a Problem Solving loop in which
12
four distinct stages are encounters. Status quo “represents the current states of
affairs”; problem definition identifies the specific problem to be solve; technical
development Solve the problem through the application of some technology, and
the solution integration delivers the result those who requested the solution in
the place.
SOFTWARE AND HARDWARE
REQUIREMENT
Software Requirement:
Operating System: Windows (10,11)
Web Browser: Mozilla, Google Chrome ,Microsoft
Database Management System: MySQL
13
Web Development System: Visual Studio
Language Used: Python,Matplotlib,Pandas,Numpy
Hardware Requirement
RAM: Minimum 1GB or higher.
HDD: Minimum 50 GB.
Processor: Intel Pentium 4 or AMD. LAN:
Version [Link](For fixing up client
disconnection).
SYSTEM DESIGN
Our Fraud detection module works as follows:-
1) The transactions and amount incoming are considered credit card
transactions
2) The incoming Transactions are used as an input to the machine learning
algorithms.
14
3) By, examining data, and observing the, pattern and using machine
learning algorithms such as isolation forest algorithm and local outlier
factor algorithm for doing anomaly detection, the output will be
resulting in either fraud or valid transaction.
4) Alarm takes the fraud transactions , to alert the user in case, a fraud
transaction has taken place and the card could be blocked for avoiding
further financial losses to the user and the company of the credit card.
5) The Genuine Transactions contain the true transactions .
CHALLENGES
Some of the challenges that we need to face are:-
1) Huge amount of data is processed everyday, so the system built must
be fast enough to detect scam in time.
2) Data is imbalanced i.e. most of the transactions are genuine, which
makes it difficult for detecting the fraud ones.
3) Data availability is a challenge because the data is mostly private.
4) The Data is misclassified, which is another major issue, as not every
fraud is caught.
5) The Scammers use Adaptive techniques against the system.
15
A few ways to tackle the challenges:-
1) The system which is being used must be fast enough to detect the
anomaly and distinguish it as a fraud, instantly.
2) For, protecting the privacy of the users, the dimensionality of the data
can be reduced.
3) We can take a more trustworthy source, for double-checking the data,
at least to train the model.
16
SOURCE CODE
# Import the required libraries
%matplotlib inline
import numpy as np
import pandas as pd
import sklearn
import scipy
import [Link] as plt
import seaborn as sns
from [Link] import classification_report,accuracy_score
from [Link] import IsolationForest
from [Link] import LocalOutlierFactor
from [Link] import OneClassSVM
from pylab import rcParams
rcParams['[Link]']=14, 8
RANDOM_SEED=42
LABLES=["NORMAL","FRAUD"]
import [Link] as px
import plotly.graph_objects as go
import [Link] as pio
[Link]
import plotly.figure_factory as ff
from [Link] import init_notebook_mode, iplot
#importing thecsv data file
17
data=pd.read_csv(r'C:\Users\SOUMAN MANDAL\OneDrive\Desktop\[Link]')
[Link](21)
data1=[Link](frac=0.1,random_state=1)
[Link]
# checking the missing values
[Link]().sum()
[Link]()
# determining the number of fraud and valid transection in the entirer dataset
count_classes=pd.value_counts(data['Class'],sort=True)
count_classes.plot(kind='bar',rot=0)
[Link]("Transection Class Distribution")
[Link](range(2), LABLES)
[Link]("Class")
[Link]("frequency")
[Link]()
# Assining the transection class "0=normal & 1=fraud"
Normal=data[data['Class']==0]
Fraud=data[data['Class']==1]
[Link]
[Link]
# How different are the amount of money used in different transection classes?
[Link]()
# How different are the amount of money used in different transection classes?
[Link]()
#lets have a more graphical representation of the data
f,(ax1,ax2)=[Link](2,1,sharex=True)
[Link]('Amount per transection by class')
18
bins=50
[Link]([Link],bins=bins)
ax1.set_title('Fraud')
[Link]([Link],bins=bins)
ax2.set_title('Normal')
[Link]('Amount($)')
[Link]('number of transection')
[Link]((0,20000))
[Link]('log')
[Link]()
#Graphical representation of data
f, (ax1, ax2) = [Link](2, 1, sharex=True)
[Link]('Time of transaction vs Amount by class')
[Link]([Link], [Link])
ax1.set_title('Fraud')
[Link]([Link], [Link])
ax2.set_title('Normal')
[Link]('Time (in Seconds)')
[Link]('Amount')
[Link]();
#Create a trace
trace = [Link](
x = [Link],
y = [Link],
mode = 'markers'
data = [trace]
19
[Link]({
"data": data
})
[Link]
trace = [Link](
x = [Link],
y = [Link],
mode = 'markers'
data = [trace]
[Link]({
"data": data
})
[Link]
#Determine the number of fraud and valid transactions in the dataset.
Fraud = data1[data1['Class']==1]
Valid = data1[data1['Class']==0]
outlier_fraction = len(Fraud)/float(len(Valid))
##Now let us print the outlier fraction and no of Fraud and Valid Transaction cases
print(outlier_fraction)
print("Fraud Cases : {}".format(len(Fraud)))
print("Valid Cases : {}".format(len(Valid)))
#Correlation Matrix
correlation_matrix = [Link]()
fig = [Link](figsize=(12,9))
[Link](correlation_matrix,vmax=0.8,square = True)
[Link]()
20
#Get all the columns from the dataframe
columns = [Link]()
# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]
# Store the variable we are predicting
target = "Class"
# Define a random state
state = [Link](42)
X = data1[columns]
Y = data1[target]
X_outliers = [Link](low=0, high=1, size=([Link][0], [Link][1]))
# Print the shapes of X & Y
print([Link])
print([Link])
#Define the outlier detection methods
classifiers = {
"Isolation Forest":IsolationForest(n_estimators=100, max_samples=len(X),
contamination=outlier_fraction,random_state=state, verbose=0),
"Local Outlier Factor":LocalOutlierFactor(n_neighbors=20, algorithm='auto',
leaf_size=30, metric='minkowski',
p=2, metric_params=None, contamination=outlier_fraction),
"Support Vector Machine":OneClassSVM(kernel='rbf', degree=3, gamma=0.1,nu=0.05,max_iter=-1,
random_state=state)
#Fit the model
n_outliers=len(Fraud)
for i, (clf_name,clf)in enumerate([Link]()):
#Fit the data and tag outliers
21
if:
clf_name=="Local Outlier Factor":
y_pred=clif.fit_predict(X)
scores_prediction=clf.negative_outlier_factor_
elif:
clf_name=="Support Vector Machine":
[Link](X)
y_pred=[Link](X)
else:
[Link](X)
scores_prediction=clf.decision_function(X)
y_pred=[Link](X)
# Reshape the prediction value to 0 for Valid transection, 1 for fraud transection
y_pred[y_pred==1]=0
y_pred[y_pred==-1]=1
n_errors=(y_pred!=Y).sum()
# Run classification metrics
print("Accuracy Score:")
print(accuracy_score(Y,y_pred))
print("Classification Report:")
print(classification_report(Y,y_pred))
OUTPUT
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
RESULTS
In complex datasets, like the one we have used, isolation forest proves to be a
good method as in 30% of all times, it can detect fraudulent transactions.
In case of Local Outlier factor Algorithm, the total number of errors is 173, and
that’s comparatively high, and it is 99.696% (approx.) accurate. f1-score and
precision are not that good. We have a precision of 100% for class 0 and very
less amount of fraudulent transactions are found for class 1.
In case of Isolation Forest Algorithm, the total number of errors is 127, and
that’s relatively low, and it is 99.777% (approx.) accurate. We get 30%
precision for class 1. F1-scores are better than those of the local outlier factor
algorithm.
Isolation Forest Method has given us better results.
We have also compared our methods, Isolation Forest Algorithm and Local Outlier
Factor Algorithm .
37
1.2
0.8
Precision
0.6
Recall
0.4
F1-Score
0.2
Isolation Forest(0) Isolation Forest(1)
1.2
0.8
0.6 Precision
0.4 Recall
0.2 F1-Score
Local Outlier Local Outlier
Factor(0) Factor(1)
Algorithm Accuracy(
%)
Random Forest 95.5
Decision Tree 94.3
Logistic Regression 90
Isolation Forest 99.77
Local Outlier Factor 99.69
38
100
98
96
94
92
90
88
86
84 Accuracy
Random Decision Tree Logistic Isolation Local Outlier
Forest Regression Forest Factor
39
CONCLUSION
The dataset of type (.csv) was imported, pre-processed, explored, and described,
histogram was plotted, to check the unusual parameters. A correlation matrix has been
done to know the important parameters for the class. The algorithms being used by us
are Isolation Forest Algorithm and Local Outlier Factor Algorithm for anomaly detection.
We have also understood the significance of examining, precision, and data. We have
also noticed that, compared to the local outlier factor, Isolation Forest has relatively
better efficiency, precision, f1, and recall scores. Neural Networks could be used in the
future to train the system for being more accurate [5]. Fraud detection in credit cards
needs a lot of planning, before applying, the algorithms of Machine Learning to it.
Hence, we can say that it is a complex issue. However, it makes sure that the card user’s
finance is safe. So, we can also say that it is the application of machine learning and data
science, made for the welfare of the people. Our Proposed methods gave us the highest
accuracies. Implementation of the system, using neural networks, for training the
system, to obtain better accuracy, will be included in the Future Work.
The following are the advantages: --
1) Reduced number of fraudulent transactions.
2) Credit Cards can be safely used, for online transactions, by the
user.
3) There is more security.
40
There are a few disadvantages, they are as follows: -
1) Huge Datasets are good for the machine learning algorithms to work. For less
amount of data, the result might be inaccurate.
2) Quite a lot of data, would be needed for the machine learning algorithms to be
more accurate.
41
BIBLIOGRAPHY
1. Dataset collected from [Link]
A. Srivastava, M. Yadav, S. Basu, S. Salunkhe and M. Shabad, "Credit card fraud detection
at merchant side using neural networks," 2016 3rd International Conference on
Computing for Sustainable Global
2. Development ([Link]), New Delhi, 2016, pp. 667-670.
3. W. Yu and N. Wang, "Research on Credit Card Fraud Detection Model Based on
Distance Sum," 2009 International Joint Conference on Artificial Intelligence, Hainan Island,
2009, pp. [Link]: 10.1109/JCAI.2009.146\
4. “Ensemble learning for credit card fraud detection,” by I Sohony, R Pratap, and U Nambiar,
2018.
5. Eduonix.(2018,July26).Eduonix/creditcardML.
6. Retrieved from [Link]
7. [Link]
8. “Credit Card Fraud Detection Using Machine Learning methodologies” by H. A. Shukur ,2019.
9. “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy”, IEEE ,
2018.
10. “[Link].”
[Link] 17- [Link]
[Accessed 6 December 2020].
11. “Comparative Analysis of Machine Learning Algorithm through Credit Card Fraud
Detection” by
12. R Banerjee, G Bourla, S Chen, S Purohit, and J Battipagli, 2018.
13. “Credit Card Fraud Detection using Local Outlier Factor”, Int. J. Pure Appl. Math., by D
Tripathi, T Lone, Y Sharma, and S Dwivedi, 2018.
42
14. “Credit Card Fraud Detection Using AdaBoost and Majority Voting”, IEEE Access, by C
P Lim, M Seera, A K Nandi, K. Randhawa, and C. K. Loo,2018.
43
47
44
45