0% found this document useful (0 votes)
39 views45 pages

Python

The document is a project synopsis submitted by Rohit Kumar Yadav for the detection of anomalies in credit card transactions as part of his Bachelor of Business Administration degree at LNCT University, Bhopal. It outlines the objectives, methodology, and challenges faced in developing a fraud detection system using machine learning algorithms like isolation forest and local outlier factor. The project aims to improve the identification of fraudulent transactions amidst a vast amount of genuine data, addressing the financial impact of credit card fraud.

Uploaded by

ry9637353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views45 pages

Python

The document is a project synopsis submitted by Rohit Kumar Yadav for the detection of anomalies in credit card transactions as part of his Bachelor of Business Administration degree at LNCT University, Bhopal. It outlines the objectives, methodology, and challenges faced in developing a fraud detection system using machine learning algorithms like isolation forest and local outlier factor. The project aims to improve the identification of fraudulent transactions amidst a vast amount of genuine data, addressing the financial impact of credit card fraud.

Uploaded by

ry9637353
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 45

“DETECTING ANOMALIES IN CREDIT

CARD
TRANSECTION”
A
PROJECT SYNOPSIS
Submitted in partial fulfilment of the Requirements
For the award of Bachelor Of Commerce

LNCT UNIVERSITY, BHOPAL (M.P.)

MAJOR PROJECT-I
(Data Analysis and Interpretation)

Submitted by

ROHIT KUMAR YADAV (LNCBBBAIA128)

Under the Guidance of

Prof. YOGESH PAYASI

BACHELOR OF BUSINESS ADMINISTRATION


LNCT UNIVERSITY

1
BHOPAL (M.P.)

BATCH 2021-2024

LNCT UNIVERSITY, BHOPAL


(M.P.)
BCHELOR OF BUSINESS ADMINISTRATION

CERTIFICATE

This is to certify that the work embodied in this MAJOR PROJECT-I


(Data Analysis And Interpretation) “DETECTING ANOMALIES IN
CREDIT CARD TRANSECTION” has been satisfactorily completed
by ROHIT KUMAR YADAV (LNCBBBAIA128)
It is a Bonafide piece of work, carried out under my guidance in the
BACHELOR OF BUSINESS ADMINISTRATION, LNCT
University, Bhopal for the partial fulfillment of the BACHELOR
OFBUSINESS ADMINISTRATION degree during the academic
session January-May, 2024.

Guided By

Prof. YOGESH PAYASI

2
Approved By

Head of Department

Forwarded By
Director Dr. ARVIND SINGH

LNCT UNIVERSITY, BHOPAL

LNCT UNIVERSITY BHOPAL


(M.P)
BACHELOR OF BUSINESS ADMINISTRATION

CERTIFICATE OF APPROVAL

This foregoing MAJOR PROJECT-I (Data Analysis and Interpretation)


is hereby approved as a creditable study of a BUSINESS
ADMINISTRATION Subject carried out and presented in a manner
satisfactory to warranty its acceptance as a prerequisite to the
degree for which it has been submitted. It is understood that by this
approval the undersigned do not necessarily endorse or approve any
statement made, opinion expressed or conclusion drawn therein, but
approve the thesis only for the purpose for which it has been
submitted.

3
Signature Of The Supervisor

Date-

LNCT UNIVERSITY BHOPAL


(M.P)
BACHELOR OF BUSINESS ADMINISTRATION

DECLARATION

I, ROHIT KUMAR YADAV (LNCBBBAIA128) the student of


BACHELOR OF BUSINESS ADMINISTRATION , LNCT
University, Bhopal, hereby declare that the work presented in this
MAJOR PROJECT-I (Data Analysis And Interpretation) is outcome of
my own work, is bonafide, correct to the best of my knowledge and
4
this work has been carried out taking care of Engineering Ethics.
The work presented does not infringe any patented work and has not
been submitted to any University for the award of any degree or any
professional diploma.

ROHIT KUMAR YADAV


(LNCBBBAIA1
28)
Date -: 20/03/2024

LNCT UNIVERSITY BHOPAL


(M.P)
BACHELOR OF BUSINESS ADMINISTRATION

ACKNOWLEDMENT
5
We express our sincere indebtedness towards our guide Prof. YOGESH PAYASI,
BACHELOR OF BUSINESS ADMINISTRATION , LNCT UNIVERSITY,
Bhopal for his invaluable guidance, suggestions and supervision throughout the
work. Without his kind patronage and guidance the project would not have taken
shape. We would also like to express our gratitude and sincere regards for her kind
approval of the project, time to time counseling and advices. We would also like to
thank to our Director Dr. ARVIND SINGH, BACHELOR OF BUSINESS
ADMINISTRATION , LNCT UNIVERSITY, BHOPAL for his expert advice
and counselling from time to time. We owe sincere thanks to all the faculty
members in the department of BACHELOR OF BUSINESS
ADMINISTRATION , LNCT UNIVERSITY, Bhopal for their kind guidance
and encouragement from time to time.

6
CONTENTS

1. Abstract

2. Objectives
3. Introduction
4. System Design
5. Methodology
6. Challenges
7. Source Code
8. Output
9. Results
10. Conclusion
11. Bibliography

ABSTRACT
7
One of the most convenient ways to pay is by using a credit card. For both online and
offline transactions, it is a handy tool. Credit card numbers are used extensively in
online purchases, and there is a danger associated with this practice. There are
different systems to identify fraudulent transactions, but they only catch on when
there are many of them. The green area's layout includes the regulations and most of
the difficulties. We have fraud transaction detection systems but they can detect it
only after the occurrence of transactions. The Organizations keep detailed data
consisting of genuine transactions as well as fraudulent transactions. The fraudulent
are generally caught following a particular pattern. It is a difficult task to analyze each
transaction data among millions and billions of them. Predictive Algorithms could be
an asset for the detection of fraudulent transactions, here we need Data Mining. A
variety of statistical tests could be used for the prevention of fraud events. However,
we still have no perfect method for detecting fraudulent transactions. To, the banks,
these frauds are a major financial issue. The detection of fraudulent transactions
among genuine transactions is totally skewed towards the latter. According to the
estimation, out of 12 billion transactions made in a year, 10 million are frauds. We are
using the isolation forest algorithm and local outlier factor algorithm to analyze and
predict fraud. The accuracy and errors of both data have also been computed. To
detect credit card fraud, this article recommends the use of an algorithm called
autoencoder, which uses deep learning to identify transactions that relate to certain
coverage groups. In this neural net study, fraud on credit cards is detected using
neural nets. Random forest and long short-term memory (LSTM) that may be used to
solve the VAE problem are effective for learning order dependency in sequence

8
prediction tasks. To compress data and preserve its original structure when decoding
it, the LSTM autoencoder utilizes LSTM encoder-decoder

OBJECTIVES
Credit Card is a convenient payment mode. It is useful for both online and offline
modes of payment. For online, we need to use the Credit Card Number. The
Credit Card Number is sufficient for online transactions and that comes with a
risk. We have fraud transaction detection systems but they can detect it only after
the occurrence of transactions. The Organizations keep the detailed data
consisting of genuine transactions as well as fraudulent transactions. The
fraudulent are generally caught following a particular pattern. It is a difficult task
to analyze each and every transaction data among about millions and billions of
them. Predictive Algorithms could be a valuable asset for the detection of
fraudulent transactions, here we need Data Mining. A variety of statistical tests
could be used for the prevention of fraud events .However, we still have no
perfect method for detecting fraudulent transactions. To, the banks, these frauds
are a major financial issues. The detection of fraudulent transactions among the
genuine transactions is totally skewed towards the latter. According the
estimation, out of 12 billion transactions made in a year, 10 million are frauds. We
are using isolation forest algorithm and local outlier factor algorithm to analyze
and predict the frauds. The accuracy and errors of both the data has also been
computed. There are billions of dollars that are lost to fraudulent credit card
transactions every year. Many of these transactions are never noticed which
causes a tremendous pressure on the economical system for the financial and
9
credit institutions of interest. In addition to this, the usage of credit cards and
thus e-business are in its arise, which together causes a threat in parallel with
new developed data infringement method. The research and progress within
Machine Learning (ML) algorithms has been seen as an useful tool for the fraud
investigators. However, there are still lacking robust frameworks which provides
accurate and reliable methods within the field of ML

INTRODUCTION

In our day to day lives Credit Cards are used in daily lives to buy services and
goods using online transactions or offline transactions. In an offline purchase
, the customer uses his physical card to for the payment. If the transaction is
to be made fraudulent, the attacker needs to steal the card. If the user is
unaware of his lost card, it results in financial losses, for both the user and
the credit card company. In case of an online payment, the attackers, need
only little information to cause a fraud transaction. This ‘little information’
could be the card number. The sole method of detecting these types of
fraud is examining the patterns of transactions of each card and realizing the
abnormalities with respect to the normal pattern. The detected frauds with
the help of the purchase data of the card user can be used to lessen the
fraudulent transactions. Each and every Credit Card User has a specific
pattern , that contains, information and data regarding purchase , the
elapsed time since last buy, money used for the purchase etc. the
irregularity from such pattern is recognized as fraudulent transaction. These
10
Frauds are the issues, in finance, that can result in, many consequences. We
can define fraud as a criminal cheating that aims financial gain. The
internet’s frequent use has resulted to, a hike in the online transactions
using credit card. The Credit Card also attracts more vulnerable and fraud
events. The fraud mainly takes place because many a times, the credit card
detail and data of an individual is misappropriated, for making illegitimate
acquisition of items, withdrawing money. Online shopping is one of the most
popular trends and the various payment methods are net banking, debit
card and credit card. They eliminate any need of any physical card. If others
come to know the details, it becomes a risk. The card holder realizes the
fraud only after it has occurred. No system/model actually exists for
detecting a fraud transaction. In this project we use a dataset of about
29,000 transactions and more than one unsupervised anomaly detection
algorithms to detect transactions with good chances of being fraudulent
transactions. Also, we will be, using F1 scores, recall and precision to check
the reason of the efficiency of classification of the algorithms being
misleading. Further, we would be.

11
METHODOLOGY
To solve the actual problem in an agency setting, software engineer or a team of

engineers must incorporate a development strategy encompasses the process

method and tool and generic phase. This strategy is often referred to as a process

model or a software engineering paradigm. A process model for software

engineering is chosen base on the nature of the project and application, the

methods and tools to used, and the controls and deliverables that are required.

All software development can be categorized as a Problem Solving loop in which

12
four distinct stages are encounters. Status quo “represents the current states of

affairs”; problem definition identifies the specific problem to be solve; technical

development Solve the problem through the application of some technology, and

the solution integration delivers the result those who requested the solution in

the place.

SOFTWARE AND HARDWARE


REQUIREMENT

Software Requirement:

Operating System: Windows (10,11)

Web Browser: Mozilla, Google Chrome ,Microsoft

Database Management System: MySQL

13
Web Development System: Visual Studio

Language Used: Python,Matplotlib,Pandas,Numpy

Hardware Requirement

RAM: Minimum 1GB or higher.


HDD: Minimum 50 GB.
Processor: Intel Pentium 4 or AMD. LAN:
Version [Link](For fixing up client
disconnection).

SYSTEM DESIGN

Our Fraud detection module works as follows:-

1) The transactions and amount incoming are considered credit card


transactions

2) The incoming Transactions are used as an input to the machine learning


algorithms.

14
3) By, examining data, and observing the, pattern and using machine
learning algorithms such as isolation forest algorithm and local outlier
factor algorithm for doing anomaly detection, the output will be
resulting in either fraud or valid transaction.

4) Alarm takes the fraud transactions , to alert the user in case, a fraud
transaction has taken place and the card could be blocked for avoiding
further financial losses to the user and the company of the credit card.

5) The Genuine Transactions contain the true transactions .

CHALLENGES

Some of the challenges that we need to face are:-

1) Huge amount of data is processed everyday, so the system built must


be fast enough to detect scam in time.
2) Data is imbalanced i.e. most of the transactions are genuine, which
makes it difficult for detecting the fraud ones.
3) Data availability is a challenge because the data is mostly private.
4) The Data is misclassified, which is another major issue, as not every
fraud is caught.
5) The Scammers use Adaptive techniques against the system.
15
A few ways to tackle the challenges:-

1) The system which is being used must be fast enough to detect the
anomaly and distinguish it as a fraud, instantly.
2) For, protecting the privacy of the users, the dimensionality of the data
can be reduced.
3) We can take a more trustworthy source, for double-checking the data,
at least to train the model.

16
SOURCE CODE

# Import the required libraries

%matplotlib inline

import numpy as np

import pandas as pd

import sklearn

import scipy

import [Link] as plt

import seaborn as sns

from [Link] import classification_report,accuracy_score

from [Link] import IsolationForest

from [Link] import LocalOutlierFactor

from [Link] import OneClassSVM

from pylab import rcParams

rcParams['[Link]']=14, 8

RANDOM_SEED=42

LABLES=["NORMAL","FRAUD"]

import [Link] as px

import plotly.graph_objects as go

import [Link] as pio

[Link]

import plotly.figure_factory as ff

from [Link] import init_notebook_mode, iplot

#importing thecsv data file


17
data=pd.read_csv(r'C:\Users\SOUMAN MANDAL\OneDrive\Desktop\[Link]')

[Link](21)

data1=[Link](frac=0.1,random_state=1)

[Link]

# checking the missing values

[Link]().sum()

[Link]()

# determining the number of fraud and valid transection in the entirer dataset

count_classes=pd.value_counts(data['Class'],sort=True)

count_classes.plot(kind='bar',rot=0)

[Link]("Transection Class Distribution")

[Link](range(2), LABLES)

[Link]("Class")

[Link]("frequency")

[Link]()

# Assining the transection class "0=normal & 1=fraud"

Normal=data[data['Class']==0]

Fraud=data[data['Class']==1]

[Link]

[Link]

# How different are the amount of money used in different transection classes?

[Link]()

# How different are the amount of money used in different transection classes?

[Link]()

#lets have a more graphical representation of the data

f,(ax1,ax2)=[Link](2,1,sharex=True)

[Link]('Amount per transection by class')

18
bins=50

[Link]([Link],bins=bins)

ax1.set_title('Fraud')

[Link]([Link],bins=bins)

ax2.set_title('Normal')

[Link]('Amount($)')

[Link]('number of transection')

[Link]((0,20000))

[Link]('log')

[Link]()

#Graphical representation of data

f, (ax1, ax2) = [Link](2, 1, sharex=True)

[Link]('Time of transaction vs Amount by class')

[Link]([Link], [Link])

ax1.set_title('Fraud')

[Link]([Link], [Link])

ax2.set_title('Normal')

[Link]('Time (in Seconds)')

[Link]('Amount')

[Link]();

#Create a trace

trace = [Link](

x = [Link],

y = [Link],

mode = 'markers'

data = [trace]

19
[Link]({

"data": data

})

[Link]

trace = [Link](

x = [Link],

y = [Link],

mode = 'markers'

data = [trace]

[Link]({

"data": data

})

[Link]

#Determine the number of fraud and valid transactions in the dataset.

Fraud = data1[data1['Class']==1]

Valid = data1[data1['Class']==0]

outlier_fraction = len(Fraud)/float(len(Valid))

##Now let us print the outlier fraction and no of Fraud and Valid Transaction cases

print(outlier_fraction)

print("Fraud Cases : {}".format(len(Fraud)))

print("Valid Cases : {}".format(len(Valid)))

#Correlation Matrix

correlation_matrix = [Link]()

fig = [Link](figsize=(12,9))

[Link](correlation_matrix,vmax=0.8,square = True)

[Link]()

20
#Get all the columns from the dataframe

columns = [Link]()

# Filter the columns to remove data we do not want

columns = [c for c in columns if c not in ["Class"]]

# Store the variable we are predicting

target = "Class"

# Define a random state

state = [Link](42)

X = data1[columns]

Y = data1[target]

X_outliers = [Link](low=0, high=1, size=([Link][0], [Link][1]))

# Print the shapes of X & Y

print([Link])

print([Link])

#Define the outlier detection methods

classifiers = {

"Isolation Forest":IsolationForest(n_estimators=100, max_samples=len(X),

contamination=outlier_fraction,random_state=state, verbose=0),

"Local Outlier Factor":LocalOutlierFactor(n_neighbors=20, algorithm='auto',

leaf_size=30, metric='minkowski',

p=2, metric_params=None, contamination=outlier_fraction),

"Support Vector Machine":OneClassSVM(kernel='rbf', degree=3, gamma=0.1,nu=0.05,max_iter=-1,


random_state=state)

#Fit the model

n_outliers=len(Fraud)

for i, (clf_name,clf)in enumerate([Link]()):

#Fit the data and tag outliers


21
if:

clf_name=="Local Outlier Factor":

y_pred=clif.fit_predict(X)

scores_prediction=clf.negative_outlier_factor_

elif:

clf_name=="Support Vector Machine":

[Link](X)

y_pred=[Link](X)

else:

[Link](X)

scores_prediction=clf.decision_function(X)

y_pred=[Link](X)

# Reshape the prediction value to 0 for Valid transection, 1 for fraud transection

y_pred[y_pred==1]=0

y_pred[y_pred==-1]=1

n_errors=(y_pred!=Y).sum()

# Run classification metrics

print("Accuracy Score:")

print(accuracy_score(Y,y_pred))

print("Classification Report:")

print(classification_report(Y,y_pred))

OUTPUT

22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
RESULTS
In complex datasets, like the one we have used, isolation forest proves to be a
good method as in 30% of all times, it can detect fraudulent transactions.

In case of Local Outlier factor Algorithm, the total number of errors is 173, and
that’s comparatively high, and it is 99.696% (approx.) accurate. f1-score and
precision are not that good. We have a precision of 100% for class 0 and very
less amount of fraudulent transactions are found for class 1.

In case of Isolation Forest Algorithm, the total number of errors is 127, and
that’s relatively low, and it is 99.777% (approx.) accurate. We get 30%
precision for class 1. F1-scores are better than those of the local outlier factor
algorithm.

Isolation Forest Method has given us better results.


We have also compared our methods, Isolation Forest Algorithm and Local Outlier
Factor Algorithm .

37
1.2

0.8
Precision
0.6
Recall
0.4
F1-Score
0.2

Isolation Forest(0) Isolation Forest(1)

1.2

0.8
0.6 Precision
0.4 Recall
0.2 F1-Score

Local Outlier Local Outlier


Factor(0) Factor(1)

Algorithm Accuracy(
%)
Random Forest 95.5
Decision Tree 94.3
Logistic Regression 90

Isolation Forest 99.77


Local Outlier Factor 99.69

38
100
98
96
94
92
90
88
86
84 Accuracy

Random Decision Tree Logistic Isolation Local Outlier


Forest Regression Forest Factor

39
CONCLUSION

The dataset of type (.csv) was imported, pre-processed, explored, and described,

histogram was plotted, to check the unusual parameters. A correlation matrix has been

done to know the important parameters for the class. The algorithms being used by us

are Isolation Forest Algorithm and Local Outlier Factor Algorithm for anomaly detection.

We have also understood the significance of examining, precision, and data. We have

also noticed that, compared to the local outlier factor, Isolation Forest has relatively

better efficiency, precision, f1, and recall scores. Neural Networks could be used in the

future to train the system for being more accurate [5]. Fraud detection in credit cards

needs a lot of planning, before applying, the algorithms of Machine Learning to it.

Hence, we can say that it is a complex issue. However, it makes sure that the card user’s

finance is safe. So, we can also say that it is the application of machine learning and data

science, made for the welfare of the people. Our Proposed methods gave us the highest

accuracies. Implementation of the system, using neural networks, for training the

system, to obtain better accuracy, will be included in the Future Work.

The following are the advantages: --

1) Reduced number of fraudulent transactions.

2) Credit Cards can be safely used, for online transactions, by the

user.

3) There is more security.


40
There are a few disadvantages, they are as follows: -

1) Huge Datasets are good for the machine learning algorithms to work. For less

amount of data, the result might be inaccurate.

2) Quite a lot of data, would be needed for the machine learning algorithms to be

more accurate.

41
BIBLIOGRAPHY
1. Dataset collected from [Link]
A. Srivastava, M. Yadav, S. Basu, S. Salunkhe and M. Shabad, "Credit card fraud detection

at merchant side using neural networks," 2016 3rd International Conference on


Computing for Sustainable Global

2. Development ([Link]), New Delhi, 2016, pp. 667-670.

3. W. Yu and N. Wang, "Research on Credit Card Fraud Detection Model Based on


Distance Sum," 2009 International Joint Conference on Artificial Intelligence, Hainan Island,
2009, pp. [Link]: 10.1109/JCAI.2009.146\

4. “Ensemble learning for credit card fraud detection,” by I Sohony, R Pratap, and U Nambiar,
2018.

5. Eduonix.(2018,July26).Eduonix/creditcardML.

6. Retrieved from [Link]

7. [Link]

8. “Credit Card Fraud Detection Using Machine Learning methodologies” by H. A. Shukur ,2019.

9. “Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy”, IEEE ,
2018.

10. “[Link].”
[Link] 17- [Link]
[Accessed 6 December 2020].

11. “Comparative Analysis of Machine Learning Algorithm through Credit Card Fraud
Detection” by

12. R Banerjee, G Bourla, S Chen, S Purohit, and J Battipagli, 2018.

13. “Credit Card Fraud Detection using Local Outlier Factor”, Int. J. Pure Appl. Math., by D
Tripathi, T Lone, Y Sharma, and S Dwivedi, 2018.

42
14. “Credit Card Fraud Detection Using AdaBoost and Majority Voting”, IEEE Access, by C
P Lim, M Seera, A K Nandi, K. Randhawa, and C. K. Loo,2018.

43
47

44
45

You might also like