ANALYSIS ON CREDIT CARD FRAUD DETECTION
USING MACHINE LEARNING APPROACHES
Submitted to
SAVEETHA INSTITUTE OF MEDICAL AND TECHNICAL SCIENCES
in partial fulfilment for the award of the degree of
BACHELOR OF ENGINEERING IN COMPUTER SCIENCE AND
ENGINEERING
By
PAWAN KUMAR
(Reg. No. 191511034)
Supervisor
Mr.FAHAD IQBAL
SAVEETHA SCHOOL OF ENGINEERING
SIMATS, CHENNAI – 602 105
JAN 2019
BONAFIDE CERTIFICATE
Certified that this project report “ANALYSIS ON CREDIT CARD FRAUD
DETECTION USING MACHINE LEARNING ALGORITHMS” is the
bonafide work of “PAWAN KUMAR (Reg. No. 191511034) who carried out the
project work under my supervision.
SIGNATURE SIGNATURE
Dr. CHOKKALINGAM Mr. FAHAD IQBAL
HEAD OF THE DEPARTMENT SUPERVISOR
Department of CSE Assistant Professor, Dept. of CSE
Saveetha School of Engineering Saveetha School of Engineering
Saveetha University Saveetha University
INTERNAL EXAMINER EXTERNAL EXAMINER
DECLARATION BY THE CANDIDATE
I declare that the report entitled “ANALYSIS ON CREDIT CARD FRAUD
DETECTION USING MACHINE LEARNING ALGORITHMS ” submitted by
me for the degree of Bachelor of Engineering is the record of the project work carried
out by me under the guidance of “Mr. FAHAD IQBAL” and furthermore this work
has not formed the basis for the award of any degree or diploma in this or any other
University or other similar institution of higher learning.
SIGNATURE
PAWAN KUMAR
(Reg. No. 191511034)
ABSTRACT
Financial fraud is an ever growing menace with far consequences in the financial industry. Data
mining had played an imperative role in the detection of credit card fraud in online transactions.
Credit card fraud detection, which is a data mining problem, becomes challenging due to two major
reasons – first, the profiles of normal and fraudulent behaviours change constantly and secondly,
credit card fraud data sets are highly skewed. The performance of fraud detection in credit card
transactions is greatly affected by the sampling approach on dataset, selection of variables and
detection technique(s) used. This paper investigates the performance of naïve bayes, k-nearest
neighbor and logistic regression on highly skewed credit card fraud data. Dataset of credit card
transactions is sourced from European cardholders containing 284,807 transactions. A hybrid
technique of under-sampling and oversampling is carried out on the skewed data. The three
techniques are applied on the raw and preprocessed data. The work is implemented in Python. The
performance of the techniques is evaluated based on accuracy, sensitivity, specificity, precision,
Matthews correlation coefficient and balanced classification rate. The results shows of optimal
accuracy for naïve bayes, k-nearest neighbor and logistic regression classifiers are 97.92%,
97.69% and 54.86% respectively. The comparative results show that k-nearest neighbour performs
better than naïve bayes and logistic regression techniques.
Keywords—credit card fraud; data mining; naïve bayes; decision tree; logistic regression,
comparative analysis
ACKNOWLEDGEMENT
This project work would not have been possible without the contribution of many people.
It gives me immense pleasure to express my profound gratitude to our honorable Chancellor
Dr. N. M. Veeraiyan, Saveetha University, for his blessings and for being a source of inspiration.
I sincerely thank our Vice Chancellor Dr. Jawahar Nesan for his visionary thoughts and support.
I am indebted to extend my gratitude to our Director madam Mrs. Ramya Deepak, Saveetha
School of Engineering, for facilitating us all the facilities and extended support to gain valuable
education and learning experience.
I register my special thanks to Dr. D. Dhanasekaran, Principal, Saveetha School of
Engineering Dr.S.P.Chokkalingam ,HOD, Department of Computer Science And Engineering,
for the support given to me in the successful conduct of this project. I wish to express my sincere
gratitude to my supervisor Mr. Fahad Iqbal, for her inspiring guidance, personal involvement
and constant encouragement during the entire course of this work.
I am grateful to Project Coordinators, Review Panel External and Internal Members and
the entire faculty of the Department of Computer Science and Engineering, for their constructive
criticisms and valuable suggestions which have been a rich source to improve the quality of this
work.
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO
ABSTRACT IV
LIST OF FIGURES VIII
1 INTRODUCTION 1
1.1 PROBLEM STATEMENT 2
1.2 OBJECTIVES 3
1.3 SCOPE OF THE PROJECT 4
2 LITERATURE REVIEW 4
3 PROBLEM STATEMENT AND METHODOLOGY
3.1 PROBLEM STATEMENT 5
3.2 METHODOLOGY 6
3.2.1 EXISTING SYSTEM 7
3.2.2 PROPOSED SYSTEM
3.4 SYSTEM ARCHITECTURE 12
3.5 UML DIAGRAMS
3.5.1 USE CASE DIAGRAM 13
3.5.2 CLASS DIAGRAM 17
3.5.3 SEQUENCE DIAGRAM 15
3.5.4 ACTIVITY DIAGRAM 18
3.6 MODULES AND DESCRIPTION 19
3.6.1 HOME PAGE 20
3.6.2 COMPOSING MESSAGE
3.6.3 CREATING A SECRET CODE 21
3.6.4 DECRYPTING THE MESSAGE 21
3.6.5 ENCRYPTED IMAGE 22
CONCLUSION AND REFERENCES 24
LIST OF FIGURES
FIGURE
NO. TITLE PAGE NO.
3.2 Architecture Diagram 20
3.3.1 Use case Diagram 21
3.3.2 Class Diagram 22
3.3.3 Sequence Diagram 23
3.3.4 Collaboration Diagram 24
3.3.5 Deployment Diagram 24
3.3.6 DFD Diagram 25
LIST OF ABBREVIATIONS
INTRODUCTION