100% found this document useful (1 vote)
40 views8 pages

Fraud Detection Using Machine Learning

This document outlines a project on fraud detection using machine learning, focusing on analyzing transaction patterns to predict fraudulent activities in real-time. It describes the objectives, tools used, dataset details, data analysis, feature engineering, model training with logistic regression, and the development of a Streamlit web application for predictions. The project demonstrates the complete cycle from data analysis to deployment, with future scope for real-time detection and advanced algorithms.

Uploaded by

Anwesha Jana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
40 views8 pages

Fraud Detection Using Machine Learning

This document outlines a project on fraud detection using machine learning, focusing on analyzing transaction patterns to predict fraudulent activities in real-time. It describes the objectives, tools used, dataset details, data analysis, feature engineering, model training with logistic regression, and the development of a Streamlit web application for predictions. The project demonstrates the complete cycle from data analysis to deployment, with future scope for real-time detection and advanced algorithms.

Uploaded by

Anwesha Jana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

FRAUD DETECTION USING

MACHINE LEARNING

NAME: ANWESHA JANA


BRANCH: CE32
ROLL_NO: 33

1|Page
🧠 1. Introduction
In today’s digital world, online financial transactions have increased drastically, leading to a
parallel rise in fraudulent activities. Detecting these frauds in real time has become a critical
challenge for banks and payment systems.
This project, “Fraud Detection using Machine Language”, aims to analyze transaction
patterns and build a machine learning model that can automatically predict whether a
transaction is fraudulent (1) or legitimate (0) based on several input features.

🎯 2. Objective
The main objective of this project is to:
 Analyze financial transaction data and identify behavioral patterns.
 Train a machine learning model that can distinguish between normal and fraudulent
transactions.
 Build a user-friendly Streamlit web app that uses the trained model to predict fraud
for new inputs.

🧰 3. Software & Tools Used


Tool / Library Purpose

Programming language used for analysis and model


Python 3
development

Jupyter Notebook Used for interactive coding and data analysis

Streamlit For building the web application interface

Pandas Data loading and manipulation

NumPy Numerical operations and array handling

Matplotlib & Seaborn Visualization libraries for plotting graphs and insights

Scikit-learn Machine learning library for model building and evaluation

Joblib Saving and loading trained models for reuse

2|Page
🔍 4. Dataset Description
 Source: Kaggle Fraud Detection Dataset
Kaggle – Fraud Detection Dataset
 File Name: AIML Dataset.csv
 Total Records: Varies (~6 million in full dataset)
 Target Column: isFraud (1 = Fraudulent, 0 = Legitimate)
Key Features:

Column Description

Time step (unit of time when transaction


step
occurred)

Type of transaction (TRANSFER, CASH_OUT,


type
PAYMENT, etc.)

amount Transaction amount

nameOrig Sender’s name

oldbalanceOrg Sender’s balance before transaction

newbalanceOrig Sender’s balance after transaction

nameDest Receiver’s name

oldbalanceDest Receiver’s balance before transaction

newbalanceDest Receiver’s balance after transaction

isFraud Target label (1 = Fraud, 0 = Not fraud)

isFlaggedFraud Flag for suspicious transactions

📊 5. Data Analysis (EDA)


3|Page
Performed using Pandas, Matplotlib, and Seaborn.
Main Insights:
 The dataset is highly imbalanced — only a small percentage of transactions are
fraudulent.
 Fraud mainly occurs in TRANSFER and CASH_OUT transaction types.
 Fraudulent transactions often have a zero balance after transfer (suspicious pattern).
 Logarithmic transformation was used on the amount column for better visualization.
Key Visualizations:
 Bar chart of transaction types.
 Histogram of transaction amounts (log scale).
 Boxplot showing amount distribution in fraud vs non-fraud cases.
 Correlation heatmap showing relationships between numeric features.
 Line chart showing number of frauds over time.

⚙️6. Feature Engineering


New derived columns were created:
 balanceDiffOrig = oldbalanceOrg - newbalanceOrig
 balanceDiffDest = newbalanceDest - oldbalanceDest
These help the model understand money movement patterns for both sender and receiver.
Unnecessary columns like step, nameOrig, nameDest, and isFlaggedFraud were removed to
simplify the dataset.

🤖 7. Machine Learning Model

4|Page
Algorithm Used: Logistic Regression
 Logistic Regression was chosen because it is:
o Simple and fast to train.
o Effective for binary classification problems.
o Provides interpretable coefficients (relationship strength).
Preprocessing Pipeline:
Used ColumnTransformer and Pipeline to:
 Standardize numeric features using StandardScaler.
 Encode categorical variable type using OneHotEncoder.
 Balance class weights (class_weight="balanced") due to data imbalance.
Model Training Steps:
1. Split data into training and testing sets using train_test_split.
2. Build preprocessing + classifier pipeline.
3. Train model using .fit(X_train, y_train).
4. Predict using .predict(X_test).
Evaluation Metrics:
 Confusion Matrix
 Classification Report (Precision, Recall, F1-score)
 Accuracy Score
The model achieved good recall and precision for detecting fraud given the class imbalance.
Model Saving:
import joblib
joblib.dump(pipeline, "fraud_detection_pipeline.pkl")

This saved the trained model for reuse in the Streamlit app.

💾 8. Application Development (Streamlit App)

5|Page
File: fraud_detection.py
A simple user interface was built using Streamlit for real-time fraud prediction.
Steps:
1. Load the trained model using joblib.load().
2. Accept user input values such as:
o Transaction Type
o Amount
o Old/New Sender Balance
o Old/New Receiver Balance
3. On clicking Predict, create a DataFrame from input values.
4. Pass the DataFrame to model.predict().
5. Display the prediction result using st.success() or st.error() messages.
Run Command:
python -m streamlit run fraud_detection.py

🧩 9. Workflow Diagram

6|Page
┌──────────────────────────┐
│ Dataset (CSV File) │
└───────────┬─────────────┘


📊 Data Cleaning & EDA
- Analyze features
- Visualize patterns
- Handle missing values


⚙️Feature Engineering
- Create new columns
- Drop irrelevant data


🤖 Model Training
- Logistic Regression
- Preprocessing Pipeline


🧪 Evaluation
- Accuracy, F1-score, Confusion Matrix


💾 Save Model (.pkl)
- Using Joblib


💻 Streamlit App
- Load model
- Take user inputs
- Predict fraud or not

📈 10. Results
 Fraud transactions are extremely rare compared to non-fraud ones.

7|Page
 The trained model successfully identifies high-risk transactions.
 Streamlit interface allows quick and interactive predictions.

✅ 11. Conclusion
This project demonstrates how machine learning can be used to detect fraudulent financial
transactions efficiently.
It showcases the complete cycle:
 From data analysis and visualization
 To model training and evaluation
 To deployment in a working web application.
The model can be further improved using advanced algorithms (Random Forest, XGBoost)
and real-time streaming data integration.

🔮 12. Future Scope


 Implement real-time fraud detection using APIs.
 Use deep learning models like LSTM for sequential transaction data.
 Build dashboards for transaction monitoring and alerts.

📚 13. References
 Kaggle Dataset: https://www.kaggle.com/datasets/amanalisiddiqui/fraud-detection-
dataset
 Scikit-learn Documentation: https://scikit-learn.org/
 Streamlit Documentation: https://docs.streamlit.io/
 Python Official Docs: https://docs.python.org/3/

8|Page

You might also like