FRAUD DETECTION USING
MACHINE LEARNING
NAME: ANWESHA JANA
BRANCH: CE32
ROLL_NO: 33
1|Page
🧠 1. Introduction
In today’s digital world, online financial transactions have increased drastically, leading to a
parallel rise in fraudulent activities. Detecting these frauds in real time has become a critical
challenge for banks and payment systems.
This project, “Fraud Detection using Machine Language”, aims to analyze transaction
patterns and build a machine learning model that can automatically predict whether a
transaction is fraudulent (1) or legitimate (0) based on several input features.
🎯 2. Objective
The main objective of this project is to:
Analyze financial transaction data and identify behavioral patterns.
Train a machine learning model that can distinguish between normal and fraudulent
transactions.
Build a user-friendly Streamlit web app that uses the trained model to predict fraud
for new inputs.
🧰 3. Software & Tools Used
Tool / Library Purpose
Programming language used for analysis and model
Python 3
development
Jupyter Notebook Used for interactive coding and data analysis
Streamlit For building the web application interface
Pandas Data loading and manipulation
NumPy Numerical operations and array handling
Matplotlib & Seaborn Visualization libraries for plotting graphs and insights
Scikit-learn Machine learning library for model building and evaluation
Joblib Saving and loading trained models for reuse
2|Page
🔍 4. Dataset Description
Source: Kaggle Fraud Detection Dataset
Kaggle – Fraud Detection Dataset
File Name: AIML Dataset.csv
Total Records: Varies (~6 million in full dataset)
Target Column: isFraud (1 = Fraudulent, 0 = Legitimate)
Key Features:
Column Description
Time step (unit of time when transaction
step
occurred)
Type of transaction (TRANSFER, CASH_OUT,
type
PAYMENT, etc.)
amount Transaction amount
nameOrig Sender’s name
oldbalanceOrg Sender’s balance before transaction
newbalanceOrig Sender’s balance after transaction
nameDest Receiver’s name
oldbalanceDest Receiver’s balance before transaction
newbalanceDest Receiver’s balance after transaction
isFraud Target label (1 = Fraud, 0 = Not fraud)
isFlaggedFraud Flag for suspicious transactions
📊 5. Data Analysis (EDA)
3|Page
Performed using Pandas, Matplotlib, and Seaborn.
Main Insights:
The dataset is highly imbalanced — only a small percentage of transactions are
fraudulent.
Fraud mainly occurs in TRANSFER and CASH_OUT transaction types.
Fraudulent transactions often have a zero balance after transfer (suspicious pattern).
Logarithmic transformation was used on the amount column for better visualization.
Key Visualizations:
Bar chart of transaction types.
Histogram of transaction amounts (log scale).
Boxplot showing amount distribution in fraud vs non-fraud cases.
Correlation heatmap showing relationships between numeric features.
Line chart showing number of frauds over time.
⚙️6. Feature Engineering
New derived columns were created:
balanceDiffOrig = oldbalanceOrg - newbalanceOrig
balanceDiffDest = newbalanceDest - oldbalanceDest
These help the model understand money movement patterns for both sender and receiver.
Unnecessary columns like step, nameOrig, nameDest, and isFlaggedFraud were removed to
simplify the dataset.
🤖 7. Machine Learning Model
4|Page
Algorithm Used: Logistic Regression
Logistic Regression was chosen because it is:
o Simple and fast to train.
o Effective for binary classification problems.
o Provides interpretable coefficients (relationship strength).
Preprocessing Pipeline:
Used ColumnTransformer and Pipeline to:
Standardize numeric features using StandardScaler.
Encode categorical variable type using OneHotEncoder.
Balance class weights (class_weight="balanced") due to data imbalance.
Model Training Steps:
1. Split data into training and testing sets using train_test_split.
2. Build preprocessing + classifier pipeline.
3. Train model using .fit(X_train, y_train).
4. Predict using .predict(X_test).
Evaluation Metrics:
Confusion Matrix
Classification Report (Precision, Recall, F1-score)
Accuracy Score
The model achieved good recall and precision for detecting fraud given the class imbalance.
Model Saving:
import joblib
joblib.dump(pipeline, "fraud_detection_pipeline.pkl")
This saved the trained model for reuse in the Streamlit app.
💾 8. Application Development (Streamlit App)
5|Page
File: fraud_detection.py
A simple user interface was built using Streamlit for real-time fraud prediction.
Steps:
1. Load the trained model using joblib.load().
2. Accept user input values such as:
o Transaction Type
o Amount
o Old/New Sender Balance
o Old/New Receiver Balance
3. On clicking Predict, create a DataFrame from input values.
4. Pass the DataFrame to model.predict().
5. Display the prediction result using st.success() or st.error() messages.
Run Command:
python -m streamlit run fraud_detection.py
🧩 9. Workflow Diagram
6|Page
┌──────────────────────────┐
│ Dataset (CSV File) │
└───────────┬─────────────┘
│
▼
📊 Data Cleaning & EDA
- Analyze features
- Visualize patterns
- Handle missing values
│
▼
⚙️Feature Engineering
- Create new columns
- Drop irrelevant data
│
▼
🤖 Model Training
- Logistic Regression
- Preprocessing Pipeline
│
▼
🧪 Evaluation
- Accuracy, F1-score, Confusion Matrix
│
▼
💾 Save Model (.pkl)
- Using Joblib
│
▼
💻 Streamlit App
- Load model
- Take user inputs
- Predict fraud or not
📈 10. Results
Fraud transactions are extremely rare compared to non-fraud ones.
7|Page
The trained model successfully identifies high-risk transactions.
Streamlit interface allows quick and interactive predictions.
✅ 11. Conclusion
This project demonstrates how machine learning can be used to detect fraudulent financial
transactions efficiently.
It showcases the complete cycle:
From data analysis and visualization
To model training and evaluation
To deployment in a working web application.
The model can be further improved using advanced algorithms (Random Forest, XGBoost)
and real-time streaming data integration.
🔮 12. Future Scope
Implement real-time fraud detection using APIs.
Use deep learning models like LSTM for sequential transaction data.
Build dashboards for transaction monitoring and alerts.
📚 13. References
Kaggle Dataset: https://www.kaggle.com/datasets/amanalisiddiqui/fraud-detection-
dataset
Scikit-learn Documentation: https://scikit-learn.org/
Streamlit Documentation: https://docs.streamlit.io/
Python Official Docs: https://docs.python.org/3/
8|Page