Credit Card Fraud Detection
For this project, I worked on building a machine learning model to detect fraudulent credit card
transactions. The dataset had 1,000 entries, and the target column is_fraud indicated whether a
transaction was legitimate or not.
Data Preprocessing
The dataset included both numerical and categorical columns. I used one-hot encoding for
categorical features like gender, transaction category, and state, while I applied frequency
encoding for city names to avoid a high number of dummy variables. I also removed columns like
latitude, longitude, and credit card number, which I felt were either too specific or irrelevant for
the model. After encoding, I checked for imbalance and used SMOTE to oversample the minority
(fraud) class to help models learn better.
Model Selection and Evaluation
I tried three models: Logistic Regression, Random Forest, and XGBoost. I chose these because
they’re commonly used for classification tasks and work well on tabular data. For evaluation, I
used accuracy, precision, recall, and F1-score — since fraud detection is an imbalanced
classification problem, precision and recall matter more than just accuracy.
Here’s what I found:
Random Forest gave the best result overall with an F1-score of ~0.28.
XGBoost followed closely.
Visualizations and Insights
I created several graphs during EDA. One important one was the heatmap, which shows how
features are correlated. Interestingly, most features weren’t strongly correlated, suggesting the
model has to learn from patterns across multiple weak signals. I also visualized transaction trends
by day of the week and fraud distribution, which helped guide some of the preprocessing steps.
Challenges and Learnings
One of the biggest challenges was handling the imbalanced dataset — many models performed
poorly on fraud cases despite decent accuracy. I also had to carefully clean and encode the data to
avoid errors like strings causing model crashes.
In just a week, I learned a lot. Even though the metrics weren’t perfect, this was a valuable
starting point and I’m excited to keep improving.