0% found this document useful (0 votes)
31 views10 pages

Data Science Project

This report analyzes consumer behavior using e-commerce transaction data from a UK-based online retail company, focusing on purchasing patterns, customer segmentation, and churn prediction. The dataset includes over 18,000 transactions and identifies key trends such as peak purchasing times and popular products, while employing RFM analysis and KMeans clustering to segment customers. A highly accurate Random Forest model predicts customer churn, with recommendations for personalized marketing and inventory optimization to enhance business strategies.

Uploaded by

Mehar M Moeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views10 pages

Data Science Project

This report analyzes consumer behavior using e-commerce transaction data from a UK-based online retail company, focusing on purchasing patterns, customer segmentation, and churn prediction. The dataset includes over 18,000 transactions and identifies key trends such as peak purchasing times and popular products, while employing RFM analysis and KMeans clustering to segment customers. A highly accurate Random Forest model predicts customer churn, with recommendations for personalized marketing and inventory optimization to enhance business strategies.

Uploaded by

Mehar M Moeed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Title: Comprehensive Analysis of Consumer Behavior

Using E-commerce Transaction Data


Data Science
Instructor: Miss Andleeb Akram
Sec A
Submitted By
Abdul Moeed B-26546 Fall 2021-2025
Ahmad Usman B-26763 Fall 2021-2025
Haider Ali B-26714 Fall 2021-2025

University of South Asia


Department of Computer Science

1|Page
1. Introduction

In the rapidly evolving digital economy, e-commerce businesses thrive on their


ability to understand and anticipate consumer behavior. With the wealth of
transactional data available today, data science techniques can be employed to
extract actionable insights that drive marketing, improve product
recommendations, and enhance customer retention strategies.

This report presents a deep dive into a UK-based online retail company's
transactional dataset collected between December 2010 and December 2011. The
overarching goals are to understand purchasing patterns, identify customer
segments, and build predictive models for customer churn. The insights derived
will serve as a blueprint for making informed, data-driven business decisions.

2. Data Overview

The dataset under investigation includes over 18,000 retail transactions made by
4,338 customers across 37 countries, featuring 3,877 unique products. Key
features in the dataset include:

 InvoiceNo: Transaction identifier


 StockCode: Product ID
 Description: Product name
 Quantity: Units purchased per transaction
 InvoiceDate: Timestamp of the purchase
 UnitPrice: Price per item
 CustomerID: Unique customer identifier
 Country: Customer’s location

The data was first cleaned to remove missing values (especially missing
CustomerIDs), negative quantities (indicating returns or errors), and duplicates. A
final dataset with 18,532 valid transactions formed the basis for further
exploration.

2|Page
Summary Statistics:

 Unique Customers: 4,338


 Distinct Products: 3,877
 Unique Transactions: 18,532
 Transaction Period: December 1, 2010 – December 9, 2011
 Countries Represented: 37

3. Purchase Trend Analysis

3.1 Top-Selling Products

To determine product popularity, total quantity sold was aggregated. The top 10
most purchased products include:

1. White Hanging Heart T-Light Holder


2. Regency Cake Stand
3. Jumbo Bag Red Retrospot
4. Party Bunting
5. Paper Chain Kit
6. Feltcraft Princess Doll Kit
7. Pack of 72 Retrospot Cake Cases
8. Assorted Colour Bird Ornament
9. Set of 3 Cake Tins Pantry Design
[Link] of 60 Pink Paisley Cake Cases

These items suggest a dominant market for decorative and party-oriented goods,
which can guide future stock planning and promotional efforts.

3|Page
3.2 Hourly and Daily Patterns

Hourly transaction analysis revealed that most purchases occur between 10 AM to


3 PM, peaking at 12 PM, likely coinciding with work breaks. Daily activity
showed Tuesday and Thursday as peak days, indicating possible targeted
marketing windows.

4|Page
3.3 Monthly Trends

Sales volume increased sharply in November and December, coinciding with the
holiday season. A time series plot of daily sales revealed predictable seasonality,
crucial for inventory and marketing planning.

5|Page
3.4 Additional Insights

A heatmap of purchases by hour and weekday demonstrated heightened sales


activity during late mornings on weekdays, with a drop-off on weekends.
Additionally, order value distributions showed a right-skewed pattern indicating
most purchases are low-to-mid value.

4. Customer Segmentation Using RFM and KMeans

4.1 RFM Feature Engineering

To segment customers, we used RFM (Recency, Frequency, Monetary) analysis:

 Recency: Days since last purchase


 Frequency: Total number of purchases
 Monetary: Total spend per customer
6|Page
The dataset was grouped by CustomerID, and values were normalized to prevent
any feature from dominating clustering due to scale.

4.2 Clustering with KMeans

KMeans clustering was applied (optimal clusters = 4, determined via Elbow


Method), yielding:

 Cluster 0 (20%): High-spenders, frequent and recent buyers – VIPs


 Cluster 1 (25%): Recent but low-frequency buyers – potential to upsell
 Cluster 2 (30%): Infrequent, low-value customers – low engagement
 Cluster 3 (25%): Haven’t purchased recently – likely churned

These profiles help in crafting tiered retention and engagement strategies.

5. Churn Prediction Using Machine Learning

5.1 Label Definition and Features

Churn was defined based on a 6-month inactivity threshold. Customers who had
not made a purchase within 180 days of the last dataset date were labeled as
"churned" (1), others as "active" (0).

7|Page
Predictors: RFM features
Model: Random Forest Classifier
Data split: 70% training, 30% testing

5.2 Performance Metrics

The model achieved perfect performance on the test data:

Class Precision Recall F1-Score Support


0 (Active) 1.00 1.00 1.00 1054
1 (Churned) 1.00 1.00 1.00 248
Accuracy 1.00 1302
Macro Avg 1.00 1.00 1.00 1302
Weighted Avg 1.00 1.00 1.00 1302

While these results are excellent, caution is advised. Such high accuracy may
suggest data leakage or overfitting. Validation using unseen or future data is
necessary to ensure generalizability.

6. Visualizations

Eight key charts were generated to visually support the analysis:

1. Top 10 Most Purchased Products


2. Daily Sales Over Time (Time Series)
3. Hourly Purchase Frequency
4. Daily Purchase Frequency
5. Monthly Sales Volume
6. Heatmap of Sales by Day and Hour
7. Order Value Distribution
8. Customer Segments (Pie Chart from RFM + KMeans

These visualizations are included in the supporting code and can be embedded
into presentations or dashboards.

8|Page
7. Conclusion

 Clear demand for specific product categories: The repeated purchases of


decorative and party-related items indicate a stable and predictable customer
preference, offering opportunities for focused promotions and stocking
strategies.
 Predictable purchase timing and seasonality: Temporal analysis
highlights increased customer activity between 10 AM to 3 PM on weekdays
and during the holiday months of November and December. This pattern
suggests ideal timing for flash sales, newsletters, and advertising campaigns.
 Well-defined customer segments based on behavior: The application of
RFM analysis and KMeans clustering uncovered four distinct customer
segments. Understanding these segments allows businesses to tailor their
retention, upselling, and engagement strategies based on customer lifetime
value.
 Highly accurate churn prediction model: The Random Forest model
provided perfect accuracy on test data, demonstrating the potential of
machine learning for proactive customer retention. However, additional
validation with future datasets is recommended to ensure robustness and
avoid overfitting.

8. Recommendations

1. Personalized Campaigns: Tailor offers based on segment profiles (e.g.,


discounts for low-frequency customers).
2. Inventory Optimization: Increase stock of top-selling products before
holiday seasons.
3. VIP Engagement: Launch loyalty programs for Cluster 0 customers to
prevent churn.
4. Churn Intervention: Use churn predictions to re-engage inactive customers
with win-back offers.
5. Dashboards: Deploy real-time dashboards to track KPIs and trends
continuously.

9|Page
9. Future Work

 Product Category Classification: Use NLP to tag products for richer


segmentation.
 A/B Testing: Validate marketing strategies on different segments.
 Real-Time Prediction: Integrate with CRM systems to apply predictions
dynamically.
 Enhanced Models: Explore XGBoost, LSTM models for better
performance and temporal analysis.

By expanding on this foundation, businesses can transition toward a data-first


customer intelligence framework that continuously evolves through feedback and
experimentation.

10 | P a g e

You might also like