Introduction
Task 1: Data Exploration and Customer Analytics.
In this blog post, we will embark on a data analysis journey to gain
insights into customer purchase behavior. We will be utilizing the R
programming language and several powerful packages to extract
meaningful information from our transaction and customer
behavior datasets. Our goal is to gain insights into customer
segments, their spending patterns, preferred brands, and pack sizes.
Data Exploration
We start by importing the necessary R packages and loading the
transaction and customer behavior datasets. Using functions
like head() and str(), we explore the structure and contents of the
datasets to understand their dimensions, data types, and variables.
#Exploring Transactions Data
str(trns_1)
trns_1 %>%
summarize_all(class) %>%
gather(variable, class)
Data Transformation
Next, we perform data transformation tasks such as formatting the
date column, handling outliers, cleaning and preprocessing text data
in the product name column, and checking for missing dates. These
transformations ensure that the data is in a suitable format for
analysis.
• I found out the data type of the DATE column was an
integer instead of a Date data type.
Exploratory Data Analysis (EDA)
During the EDA phase, we delve deeper into the datasets to uncover
insights. We analyze summary statistics, identify outliers, explore
product names and pack sizes, check unique values in categorical
columns, visualize transaction trends over time, and check for
duplicates in the data. I found outliers in the Quantity and Sales
column using the box whisker plot and removed them later.
Later I removed all the non-chips product categories from the
product name. While checking the date column, I found out 1 date
was missing, and after further analysis, it turns out to be 25th Dec,
I’m showing the plot specifying the sudden increase in sales and
drop in the number of transactions.
One duplicate value was found and removed from the transaction
dataset.
Distribution of Pack Size
The pack size frequency doesn’t seem inconsistent and does not
differ significantly from other observations.
Data Analysis
In the data analysis section, we focus on calculating metrics related
to customer segments, such as total sales, chips bought per
customer, number of customers in each segment, and average sales
by customer segment. We use visualizations like bar plots and
histograms to compare sales across different customer segments and
analyze the chips bought per customer.
I created a column for brand names and there are a total of 26
unique brands.
I printed out the unique values in customer behaviors’ categorical
columns. After cleaning and exploring the datasets, I merged
customer behavior and transaction tables to make it easy for
analysis.
For analysis, four metrics created these are as follows:
1. Customer segment who are spending most.
Mainstream customers are spending most in premium_customer
and OLDER SINGLES/COUPLES in LIFESTAGE segment.
Most sales come from Budget-Midage singles/couples, followed by
Mainstream-young singles/couples.
2. Chips bought per customer by segment.
Mainstream-Older families segment are buying more chips, followed
by mainstream young families.
3. Number of customers in each segment.
The highest number of customers is in the Mainstream-Young
Single/Couples segment which is the reason for more sales in this
segment. But this is not the case for the Budget-midage segment.
4. Avg sales by customer segment.
Mainstream-Young single couples & Middle-aged single/couples
tend to spend more per unit and contribute most in sales.
In further analysis, find out the brand they prefer and the size of the
packet. Below are the findings:
Mainstream young single couple segment tends to buy TYRRELLS
chips most and BURGER the least. They prefer to buy 270g pack size
most and 220g the least. Twisties Cheese is the brand that sells 270g
size chips.
Insights and Recommendations
• Based on our analysis, we uncover key insights such as the
segments with the highest sales, preferred brands and pack
sizes among specific customer segments, and trends in
spending per pack. Just before Christmas sales increased
significantly. These insights provide valuable information
for business decision-making.
• Category Manager can focus more on TYRRELLS chips as
Mainstream-young single/couples tend to buy these chips
by increasing the visibility of the product to attract
customers of this segment.
• Maintaining the stock sufficient for sales just before
Christmas.
Stay tuned for the next part of our data analysis journey in the
upcoming blog post!
Check out the detailed analysis on Github. Any thoughts or
suggestions are welcome in the comment or you can directly
message and connect with me on Linkedin.