0% found this document useful (0 votes)
12 views2 pages

BA End Term Q1

The document details an Exploratory Data Analysis (EDA) on a dataset of 100 entries with five numerical columns related to customer demographics and behavior. Key findings include the absence of missing values, the age range of customers being predominantly between 25 and 45, and a churn rate of 40%. Visualizations of age, annual income, and spending score distributions are also provided, highlighting trends in customer characteristics.

Uploaded by

vaisurithi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views2 pages

BA End Term Q1

The document details an Exploratory Data Analysis (EDA) on a dataset of 100 entries with five numerical columns related to customer demographics and behavior. Key findings include the absence of missing values, the age range of customers being predominantly between 25 and 45, and a churn rate of 40%. Visualizations of age, annual income, and spending score distributions are also provided, highlighting trends in customer characteristics.

Uploaded by

vaisurithi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Question 1:

Exploratory Data Analysis (EDA) on dataset, checking out the missing values, computing basic
statistics and visualizing the distributions. We will start by loading the dataset and summarizing it.

The dataset consists of 100 entries with 5 numerical columns:

● Customer ID (Unique identifier)

● Age

● Annual Income

● Spending Score

● Churn (1 = Churned, 0 = Not Churned)

There are no missing values.

All columns are in integer type.

Statistics for numerical data:

● Age: Ranges from 22 to 50.


Mean value = 34.1.

● Annual Income: Ranges from $20,000 to $100,000


Average = $53,000.

● Spending Score: Ranges from 30 to 90, average value is 64.5

● Churn Rate: 40% of customers churned.

Histogram Visualization of the distributions of Age, Annual Income and Spending Score

Main conclusions arrived:

● Age: Mostly between 25 and 45 and 30-35 age is at slightly high.

● Annual Income: Distributed widely from $30,000 to $70,000.


● Spending Score: Most customers have a spending score between 50 and 80.

Step 1: Load the Dataset and Perform EDA

CopyEdit

# Load necessary libraries

library(ggplot2)

library(dplyr)

library(cluster)

library(caret)

# Read the dataset

df <- read.csv("Customer_Segmentation_and_Churn.csv")

# View basic information

str(df)

summary(df)

# Check for missing values

colSums(is.na(df))

# Visualize distributions

ggplot(df, aes(x=Age)) + geom_histogram(binwidth=5, fill="blue", color="black") + ggtitle("Age


Distribution")

ggplot(df, aes(x=AnnualIncome)) + geom_histogram(binwidth=5000, fill="green", color="black") +


ggtitle("Annual Income Distribution")

ggplot(df, aes(x=SpendingScore)) + geom_histogram(binwidth=5, fill="red", color="black") +


ggtitle("Spending Score Distribution")

You might also like