shop-customer-data-analysis
March 16, 2023
[1]: #importing necessary libraries
import numpy as np
import pandas as pd
import [Link] as plt
import seaborn as sns
[2]: #loading the dataset
df = pd.read_csv("/kaggle/input/customers-dataset/[Link]")
[3]: #extracting first-five rows
[Link]()
[3]: CustomerID Gender Age Annual Income ($) Spending Score (1-100) \
0 1 Male 19 15000 39
1 2 Male 21 35000 81
2 3 Female 20 86000 6
3 4 Female 23 59000 77
4 5 Female 31 38000 40
Profession Work Experience Family Size
0 Healthcare 1 4
1 Engineer 3 3
2 Engineer 1 1
3 Lawyer 0 2
4 Entertainment 2 6
[4]: #extracting last-five rows
[Link]()
[4]: CustomerID Gender Age Annual Income ($) Spending Score (1-100) \
1995 1996 Female 71 184387 40
1996 1997 Female 91 73158 32
1997 1998 Male 87 90961 14
1998 1999 Male 77 182109 4
1999 2000 Male 90 110610 52
Profession Work Experience Family Size
1995 Artist 8 7
1
1996 Doctor 7 7
1997 Healthcare 9 2
1998 Executive 7 2
1999 Entertainment 5 2
[5]: #determining the shape
[Link]
[5]: (2000, 8)
[6]: #determining the size
[Link]
[6]: 16000
[7]: #checking the null values
[Link]().sum()
[7]: CustomerID 0
Gender 0
Age 0
Annual Income ($) 0
Spending Score (1-100) 0
Profession 35
Work Experience 0
Family Size 0
dtype: int64
[8]: #determining mode of 'Profession' column
df["Profession"].mode()
[8]: 0 Artist
dtype: object
[9]: #replacing null values with mode
df["Profession"].fillna("Artist", inplace=True)
[10]: # checking the duplicates
[Link]().value_counts()
[10]: False 2000
dtype: int64
[11]: #checking the information
[Link]()
<class '[Link]'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):
2
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 2000 non-null int64
1 Gender 2000 non-null object
2 Age 2000 non-null int64
3 Annual Income ($) 2000 non-null int64
4 Spending Score (1-100) 2000 non-null int64
5 Profession 2000 non-null object
6 Work Experience 2000 non-null int64
7 Family Size 2000 non-null int64
dtypes: int64(6), object(2)
memory usage: 125.1+ KB
[12]: #extracting statistical summary
[Link]()
[12]: CustomerID Age Annual Income ($) Spending Score (1-100) \
count 2000.000000 2000.000000 2000.000000 2000.000000
mean 1000.500000 48.960000 110731.821500 50.962500
std 577.494589 28.429747 45739.536688 27.934661
min 1.000000 0.000000 0.000000 0.000000
25% 500.750000 25.000000 74572.000000 28.000000
50% 1000.500000 48.000000 110045.000000 50.000000
75% 1500.250000 73.000000 149092.750000 75.000000
max 2000.000000 99.000000 189974.000000 100.000000
Work Experience Family Size
count 2000.000000 2000.000000
mean 4.102500 3.768500
std 3.922204 1.970749
min 0.000000 1.000000
25% 1.000000 2.000000
50% 3.000000 4.000000
75% 7.000000 5.000000
max 17.000000 9.000000
[13]: #creating the pairplot
[Link]([Link]("CustomerID", axis=1))
[13]: <[Link] at 0x7f21431e3c90>
3
[14]: # segment customers by gender
[Link](x='Gender', data=df)
[Link]('Customer Gender Distribution')
[Link]()
4
[28]: # segment customers by age
[Link](x='Age', data=df, color='purple', bins=20)
[Link]('Customer Age Distribution')
[Link]()
5
[29]: # segment by income
[Link](x='Annual Income ($)', data=df, color="green", fill=True)
[Link]('Income Distribution')
[Link]()
6
[17]: # segment customers by profession
[Link](x='Profession', data=df)
[Link](rotation=45)
[Link]('Customer Profession Distribution')
[Link]()
7
[30]: # segment customers by work experience
[Link](x='Work Experience', data=df, color='red', fill=True)
[Link]('Work Experience Distribution')
[Link]()
8
[19]: # segment customers by family size
[Link](x='Family Size', data=df)
[Link]('Customer Family Size Distribution')
[Link]()
9
[20]: # spending score by gender
[Link](x='Gender', y='Spending Score (1-100)', data=df)
[Link]('Spending Score by Gender')
[Link]()
10
[31]: # spending behavior by age
[Link](x='Age', y='Spending Score (1-100)', color="orange", data=df)
[Link]('Spending Score by Age')
[Link]()
11
[22]: # analyze spending behavior by age and gender
[Link](x='Age', y='Spending Score (1-100)', hue='Gender', data=df)
[Link]('Spending Score by Age and Gender')
[Link]()
12
[23]: # spending behavior by income
[Link](x='Annual Income ($)', y='Spending Score (1-100)', data=df,␣
↪gridsize=20, cmap='Blues')
[Link]('Annual Income ($)')
[Link](rotation=45)
[Link]('Spending Score (1-100)')
[Link]('Spending Score by Income')
[Link]()
[Link]()
13
[24]: # spending behavior by profession
[Link](x='Profession', y='Spending Score (1-100)', data=df)
[Link](rotation=45)
[Link]('Spending Score by Profession')
[Link]()
14
[32]: # spending behavior by work experience
[Link](x='Work Experience', y='Spending Score (1-100)', data=df)
[Link]('Spending Score by Work Experience')
[Link]()
15
[26]: # spending behavior by family size
[Link](x='Family Size', y='Spending Score (1-100)', data=df)
[Link]('Spending Score by Family Size')
[Link]()
16
[ ]:
17