0% found this document useful (0 votes)

132 views7 pages

RFM Analysis with Python Guide

This document discusses performing RFM (Recency, Frequency, Monetary) analysis in Python. It loads customer transaction data, filters to a specific country, and calculates RFM scores by quantiling customers based on days since last purchase (Recency), number of purchases (Frequency), and total spending (Monetary). It combines these into an RFM score and identifies top customers with scores of 111, representing the lowest Recency, highest Frequency, and highest Monetary values.

Uploaded by

Sakshi Singh Yaduvanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views7 pages

RFM Analysis with Python Guide

Uploaded by

Sakshi Singh Yaduvanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Running RFM in Python

Importing Required Library

#import modules
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt

Loading Dataset

data = pd.read_excel("C:\Users\siva\Desktop\Online_Retail.xlsx")

data.head()

data.tail( )

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
InvoiceNo 541909 non-null object
StockCode 541909 non-null object
Description 540455 non-null object
Quantity 541909 non-null int64
InvoiceDate 541909 non-null datetime64[ns]
UnitPrice 541909 non-null float64
CustomerID 406829 non-null float64
Country 541909 non-null object
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB

This material is not original work. This compilation draws heavily from various sources
data= data[pd.notnull(data['CustomerID'])]

Removing Duplicates

Sometimes you get a messy dataset. You may have to deal with duplicates, which will skew your
analysis. In python, pandas offer function drop_duplicates(), which drops the repeated or
duplicate records.
filtered_data=data[['Country','CustomerID']].drop_duplicates()

filtered_data.Country.value_counts()

United Kingdom 3950

Germany 95
France 87
Spain 31
Belgium 25
Switzerland 21
Portugal 19
Italy 15
Finland 12
Austria 11
Norway 10
Denmark 9
Netherlands 9
Australia 9
Channel Islands 9
Sweden 8
Japan 8
Cyprus 8
Poland 6
Unspecified 4
Canada 4
Israel 4
Greece 4
USA 4
EIRE 3
Bahrain 2
United Arab Emirates 2
Malta 2
Lithuania 1
Singapore 1
Iceland 1
Lebanon 1
RSA 1
Saudi Arabia 1
Czech Republic 1

This material is not original work. This compilation draws heavily from various sources
Brazil 1
European Community 1

filtered_data.Country.value_counts()[:10].plot(kind='bar')

filtered_data.Country.value_counts()[:5].plot(kind='bar')

To Filter data for United Kingdom customer

uk_data=data[data.Country=='United Kingdom']

The describe() function in pandas is convenient in getting various summary statistics. This
function returns the count, mean, standard deviation, minimum and maximum values and the
quantiles of the data.

uk_data.describe()

Quantity UnitPrice CustomerID

count 361878.000000 361878.000000 361878.000000

mean 11.077029 3.256007 15547.871368

std 263.129266 70.654731 1594.402590

This material is not original work. This compilation draws heavily from various sources
min -80995.000000 0.000000 12346.000000

25% 2.000000 1.250000 14194.000000

50% 4.000000 1.950000 15514.000000

75% 12.000000 3.750000 16931.000000

max 80995.000000 38970.000000 18287.000000

To remove the negative quantity

uk_data = uk_data[(uk_data['Quantity']>0)]
uk_data.describe()

Filter required Columns

Here, you can filter the necessary columns for RFM analysis. You only need her five columns
CustomerID, InvoiceDate, InvoiceNo, Quantity, and UnitPrice. CustomerId will uniquely define
your customers, InvoiceDate help you calculate recency of purchase, InvoiceNo helps you to
count the number of time transaction performed(frequency). Quantity purchased in each
transaction and UnitPrice of each unit purchased by the customer will help you to calculate the
total purchased amount.
uk_data=uk_data[['CustomerID','InvoiceDate','InvoiceNo','Quantity','UnitPrice'
]]
uk_data['TotalPrice'] = uk_data['Quantity'] * uk_data['UnitPrice']

uk_data['InvoiceDate'].min(),uk_data['InvoiceDate'].max()

(Timestamp('2010-12-01 08:26:00'), Timestamp('2011-12-09 12:49:00'))

PRESENT = dt.datetime(2011,12,10)
uk_data['InvoiceDate'] = pd.to_datetime(uk_data['InvoiceDate'])
uk_data.head()

CustomerID InvoiceDate InvoiceNo Quantity UnitPrice TotalPrice

0 17850.0 2010-12-01 08:26:00 536365 6 2.55 15.30

1 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

2 17850.0 2 010-12-01 08:26:00 536365 8 2.75 22.00

3 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

This material is not original work. This compilation draws heavily from various sources
4 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

RFM Analysis

Here, you are going to perform following opertaions:

 For Recency, Calculate the number of days between present date and date of last
purchase each customer.
 For Frequency, Calculate the number of orders for each customer.
 For Monetary, Calculate sum of purchase price for each customer.

rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT -

date.max()).days,'InvoiceNo': lambda num: len(num),'TotalPrice': lambda price:
price.sum()})

rfm.columns
Index(['InvoiceDate', 'TotalPrice', 'InvoiceNo'], dtype='object')

# Change the name of columns

rfm.columns=['recency','frequency','monetary']
rfm['recency'] = rfm['recency'].astype(int)
rfm.head()

recency frequency monetary

CustomerID
12346.0 325 1 77183.60
12747.0 2 103 4196.01
12748.0 0 4596 33719.73
12749.0 3 199 4090.88
12820.0 3 59 942.34

Computing Quantile of RFM values

Customers with the lowest recency, highest frequency and monetary amounts considered as top
customers.

qcut() is Quantile-based discretization function. qcut bins the data based on sample quantiles. For
example, 1000 values for 4 quantiles would produce a categorical object indicating quantile
membership for each customer.

rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])

This material is not original work. This compilation draws heavily from various sources
rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1'])
rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1'])
rfm.head()

Recency frequency monetary r_quartile f_quartile m_quartile

CustomerID
12346.0 325 1 77183.60 4 4 1
12747.0 2 103 4196.01 1 1 1
12748.0 0 4596 33719.73 1 1 1
12749.0 3 199 4090.88 1 1 1
12820.0 3 59 942.34 1 2 2

RFM Result Interpretation

Combine all three quartiles(r_quartile,f_quartile,m_quartile) in a single column, this rank will

help you to segment the customers well group.

rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) +

rfm.m_quartile.astype(str)
rfm.head()

# Filter out Top/Best cusotmers

rfm[rfm['RFM_Score']=='111'].sort_values('monetary', ascending=False).head()

This material is not original work. This compilation draws heavily from various sources
This material is not original work. This compilation draws heavily from various sources

Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Data Mining Project
100% (1)
Data Mining Project
24 pages
RFM Marketing and RFM Modeling
100% (1)
RFM Marketing and RFM Modeling
4 pages
RFM
No ratings yet
RFM
132 pages
BBBC Presntation PDF
100% (1)
BBBC Presntation PDF
51 pages
RFM
100% (1)
RFM
27 pages
Market Segmentation Statistics Project
100% (5)
Market Segmentation Statistics Project
14 pages
Kohli Batting Analysis
No ratings yet
Kohli Batting Analysis
19 pages
Retail Data Insights & Strategies
No ratings yet
Retail Data Insights & Strategies
24 pages
Data Analyst Udemy Report Writing PDF
No ratings yet
Data Analyst Udemy Report Writing PDF
15 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
13 pages
Data Visualization Using TABLEAU - Amit - Khilare
100% (1)
Data Visualization Using TABLEAU - Amit - Khilare
3 pages
Customer Segmentation with ML Techniques
100% (1)
Customer Segmentation with ML Techniques
19 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
CMSU Student Survey Analysis
No ratings yet
CMSU Student Survey Analysis
10 pages
Annual Spending Analysis of Retailers in Portugal
No ratings yet
Annual Spending Analysis of Retailers in Portugal
12 pages
Evans Analytics1e PPT 14
No ratings yet
Evans Analytics1e PPT 14
74 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Parametric and Nonparametric Machine Learning Algorithms
No ratings yet
Parametric and Nonparametric Machine Learning Algorithms
16 pages
4th STD - Arabic Important Notes
No ratings yet
4th STD - Arabic Important Notes
2 pages
SQL - Basics
No ratings yet
SQL - Basics
25 pages
Data Science Project Lifecycle Guide
No ratings yet
Data Science Project Lifecycle Guide
1 page
Solution Manual For Business Analytics Data Analysis and Decision Making 7th Edition Albright
No ratings yet
Solution Manual For Business Analytics Data Analysis and Decision Making 7th Edition Albright
3 pages
RFM Segmentation
No ratings yet
RFM Segmentation
12 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
9 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Predictive Modelling Coded Project
No ratings yet
Predictive Modelling Coded Project
33 pages
Data Mining Project Report Summary
100% (1)
Data Mining Project Report Summary
19 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Assignment 1 BBA-6B
No ratings yet
Assignment 1 BBA-6B
2 pages
SMDM Project Solved
0% (1)
SMDM Project Solved
27 pages
Promilo BA Assignment
No ratings yet
Promilo BA Assignment
33 pages
Apoorva P 17th March TSF
No ratings yet
Apoorva P 17th March TSF
47 pages
EC400 Revision Mathematics Quizzes All
No ratings yet
EC400 Revision Mathematics Quizzes All
41 pages
Business Report MRA Project
No ratings yet
Business Report MRA Project
48 pages
Deakin Master of Data Science Course Map
0% (1)
Deakin Master of Data Science Course Map
3 pages
Problem Statement 1
100% (1)
Problem Statement 1
17 pages
Financial Reporting Course Guide
100% (1)
Financial Reporting Course Guide
16 pages
Data Visualisation Coursework
No ratings yet
Data Visualisation Coursework
3 pages
Multivariate Data Analysis: Overview of Methods
100% (1)
Multivariate Data Analysis: Overview of Methods
30 pages
Keerti Purswani Hands-On Generative AI
No ratings yet
Keerti Purswani Hands-On Generative AI
3 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
DEA Models
No ratings yet
DEA Models
16 pages
Using Instrumental Variables in Econometrics
No ratings yet
Using Instrumental Variables in Econometrics
29 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
INNHotels Group
No ratings yet
INNHotels Group
40 pages
Financial Analytics for Executives
No ratings yet
Financial Analytics for Executives
7 pages
Transportation Model: Transportation Models Play An Important Role in Logistics and Supply Chain
No ratings yet
Transportation Model: Transportation Models Play An Important Role in Logistics and Supply Chain
5 pages
LP Cheat Sheet: Assumptions of Linear Programming
No ratings yet
LP Cheat Sheet: Assumptions of Linear Programming
5 pages
Marketing and Retail Analytics - Capstone Project
No ratings yet
Marketing and Retail Analytics - Capstone Project
12 pages
Customer Churn Prediction in Telecommunication
No ratings yet
Customer Churn Prediction in Telecommunication
13 pages
Data Analysis and Linear Regression Insights
No ratings yet
Data Analysis and Linear Regression Insights
3 pages
Ott Data Project
No ratings yet
Ott Data Project
40 pages
Akshaya SMDM Project Report
100% (1)
Akshaya SMDM Project Report
18 pages
2016 Safe Country en
No ratings yet
2016 Safe Country en
5 pages
Customer Segmentation Using RFM Analysis: Overview
No ratings yet
Customer Segmentation Using RFM Analysis: Overview
11 pages
Data Uploading in Google Colab
No ratings yet
Data Uploading in Google Colab
2 pages
Answer
No ratings yet
Answer
2 pages
Five Cs of Credit: Creditworthiness
No ratings yet
Five Cs of Credit: Creditworthiness
3 pages
Challenges at Forbes Facility Services
No ratings yet
Challenges at Forbes Facility Services
4 pages
The Complete Platform For Agvs and Mobile Robots: NDC Solutions by Kollmorgen
No ratings yet
The Complete Platform For Agvs and Mobile Robots: NDC Solutions by Kollmorgen
7 pages
User Experience and Interface Design 2023
No ratings yet
User Experience and Interface Design 2023
111 pages
New Fresh Notes
No ratings yet
New Fresh Notes
77 pages
Team Projects and Members Overview
No ratings yet
Team Projects and Members Overview
1 page
Banking Churn Prediction Report
No ratings yet
Banking Churn Prediction Report
74 pages
Collections Manager
No ratings yet
Collections Manager
50 pages
CIMA P3 Risk Management Notes
No ratings yet
CIMA P3 Risk Management Notes
182 pages
Lesson 1 - RPA Basics
No ratings yet
Lesson 1 - RPA Basics
29 pages
Lecture 2
No ratings yet
Lecture 2
16 pages
Intern Report
100% (1)
Intern Report
57 pages
g8 Pretechnical Studies Notes
No ratings yet
g8 Pretechnical Studies Notes
47 pages
IT/OT Incident Response Guide
No ratings yet
IT/OT Incident Response Guide
39 pages
23ad1403 - SDP QB
No ratings yet
23ad1403 - SDP QB
5 pages
Expert Networks, Redefined - Smarter, Faster, Stronger
No ratings yet
Expert Networks, Redefined - Smarter, Faster, Stronger
8 pages
MCA Dissertation Guidelines
No ratings yet
MCA Dissertation Guidelines
3 pages
Business Storytelling Essentials Guide
100% (1)
Business Storytelling Essentials Guide
19 pages
World Web Technology Brochure
No ratings yet
World Web Technology Brochure
14 pages
Remote On-Boarding Plan Template - Mangesh
No ratings yet
Remote On-Boarding Plan Template - Mangesh
30 pages
Abdul Qadeer
No ratings yet
Abdul Qadeer
3 pages
Thin Client vs. PC Performance Test
No ratings yet
Thin Client vs. PC Performance Test
35 pages
Simplified Global Payroll With SAP SuccessFactors Employee Central Payroll
No ratings yet
Simplified Global Payroll With SAP SuccessFactors Employee Central Payroll
11 pages
Project Management Essentials Guide
No ratings yet
Project Management Essentials Guide
1 page
ICT 2015 August 2025 Past Papers
No ratings yet
ICT 2015 August 2025 Past Papers
30 pages
Career Skills and Job Opportunities
No ratings yet
Career Skills and Job Opportunities
114 pages
Machine Learning in Business Intelligence Presentation
No ratings yet
Machine Learning in Business Intelligence Presentation
24 pages
Kanika Verma - Resume
No ratings yet
Kanika Verma - Resume
1 page
Cloud Computing Overview and Benefits
No ratings yet
Cloud Computing Overview and Benefits
8 pages
ATM Software Requirements Specification
No ratings yet
ATM Software Requirements Specification
14 pages
Continuous Casting Process Automation: Optimize Your Production With Our Automation Experience
No ratings yet
Continuous Casting Process Automation: Optimize Your Production With Our Automation Experience
12 pages
Minimum Viable Product (MVP) : Definition
No ratings yet
Minimum Viable Product (MVP) : Definition
14 pages

RFM Analysis with Python Guide

Uploaded by

RFM Analysis with Python Guide

Uploaded by

Running RFM in Python

Importing Required Library

United Kingdom 3950

To Filter data for United Kingdom customer

Quantity UnitPrice CustomerID

count 361878.000000 361878.000000 361878.000000

mean 11.077029 3.256007 15547.871368

std 263.129266 70.654731 1594.402590

25% 2.000000 1.250000 14194.000000

50% 4.000000 1.950000 15514.000000

75% 12.000000 3.750000 16931.000000

max 80995.000000 38970.000000 18287.000000

To remove the negative quantity

Filter required Columns

(Timestamp('2010-12-01 08:26:00'), Timestamp('2011-12-09 12:49:00'))

CustomerID InvoiceDate InvoiceNo Quantity UnitPrice TotalPrice

0 17850.0 2010-12-01 08:26:00 536365 6 2.55 15.30

1 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

2 17850.0 2 010-12-01 08:26:00 536365 8 2.75 22.00

3 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

Here, you are going to perform following opertaions:

rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT -

# Change the name of columns

recency frequency monetary

Computing Quantile of RFM values

rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])

Recency frequency monetary r_quartile f_quartile m_quartile

RFM Result Interpretation

Combine all three quartiles(r_quartile,f_quartile,m_quartile) in a single column, this rank will

rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) +

# Filter out Top/Best cusotmers

You might also like