Thursday , November 21 2019
Home / Uncategorized / How to automatically segment customers using purchase data and some Python lines

How to automatically segment customers using purchase data and some Python lines



A small educational project for learning "customer segmentation" with a simple data analysis technique

Automatic customer segmentation using Recency / Monetary Matrix

Why should you worry about customer segmentation? Segmentation is the key to delivering personalized customer experiences. It can provide detailed information on customer behavior, habits and preferences, enabling you to offer personalized marketing campaigns that increase your chances of success and improve your customer experience with personalized content.

Customer segmentation

What will we build? Using transactional purchase data, we will be able to create a 2 x 2 value matrix to create 4 groups of customers. Each group will be different from the other based on 2 dimensions: (1) current customer value and (2) potential customer value.

Which technique will we use? We will use the RFM model to create the functionality required by transactional purchase data. The RFM model is synonymous with:

  • Recency: when was the last time they bought?
  • Frequency: how often and for how long have they purchased?
  • Monetary Value / Sales: how much did they buy?

It is usually used to identify the customer with the highest value at the intersection of each 3 questions. To construct the 2 x 2 matrix we will use only R & the M of RFM.

RFM model

What data are we using? We will use the sample data set purchased provided by Tableau also known as "Global Superstore". It is often used for forecasting and time series analysis. It contains over 1500 different customers and 4 years of purchase data. Since we are conducting a behavioral segmentation and not a demographic segmentation, we will eliminate some potential demographic bias by filtering only on the B2C (consumer) segment and the US country.

What approach are we taking?

  • Step 0: Upload, filter, clean and aggregate data at the customer level,
  • Step 1: Create RFM functionality for each customer,
  • Step 2: To automate segmentation we will use 80% quantile for Recency and Monetary (we may also have used k-media clustering or leveraged business knowledge to create buckets – for example, corporate users of a global superstore consider a customer active as someone whose last order is less than 100 days),
  • Step 3: Calculate the RM score and order the customers,
  • Step 4: View the Value Matrix and explore some key numbers.

The way in Python:

  • Step 0: Upload, filter, clean and aggregate data at the customer level
it imports matplotlib as plt
imports numpy as np
% matplotlib online
import notices
warnings.filterwarnings (& # 39; ignore & # 39;)
it imports panda as pd
url = & # 39; https: //github.com/tristanga/Data-Analysis/raw/master/Global%20Superstore.xls'
df = pd.read_excel (url)
df = df[(df.Segment == 'Consumer') & (df.Country == 'United States')]
df.head ()
  • Step 1: Create RFM functionality for each customer
df_RFM = df.groupby (& # 39; Customer ID & # 39;). adj ({& # 39; Order date & # 39 ;: lambda y: (df['Order Date'].max (). date () - y.max (). at your place ()). days,
& # 39; Order ID & # 39 ;: lambda y: len (y.unique ()),
"Sales": lambda y: round (y.sum (), 2)})
df_RFM.columns = ['Recency', 'Frequency', 'Monetary']
df_RFM = df_RFM.sort_values ​​(& # 39; Monetary & # 39 ;, ascending = False)
df_RFM.head ()
  • Step 2: To automate segmentation we will use 80% quantile for Recency and Monetary
# We will use 80% quantile for each function
quantiles = df_RFM.quantile (q =[0.8])
printing (quantiles)
df_RFM['R']= Np.where (df_RFM['Recency']<= int (quantiles.Recency.values), 2, 1)
df_RFM['F']= Np.where (df_RFM['Frequency']> = int (quantiles.Frequency.values), 2, 1)
df_RFM['M']= Np.where (df_RFM['Monetary']> = int (quantiles.Monetary.values), 2, 1)
df_RFM.head ()
  • Step 3: Calculate the RFM score and order the customers
# To make the 2 x 2 matrix we will use only Recency & Monetary
df_RFM['RMScore'] = df_RFM.M.map (str) + df_RFM.R.map (str)
df_RFM = df_RFM.reset_index ()
df_RFM_SUM = df_RFM.groupby (& # 39; RMScore & # 39;). agg ({& # 39; Customer ID & # 39 ;: lambda y: len (y.unique ()),
& # 39; Frequency & # 39;: lambda y: round (y.mean (), 0),
& # 39; Recency & # 39 ;: lambda y: round (y.mean (), 0),
& # 39 ;: lambda y: round (y.mean (), 0),
& # 39 ;: lambda y: round (y.mean (), 0),
& # 39; Monetary & # 39 ;: lambda y: round (y.mean (), 0)})
df_RFM_SUM = df_RFM_SUM.sort_values ​​(& # 39; RMScore & # 39 ;, rising = False)
df_RFM_SUM.head ()
  • Step 4: View the Value Matrix and explore some key numbers
1) Average monetary matrix
df_RFM_M = df_RFM_SUM.pivot (index = & # 39; M & # 39 ;, columns = & # 39; R & # 39 ;, values ​​= & # 39; Monetary & # 39;)
df_RFM_M = df_RFM_M.reset_index (). sort_values ​​(['M'], ascending = False) .set_index (['M'])
df_RFM_M
2) Number of the customer matrix
df_RFM_C = df_RFM_SUM.pivot (index = & # 39; M & # 39 ;, columns = & # 39; R & # 39 ;, values ​​= & # 39; Customer ID & # 39;)
df_RFM_C = df_RFM_C.reset_index (). sort_values ​​(['M'], ascending = False) .set_index (['M'])
df_RFM_C
3) Recency matrix
Final Matrix

Some take-aways / quick wins with examples of very simple sales and marketing tactics?

  • There are few customers in the "Disengaged" bucket and have an average income above the "Star" bucket. Since there are very few, it should be easy to work with the company to understand what happened at the customer level. Based on the analysis, there could be a simple quick win: reactivate some of them with a phone call or a meeting to push them back to the "Star" bucket (for example, the customers involved).
  • The last average order of the "Light" bucket is very old (more than 1 year against 60-70 days for "involved" customers). Starting a simple reactivation campaign with a coupon could be an initiative that could lead to some new orders and help some of these customers move to the "New" bucket (for example, the customers involved).
Examples of simple tactics

The notebook is available on Github. Thanks for reading my post if you liked it, please clap your hands. Feel free to contact me if you want to make simple or more complex RFM segmentations within your organization.

More interesting readings to learn more about RFM with k-means for Python:


Source link

Leave a Reply

Your email address will not be published.