Friday , December 4 2020

# How to automatically segment customers using purchase data and some Python lines

## A small educational project for learning "customer segmentation" with a simple data analysis technique

Why should you worry about customer segmentation? Segmentation is the key to delivering personalized customer experiences. It can provide detailed information on customer behavior, habits and preferences, enabling you to offer personalized marketing campaigns that increase your chances of success and improve your customer experience with personalized content.

What will we build? Using transactional purchase data, we will be able to create a 2 x 2 value matrix to create 4 groups of customers. Each group will be different from the other based on 2 dimensions: (1) current customer value and (2) potential customer value.

Which technique will we use? We will use the RFM model to create the functionality required by transactional purchase data. The RFM model is synonymous with:

• Recency: when was the last time they bought?
• Frequency: how often and for how long have they purchased?
• Monetary Value / Sales: how much did they buy?

It is usually used to identify the customer with the highest value at the intersection of each 3 questions. To construct the 2 x 2 matrix we will use only R & the M of RFM.

What data are we using? We will use the sample data set purchased provided by Tableau also known as "Global Superstore". It is often used for forecasting and time series analysis. It contains over 1500 different customers and 4 years of purchase data. Since we are conducting a behavioral segmentation and not a demographic segmentation, we will eliminate some potential demographic bias by filtering only on the B2C (consumer) segment and the US country.

What approach are we taking?

• Step 0: Upload, filter, clean and aggregate data at the customer level,
• Step 1: Create RFM functionality for each customer,
• Step 2: To automate segmentation we will use 80% quantile for Recency and Monetary (we may also have used k-media clustering or leveraged business knowledge to create buckets – for example, corporate users of a global superstore consider a customer active as someone whose last order is less than 100 days),
• Step 3: Calculate the RM score and order the customers,
• Step 4: View the Value Matrix and explore some key numbers.

The way in Python:

• Step 0: Upload, filter, clean and aggregate data at the customer level
`it imports matplotlib as pltimports numpy as np% matplotlib online  import noticeswarnings.filterwarnings (& # 39; ignore & # 39;)it imports panda as pdurl = & # 39; https: //github.com/tristanga/Data-Analysis/raw/master/Global%20Superstore.xls&#39;df = pd.read_excel (url)df = df[(df.Segment == 'Consumer') & (df.Country == 'United States')]df.head ()`
• Step 1: Create RFM functionality for each customer
`df_RFM = df.groupby (& # 39; Customer ID & # 39;). adj ({& # 39; Order date & # 39 ;: lambda y: (df['Order Date'].max (). date () - y.max (). at your place ()). days,& # 39; Order ID & # 39 ;: lambda y: len (y.unique ()),  "Sales": lambda y: round (y.sum (), 2)})df_RFM.columns = ['Recency', 'Frequency', 'Monetary']df_RFM = df_RFM.sort_values ​​(& # 39; Monetary & # 39 ;, ascending = False)df_RFM.head ()`