This notebook aims to track the movement of customers across different RFM segments for two different periods.

Prerequisite

This notebook can be used for brands where the RFM segments have been created by running the automated RFM notebook on databricks (which makes use of K-means).

The notebook is present in the Segmentation section of the Standardized notebooks folder in the Knowledge Base repository. The name of the notebook is RFM Standardized Combined (R+F+M).

Input parameters

You need to provide the following details before running the Notebook.

  1. org_id: Enter the org_id of the brand for which RFM needs to be done.
  2. start_date: The start date of the duration for which RFM needs to be done has to be entered.
    Format: yyyy-mm-dd. Ex: 2018-01-01.
  3. end_date: The end Date of the duration for which RFM needs to be done has to be entered.
    Format: yyyy-mm-dd. Ex: 2020-12-31.
  4. Quantile_Outlier:
    1. This is an exploratory step that will help us in excluding the outliers from our data.
    2. The value entered in this field is taken as a cut off and customers having their Frequency and Monetary values higher than that will be dropped from the analysis.
    3. The value can be decided by looking at the output of command 22.
      From this table, it can be seen that there is a huge difference between the 99th percentile of frequency and monetary values and the maximum value.
    4. You can enter a value between 1 and 100.
      This is to exclude all the customers who have their Frequency and Monetary value much higher than the rest.
  5. No. of Clusters:
    1. This is an exploratory step that will help us in deciding the no. of individual clusters for R, F, and M values.
    2. The number of clusters is decided based on the output of command 31.
      • This chart is an Elbow curve. Source: Wikipedia “In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.” 
      • The segmentation is first done individually of Recency, Frequency, and Monetary features. After this, we combine the values of the individual segment scores to come up with an overall score.
    3. Overall score = R+F+M (In the following example, the range of overall score will be from 0 to 6).
  6. Cuts:
    1. This is an exploratory step, which will help us in deciding the final number of segments for our customer group.
    2. The visual in command 52 can help us decide the different cuts for the overall segments.
    3. As mentioned earlier, the overall score is equal to the sum of Recency, Frequency and Monetary scores. So, if we have 4 clusters for each of the variables with the individual score ranging from 0 to 3 then the overall score can have values between 0 and 9.
      1. The x-axis is the overall score after combining R, F, and M clusters. Each bar represents the % share of customers present under each overall score. Whereas, the line represents the cumulative sales contribution.
      2. Users have to enter the lowest possible score for each segment separated by commas. For example: 0,3,5,7 or 0,5,7 etc.
      3. The cuts have to be entered in the increasing order of the segment’s value. The least valuable has to be the first and the most valuable has to be the last.
  7. RFM Labels:
    1. The values have to be entered in the comma-separated format.
    2. The no. of values to be entered has to be equal to the no. of cuts already entered.
    3. If values entered are less than or greater than the no. of cuts that have been entered then the notebook will throw an error. For example, if we enter 0,3,5,7 then the values in this field will be Base, Mid, Top, Premium.
    4. The segment labels have to be entered in the increasing order of the segment’s value. The least valuable has to be the first and the most valuable has to be the last.
  • Once we have decided on the quantile_outlier, no. of individual clusters, cuts, and RFM labels then we need to run the entire notebook again to get the final segments.
  • Do the above process for both the time periods separately and then run the notebook to get the movement of users across RFM segments. 
  • Once you have the final segments for both the time period, input them into the notebook to get the movement of users.
  • Instruction to run the above notebook is in cmd 2 of the notebook

Sample Output

From the following output, we can observe that almost 18.8K users have downgraded from Mid segment to Base segment, whereas only ~5K have upgraded to either premium or top segment.

Notebook Links

Open your cluster-specific link provided for the Notebook.

Notebook
Cluster links
Movement of Users Across RFM Segments (using Kmeans)
India, SEA, EMEA