This document is designed to help the end-user with input fields that need to be filled for running the notebook instance.

The notebook is present in the Segmentation section of the Standardized notebooks folder in the Knowledge Base repository. The name of the notebook is RFM Standardized Combined (R+F+M).

Recency, frequency, monetary value is a marketing analysis tool used to identify a company's or an organization's best customers by using certain measures. The RFM model is based on three quantitative factors, they are as follows.


Following are the methodology of doing RFM Segmentation.

  1. Exploratory Data Analysis:
    1. The first step for doing RFM segmentation for any brand is to create a dataset where each customer’s Recency, Frequency and Monetary values are present.
    2. The second step is exploratory data analysis, that is to identify the lapsation period as well as the drop rate of the brand.
    3. The lapsation period helps in identifying the recency threshold beyond which a customer is considered as lapsed.
    4. The Drop rate helps in identifying how many customers are dropping off after making their first visit.
       If the drop rate after first visit is greater than 60% then as a standard practice, the RFM segmentation is done separately for the customers who have made more than one visit and for customers who have made exactly one visit.
  2.  K Means Algorithm:
    1. Specify number of clusters K. Use the Elbow method or Silhouette Method to get the appropriate number of clusters. To know more, read Determine the Optimal K for K-Means.
    2. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement.
    3. Keep iterating until there is no change to the centroids. that is assignment of data points to clusters isn’t changing.
      To learn more, read Customer Segmentation Using K Means Clustering.

Input Parameters

You need to provide the following details before running the Notebook.

  1. org_id: Enter the org ID of the org/brand for which you want to do the RFM segmentation.
  2. start_date: This is a date field. Enter the start date of the duration for which you want to do the RFM segmentation.
    Format: yyyy-mm-dd. For example, 2018-01-01.
  3. end_date: This is a date field. Enter the end date of the duration for which you want to do the RFM segmentation.
    Format: yyyy-mm-dd. For example, 2021-01-31.
  4. Quantile_Outlier: This step will help you to exclude the outliers from our data. The value entered in this field is a cut-off value, customers having a higher frequency and monetary value than the cut-off will be excluded from the analysis.
    You can decide the value by looking at the output of command 22.

    A value between to100 is entered. This is to exclude all the customers who have a higher frequency and monetary value than the rest.
    From the above table, you can see a huge difference between the 99th percentile of frequency value, monetary value, and maximum value.
  5. Number of Clusters: This step can help you decide the number of individual clusters for R, F, and M(recency, frequency, and monetary) values. You can decide the number of clusters after looking at the output of command 31.

    The above chart is an elbow curve.
    Source: Wikipedia “In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.”

    The segmentation is first done individually for Recency, Frequency, and Monetary features. After that, you can combine the values of the individual segment scores to get an overall score.

    Overall score = R+F+M(In the following example, the range of overall score will be 0 to 6).
    R ScoreDescriptionF ScoreDescriptionM ScoreDescription
    0High Recency0Low Frequency0Low Monetary
    1Medium Recency1Medium Frequency1Medium Monetary
    2Low Recency2High Frequency2High Monetary
  6. Cuts: This is a step that can help you to decide the final number of segments for the customer group. The visual in command 52 can help you to decide the different cuts for the overall segments.
    As mentioned in step 5, the overall score is equal to the sum of Recency, Frequency, and Monetary scores. Therefore, if we have four clusters for each of the variables with the individual score ranging from 0 to 3, then the overall score can have values from 0 to 9.
    1. The x-axis shows the overall score after combining the R, F, and M clusters. Each bar represents the percent(%) of the share of customers present under each overall score. Whereas, the line represents the cumulative sales contribution.
    2. You have to enter the lowest possible score for each segment separated by commas. For example, 0,3,5,7 or 0,5,7 etc.

    3. The cuts have to be entered in the increasing order of the segment’s value. The least valuable segment comes first, and the most valuable segment comes last in the order.

  7. RFM labels: Enter the values separated by a comma.

    The no. of values to be entered has to be equal to the no. of cuts already entered. The notebook will throw an error if the values entered are less than or greater than the number of cuts.

    For example, if we have entered 0,3,5,7 then values in this field have to be something like Base, Mid, Top, Premium.

    You need to enter the segment labels in the increasing order of the segment’s value. The least valuable segment comes first, and the most valuable segment comes last in the order.
    For example, in the above chart, if you have an overall score from 0 to 9 and the values entered in cuts and RFM labels fields are 0,3,5,7 and Base, Mid, Top, Premium then,

    1. Customers in the Base segment are the ones who have an overall RFM score of 0, 1, or 2.
    2. Customers in the Mid segment are the ones who have an overall score of or 4
    3. Customers in the Top segment are the ones who have an overall score of or 6.
    4. Customers in the Premium segment are the ones who have an overall score of 7, 8, or 9.
  8. FTP location: Enter the FTP location, where you want to save the RFM output. Format: /DeliveryAPAC/Dominos_Internal.
    Do not enter URL in front of the path.
  9. User name: Enter the username for the FTP.
  10. Password: Enter the password for the FTP.

Once you have decided on the quantile_outlier, number of individual clusters, cuts, and RFM labels then you need to run the entire notebook again to get the final segments.

The following are the different versions of the notebook.

  1. RFM segmentation (separate for one timer and repeaters): You can use this when the database has a very high share of customers who have made only one visit.
  2. RFM segmentation with OU: You can use this when RFM segmentation for a specific OU within an org.
  3. FM-based segmentation: You can use this when only Frequency and Monetary are used for segmentation.

Benefits and applications of RFM Segmentation

RFM segments allow orgs to do the following.

  1. Identify their TOP customers- customers who are most loyal, spend the most and visit frequently.
  2. Identify the fence sitters- customers who are on verge of getting lapsed.
  3. Identify potential loyal customers- customers who have the capability of getting converted to top customers if given the right nudge at the right time.
  4. Increase the repeat conversion by targeting the active one timers with bounce-back campaigns and lapsed one timers with aggressive offers.

The following image shows a Sample Engagement strategy based on the RFM segmentation for an electronics brand.

Notebook Links

Open your cluster-specific link provided for the Notebook.

NotebooksCluster links
RFM segmentation - together for one timer and repeatersIndia  SEA
RFM segmentation - individually for one timer and RepeatersIndia  SEA
FM segmentationIndia  SEA
RFM segmentation - adding OUIndia  SEA