The Lapsation Analysis Notebook is used to understand what different factors are responsible for the increased Lapsed rate for a particular period.


Lapsation Analysis is a process where we analyze historical data of our lapsed users to determine factors causing their lapsation. In this process, we start with creating a broad single view of our user base for a defined period. This single view must include all the relevant attributes pertaining to our analysis. Here, we have included the following KPIs.

  • User-level KPIs: User id, User name, Mobile number
  • Transaction level KPIs:  first bill date, last bill date, total sales, total visits, total quantity, total bills, ATV, ABS, recency, latency, points_redeemed
  • Geographical KPIs: Last purchased store location (City, zone, region)
  • Product KPI: Last purchased product category

Using the above KPI’s we will find our active, lapsed & inactive users based on their recency. After this, we will continue our analysis on the lapsed and inactive customers.

Here, we consider four factors that affect our lapsed customers.

  • Seasonality: In this part, we are calculating monthly lapsed and inactive customers. Using this we will get to know if any particular month/quarter/year is increasing our lapsed rate.
    We will also check new customers, repeat customers and win back customers on a monthly basis. 
  • ATV and  Frequency level: In this section, we will analyze our lapsed customers based on their Average Transaction Value (ATV) and total visits made by them (Frequency).
    We will divide our user base into high, mid and low based on their ATV and Visits. This exploration will lead us to know users of which segment are getting lapsed. With this information, we can target those specific customers with proper engagement/communication to make them active again. 
  • Geographical Level: Here, we will take geographical level information into account to check which of our geography has maximum lapsed customers.
    This geographical level can be zone, region, city, area and so on (any 2 from these are explored in this analysis).
    We will get our top 10 geographies with maximum lapsed customers. We can then focus on these geographies to increase our user engagement.
  • Product Level: In this section, we will analyze the last product category/categories purchased by our lapsed/inactive users. This will include a combination of categories if our user has bought more than one product category in their last visit.
    Using this data, we can get our top categories or combination categories that was last purchased by our lapsed/inactive users. We can then know if there is any particular product category that is increasing our lapsed/inactive rate.
    This exploration is done for both one-timer users and repeat users separately.

Understanding the Notebook

The Notebook has 4 main sections:

  1. Yearly and Monthly Lapsed, Inactive and Winback Rate
    1. Yearly Active and Lapsed Rate (Output - cmd 14)
    2. Monthly Repeat Rate (Output - cmd 23) 
    3. Monthly New Enrollment Rate (Output - cmd 25)
    4. Monthly Lapsed Rate: Overall, Repeaters and Onetimers (Output - cmd 29,30,31)
    5. Monthly Inactive Rate: Overall, Repeaters and Onetimers (Output - cmd 34,35,36)
    6. Monthly Winback Rate: Overall, Repeaters and Onetimers(Output - cmd 39,40,41)
  2. Lapsed Users based on their Average Transaction value and Total Visits.(Output - cmd 50,52,53)
  3. Geographical factors:
    1. Geographical Level 1 wise Active and Lapsed Users:
      - One Timers. (Output - cmd 58,59)
      - Repeaters (Output - cmd 61,62)
    2. Geographical Level 2 wise Active and Lapsed Users.(Output - cmd 66,67)
  4. Product Category last purchased by Active and Lapsed users.(Output - cmd 73 to 79)

Understanding Output

The first part of point 1 is simply showing how the lapsed rate has changed in comparison to the previous year. The rest all the graphs for part 1 are a percentage distribution of users on a monthly basis, giving a yearly comparison. 

Sample output of Yearly Distribution of Lapsed and Active users:

For point 2, cmd 50 has the excel sheet for all the users' kpis and their respective ATV and Frequency tag. You can download this sheet to get user ids. Cmd 52 and 53 is a visual representation of User count and percentage distribution in each tag respectively.

Sample output of User Count based on ATV and Frequency Tags:

For point 3, cmd 58, 59,61,62 are graphs showing Active & Lapsed user count and percentage distribution for Onetimers and Repeaters respectively. These graphs will only show us the top 10 of each Geographical level based on maximum lapsed users present.

Sample output of Store State-wise Active and Lapsed Repeater User Distribution:

For point 4, cmd 73 and 76 will give us a table for active and lapsed users respectively. This table tells us which category combinations have a maximum of active users and which category combination was purchased last by our lapsed users. Cmd 74,75,78,79 are visual representations of top 5 and bottom 5 such combinations for Active and Lapsed Users.

Sample output of Top 5 purchased categories in the last visit by users who are active for an apparel brand:


Running notebook

The required inputs which need to be entered before running the notebook:

  • Org_id of the brand
  • Start Date and End Date - Duration for which the Lapsation notebook is to be run. Format: yyyy-mm-dd 
  • Inventory table database name and Inventory table name for product category wise analysis. 
  • Primary Key Column name to join it with bill_summary to get product details.
  • Main Category Column name - Column name of the main category of the product from the Inventory table
  • Frequency and ATV column name from the cmd 7 (single view)
  • Lapsed Period - Lapsed Period of the brand in terms of days
  • Inactive Period -  Inactive Period of the brand in terms of days
  • Geographical level 1 and 2 - Column names from zone_tills table for geographical analysis (For eg: state,store,city,zone,region,area,etc)

Steps to run the notebook

Notebook links: India Cluster  | APAC Cluster | EMEA Cluster

  1. Input the Org ID of the brand
  2. Start date and End date for your analysis
  3. ATV and Frequency column names from the below single view code
  4. Provide lapsed period and Inactive Period days of your brand
  5. Input the column name of geographical level 1 and geographical 2 from zone_tills table of your brand.
    This analysis will return  the top 10 in geographical level 1 and geographical level 2 (i.e city, region, state, zone) that have the most Lapsed users.
  6. In command 48 change ATV low & mid using the outputs from cmd 46 and for frequency low and mid use output from cmd 45.
    Run the notebook again after making these changes for final outputs.
  7. For product-wise analysis, provide the database and table name of your product master.
  8. Input the primary key column name to join with bill_lineitems item_code
  9. Input the column name for the main category from the product table.
    This column should not contain any duplicate values. For eg if your column has 'Gold' and 'GOLD' in the main category then kindly change it to either one before running this notebook otherwise we will get duplicate categories in our analysis.
  10. Make necessary changes in the SQL code to add any other filters for the main category tag in cmd 70 line 67 and line 139.
    The sample code has '-' and 'null' values tagged as others. You can change it according to your brand's data.
  11. Change the insights and observations below each graph according to your output.