In market basket analysis (also called association analysis or frequent itemset mining), you can analyze products that commonly purchase together. For example, people who buy bread and peanut butter might also buy jelly, people who buy shampoo might also buy conditioner. The analysis aims at understanding relationships between items. Knowing what your customers tend to buy together can help with marketing efforts and store/website layout.

The main purpose of market basket analysis is to generate association rules between different products.  Rule generation is a common task in mining frequent patterns. 

The output of the analysis can be used to answer the following questions:



Process flow of the analysis



Apriori Algorithm

Application of Market Basket Analysis


Market basket analysis is applied to various fields of the retail sector in order to boost sales and generate revenue by identifying the needs of the customers and making purchase suggestions to them. Some of the applications are listed below:

  • Cross-Selling: Cross-selling is basically a sales technique in which a seller suggests some related product to a customer after he buys a product. A seller influences the customer to spend more by purchasing more products related to the product that has already been purchased by him. For instance, if someone buys milk from a store, the seller recommends buying coffee or tea as well. So basically the seller suggests the complementary product to the customer with the product that the customer has already purchased. Market basket analysis helps the retailer to know the consumer behavior and then go for cross-selling.
  • Product Placement: It refers to placing the complimentary (pen and paper)and substitute goods (tea and coffee) together so that the customer addresses the goods and will buy both the goods together. If a seller places these kinds of goods together there is a probability that a customer will purchase them together. Market basket analysis helps the retailer to identify the goods that a customer can purchase together.

FAQs

Most common FAQs regarding Market Basket Analysis:

What is an item set?

  • An item set is a collection of unique items bought by a customer in a single transaction.

What is a frequent itemset?

  • A frequent itemset is the collection of unique items which are present in the majority of the transactions.

What is an association rule?

  • An association rule is an implication expression of the form X→Y, where X and Y are disjoint item sets. A more concrete example based on consumer behavior would be {Baby Food} → {Diapers} suggesting that people who buy Baby Food are also likely to buy Diapers.

What is Support,  Confidence and Lift?




Sample output




Notebook Workflow


  1. Deciding the maximum length of an item set and minimum support.

    The above chart will help the user in deciding the maximum length of an item set up to which the analysis needs to be done. The vertical axis represents the no. of transactions, the horizontal axis represents the no. of items that have been bought by the customers in a single transaction.

    In the above example, it makes sense to keep the maximum length of an item set as 3. As not many transactions have happened where items being bought are greater than 3.

    Min Support (Frequent Itemset), Minimum support refers to the minimum proportion of transactions that have a particular item set. For example, if an item set {A,X,Y} has the support of 0.02 then it means that 2% of the overall transactions had items A, X and Y together. Setting a minimum value for this metric will help in removing the less popular item sets.

    The input for maximum length and minimum support can be entered in command no. 15 of the notebook.

  2. Association Rules pruning:
    After the frequent itemsets have been generated using the min. support and max. Length values, we finally create the plausible association rules among those frequent item sets which will help in determining if there are any specific items that are prompting the user to buy any other item.
    But not all the rules are good enough. There are few ways to filter out only the best rules out of the entire set. Let’s say we had an item set: {X, Y, Z} and the association rule algorithm has given us three rules:
    1. {X, Y} → {Z}
    2. {X, Z} → {Y}
    3. {Y, Z} → {X}
      Now the task is to identify which of the above is the best combination.
      There are 3 metrics, which we are currently looking at in this notebook.
    4. Antecedent Support: It is the proportion of the transactions in which only the items on the left-hand side of the rules are present. We would want this to be sufficiently high as creating a cross-promotional offer on an item set that is already not being bought will not make sense.
    5. Consequent Support: It is the proportion of the transactions in which only the items on the right-hand side of the rules are present. Similarly, the consequent support also needs to be sufficiently high for the same reasons.
      For now, the values of antecedent support and consequent support have been fixed as 0.095, i.e. at least 1% of transactions should have only the items which are present on either side of the rule. These can be changed as per the business requirements from command 19 and 20 respectively.
    6. Leverage: This metric measures the strength of the association rule and tells us how good an association is whethe items on either side of the rule being bought individually. By a rule of thumb, Leverage > 0 indicates that there are more chances of items in the rule being bought together.


The standard notebook to do Market Basket Analysis for a brand: 


Notebook fields

The changes that need to be made for running this notebook are mentioned as under:

  • Org_id: Enter the org_id for which the analysis needs to be done.
  • Start Date: Enter the start date of the duration for which the analysis needs to be done.
  • End Date: Enter the last date of the duration for which the analysis needs to be done.
  • Inventory DB: Enter the database name which has the table with product inventory data.
  • Inventory Table: Enter the name of the table which has the product inventory data.
  • Product Code Column
    • Enter the name of the column that has the information about the unique product code/ean_code.
    • This column must have the same data which is getting stored under the ‘item_code’ field in the bill_lineitems fact.
  • Hierarchy Level Column: Enter the column name of the product hierarchy at which the analysis needs to be done. For example: If the brand needs to identify which two parent categories are usually being bought together then we need to enter the name of the column which captures the Parent Category of the products.