Fraud detection is a critical issue for retailers determined to prevent losses and preserve customer trust. Fraud can originate from customers or people acting like customers, store associates, or external criminals or hackers. The most prominent recent frauds involve stolen credit card information and fraudulent merchandise returns. Analysis of transactions and activities such as purchasing, accounts payable, POS, sales projections, warehouse movements, employee shift records, returns, store-level video, and audio recordings, and other data across your company can help to identify fraudulent activity and develop appropriate priorities for case management and investigation.

For Fraud Detection we have identified threshold values for certain KPIs which can highlight if a transaction is a fraud or not. The set of Rules considered in this Fraud Analysis are as follows.

Based on these rules we have taken different KPIs to determine the threshold Values for these rules.

Vintage and VisitVintage is the number of days since the first transaction date, and Visit is the number of times the Customer has come to the store (unique number of days the customer has purchased). The KPI for this rule is (Vintage) / (Visit) which must be less than a predefined threshold value to get the fraud status.

Transaction hours count in a   


Customers are prone to have fraud status if they cross a predefined number of transactions in a day (transactions count here is the distinct transaction hours in a day).
Transaction Amount LimitThe system will trigger transactions having an amount more than the predefined threshold and that particular customer is tagged as Fraud.
Top Customer PercentileThis is the rank of the customer sorted in descending order of lifetime purchase. Customers who belong to the 0.1 percentile would have a higher probability of making fraud by entering other's transactions into their account.
Transaction Happening in 2 different regions/zones on the same day or with a gap of 6 hours (transactions in different zones on the same day)Customers having transactions more than once a day in different zones are prone to be tagged as fraud. There would be very few customers who purchase in 2 different regions on the same day.
LatencyThe Latency is defined as (last bill date – joined date) / (visits – 1). The chances of a customer being a fraud are high if he has more number of visits than the predefined value and has lower latency of visit than the predefined value.
Spike in Transaction AmountsIf any transaction amount is more than 10 times the average transaction amount, there is a high possibility that the particular customer is doing fraud.
Transaction per DayThe maximum number of transactions in a day should be less than a predefined critical value.
Transaction per WeekThe maximum number of transactions in a week should be less than a predefined critical value.
Latency of RedemptionThe Latency of redemption is defined as (maximum date of redemption – minimum date of redemption) / (# unique days where points get redeemed– 1).
If the number of redeemed visits is more than a predefined value and the latency of redemption is less than a predefined value; then there are high chances of fraud.
Number of Times the points get RedeemedThe number of times the customer has redeemed the points should be more than a predefined value to mark him as a fraud.
Redeeming RateThis is the ratio of points redeemed visit days to points awarded visit days which should be more than a predefined value to mark a particular customer fraud.
Total Points RedeemedIf the total points redeemed by the customer is more than a critical value he is more prone to fraud.

Input parameters

Input ParamDescription
Org_idThe ID of the brand.
Start Date and End DateThe notebook’s running duration. Format: YYYY-MM-DD
Bottom and Top Quantiles

Enter 3 different values for bottom and top quantiles.

(Bottom_Quantile_1 and Top_Quantile_1, Bottom_Quantile_2 and  Top_Quantile_2,

Bottom_Quantile_3 and Top_Quantile_3) 

Command 7Update the single view with the Org specific store filters.


  • The KPIs' values are determined at the customer level by creating Fraud Single View.
  • Box Plots are used to get an idea about the outlier customers.
  • After determining the customer-level data we identify 3 sets of top and bottom percentile values for the respective KPIs. In the notebook, the following quantiles are used.
    OptionBottom QuantileTop Quantile
    Set 10.010.99
    Set 20.050.995
    Set 30.10.999
  • Based on these quantile values we calculate customer count respectively for each KPI. Based on these 3 sets we identify the right cutoffs decision. For example, if a bill amount is greater than 30000 then the customer will be marked as fraud.
    KPICutoffCustomer Count
    Bill Amount300005500


  • There are 3 sets of outputs of top and bottom quantiles.
  • The output is the quantile values and the number of customers above the top quantile value and the number of customers below the bottom quantile value.

Notebook Links