Depending on where the client-side source data is uploaded, there are two templates that you can use for data reconciliation - S3 or SFTP.

The steps to set up the data flows remain the same otherwise.

Connect-to-Source

Provide details on where the source data files (CSV format) are available (S3 or SFTP server). To identify the missing events, the source data file is compared with data in the Insights+ one by one.

For example, if the source is SFTP, you need to specify the location in SFTP where the source CSV files will be uploaded. This is same as what is seen in other templates.

Map-fields-for-reconciliation

Here, you need to specify the type of the source file and the matching criteria. Currently, the following are supported.

  • Regular transactions 
  • Regular transactions: Line Items
  • Return transactions
  • Return transactions: Line Items
  • Customers
  • Product

To better your understanding, consider a scenario where a client has mentioned that Capillary is missing ingestion of 10% of transactions which is leading to major customer issues on the loyalty program front. 

Let us also assume that on a weekly basis the client will share source data, which has both transaction and line-level details in one CSV file to a SFTP location and wants Capillary to ingest the missing transactions into CDP to trigger downstream loyalty events. The client needs the data to be ingested even on a weekly basis without fail.

Here are the steps:

  1. In the Connect-to-Source block, set up the location where the files would come from.
  2. Select the type of source data. Here, the file type is Regular Transactions.
Please note that we are looking to ingest missing transactions and corresponding line items of those transactions. Hence, it is appropriate to select Regular Transactions.


  1. In the Regular Transactions tab, map the source data file with the corresponding fields on Capillary CDP for comparison. 
  2. To identify transactions that are missing in the Capillary CDP  and mark them in the source file, enter the correct way to identify the bill. For example, the CDP field mapping options are:
  • Bill number
  • Bill date
  • Store Code
  • External ID
  • Mobile
  • Email

    Consider the following source file:


Here, you can see a single file with both transaction and line-level data. The the bill_number field is repeated in two different stores on two different days. Now, if we enter bill_number alone as the condition to identify whether a transaction in the source file exists in the Capillary CDP, it will be a problem. In this case, Connect+ will check for bill_number 29283288382 in the backend and see if the corresponding transaction exists. Since we can see that bill_number is not unique, another transaction with number 29283288382 might already exist in the backend. If it does exist, the CDP will mark this transaction as available already with 1 in the reconciled file as shown below (see last column):

To avoid this situation, you need to use multiple fields to uniquely identify the transaction. For example, you can use a combination of store_code and bill_number fields to uniquely identify a bill. In such a case, we have to map the source file headers store_code and bill_number to Store Code and Bill Number CDP fields in Connect+ Map-fields-for-reconciliation block. We can also use a combination of store_code, bill_number and bill_date if the previous combination does not suffice.

  • You can also see a Current Year option here. If you leave the value as is, it will compare source data with backend data up to current year and previous year. If you change it to say 2016, it will compare source data with backend data from 2016 up to current year. This option is available to speed up the comparison process.
  • Once the fields that need to be compared are mapped against the CDP fields in the, we need to specify the SFTP path where we want the reconciled file to be uploaded as well. This inherits the same credentials from Connect-to-source SFTP. 
  • There is also a date/time formatter if the bill date field is included for comparison.

Reconciliation-job

Retries refers to the number of times Connect+ should try a comparison, in case of any failures. Usually, you can leave this to the default value.

Trigger

Specify the frequency at which the data flow needs to be executed

After this is done, you can start the data flow. he dataflow, once started, will run at specified frequency, compare source data with data in Insights+ backend and then upload the source data with an additional field where we will find 1 or 0. 1 implies that the transaction is available in the backend. 0 implies that it is a missing transaction. The same holds true if we are comparing returns or customers.

Note: As specified throughout, Connect+ compares source data with backend (also referred to as CDP data here) of Insights+. Insights+ backend has data from CDP but in a normalized form for easy reporting/analysis. We are using Insights+ backend for reconciliation as it will be faster to compare source data with it as opposed to direct CDP data. However, the catch is that Insights+ backend data is always lagging behind by a day. ETL runs only once every night (according to org time zone). So today, we will be able to see data only up to yesterday assuming the ETL run for the previous day is complete. Sometimes, ETL runs get delayed or don’t even complete if there are issues. We will have to be mindful of this while setting up the reconciliation data flow. We are working on an enhancement to deal with this potential issue but it will come later in the year.