Association Mining: What products to recommend to your customers based on historic buying patterns?

"Association Rule Mining a.k.a Market Basket Analysis extracts underlying patterns and relationships that are otherwise not so apparent. The co-occurrences of data items can reveal inherent dependencies and establish rules of specific strength, often useful as a recommendation mechanism. Here is how you can quickly implement this.."

Measures of Association Rules

The following measures are used to evaluate the strength of association. Suppose, you are interested in the association between two events A and B:

  • Support = Number of  Rows having both A AND B / Total Number of Rows
  • Confidence =  Number of Rows  having both A AND B / Number of Rows with A
  • Expected Confidence = Number of rows with B / Total Number of Rows
  • Lift = Confidence / Expected Confidence.

Lift is the association growth factor by which the co-occurence A AND B exceeds the expected probability when there is no relation between events A and B. In other words, higher the lift ( > 1), higher the chance of co-occurrence of B with A.
# Load the libraries
library (arules)
library (arulesViz)
library (datasets)
data (Groceries) # Load the data set

By default, the class of ‘Groceries’ dataset is a ‘transactions’ type. Since ‘arules’ package is designed to work with  ‘transactions’ class, it is desirable to convert your dataframe to this class. Here is how you can convert it.
transDat <- as (myDataFrame, "transactions") # convert to 'transactions' class

Some Groundwork: Methods of ‘Transactions’ class dataset  

inspect (transDat) # view the observations
length (transDat) # get number of observations
size (transDat) # number of items in each observation
LIST(transDat) # convert 'transactions' to a list, note the LIST in CAPS

Lets Apply Apriori Algorithm

For illustrative purpose, Lets continue to work with ‘Groceries’ dataset from ‘arules’ package.
frequentItems <- eclat (Groceries, parameter = list(supp = 0.07, maxlen = 15)) # calculates support for frequent items
itemFrequencyPlot (Groceries,topN=10,type="absolute") # plot frequent items

A low support and high confidence helps to extract strong relationship even for less overall co-occurrences in data.

rules <- apriori (Groceries, parameter = list(supp = 0.001, conf = 0.5)) # Min Support as 0.001, confidence as 0.8.
quality(rules) # show the support, lift and confidence for all rules

# Show the top 5 rules, but only 2 digits
options (digits=2)
inspect (rules[1:5])
rules <- sort (rules, by="confidence", decreasing=TRUE) # 'high-confidence' rules.

How To Control The Number Of Rules in Output ?

Adjust the maxlen and conf  arguments in the apriori statement to control the number of rules generated. Use your best judgement here.
rules <- apriori (Groceries, parameter = list (supp = 0.001, conf = 0.5, maxlen=3)) # maxlen = 3 limits the elements in a rule to 3

  • To get ‘strong‘ rules, increase the value of ‘conf’ parameter.
  • To get ‘longer‘ rules, increase ‘maxlen’

How To Remove Redundant Rules ?

Use the below code to find out and filter the redundant rules.

redundant <- which (colSums (is.subset (rules, rules)) > 1) # get redundant rules in vector
rules <- rules[-redundant] # remove redundant rules


How to Find Rules Related To Given Item/s ?

This method is the core of ‘Market basket analysis’ that is useful to make recommendations of new items to your users. This can be achieved by modifying the ‘appearance’ parameter in the apriori() function. For example,

Find what factors influenced an event ‘X’

To find out what customers had purchased before buying ‘Whole Milk’. This will help you understand the patterns that led to the purchase of ‘whole milk’.

rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.08), appearance = list (default="lhs",rhs="whole milk"), control = list (verbose=F)) # get rules that lead to buying 'whole milk'

Find out what events were influenced by a given event

In this case: the Customers who bought ‘Whole Milk’ also bought. In the equation, ‘whole milk’ is in LHS (left hand side).                 
rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.15,minlen=2), appearance = list (default="rhs",lhs="whole milk"), control = list (verbose=F)) # those who bought 'milk' also bought..

Remove redundancies

Sort the rules, filter the redundant ones and show the Top 7 Rules.

rules <- sort (rules, decreasing=TRUE,by="confidence")
redundant <- which (colSums(is.subset(rules, rules)) > 1) # get redundant rules in vector
rules <- rules[-redundant] # remove redundant rules inspect (rules[1:7])

Making Rules For Continuous Data

If you try to make rules on continuous variables, each value will be treated as distinct item, causing undesirable explosion of rules. So, convert the continuous variables to factors, which can be easily done using discretize() function.
discretize (x, method="cluster", categories=3) # method can make cuts in equal "intervals", "frequency", "cluster", "fixed"

Visualizing The Rules

# Interactive Plot
plot (rules[1:25],method="graph",interactive=TRUE,shading="confidence") # feel free to expand and move around the objects in this plot
plot (rules, measure=c("support", "lift"), shading="confidence")

Association rules interactive plot        Plot Suppor Lift aRules 2

 

More Useful Functions

affinity(transDat) # Calculates affinity - the 'nxn' Jaccard Index affinity matrix
transDat_c <- addComplement(transDat, "Item 1") # Adds "Item 1" to all transactions in transDat
duplicated(rules) # find out if any rule is duplicated

Summary
Article Name
Association mining and Market basket analysis With R
Description
This article shows you how to do association mining analysis with R. Also, called as Market basket analysis or a recommendation system.

If you like us, please tell your friends.Share on LinkedInShare on Google+Share on RedditTweet about this on TwitterShare on Facebook
  • Edward Tamil

    How to Export more than 2000 Rules to Excel. write.csv, write.table is not working above 2000+ rules