"Association Rule Mining a.k.a Market Basket Analysis extracts underlying patterns and relationships that are otherwise not so apparent. The co-occurrences of data items can reveal inherent dependencies and establish rules of specific strength, often useful as a recommendation mechanism. Here is how you can quickly implement this.."

### Measures of Association Rules

The following measures are used to evaluate the strength of association. Suppose, you are interested in the association between two events A and B:

*Support = Number of Rows having both A AND B / Total Number of Rows**Confidence = Number of Rows having both A AND B / Number of Rows with A**Expected Confidence = Number of rows with B / Total Number of Rows**Lift = Confidence / Expected Confidence.*

Lift is the association growth factor by which the co-occurence A AND B exceeds the expected probability when there is no relation between events A and B. In other words, higher the lift ( > 1), higher the chance of co-occurrence of B with A.

` # Load the libraries`

library (arules)

library (arulesViz)

library (datasets)

data (Groceries) # Load the data set

By default, the class of ‘Groceries’ dataset is a **‘transactions’** type. Since ‘arules’ package is designed to work with ‘transactions’ class, it is desirable to convert your dataframe to this class. Here is how you can convert it.

`transDat <- as (myDataFrame, "transactions") # convert to 'transactions' class`

### Some Groundwork: Methods of ‘Transactions’ class dataset ** **

`inspect (transDat) # view the observations`

length (transDat) # get number of observations

size (transDat) # number of items in each observation

LIST(transDat) # convert 'transactions' to a list, note the LIST in CAPS

### Lets Apply Apriori Algorithm

For illustrative purpose, Lets continue to work with ‘Groceries’ dataset from ‘arules’ package.

`frequentItems <- eclat (Groceries, parameter = list(supp = 0.07, maxlen = 15)) # calculates support for frequent items`

itemFrequencyPlot (Groceries,topN=10,type="absolute") # plot frequent items

A low support and high confidence helps to extract strong relationship even for less overall co-occurrences in data.

`rules <- apriori (Groceries, parameter = list(supp = 0.001, conf = 0.5)) # Min Support as 0.001, confidence as 0.8.`

quality(rules) # show the support, lift and confidence for all rules

`# Show the top 5 rules, but only 2 digits`

options (digits=2)

inspect (rules[1:5])

rules <- sort (rules, by="confidence", decreasing=TRUE) # 'high-confidence' rules.

### How To Control The Number Of Rules in Output ?

Adjust the **maxlen **and** conf** arguments in the apriori statement to control the number of rules generated. Use your best judgement here.

`rules <- apriori (Groceries, parameter = list (supp = 0.001, conf = 0.5, maxlen=3)) # maxlen = 3 limits the elements in a rule to 3`

- To get ‘
**strong**‘ rules, increase the value of**‘conf’**parameter. - To get ‘
**longer**‘ rules, increase**‘maxlen’**

### How To Remove Redundant Rules ?

Use the below code to find out and filter the redundant rules.

`redundant <- which (colSums (is.subset (rules, rules)) > 1) # get redundant rules in vector`

rules <- rules[-redundant] # remove redundant rules

### How to Find Rules Related To Given Item/s ?

This method is the core of ‘Market basket analysis’ that is useful to make recommendations of new items to your users. This can be achieved by modifying the *‘appearance’* parameter in the *apriori()* function. For example,

#### Find what factors influenced an event ‘X’

To find out what customers had purchased before buying ‘Whole Milk’. This will help you understand the patterns that led to the purchase of ‘whole milk’.

`rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.08), appearance = list (default="lhs",`

**rhs="whole milk"**), control = list (verbose=F)) # get rules that lead to buying 'whole milk'

#### Find out what events were influenced by a given event

In this case: the Customers who bought ‘Whole Milk’ also bought. In the equation, ‘whole milk’ is in LHS (left hand side). * *

`rules <- apriori (data=Groceries, parameter=list (supp=0.001,conf = 0.15,minlen=2), appearance = list (default="rhs",`

**lhs="whole milk"**), control = list (verbose=F)) # those who bought 'milk' also bought..

### Remove redundancies

Sort the rules, filter the redundant ones and show the Top 7 Rules.

`rules <- sort (rules, decreasing=TRUE,by="confidence")`

redundant <- which (colSums(is.subset(rules, rules)) > 1) # get redundant rules in vector

rules <- rules[-redundant] # remove redundant rules inspect (rules[1:7])

### Making Rules For Continuous Data

If you try to make rules on continuous variables, each value will be treated as distinct item, causing undesirable explosion of rules. So, convert the continuous variables to factors, which can be easily done using **discretize() **function.

`discretize (x, method="cluster", categories=3) # method can make cuts in equal "intervals", "frequency", "cluster", "fixed"`

### Visualizing The Rules

`# Interactive Plot`

plot (rules[1:25],method="graph",interactive=TRUE,shading="confidence") # feel free to expand and move around the objects in this plot

plot (rules, measure=c("support", "lift"), shading="confidence")

### More Useful Functions

`affinity(transDat) # Calculates affinity - the 'nxn' Jaccard Index affinity matrix`

transDat_c <- addComplement(transDat, "Item 1") # Adds "Item 1" to all transactions in transDat

duplicated(rules) # find out if any rule is duplicated