The What, Why, Where, When, Who and Why of Association Rule Mining

What is it?

Association rule mining is an approach to discovering patterns of co-occurrence in a (large) dataset, by identifying entities that frequently appear together in a group. This can be done in the context of, for example, products bought together in each transaction or characteristics possessed by each individual.

This type of patterns can be summarized by association rules, which predicts the occurrence of one or more entities based on the occurrences of other entities in a certain grouping, such as a transaction or an individual.

For example, an association rule found in a grocery retailer database may be:

The rule can be interpreted as conditional probability: if a customer bought toilet paper, cheese and milk, they are more likely to also buy shampoo, apples and lettuce in the same purchase.

It is important to note that the association rule does not imply a causal relationship between itemsets on the left and right hand side of the rule.

Why do it?

As an unsupervised learning technique, association rule mining can be used to identify novel patterns/relationships amongst entities in a large set of data.

When to use it?

  • Market basket analysis: Which items are frequently purchased together?

  • Churn analysis: What are the characteristics and behaviours of customers who are likely/unlikely to switch to other companies?

  • Selective marketing: Which customer groups who are likely to purchase a new service or product?

  • Stock market analysis: What relationships exist between individual stocks, and between stocks and economic factors?

  • Medical diagnosis: What relationships exist between symptoms, test results and illness?

Where to learn about it?

Association Rule Mining with R [R and Data Mining]

How to do it?

Overall workflow of association rule mining:

  1. Identify all frequent itemsets
  2. Create association rules by binary partitions of each frequent itemset

  3. Filter the association rules by redundancy, statistical significance, and various measures of interestingness