Association Rule Mining

6.3.4.3. Association Rule Mining#

Association Rule Mining is the task of discovering interesting and probable association rules from a transactional dataset.

But what is an Association Rule?

An association rule is an implication of the form: X → Y, where X and Y are itemsets (sets of items), and X ∩ Y = ∅.

For example: \( {Milk, Diaper} \Rightarrow {Beer} \)

is an association rule, which suggests that customers who buy milk and diapers are also likely to buy beer.

The primary objective of association rule mining is to uncover such patterns, helping us understand the relationships and co-occurrence of items in transactions. These rules are especially useful in applications like market basket analysis, recommendation systems, and inventory management.

Rule Evaluation Metrics#

When generating association rules of the form:

\[ X \Rightarrow Y \]

we use certain metrics to evaluate their usefulness, relevance, and strength.

1. Support#

Support measures how frequently the rule occurs in the dataset.

\[ \text{Support}(X \Rightarrow Y) = \frac{\sigma(X \cup Y)}{N} \]

Where:

\(\sigma(X \cup Y)\) = count of transactions containing both \(X\) and \(Y\)
\(N\) = total number of transactions

2. Confidence#

Confidence measures the likelihood of Y appearing in a transaction that contains X. It’s a conditional probability.

\[ \text{Confidence}(X \Rightarrow Y) = \frac{\sigma(X \cup Y)}{\sigma(X)} \]

3. Lift#

Lift tells us how much more likely Y is to appear with X than by random chance.

\[ \text{Lift}(X \Rightarrow Y) = \frac{\text{Confidence}(X \Rightarrow Y)}{\text{Support}(Y)} \]

A lift:

= 1 ⇒ X and Y are independent
> 1 ⇒ X and Y are positively correlated
< 1 ⇒ X and Y are negatively correlated

Rule Evaluation Example#

TID	Items
1	Bread, Milk
2	Bread, Diaper, Beer, Eggs
3	Milk, Diaper, Beer, Coke
4	Bread, Milk, Diaper, Beer
5	Bread, Milk, Diaper, Coke

{Milk, Diaper} ⇒ Beer

From the dataset:

\(\sigma(\text{Milk, Diaper}) = 3\) (T3, T4, T5)
\(\sigma(\text{Milk, Diaper, Beer}) = 2\) (T3, T4)
\(\sigma(\text{Beer}) = 3\) (T2, T3, T4)
Total Transactions \(= 5\)

Now compute:

Support:

\[ \text{Support} = \frac{2}{5} = 0.4 \]

Confidence:

\[ \text{Confidence} = \frac{2}{3} = 0.67 \]

Lift:

\[ \text{Lift} = \frac{0.67}{0.6} \approx 1.12 \]

Interpretation:

Support = 0.4 ⇒ 40% of all transactions include Milk, Diaper, and Beer.
Confidence = 0.67 ⇒ In 67% of cases where Milk and Diaper occur, Beer also appears.
Lift = 1.12 ⇒ There is a positive correlation between the presence of Milk and Diaper and Beer - the rule is interesting!

Association Rule Generation / Rule Mining#

Just like mining frequent itemsets, the goal of association rule mining is to discover interesting relationships among items in large datasets. These rules must satisfy certain thresholds like minimum support and minimum confidence.

Two-Step Process#

Generate Frequent Itemsets: Use brute-force or algorithms like Apriori or FP-Growth to find all itemsets whose support ≥ minsup.
Generate Rules from Frequent Itemsets: From each frequent itemset L, generate rules of the form:

\[ f \rightarrow (L - f) \]

where f is a non-empty subset of L, and the rule satisfies the minimum confidence.

Brute-Force Example#

TID	Items
1	Bread, Milk
2	Bread, Diaper, Beer, Eggs
3	Milk, Diaper, Beer, Coke
4	Bread, Milk, Diaper, Beer
5	Bread, Milk, Diaper, Coke

Example rules (support s, confidence c):

{Milk, Diaper} → {Beer} (s=0.4, c=0.67)
{Milk, Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper, Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk, Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk, Beer} (s=0.4, c=0.5)
{Milk} → {Diaper, Beer} (s=0.4, c=0.5)

Why Not Brute-Force?:

Generating all possible rules is computationally expensive.

If d = 6, total possible association rules R = 602
Exhaustively checking all possible combinations is inefficient.

Efficient Rule Generation with Apriori#

Using Apriori:

First generate all frequent itemsets (efficiently pruned via anti-monotonicity of support).
For each frequent itemset L, generate rules by evaluating all non-empty subsets f ⊂ L such that:

\[ \text{confidence}(f \rightarrow L - f) \geq \text{minconf} \]

Rule Generation Example#

If L = {A, B, C, D}, possible rules:

ABC → D, ABD → C, ACD → B, BCD → A
A → BCD, B → ACD, C → ABD, D → ABC
AB → CD, AC → BD, AD → BC, BC → AD, BD → AC, CD → AB

The total number of rules from itemset of size k is:

\[ 2^k - 2 \]

Lattice of Rules#

The rule lattice represents all possible rules derived from a frequent itemset.

Sample Rule Lattice: Itemset Tree

Rules lower in the lattice have more items on RHS.
Many rules have low confidence and can be pruned early.

Red dotted region in the image shows pruned rules due to low confidence.

Confidence: Anti-Monotonicity Property#

While support is anti-monotonic (larger itemsets have equal or lower support), confidence behaves differently:

General Case:#

Confidence is not anti-monotonic across different itemsets:

\[ c(ABC \rightarrow D) \quad \text{can be more or less than} \quad c(AB \rightarrow D) \]

Within the Same Itemset:#

Confidence is anti-monotonic with respect to RHS size: If L = {A, B, C, D}:

\[ c(ABC \rightarrow D) \geq c(AB \rightarrow CD) \geq c(A \rightarrow BCD) \]

This helps prune low-confidence rules early, improving efficiency.