6.3.4. Pattern Mining#
Pattern mining is an unsupervised learning technique used to discover recurring structures or relationships within a dataset. Unlike supervised learning where we have labeled data, pattern mining works with unlabeled transactional or behavioral data to find hidden associations.
The primary goal is to identify frequently co-occurring items or patterns and quantify their occurrence using metrics like support, confidence, or lift.
A classic example of pattern mining is Market Basket Analysis. Given transactional data, where each transaction is a list of items purchased together, the task is to find sets of items that frequently appear together. These frequent itemsets can be valuable for tasks such as recommendation systems, shelf arrangements, or promotional bundling.
Important
Pattern mining reveals co-occurrence, not causation. Just because two items frequently occur together does not mean one causes the other. Their co-occurrence could be due to an external factor or simply coincidence. Implication means co-occurrence, not causality!
Example: Market Basket Transactions
Let’s consider the following small dataset of 5 transactions:
Transaction ID |
Items Purchased |
|---|---|
1 |
Milk, Bread, Butter |
2 |
Beer, Diapers, Milk |
3 |
Milk, Bread |
4 |
Beer, Diapers, Bread |
5 |
Milk, Diapers, Bread |
From this, we might find that the itemset {Milk, Bread} appears in 3 out of 5 transactions, indicating a frequently occurring pattern.
Another application of pattern mining is Web Usage Mining. Here, instead of physical items, we analyze user behavior on websites such as sequences of pages visited, clicks, or time spent on each section to find common navigation patterns. This can inform website design, improve user experience, or optimize content placement.
Unlike typical machine learning models, which aim to predict outcomes based on input features, pattern mining is more focused on discovering inherent structure within the data. It is especially useful during exploratory data analysis (EDA) to uncover hidden relationships that might not be obvious at first glance.