6.3.4.1. Pattern Mining Terminologies#
Pattern mining involves several specialized terms that are widely used. We’ll go over them first before diving into further details.
Item#
Each individual entity, commodity, or object is referred to as an item. It represents a single, indivisible unit. From the earlier example, Milk, Butter, etc., are considered items.
Itemset#
A collection of items is referred to as an itemset. An itemset containing only one item is essentially the item itself. For example, (Milk, Butter) is an itemset of size 2.
Itemsets are often categorized by their size:
A 1-itemset contains one item.
A 2-itemset contains two items.
A 3-itemset contains three items, and so on.
Examples:
1-itemset: (Milk)
3-itemset: (Milk, Bread, Butter)
Note
The order of items in an itemset does not matter. For instance, (Milk, Beer) and (Beer, Milk) refer to the same itemset.
Transaction#
A transaction is an entry in the dataset representing a set of items purchased or grouped together. In the market basket example, each transaction corresponds to the items bought by a customer in a single visit.
Datasets used for pattern mining typically contain transactional data, where each record denotes one transaction.
For example, in a dataset:
Transaction with ID 3 might contain (Milk, Bread), indicating those two items were bought together.
We use transactions to infer co-occurrence of items and compute statistics such as support or frequency.