Charm Algorithm
The Charm algorithm is a method for mining closed frequent
itemsets from a large database of transactions. A closed frequent
itemset is a set of items that occurs frequently in the database,
and has no superset that is also frequent. For example, if {a, b,
c} is a frequent itemset, and {a, b, c, d} is not frequent, then
{a, b, c} is a closed frequent itemset.
The Charm algorithm uses a tree-based data structure to
efficiently enumerate all closed frequent itemsets. It uses
several techniques to prune the search space and reduce the
number of database scans. Some of the main features of the
Charm algorithm are:
• It uses a novel technique called itemset-tidset tree (IT-
tree) to store the itemsets and their corresponding
transaction ids. Each node in the IT-tree is an itemset-
tidset pair, such as {a, b}- {1, 2, 5}. The IT-tree is
constructed by recursively adding items to the current
itemset, while maintaining the tidset as the intersection of
the tidsets of the parent and the child nodes.
• It uses a diffset propagation technique to perform fast
frequency computation. This technique computes the
support of an itemset by subtracting the transactions that
do not contain it from the transactions that contain its
prefix. This reduces the size of the transactions and avoids
counting the same transactions multiple times.
• It uses a property called closure extension to merge nodes
that have the same tidset. This technique eliminates non-
closed itemsets by comparing them with the previously
found closed itemsets, and extending the itemset with the
items that are common to all the transactions in the tidset.
• It uses a property called closure jumping to skip nodes that
are not closed. This technique avoids generating non-closed
itemsets by jumping to the next node that is guaranteed to
be closed.