Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR658:
Probabilistic analysis of success and failure rates of candidate generation algorithms for the frequent itemsets mining problem.

Minh Tang
(Feb 2008), 16 pages pages
[This is an independent study report. I'm currently still looking for some conferences/journal for publication.]
Abstract:
This paper considers the success and failure probability of candidate generation algorithms for the frequent itemsets mining problem under several probability models. Results for one of the models had been obtained previously, but with a complex derivation. Our re-derivation of these results is simpler and employed a concentration inequality for the sum of independent Bernoulli random variables. Our results for the other models employed concentration inequalities for martingales and is applicable to models where there is dependence between the transactions. From the success and failure probability we can estimate the size of the maximum frequent itemset.

Available as: