Indiana University Bloomington

Luddy School of Informatics, Computing, and Engineering

Technical Report TR551:
FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances

Catharine Wyss, Chris Giannella, and Edward Robertson
(Jul 2001), 23 pages pages
Abstract:
Discovering functional dependencies (FDs) from an existing relation instance is an important technique in data mining and database design. To date, even the most efficient solutions are exponential in the number of attributes of the relation (n), even when the size of the output is not exponential in n. Lopes et al. developed an algorithm, Dep-Miner, that works well for large n on randomly-generated integer-valued relation instances. Dep-Miner first reduces the FD discovery problem to that of finding minimal covers for hypergraphs, then employs a levelwise search strategy to determine these minimal covers. Our algorithm, FastFDs, instead employs a depth-first, heuristic driven search strategy for generating minimal covers of hypergraphs. This type of search is commonly used to solve search problems in Artificial Intelligence (AI). Our experimental results indicate that the levelwise strategy that is the hallmark of many successful data mining algorithms is in fact significantly surpassed by the depth-first, heuristic driven strategy FastFDs employs, due to the inherent space efficiency of the search. Furthermore, we revisit the comparison between Dep-Miner and Tane, including FastFDs. We report several tests on distinct benchmark relation instances, comparing the Dep-Miner and FastFDs hypergraph approaches to Tane's partitioning approach for mining FDs from a relation instance. At the end of the paper we provide experimental data comparing FastFDs with a third algorithm, fdep of Flach and Savnik.

Available as: