Indiana University Bloomington

School of Informatics and Computing

Technical Report TR555:
On an Information Theoretic Approximation Measure for Functional Dependencies

Chris Giannella and Edward Robertson
(Aug 2001), 12 pages
We investigate the problem of defining an approximation measure for functional dependencies (FDs). For fixed sets of attributes, X and Y, an approximation measure is a function which maps relation instances to real numbers. The number to which an instance is mapped, intuitively, describes the strength of the dependency, X --> Y, in that instance. We define an approximation measure for FDs based on a connection between Shannon's information theory and relational database theory. Our measure is normalized to lie between zero and one (inclusive), and maps a relation instance to zero if and only if X --> Y holds in the instance. Hence, the smaller the number to which an instance is mapped, the ``closer'' X --> Y is to being an FD in the instance.

To put our measure in context, we compare it to a slight variation of a measure previously defined by Kivinen and Mannila, g_3. We denote the variation as \hat{g_3}, although, our results, essentially, apply unchanged to g_3. The purpose of comparing our measure with \hat{g_3} is to develop a deeper understanding of not only our measure, but also, \hat{g_3}. Moreover, we gain a deeper understanding of the natural intuitive notion of an approximate FD. We observe that our measure and \hat{g_3} agree at their extremes but are quite different in-between. As a result, we conclude that our measure and \hat{g_3} are significantly different. An interesting question emerges from this conclusion: is there a rigorous way to determine when one measure better captures the meaning of the degree to which an FD is approximate?

Available as: