Information Dependencies


Supported by the National Science Foundation under grant #82407.

Abstract

This project focuses on Information Dependency (InD) measures and the application of these measures to databases and datamining. InD measures use classical (Shannon) information theory to evaluate the information structure of database relations. This work extends results by the investigators of this project which show how InD measures generalize concepts important in database design, namely functional and multivalued dependencies.

Research in this project is taking place across the spectrum from theory to practice. On the theoretical side, deeper details of InD's are investigated with an eye toward mechanisms for manipulating and applying InD measures. On the theoretic side, properties of InD's are investigated with an eye toward manipulating and applying InD measures, as well as toward implications of InD's on modeling. In the center, techniques for computing the measures are being investigated. Because the ultimate goal of datamining is to inform the user, investigations also include the interaction of InD and visualization. On the applied side, the major focus is the application of InD measures on data mining. Recognizing that research into applications requires real rather than "toy" targets, this project seeks collaborations involving data mining: the first such collaboration being with researchers in Biology.

All of the activities of this project ultimately lead toward the development of prototype toolkit components based on InD measures.

Staff

Publications

See the Datamining section of our Database Group's publication list.

Also, see the web reports to the NSF, for 2001, 2002, 2003, 2004, and the final report.


Last modified: Mon Sep 13 17:56:16 EST 2004