INFORMATION DEPENDENCIES
Edward Robertson
Memhet Dalkilic
Dirk Van Gucht
Affiliation:
Computer Science Department &
School of Informatics
Indiana University
Bloomington, Indiana
Contact
Information
Edward Robertson
Computer Science Department
Indiana University
215 Lindley Hall
Bloomington IN 57405
Phone: (812) 855-4954
Fax : (812) 855-4829
Email: edrbtsn@cs.indiana.edu
URL
WWW PAGE
List of
Supported Students and Staff
Dennis Groth, research assistant
Project Award
Information
Keywords
information theory, information dependency measure, functional and multivalued dependency, entropy, information structure, datamining
Project
Summary
This project focuses on Information Dependency (InD) measures and the application of these measures to databases and datamining. InD measures use classical (Shannon) information theory to evaluate the information structure of database relations. This work extends results by the investigators of this project which show how InD measures generalize concepts important in database design, namely functional and multivalued dependencies. Research in this project is taking place across the spectrum from theory to practice. On the theoretical side, deeper details of InD's are investigated with an eye toward mechanisms for manipulating and applying InD measures. On the theoretic side, properties of InD's are investigated with an eye toward manipulating and applying InD measures, as well as toward implications of InD's on modeling. In the center, techniques for computing the measures are being investigated. Because the ultimate goal of datamining is to inform the user, investigations also include the interaction of InD and visualization. On the applied side, the major focus is the application of InD measures on data mining. Recognizing that research into applications requires real rather than "toy" targets, this project seeks collaborations involving data mining: the first such collaboration being with researchers in Biology. All of the activities of this project ultimately lead toward the development of prototype toolkit components based on InD measures.
Publications
and Products
Goals,
Objectives, and Targeted Activities
Project
References
Area
Background
Just as traditional information theory investigates the consequences of knowing the "surprisingness" of receiving a particular message, this work looks at measures based on the "surprisingness" of finding a particular value in a database table. These measures are especially significant when considered across multiple columns of a table, allowing us to phrase question such as "If we know the gender of an employee, does that provide any information about job_classification?" Questions such as this are generalized to information dependency [InD] measures. Requiring that an InD measure satisfy some particular arithmetic constraint may have interesting consequences on the structure of the underlying relation instance. For example, requiring that a particular measure equal 0 forces a functional dependency in the relation - knowing that a functional dependency exists has important consequences concerning data quality and redundancy. Furthermore, arithmetic inequalities that must always hold between various InD measures in a relation generalize correspondances between various functional dependencies and the like.
*All award information can be found on the on the NSF on-line Awards Abstracts system http://www.fastlane.nsf.gov/a6/A6Start.htm.