A list of my publications is here.
Below is a list of research projects in which I have been involved.
- AFD Mining
Approximate Functional Dependencies (AFDs) are a probablistic extension of the
traditional concept of Functional Dependencies. AFDs are the formalization
of rules which are some measured "closeness" to being a Functional Dependency.
Mined AFDs are used in a wide variety of applications. They have been shown
useful in bootstrapping Bayesian classifiers, query optimizers, query answering,
and other applications.
- Framework Approach
Heuristic Lozenge Search (HLS), is a framework developed for the mining of
AFDs. A framework approach has the advantage of a static system by using
independent components which can be altered or replaced in order to tune
the mining process. The goal of this flexibility is to create a mining
system which can meet the needs of an application instead of the application
having to meet the limits of a static mining system. HLS can provide improved
efficiency and effectiveness over the state of the art algorithms.
- Algorithm
The Lozenge Search algorithm is a template search algorithm. It provides
a strategy for searching and then allows traditional search algorithms,
such as breadth or depth first search, to be plugged in to provide
how a lozenge is to be navigated.
Lozenge Search is an iterative algorithm which searchs a lozenge in each
iteration. A lozenge is the concept of the space of rules which are
the result of adding a previously unseen attribute to an already
searched space.
- Effectiveness
HLS not only provides new levels of efficiency, but also incorporates new
abilities directly in the mining process. The two most prominent abilities
are dynamic pruning of attributes from the search space and the ability
to order the results.
- Decisions
HLS allows tunability by using points in Lozenge Search at which decisions
can be made about different variations.
- Schema Matching
I am working with Cognitive Psychologist, Rob
Goldstone to adapt his
ABSURDIST
system to the schema matching problem. ABSURDIST is a concept mapping system
which uses graph matching to incorporate inter and intra system
information for concept mapping. The project poses a number of questions.
The first is how can information about the similarity of two concepts/attributes
be mined, weighted, aggregated, and incorporated into ABSURDIST. The second
question is what information can be mined and then used to transform data
in various formats into graphs. The last aspect of the project is how
and to what extent can solutions to these questions be done in a fully automated manner.
- Machine Learning
Multiple Instance Learning in Cheminformatics
Over the sumemr of 2007, I did an RAship with Rajarshi
Guha. I implemented the EM algorithm, an iterative global optimization algorithm which uses linear regression,
and worked on improvements. We implemented two new features. The first was seeding the algorithm
with a model based on aggregate data values in order to provide more consistent results then
when seeding with a model based on random values. The second improvement
was to use noise to help avoid local minimums.
|